Elasticsearch: How to delete log entries older than X days with Curator

Tagged curator, elasticsearch  Languages bash, cron
# Install curator
pip install curator
# Download curator config file
curl -o curator.yml https://raw.githubusercontent.com/elastic/curator/master/examples/curator.yml

Next, download, read, and edit the action file: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/actionfile.html

# Run curator
curator --config curator.yml action_file.yml

Add this to crontab:

# Run curator at 00:01
01 00 * * * /usr/local/bin/curator --config /etc/elasticsearch/curator/curator.yml /etc/elasticsearch/curator/remove-old-data.yml >> /var/log/elasticsearch-cu
rotor.log

Tested with Elasticsearch 6.0 and curator version 5.4.1.

GDPR

Tagged encryption, eu, gdpr, pseudonymization, masking  Languages 
  • In order to be able to demonstrate compliance with the GDPR, the data controller should implement measures which meet the principles of data protection by design and data protection by default.

  • Privacy by Design and by Default (Article 25) require that data protection measures are designed into the development of business processes for products and services. Such measures include pseudonymising personal data, by the controller, as soon as possible (Recital 78).

  • Although the GDPR encourages the use of pseudonymisation to “reduce risks to the data subjects,” (Recital 28) pseudonymised data is still considered personal data (Recital 26) and therefore remains covered by the GDPR.

  • However, the notice to data subjects is not required if the data controller has implemented appropriate technical and organizational protection measures that render the personal data unintelligible to any person who is not authorized to access it, such as encryption (Article 34).

  • Records of processing activities must be maintained, that include purposes of the processing, categories involved and envisaged time limits. These records must be made available to the supervisory authority on request.[24] (article 30).

https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

  • Pseudonymization is a central feature of “data protection by design.”

  • Companies that encrypt their personal data also gain the advantage of not having to notify data subjects in the case of a breach. (They still, though, would have to notify the local DPA.)

  • Under Article 32, controllers are required to implement risk-based measures for protecting data security. One such measure is the “pseudonymization and encryption of personal data”

  • The GDPR addresses the first concern in Recital 75, which instructs controllers to implement appropriate safeguards to prevent the “unauthorized reversal of pseudonymization.” To mitigate the risk, controllers should have in place appropriate technical (e.g., encryption, hashing or tokenization) and organizational (e.g., agreements, policies, privacy by design) measures separating pseudonymous data from an identification key.

https://iapp.org/news/a/top-10-operational-impacts-of-the-gdpr-part-1-data-security-and-breach-notification/

  • In this regard, the GDPR expressly says that businesses should consider implementing “as appropriate … the pseudonymisation and encryption of personal data.” While the law stops short of telling businesses they must implement pseudonymisation, the express reference to pseudonymisation in the security provisions of the GDPR is highly significant – indicating that, in the event of a security breach, regulators will take into consideration whether or not a business had implemented pseudonymisation technologies. Businesses that have not may therefore find themselves more exposed to regulatory action.

  • if a data breach presents low risk to the individuals concerned, the GDPR’s breach notification requirements become more relaxed. Pseudonymisation, whether through masking, hashing or encryption, offers a clear means to reduce the risks to individuals arising from a data breach (e.g. by reducing the likelihood of identity fraud and other forms of data misuse), and is supported by the GDPR as a security measure as already described above.

http://www.directcommercemagazine.com/UserContent/doc/12624/delphix%20gdpr%20for%20data%20masking.pdf

Tools

  • Data masking

https://www.mssqltips.com/sqlservertip/3091/masking-personal-identifiable-sql-server-data/

  • Data encryption

PostgreSQL 10 With Streaming Replication and PITR

Tagged pitr, postgresql, replication, standby, streaming  Languages bash, sql

Notes on how to configure streaming replication and PITR with PostgreSQL 10.

Goal

  • Master and slave (warm standby)

    • Master database backed up to slave server via streaming replication
    • WAL files copied from master to a network drive
  • Point-in-time-recovery (PITR), e.g., if someone deletes a table by mistake
  • Recovery possible even if both master and slave are lost
  • Daily and weekly backups available on a network drive
  • WAL files available on a network drive
  • Network drive backed up

Steps

  1. Install postgres 10 on master and slave
  2. Configure the master

    • wal_level=replica, archive_mode=on, archive_command, wal_keep_segments (optional with replication slots), etc in postgresql.conf
    • archive_command should copy WAL files to a shared network drive for additional redundancy, or the slave
    • create a replication slot for each slave, so that WAL files are stored long enough for slaves to receive them:
SELECT * FROM pg_create_physical_replication_slot('pg-slave-1');
  1. Configure the slave

    • hot_standby=on, etc in postgresql.conf (keep slave configuration as identical as possible to master)
    • primary_slot_name = ‘pg-slave-1’, standby_mode=on, restore_command, primaryconn_info, trigger_file in recovery.conf
    • restore_command should use the WAL files on the network drive that are copied there from the master
  2. Start primary server, stop the slave

Make sure, e.g., monit does not start the slave immediately again.

  1. Copy the master database to the slave with pg_basebackup

Make sure the slave’s data directory is empty:

psql -c '\db'
sudo /etc/init.d/postgresql stop
sudo rm -rf /var/lib/postgresql/10/main
sudo mkdir /var/lib/postgresql/10/main
sudo pg_basebackup -h 10.0.0.1 -D /var/lib/postgresql/10/main -P -U replicator -X stream -W
sudo chown -R postgres:postgres /var/lib/postgresql/10/main
sudo chmod -R 0700 /var/lib/postgresql/10/main
  1. Start slave
sudo service postgresql start
  1. Set up daily backups

Configure daily backups of PostgreSQL data to a network drive.

  1. Backup the backup

Configure daily and weekly backups of network drive.

  1. Check replication status

On master:

select pg_current_wal_lsn();

On slave:

select pg_last_wal_replay_lsn();

Both values should match.

Scenarios

  • Master server is killed

Promote slave to master with touch /tmp/promote-slave-to-master.trigger

  • Master server is killed, slave replication has fallen behind master

Restore slave from the WAL files located on the network drive. Or use a daily backup plus the WAL files if replication has fallen behind too much.

  • Master server and slave are killed

Restore the database from a daily backup and WAL files located on the network drive.

  • Master server, slave, and network drive are killed

Restore the database from a daily backup and the WAL files located on another network drive.

  • “drop database xxx” was run by mistake

Restore the database with PITR. For example, set recovery_target_time = ‘2017-06-06 06:06:06’ in recovery.conf.

  • Additional slaves are needed

Configure the new slave. Remember to create a new replication slot.

  • Slave is removed permanently

Delete the replication slot or the WAL archives will keep accumulating until the master’s disk is full.

Troubleshooting

  • Replication not working

Is recovery.conf in the data directory? Was it there before the server was started?

References

Ansible: How to find the IP address of a specific network interface when there are multiple network interfaces

Tagged ansible, ip address, network  Languages yaml

Here’s how to find the IP address of a specific network interface that matches a given CIDR:

- hosts: all
  gather_facts: false
  tasks:
    - set_fact:
        prod_ip_addr: "{{ item }}"
      when: "item | ipaddr('192.168.10.0/24')"
      with_items: "{{ ansible_all_ipv4_addresses }}"
    - debug: var=prod_ip_addr

This is useful for example when you have a separate management network interface.

It’s almost as easy as trying to explain what the script is doing in plain English.

How to debug Ansible variables

Tagged ansible, debug, hostvars, variables  Languages bash, yaml

Print all variables for all hosts from the command line:

 $ ansible -i inventory/local -m debug -a "var=hostvars" all

Replace hostvars with any of the following to print:

  • ansible_locals
  • groups
  • group_names
  • environment
  • vars
  • ansible_sucks

Print all variables for all hosts from a playbook:

- hosts: all
  tasks:
    -  debug:
        var: hostvars[inventory_hostname]
        # -vvv to debug !!!!
        # verbosity: 4

Print all variables:

- name: print ansible_local
  debug: var=ansible_local

Nim web application example

Tagged example, jester, nim  Languages bash, nim
brew install nimlang
nimble install jester
vim hello.nim
nim compile --run hello.nim
import jester, asyncdispatch, strutils, asyncnet, htmlgen, logging
#
# * CLI *
#
# brew install nimlang
# nimble install jester
# vim hello.nim
# nim compile --run hello.nim
#
# * DOCS *
#
# https://nim-lang.org/docs/htmlgen.html
# https://learnxinyminutes.com/docs/nim/
#
var L = newConsoleLogger()
addHandler(L)

settings:
  port = Port(3000)
  appName = "/"
  bindAddr = "127.0.0.1"

proc layout(content: string): string =
  htmlgen.html(htmlgen.body(content))

routes:
  get "/":
    logging.debug("render form $1 $2" % [$status, $headers])
    var content = `div`(
        h1(a(href="http://nim-lang.org", title="Hello", "Hello Nim")),
        """
        <form name="input" action="$1" method="post">
          <input type="text" name="first_name" value="" placeholder="First name" />
          <input type="text" name="last_name" value="" placeholder="Last name" />
          <input type="submit" value="Submit" />
        </form>
        """ % [uri("/", absolute = false)]
        )
    body.add(layout(content))
    status = Http200

  post "/":
    logging.debug("process form $1 $2" % [$status, $headers])
    # TODO: How can we sanitize submitted form data?
    var content = `div`(
        h1(
          a(href="http://nim-lang.org", title="Hello", "Hello $1 $2" % [$request.params["first_name"], $request.params["last_name"]])
        ),
        h4("Form data"),
        p($request.params),
    )
    body.add(layout(content))
    status = Http200

runForever()

SQL for generating a report for each month of the year using PostgreSQL's crosstab

Tagged crosstab, pivot, postgresql, year  Languages sql

This is the report we want to generate:

┌──────────────────────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬─────┬─────┐
│           key            │  jan  │  feb  │  mar  │  apr  │  may  │  jun  │  jul  │  aug  │  sep  │  oct  │ nov │ dec │
├──────────────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼─────┼─────┤
│ Christian                │ 4209  │ 3627  │ 3686  │ 3109  │ 3605  │ 3506  │ 2892  │ 3380  │ 3262  │ 1821  │ ¤   │ ¤   │
│ Barney                   │ 17188 │ 17139 │ 16622 │ 14096 │ 17302 │ 17063 │ 13372 │ 16277 │ 16672 │ 9263  │ ¤   │ ¤   │
│ Donald                   │ 16078 │ 14627 │ 16518 │ 14241 │ 16397 │ 16655 │ 15739 │ 17639 │ 16178 │ 9588  │ ¤   │ ¤   │
│ Duck                     │ 9369  │ 9099  │ 10640 │ 9184  │ 10489 │ 10332 │ 9711  │ 11108 │ 10405 │ 6338  │ ¤   │ ¤   │
│ Jebus                    │ 17774 │ 16433 │ 18502 │ 15877 │ 17918 │ 17411 │ 15900 │ 18175 │ 17149 │ 10141 │ ¤   │ ¤   │
└──────────────────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴─────┴─────┘

You need the crosstab function which can be found in the tablefunc extension:

CREATE EXTENSION tablefunc;

Now generate the report using the crosstab function:

SELECT * 
FROM CROSSTAB(
  'SELECT key, month, SUM(value) FROM people_statistics WHERE month >= ''2017-01-01'' GROUP BY key, month ORDER BY key',
  'SELECT (DATE ''2017-01-01'' + (INTERVAL ''1'' month * generate_series(0,11)))::date')
AS
  ct_result (key bigint, jan bigint, feb bigint, mar bigint, apr bigint, may bigint, jun bigint, jul bigint, aug bigint, sep bigint, oct bigint, nov bigint, dec bigint);

We used the crosstab function that accepts two SQL queries as arguments. The first argument generates the rows for the query:

  • Column 1 is the key or identifier for the data, e.g., person name (Christian)
  • Column 2 contains the categories that will be used to pivot the data, e.g., year and month (2017-01)
  • Column 3 is the value that will be displayed, e.g., number of people (12)

The second argument generates the categories which in this example are the months of the year.

Pattern matching order in Elixir

Tagged elixir, order, pattern matching  Languages elixir
defmodule Greeter do
  def hello(name) do
    "Hello #{name}"
  end
  def hello(:jane), do: "Hello Jane!!!!!"
end

Greeter.hello(:jane) # Prints "Hello Jane"
Greeter.hello(:janet) # Prints "Hello Janet"

defmodule Greeter do
  def hello(:jane), do: "Hello Jane!!!!!"
  def hello(name) do
    "Hello #{name}"
  end
end

Greeter.hello(:jane) # Prints "Hello Jane!!!!!"
Greeter.hello(:janet) # Prints "Hello Janet"

Ecto query using left join, group by, order by, and count

Tagged count, ecto, group_by, left_join, order_by  Languages elixir

This is an example of an Ecto query that uses a left join, group by, order by, and count to produce a count of associated records for a list:

query = from list in List,
  left_join: subscriber in assoc(list, :subscribers),
  group_by: list.id,
  order_by: [asc: :name],
  select_merge: %{ subscriber_count: count(subscriber.id) }
query |> Repo.all

Remember to add a virtual attribute named subscriber_count:

schema "lists" do
  ...
  field :subscriber_count, :integer, virtual: true

Tested with Ecto 2.2.

Copy-to-clipboard with plain Javascript

Tagged clipboard, copy, javascript  Languages javascript
var Copy2Clipboard = {
  init: function(selector) {
    var btns = document.querySelectorAll(selector);
    for (var i = 0, len = btns.length; i < len; i++) {
      var btn = btns[i]
      btn.addEventListener('click', function(event) {
        var btn = event.target
        try {
          console.debug("click")
          var textarea = document.getElementById(btn.getAttribute('data-target'))
          if (textarea == null) {
            alert("copy-to-clipboard target is undefined")
            return
          }
          textarea.select();
          var successful = document.execCommand && document.execCommand('copy')
          if (successful) {
            btn.innerHTML = 'Copied...'
          } else {
            alert("Press Ctrl+C or Cmd+C to copy")
          }
        } catch (err) {
          console.log('Oops, unable to copy')
        }
      })
    }
  }
}

Copy2Clipboard.init('.copy-to-clipboard')
textarea id="embed-code">
  This will be copied to the clipboard.
/textarea>