Adding columns is super easy (and performant) in postgres 11

Generally databases are always good at adding the column metadata into the data dictionary – normally the performance issues are around adding NOT NULL or a default value because a back population needs to happen.

This can be problamatic when tables have billions of rows and an exclusive lock needs to be taken out. Luckily, in Postgres 11 some underlying changes have been made to allow this previously difficult task to happen instantly.

Someone has done a great job of explaining the difference for version 11 here – https://brandur.org/postgres-default

My testing showed the following results on a 400 million row table (200GB):

alter table stock_performance_by_version add column mos int;
--81 msec
alter table stock_performance_by_version add column mosdefaultint int default 0;
--87 msec
alter table stock_performance_by_version add column mosdefaultnotnull int NOT NULL default 0;
--104 msec
alter table stock_performance_by_version add column mosdefaultstring varchar NOT NULL default 'hello';
--126 msec

If a table column is added with NOT NULL and a default value, the value is now stored in pg_attribute and referenced back when the original data is selected from the table. To the end user, it just looks like they are selected columns from tables, but behind the scenes, the db engine is joining the “missing attributes” back to the table.

Screen Shot 2019-11-13 at 11.50.59

underying meta data for the above can be seen in pg_attribute

SELECT c.oid,
  n.nspname,
  c.relname, t.*
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
CROSS JOIN LATERAL (
  SELECT a.attname,
    pg_catalog.format_type(a.atttypid, a.atttypmod),
    (
      SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
      FROM pg_catalog.pg_attrdef d
      WHERE d.adrelid = a.attrelid
        AND d.adnum = a.attnum
        AND a.atthasdef
    ),
    a.attnotnull, a.attnum,
    (
      SELECT c.collname
      FROM
        pg_catalog.pg_collation c,
        pg_catalog.pg_type t
      WHERE c.oid = a.attcollation
        AND t.oid = a.atttypid
        AND a.attcollation <> t.typcollation
    ) AS attcollation,
	a.atthasmissing,
	a.attmissingval
  FROM pg_catalog.pg_attribute a
  WHERE a.attrelid = c.oid
    AND a.attnum > 0
	AND a.attmissingval is not null --TO ONLY SEE MISSING ATTRIBUTES
    AND NOT a.attisdropped
) AS t
WHERE n.nspname ~ '^(public)$'  -- YOUR SCHEMA HERE
AND relname = 'stock_performance_by_version'
AND pg_catalog.pg_table_is_visible(c.oid);

Screen Shot 2019-11-13 at 11.56.39

Great Feature!

Setting up SOPS (Secrets OPerationS)

We use docker-compose to locally generate terraform plans before committing them into code and triggering pipelines. The code requires some encrypted variables, which can be locally extracted using sops.

SOPS (Secrets OPerationS) is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault and PGP.

GitHub – mozilla/sops

brew install sops

Before you can use sops, you need to authorise against the application-default Credentials.

If not authenticated, when you try to source sops this happens:

sops -d testing.config.sops

Failed to get the data key required to decrypt the SOPS file.

Group 0: FAILED
projects/project-name-here/locations/global/path/terraform: FAILED
– | Cannot create GCP KMS service: google: could not find
| default credentials. See
| https://developers.google.com/accounts/docs/application-default-credentials
| for more information.

Recovery failed because no master key was able to decrypt the file. In
order for SOPS to recover the file, at least one key has to be successful,
but none were.

To Authorise a login:

gcloud auth application-default login

Which then allows you to source the decrypted keys locally:

#Bash
eval $(sops -d testing.config.sops)

#zsh
source <(sops -d testing.config.sops)

Essentially all you are doing with sops is setting up your local machine with the correct decrypted keys:

More about Application default credentials

https://cloud.google.com/sdk/gcloud/reference/auth/application-default/

Application Default Credentials (ADC) provide a method to get credentials used in calling Google APIs. The gcloud auth application-default command group allows you to manage active credentials on your machine that are used for local application development.

These credentials are only used by Google client libraries in your own application.

 

Return pods running on nodes using external node IP

Some of our databases sit in managed environments which means connections from our applications can show up in the database logs as coming from an external IP at the edge of our cloud infrastructure.

I wrote a little script which will return the pods running on a kubernetes cluster node by specifying its external IP.

kubectl get pods -o wide --sort-by="{.status.phase}" --all-namespaces | grep `kubectl get nodes -o wide | grep 35.X.X.X | awk '{print $1}' | awk '{$1=$1;print}'`

Ive found it quite useful for starting to diagnose where database connections are being initiated from.

Group IP address counts from a log

A colleague wrote a great little script to do some analysis against an Oracle listener log and to show how many listener connections were being established from each unique IP in the log.

cd /u01/app/oracle/diag/tnslsnr/node_name/listener_qa/trace
grep DB_DEFAULT_SERVICE listener_qa.01-Nov-2018-W44.log | grep -oE "[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}" | sort | uniq -c

The output looks something like this:

20277 172.0.0.1
37176 172.0.0.4
18    172.1.1.1
2300  172.2.3.10
2300  172.2.3.8
96    172.2.3.4
1012  172.2.3.7

 

 

Using Cloudsqlproxy to initiate connection to mysql from the shell

The cloud sql proxy is a really useful way of managing connectivity to cloudSQL instances. The proxy provides secure access to your Cloud SQL Second Generation instances without having to whitelist IP addresses or configure SSL.

Accessing Cloud SQL instances using the Cloud SQL Proxy offers these advantages:

  • Secure connections: The proxy automatically encrypts traffic to and from the database using TLS 1.2 with a 128-bit AES cipher; SSL certificates are used to verify client and server identities.
  • Easier connection management: The proxy handles authentication with Cloud SQL, removing the need to provide static IP addresses.

There are a couple of pre-requisites before this will work:

  • Only Second generation instances can benifit from this feature.
  • Cloud SQL Admin API must be enabled
  • The instance must have  either a public IPv$ address or be configured to use a private IP.
  • The proxy needs to be given GCP authentication credentials, either an individual or a service account.

Full details of exactly what to do are here – https://cloud.google.com/sql/docs/mysql/sql-proxy 

Some notes on what I did on macOS sierra:

  1. Created a Service Account, gave it the Cloud SQL Client role and then generated a new private key (json key)
curl -o cloud_sql_proxy https://dl.google.com/cloudsql/cloud_sql_proxy.darwin.amd64

Unpack the dmg file, then give execute permissions on the proxy file.

chmod +x cloud_sql_proxy

Then start the proxy:

./cloud_sql_proxy -instances=project:region:instance=tcp:3306 \
-credential_file=/Users/dbamohsin/sslkeys/cloudsqlproxy-env.json

The proxu will now start listening on your local machine for connection attempts against port 3306:

2018/11/09 11:53:10 Listening on 127.0.0.1:3306 for project:region:instance
2018/11/09 11:53:10 Ready for new connections

Then in another terminal window, connect as normal to a mysql instance using the localhost IP:

mysqlsh -udbamohsin -p -h127.0.0.1

This should make a connection to the cloudSQL instance using the proxy and the connection will show in the proxy log:

2018/11/09 11:53:42 New connection for "project:region:instance"

Deleting files older than X days

A couple of different ways to purge files older than a certain amount of days.

find -type f -mtime +350 -name 'CopyCluster*' -execdir rm -- '{}' \;

If you want to be very precise with the number of days ago you want to delete from then you can find the number of days from now using:

days=$(( ( $(date '+%s') - $(date -d '1 months ago' '+%s') ) / 86400 ))
echo $days

and then use the variable to delete

find /logs/CopyCluster* -mtime +$days -type f -delete

To do this all in one command

find /tmp/*.log -type f \
-mtime "+$(( ( $(date '+%s') - $(date -d '1 months ago' '+%s') ) / 86400 ))" -delete

 

 

Adding a new project into kubectl/kubectx cli

I use two main command line tools when it comes to kubernetes – kubectx and kubectl

Kubectx is a great little utility for switching between kubernetes clusters, and kubectl is the official utility to manage and deploy kubernetes applications.

Adding a context into kubectx/kubectl

Go to the google cloud console and navigate to Kubernetes Engine and then click the connect button on the cluster, which will provide the gcloud command you need to run on your machine to configure access to the cluster locally.

as an example, lets say we have just added gke-dbemohsin-cluster-sandbox 

MYMAC:cloud-architecture dbamohsin$ kubectx
DEV
PREPROD
PROD
TESTING
gke-dbemohsin-cluster-sandbox

To rename a context in kubectx:

kubectx SANDBOX=gke-dbemohsin-cluster-sandbox
Context "gke-dbemohsin-cluster-sandbox" renamed to "SANDBOX".

To add a wrapper alias around context switching, expecially if some context run under different gcloud accounts, you can add the commands to your bash_profile:

alias k8-sandbox='gcloud config configurations activate dbemohsin; kubectx SANDBOX'

and to call the alias:

MYMAC:~ dbamohsin$ k8-sandbox
Switched to context "SANDBOX".