Correlating mongodb connections to application threads

We are currently in the process of making sure all of our applications that connect to mongodb Atlas are using connection pooling. There are several benifits to managing connnections effectively and efficiently, namely that connections are recycled efficiently reducing resource overhead.

The mongodb cost of a connection is 1MB so it can quickly add up and eat into valuable RAM that could otherwise be use for cache activity.

In a managed service world, everything is based around limitations and tiering based on some core components such as CPU, IOPS, disk space and from a database perspective; connections.

The number of connections allowed on a mongodb cluster correlates directly to the instance size. For example;

Instance Size Connection Limit
M10 350
M20 700
M30 2000

To see all the connection limits see

Connections limits mattered less to a DBA in an on prem world, as you would set the ulimit settings for open files and processes/threads to 64000 as per the mongodb recommendations ( However, it becomes extremely critical when you have an M10 that only allows 350 connections of which around 5-10% are taken up by mongo system processes.

Analysing mongodb logs for connections

I use a little app called mtools developed by Thomas Rückstieß who works at mongodb. It is a collection of helper scripts to parse, filter, and visualize MongoDB log files (mongodmongos).

You can pick it up here –

The setup is straightforward and you can quickly start seeing how many connections are being opened and closed grouped by IPs.

mloginfo mongod.log --connections
     source: core-prod-vms-scaled.log
     host: unknown
     start: 2018 Dec 31 10:56:08.404
     end: 2018 Dec 31 13:25:40.320
date format: iso8601-local
     length: 2714
     binary: unknown
     version: >= 3.0 (iso8601 format, level, component)
     storage: unknown

     total opened: 155
     total closed: 143
     no unique IPs: 4
     socket exceptions: 0

35.X.X.1    opened: 55        closed: 55
35.X.X.2    opened: 49        closed: 49
35.X.X.3    opened: 39        closed: 39
35.X.X.4    opened: 12        closed: 0

Correlating open connections against an app server

If we take the example output above and use the 35.X.X.4 IP – we can see that it has sent 12 incoming connections to mongo. The best way i’ve found to see established connections on an app server is to use netstat.

netstat -anp | grep ESTABLISHED | grep ":27017" | grep " 172." | awk '{print $5}' | sort | uniq -c | sort -n
      12 172.X.X.1:27017
      12 172.X.X.2:27017
      12 172.X.X.3:27017

The above is telling us that there are 12 threads connected to 3 different IPs. When looking into the IP’s, they reference the 3 nodes on a mongo replica set which tells us that each connection on mongo is actually 3 threads on an app server (or however many nodes there are in the replica set).


Setting the maxPoolSize  property on the mongo driver will help control how many threads an app server is allowed to open against a mongodb node. Be wary that the maxPoolSize default varies in different drivers – for example, in python its 100, but in node.js its 5.

Knowing the maxPoolSize for applications that have databases on the same cluster can then allow you to accurately calculate what the max connections could potentially be for a cluster. This could then help make more informed decisions about whether to scale or upsize a mongodb cluster or split applications out.

YOu can get more info about connection pool options here –

Troubleshooting Kubernetes to GCP Mongo Atlas connectivity – so where do you start?

**Update – 03/09/2018**

Since I wrote the original post we have seen further connectivity issues getting Istio working with mongo replica sets. The error looks something like this;

pymongo.errors.ServerSelectionTimeoutError: No replica set members match selector “Primary()”

This has been raised as an issue with Istio 1.0.1 –

We believe that Istio is not correctly routing traffic when it comes to replica set hosts, and is using the host list in the service entry for load balancing rather than just a List of nodes which can be talked to externally. This is a big problem for a clustering based data store such as MongoDB where connection flow is incredibly important. This is a hypothetical theory we have rather than something that has been concluded.

As a result of this issue we are no longer using the outbound port whitelisting feature in Istio, as it turned out this feature had been the cause of a few unrelated issues in our environment. Although the post below is still valid in terms of troubleshooting possible connectivity issues, we are no longer using Service Entries to for outbound traffic interception.


Original Post

In the last few weeks, We have started to prove out running a latency sensitive application in a k8s container against a database running on mongoDB Atlas which is also in GCP.


We currently have a service running in Kubernetes in a europe-west4 container, but the metrics are showing that it runs around 5x slower on average when it connects to an on premise mongo database running 3.2.

We wanted to get the database closer to the application, and since the general availability of GCP as a region in Mongo Atlas, this became a lot easier to try out. MongoDB kindly gave us some Atlas credits to try this out, and within 10-15 minutes, I had the following running in Mongo Atlas:

  • 3 node replica set running v3.6 with WiredTiger
    • 2 nodes in Europe-West4 (Netherlands)
    • 1 node in Europe-West1 (Belgium)
  • M30 Production Cluster
    • 7.5GB RAM
    • 2 vCPUs
    • Storage
      • 100GB Storage
      • 6000 IOPS
      • Auto Expand Storage
      • Encrypted Disk

“Only Available in AWS”

Get used to hearing and seeing this both in the documentation, and when discussing with Support or Account Managers. As expected, not all features are available in GCP yet, the main ones that are frustrating us (As of August 2018) are VPC Peering and Encyption at Rest.

VPC Peering

Atlas supports VPC peering with other AWS VPCs in the same region. MongoDB does not support cross-region VPC peering. For multi-region clusters, you must create VPC peering connections per-region.

Depending on the volume we push to and from our databases, lack of support for VPC Peering may be cost prohibitive as you end up paying egress costs twice, both as your data leaves your K8s application project for data entry and also when it leaves GCP Mongo Atlas on data retrieval. Mongo generally give a cost estimation of 7-10% of cluster size for Egress costs.

Encryption at Rest

The following restrictions apply to Encyption at Rest on an Atlas cluster:

  • Continuous Backups are not supported. When enabling backup for a cluster using Encryption at Rest, you must use Cloud Provider Snapshots from AWS or Azure to encrypt your backup snapshots.
  • You cannot enable Encryption at Rest for clusters running on GCP.

So, currently we cannot take advantage of this option because we would eventually want to run backups via Atlas.

Administrators who deploy clusters on GCP and want to enable backup should keep those clusters in a separate project from deployments that use Encryption at Rest or Cloud Provider Snapshots.


We changed the uri connection string for our test application running in k8s to point towards the mongo atlas cluster. Something like this:

uri = mongodb://,,

However, the application was failing its readiness check (healthcheck on pod startup could not connect to the mongo database).

Troubleshooting Avenues

Kubernetes is not easy to diagnose issues in – especially for novices like me!

Using kubectl logs on the pod was not a great help in this instance, but that might be down to application configuration and what it sends to stdin, stdout, stderr (which is the basis of kubectl logs).

This doesnt tell me much in terms of error logging:

kubectl logs my-k8s-service-5f9b978fc8-d6ddc -n my-k8s-service -c masterUsing config environment: PREPROD[24/Aug/2018:13:29:17] ENGINE Bus STARTING[24/Aug/2018:13:29:17] ENGINE Started monitor thread 'Autoreloader'.[24/Aug/2018:13:29:17] ENGINE Started monitor thread '_TimeoutMonitor'.[24/Aug/2018:13:29:17] ENGINE Serving on[24/Aug/2018:13:29:17] ENGINE Bus STARTED

Replicate the connectivity issue locally on the pod

We knew our readiness check was failing on connectivity so runnng an Interactive bash terminal inside the pod and trying to make a connection to the Atlas replica set without the application was a fair shout.

kubectl exec -it my-k8s-service-5f9b978fc8-d6ddc -c master bash -n my-k8s-service

Then I generated a manual mongo uri connection using python (the application runs in python) and tried to retrive the database collections.

//connect using uri
/usr/bin/scl enable rh-python35 python
from pymongo import MongoClient
uri = "mongodb://,,"
client = MongoClient(uri)
db = client.myDB
collection = db.myCollection

and this is what happened:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/atcloud/.local/lib/python3.5/site-packages/pymongo/", line 715, in collection_names nameOnly=True, **kws)] File "/home/atcloud/.local/lib/python3.5/site-packages/pymongo/", line 674, in list_collections read_pref) as (sock_info, slave_okay): File "/opt/rh/rh-python35/root/usr/lib64/python3.5/", line 59, in __enter__return next(self.gen) File "/home/.local/lib/python3.5/site-packages/pymongo/", line 1099, in _socket_for_reads  server = topology.select_server(read_preference) File "/home/.local/lib/python3.5/site-packages/pymongo/", line 224, in select_server address)) File "/home/.local/lib/python3.5/site-packages/pymongo/", line 183, in select_servers selector, server_timeout, address) File "/home/.local/lib/python3.5/site-packages/pymongo/", line 199, in _select_servers_loop self._error_message(selector))pymongo.errors.ServerSelectionTimeoutError: SSL handshake failed: EOF occurred in violation of protocol (_ssl.c:645),SSL handshake failed: EOF occurred in violation of protocol (_ssl.c:645),SSL handshake failed: EOF occurred in violation of protocol (_ssl.c:645)

Failing without the application – thats a good sign as it now allows us to do some quick fail fast testing with the uri connection code above!

Lets copy the pod and start it in a testing environment – to start testing changes more aggresively

Create a mytestpod.yaml file with a basic pod configuration with the same base image as the application:

apiVersion: v1kind: Podmetadata:  annotations: "true"  name: dbamohsin-istio-test-podspec:  containers:  - args:    - "10000"    image:    imagePullPolicy: Always    name: master    command: ["/bin/sleep"]

and create it in kubernetes:

kubectl apply -f mytestpod.yaml

Then go into an interactive terminal with bash

kubectl exec -it dbamohsin-istio-test-pod bash

and run the same uri mongo connection as before. Doing this gave the same python connectivity error shown early in the post.

Again, this is a good sign as now we have moved our issue to a less strict environment where we can test changes better.

Can we test the same thing without the istio sidecar…sure!

Delete the pod:

//There are 2 ways of doing this:
kubectl delete -f mytestpod.yaml
kubectl delete pod dbamohsin-istio-test-pod

Modify mytestpod.yaml and comment out the istio sidecar annotation

metadata:  annotations:# "true"

Then re-create the pod and go into an interactive terminal with bash.

Retry the mongo Uri…and it worked!

Python 3.5.1 (default, Oct 21 2016, 21:37:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linuxType "help", "copyright", "credits" or "license" for more information.>>> from pymongo import MongoClient>>> uri = "mongodb://,,">>> client = MongoClient(uri)>>> db = client.myDB>>> collection = db.myCollection>>> db.collection_names(include_system_collections=False)['col1', 'col2', 'col3', 'col4']

so this narrowed it down to something we potentially hadnt configured in istio.

Connectivity from GCP K8s to GCP Mongo

We use Istio for service discovery/unifying traffic flow management (Auto Trader Case Study on Computer World UK). I didnt know beforehand but we restrict egress traffic out of our GCP projects and as a result we needed a service entry in kubernetes to allow 27017. Some MESH_EXTERNAL services documentation –

The good thing is that there is a specific mongo protocol available in istio…We added a service entry to our testing environment as shown below:

kind: ServiceEntry
 name: egress-mongo-all


 - number: 27017
 name: mongo
 protocol: MONGO
 resolution: DNS

and deployed:

kubectl apply -f egress-mongo-all.yaml

Changes from service entry’s are immediate to pods.

Check its deployed:

kubectl get serviceentry -n core-namespace

and double check the config:

kubectl get serviceentry -n core-namespace -o egress-mongo-atlas.yaml

All good.

Re-enable istio and retry

We re-added the istio sidecar annotation into our mytestpod.yaml file and redeployed. This time, even with Istio, the mongo connection worked fine and retrieved the data.

Other connectivity areas to check

Connectivity into GCP Mongo…Whitelists!

Defining a whitelist can be very tricky as providing a google IP range for our K8s applications is more or less impossible – or would cover most of the internet! The reason for this issue is that only the DNS is static for K8s services, so the IP for an application pod can change multiple times a day, meaning its more or less impossible to assign a strong IP whitelist bound to applications unless an automated check is implemented.

We have several different viewpoints here internally, ranging from we should have a structured IP whitelist to why do we even need a whitelist? Our Authentication and Encryption should be at a level that it can run securely with a IP Whitelist.

Dont forget client connectivity…

Bear in mind that you may have an on premise firewall between your laptop and Mongo Atlas and may need your network administrator to enable access for you to go outbound on 27017 to your Atlas Nodes.

Takeaway points

  • Isolate the issue quickly
    • We took the application out of the equation first
    • Then proved that it wasnt that specific google project by moving environments
    • and finally isolated a networking sidecar (istio)
  • Find an easy way to reproduce the issue
    • We created a simple bit of code to quickly test connectivity using the same language as the application (Python)
    • Created a dummy pod that we could quickly trash and recreate and test against BUT ensured the fundamentals of the application remained (Used the same image as the application for the dummy pod)
  • Have an environment where you can potentially break stuff. An infrastructure testing area of some kind.

How to list containers running in a kubernetes pod

Unfortunately, there isnt a simple get command that can be run to retrieve containers running inside pods. However, the custom-columns command can be exposed via kubectl to get this information out.

For more info on custom-columns visit the kubectl docs –

The following command will list pod name and container name for all pods in the default namespace

kubectl get pods,CONTAINERS:.spec.containers[*].name

The output is something like this:


You can also just get the container via the jsonpath option

kubectl get pods mongod-1 -o jsonpath='{.spec.containers[*].name}'

The reason I did a bit of research (5 minutes!) into this was because the container name is required when running the exec command in kubectl. eg:

kubectl exec -ti mongod-1 -c mongod-container bash

Installing MongoDB on a Mac with Homebrew

Homebrew is a great little utility to aid application installations and does a great job in packaging up and simplifying the install process.

To learn more about Homebrew visit or the Wiki page here

Installing MongoDB using Homebrew is pretty straightforward but some familiarity with the terminal is beneficial.

Open the Terminal app and type the following:

brew install mongodb

This should trigger a set of actions, of which the first will be a brew update

Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).

After updating itself, brew will download the latest mongodb package ‘bottle’ and install it using the default settings.

==> Downloading
######################################################################## 100.0%
==> Pouring mongodb-3.4.4.sierra.bottle.tar.gz
==> Caveats
To have launchd start mongodb now and restart at login:
  brew services start mongodb
Or, if you don't want/need a background service you can just run:
  mongod --config /usr/local/etc/mongod.conf
==> Summary
🍺  /usr/local/Cellar/mongodb/3.4.4: 17 files, 266.3MB

As this point, there is enough to get started with a mongodb instance, or to use the mongo shell to connect to a remote instance.

To start the instance locally without a service:

mongod --config /usr/local/etc/mongod.conf —fork

Then connect into the instance:

mongo —port 27017

Alterntively, to start as a brew service, run:

brew services start mongodb

Duplicate Key Error on local.slaves

We have been getting user assertion errors showing up on our 5 node replica set for a while.


Initially these assertion errors were not showing up in the mongo logs, so we enabled increased logging – details can be found here –

The assertion errors turned out to be related to the local.slaves collection:

[slaveTracking] User Assertion: 11000:E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId(‘4def89b415e7ee0aa29fd64b’) }
[slaveTracking] update local.slaves query: { _id: ObjectId(‘4def89b415e7ee0aa29fd64b’), host: "", ns: "" }
update: { $set: { syncedTo: Timestamp 1323652648000|784 } }
exception 11000 E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId(‘4def89b415e7ee0aa29fd64b’) } 0ms

Taken from Mongo Docs:

The duplicate key on local.slaves error, occurs when a secondary or slave changes its hostname and the primary or master tries to update its local.slaves collection with the new name. The update fails because it contains the same _id value as the document containing the previous hostname. The error itself will resemble the following.

This is a benign error and does not affect replication operations on the secondary or slave.

To prevent the error from appearing, drop the local.slaves collection from the primary or master, with the following sequence of operations in the mongo shell:

use local

This should resolve the assertion errors and the new config will be picked up next time the replica syncs:

use local

This topic is also discussed in Jira –

db.currentOp Queries in mongodb

Return active sessions running for more than x seconds:

  function(op) {
    if(op.secs_running > 5) printjson(op);

Waiting for a lock and not a read:

     if(d.waitingForLock && d.lockType != "read") 

Finding active writes:

     if( && d.lockType == "write") 

Finding active reads:

     if( && d.lockType == "read") 

Set additional logging and tracing in mongodb

logLevel Parameter

To set logging on an ad hoc basis, the parameter can be set in the admin database:

--Current log level:
use admin;
db.runCommand({ getParameter: 1, logLevel: 1 })

Logging can be set between 0 and 5, with 5 being the most verbose logging:

--Set log level to 3
use admin;
db.runCommand( { setParameter: 1, logLevel: 3 } )

LogLevel can also be set at instance startup in the mongod.conf under the systemLog.verbosity parameter:

For more details –

Database Profiling

The database profiler collects fine grained data about MongoDB write operations, cursors, database commands on a running mongod instance. You can enable profiling on a per-database or per-instance basis. The database profiling is also configurable when enabling profiling

--get the tracing level and current slow ops threshold

--set the profiling level to 2

See here for full profiling details –

// last few entries
show profile                                                     
// sort by natural order (time in)
// sort by slow queries first
// anything > 20ms                  
// single coll order by response time                      
// regular expression on namespace
db.system.profile.find( { "ns": / } ).sort({millis:-1,$ts:-1})
// anything thats moved    
// large scans                           
// anything doing range or full scans                 

Aggregation framework queries:

--response time by operation type
{ $group : { 
   _id :"$op", 
   "max response time":{$max:"$millis"},
   "avg response time":{$avg:"$millis"}
--slowest by namespace
{ $group : {
  _id :"$ns",
  "max response time":{$max:"$millis"}, 
  "avg response time":{$avg:"$millis"}  
{$sort: {
 "max response time":-1}
--slowest by client
{$group : { 
  _id :"$client", 
  "max response time":{$max:"$millis"}, 
  "avg response time":{$avg:"$millis"}  
{$sort: { 
  "max response time":-1}