In this post we are exploring how to run MongoDB in GKE using StatefulSets. 

A StatefulSet manages Pods that are based on an identical container specification. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. 

First we create a new cluster - obviously, you can also use an existing cluster.

gcloud container clusters create mongodb


We will be using a replica set so that our data is highly available and redundant. I'm using the  MongoDB replica set sidecar from https://github.com/thesandlord/mongo-k8s-sidecar
A "sidecar" is a helper container that helps the main container run its jobs and tasks. In this case the MongoDB replica set.

We now create a new StorageClass for our MongoDB instances. 

cat googlecloud_ssd.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: fast
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd

kubectl apply -f googlecloud_ssd.yaml

Next up is creating a headless services for MongoDB. Basically, a headless service is one that doesn't includes load balancing. In combination with StatefulSets, this will give us individual DNS names to access our pods. In this way we can connect to all of our MongoDB nodes individually.

cat mongo-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mongo
  labels:
    name: mongo
spec:
  ports:
  - port: 27017
    targetPort: 27017
  clusterIP: None
  selector:
    role: mongo

We are now deploying the StatefulSet, that runs the MongoDB workload and orchestrates our resources. Looking at the yaml, the first section describes the StatefulSet object. 

As part of specs, the terminationGracePeriodSeconds is used to gracefully shutdown the pod when you scale down the number of replicas.
We then have the configurations for the two containers. The first one runs MongoDB with command line flags that configure the replica set name. It also mounts the persistent storage volume to /data/db: the location where MongoDB saves its data. 

The second container runs the sidecar. This sidecar container will configure the MongoDB replica set automatically. 

Finally, there is the volumeClaimTemplates. This is what talks to the StorageClass we created before to provision the volume. It provisions a 100 GB disk for each MongoDB replica.

cat mongo-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: "mongo"
  replicas: 3
  selector:
    matchLabels:
      role: mongo
  template:
    metadata:
      labels:
        role: mongo
        environment: test
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: mongo
          image: mongo
          command:
            - mongod
            - "--replSet"
            - rs0
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: mongo-persistent-storage
              mountPath: /data/db
        - name: mongo-sidecar
          image: cvallance/mongo-k8s-sidecar
          env:
            - name: MONGO_SIDECAR_POD_LABELS
              value: "role=mongo,environment=test"
  volumeClaimTemplates:
  - metadata:
      name: mongo-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: "fast"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 100Gi

We can now review the configuration from the GCP console




As we have our pods running, we connect to the first replica set member and initiate the replica set configuration.

kubectl exec -ti mongo-0 -- mongosh
rs.initiate()

Each pod in a StatefulSet backed by a headless service will have a stable DNS name. Following the following naming convention:  <pod-name>.<service-name>


We can now connect to our DB from the application. Below the example of my connection string, including the replica set.

"mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/?replicaSet=rs0"