In this post we are exploring how to run MongoDB in GKE using StatefulSets.
A StatefulSet manages Pods that are based on an identical container specification. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods.
First we create a new cluster - obviously, you can also use an existing cluster.
gcloud container clusters create mongodb
We will be using a replica set so that our data is highly available and
redundant. I'm using the MongoDB replica set sidecar from https://github.com/thesandlord/mongo-k8s-sidecar
A "sidecar" is a helper container that helps the main container run its jobs and tasks. In this case the MongoDB replica set.
We now create a new StorageClass for our MongoDB instances.
cat googlecloud_ssd.yaml kind: StorageClass apiVersion: storage.k8s.io/v1beta1 metadata: name: fast provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd kubectl apply -f googlecloud_ssd.yaml
Next up is creating a headless services for MongoDB. Basically, a headless service is one that doesn't includes load balancing. In combination with StatefulSets, this will give us individual DNS names to access our pods. In this way we can connect to all of our MongoDB nodes individually.
cat mongo-service.yaml
apiVersion: v1
kind: Service
metadata:
name: mongo
labels:
name: mongo
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
role: mongo
We are now deploying the StatefulSet, that runs the MongoDB workload and orchestrates our resources. Looking at the yaml, the first section describes the StatefulSet object.
As part of specs, the terminationGracePeriodSeconds is used to gracefully shutdown the pod when you scale down the number of replicas.
We then have the configurations for the two containers. The first one runs MongoDB with command line flags that configure the replica set name. It also mounts the persistent storage volume to /data/db: the location where MongoDB saves its data.
The second container runs the sidecar. This sidecar container will configure the MongoDB replica set automatically.
Finally, there is the volumeClaimTemplates. This is what talks to the StorageClass we created before to provision the volume. It provisions a 100 GB disk for each MongoDB replica.
cat mongo-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: mongo spec: serviceName: "mongo" replicas: 3 selector: matchLabels: role: mongo template: metadata: labels: role: mongo environment: test spec: terminationGracePeriodSeconds: 10 containers: - name: mongo image: mongo command: - mongod - "--replSet" - rs0 ports: - containerPort: 27017 volumeMounts: - name: mongo-persistent-storage mountPath: /data/db - name: mongo-sidecar image: cvallance/mongo-k8s-sidecar env: - name: MONGO_SIDECAR_POD_LABELS value: "role=mongo,environment=test" volumeClaimTemplates: - metadata: name: mongo-persistent-storage annotations: volume.beta.kubernetes.io/storage-class: "fast" spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 100Gi
We can now review the configuration from the GCP console
As we have our pods running, we connect to the first replica set member and initiate the replica set configuration.
kubectl exec -ti mongo-0 -- mongosh
rs.initiate()
Each pod in a StatefulSet backed by a headless service will have a stable DNS name. Following the following naming convention: <pod-name>.<service-name>
We can now connect to our DB from the application. Below the example of my connection string, including the replica set.
"mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/?replicaSet=rs0"