How to use MOCO

After setting up MOCO, you can create MySQL clusters with a custom resource called MySQLCluster.

Basics

MOCO creates a cluster of mysqld instances for each MySQLCluster. A cluster can consists of 1, 3, or 5 mysqld instances.

MOCO configures semi-synchronous GTID-based replication between mysqld instances in a cluster if the cluster size is 3 or 5. A 3-instance cluster can tolerate up to 1 replica failure, and a 5-instance cluster can tolerate up to 2 replica failures.

In a cluster, there is only one instance called primary. The primary instance is the source of truth. It is the only writable instance in the cluster, and the source of the replication. All other instances are called replica. A replica is a read-only instance and replicates data from the primary.

Limitations

Errant replicas

An inherent limitation of GTID-based semi-synchronous replication is that a failed instance would have errant transactions. If this happens, the instance needs to be re-created by removing all data.

MOCO does not re-create such an instance. It only detects instances having errant transactions and excludes them from the cluster. Users need to monitor them and re-create the instances.

Read-only primary

MOCO from time to time sets the primary mysqld instance read-only for a switchover or other reasons. Applications that use MOCO MySQL need to be aware of this.

Creating clusters

Creating an empty cluster

An empty cluster always has a writable instance called the primary. All other instances are called replicas. Replicas are read-only and replicate data from the primary.

The following YAML is to create a three-instance cluster. It has an anti-affinity for Pods so that all instances will be scheduled to different Nodes. It also sets the limits for memory and CPU to make the Pod Guaranteed.

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: default
  name: test
spec:
  # replicas is the number of mysqld Pods.  The default is 1.
  replicas: 3
  podTemplate:
    spec:
      # Make the data directory writable. If moco-init fails with "Permission denied", uncomment the following settings.
      # securityContext:
      #   fsGroup: 10000
      #   fsGroupChangePolicy: "OnRootMismatch"  # available since k8s 1.20
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - mysql
              - key: app.kubernetes.io/instance
                operator: In
                values:
                - test
            topologyKey: "kubernetes.io/hostname"
      containers:
      # At least a container named "mysqld" must be defined.
      - name: mysqld
        image: ghcr.io/cybozu-go/moco/mysql:8.0.35
        # By limiting CPU and memory, Pods will have Guaranteed QoS class.
        # requests can be omitted; it will be set to the same value as limits.
        resources:
          limits:
            cpu: "10"
            memory: "10Gi"
  volumeClaimTemplates:
  # At least a PVC named "mysql-data" must be defined.
  - metadata:
      name: mysql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

By default, MOCO uses preferredDuringSchedulingIgnoredDuringExecution to prevent Pods from being placed on the same Node.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: moco-<MYSQLCLSTER_NAME>
  namespace: default
...
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                  - mysql
                - key: app.kubernetes.io/created-by
                  operator: In
                  values:
                  - moco
                - key: app.kubernetes.io/instance
                  operator: In
                  values:
                  - <MYSQLCLSTER_NAME>
              topologyKey: kubernetes.io/hostname
            weight: 100
...

There are other example manifests in examples directory.

The complete reference of MySQLCluster is crd_mysqlcluster_v1beta2.md.

Creating a cluster that replicates data from an external mysqld

Let's call the source mysqld instance donor.

First, make sure partial_revokes is enabled on the donor; Replicating data from the donor with partial_revokes disabled will result in replication inconsistencies or errors since MOCO uses partial_revokes functionality.

We use the clone plugin to copy the whole data quickly. After the cloning, MOCO needs to create some user accounts and install plugins.

On the donor, you need to install the plugin and create two user accounts as follows:

mysql> INSTALL PLUGIN clone SONAME 'mysql_clone.so';
mysql> CREATE USER 'clone-donor'@'%' IDENTIFIED BY 'xxxxxxxxxxx';
mysql> GRANT BACKUP_ADMIN, REPLICATION SLAVE ON *.* TO 'clone-donor'@'%';
mysql> CREATE USER 'clone-init'@'localhost' IDENTIFIED BY 'yyyyyyyyyyy';
mysql> GRANT ALL ON *.* TO 'clone-init'@'localhost' WITH GRANT OPTION;
mysql> GRANT PROXY ON ''@'' TO 'clone-init'@'localhost' WITH GRANT OPTION;

You may change the user names and should change their passwords.

Then create a Secret in the same namespace as MySQLCluster:

$ kubectl -n <namespace> create secret generic donor-secret \
    --from-literal=HOST=<donor-host> \
    --from-literal=PORT=<donor-port> \
    --from-literal=USER=clone-donor \
    --from-literal=PASSWORD=xxxxxxxxxxx \
    --from-literal=INIT_USER=clone-init \
    --from-literal=INIT_PASSWORD=yyyyyyyyyyy

You may change the secret name.

Finally, create MySQLCluster with spec.replicationSourceSecretName set to the Secret name as follows. The mysql image must be the same version as the donor's.

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: foo
  name: test
spec:
  replicationSourceSecretName: donor-secret
  podTemplate:
    spec:
      containers:
      - name: mysqld
        image: ghcr.io/cybozu-go/moco/mysql:8.0.35  # must be the same version as the donor
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

To stop the replication from the donor, update MySQLCluster with spec.replicationSourceSecretName: null.

Bring your own image

We provide pre-built MySQL container images at ghcr.io/cybozu-go/moco/mysql. If you want to build and use your own image, read custom-mysqld.md.

Configurations

The default and constant configuration values for mysqld are available on pkg.go.dev. The settings in ConstMycnf cannot be changed while the settings in DefaultMycnf can be overridden.

You can change the default values or set undefined values by creating a ConfigMap in the same namespace as MySQLCluster, and setting spec.mysqlConfigMapName in MySQLCluster to the name of the ConfigMap as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: foo
  name: mycnf
data:
  long_query_time: "5"
  innodb_buffer_pool_size: "10G"
---
apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: foo
  name: test
spec:
  # set this to the name of ConfigMap
  mysqlConfigMapName: mycnf
  ...

InnoDB buffer pool size

If innodb_buffer_pool_size is not specified, MOCO sets it automatically to 70% of the value of resources.requests.memory (or resources.limits.memory) for mysqld container.

If both resources.request.memory and resources.limits.memory are not set, innodb_buffer_pool_size will be set to 128M.

Opaque configuration

Some configuration variables cannot be fully configured with ConfigMap values. For example, --performance-schema-instrument needs to be specified multiple times.

You may set them through a special config key _include. The value of _include will be included in my.cnf as opaque.

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: foo
  name: mycnf
data:
  _include: |
    performance-schema-instrument='memory/%=ON'
    performance-schema-instrument='wait/synch/%/innodb/%=ON'
    performance-schema-instrument='wait/lock/table/sql/handler=OFF'
    performance-schema-instrument='wait/lock/metadata/sql/mdl=OFF'

Care must be taken not to overwrite critical configurations such as log_bin since MOCO does not check the contents from _include.

Using the cluster

kubectl moco

From outside of your Kubernetes cluster, you can access MOCO MySQL instances using kubectl-moco. kubectl-moco is a plugin for kubectl. Pre-built binaries are available on GitHub releases.

The following is an example to run mysql command interactively to access the primary instance of test MySQLCluster in foo namespace.

$ kubectl moco -n foo mysql -it test

Read the reference manual of kubectl-moco for further details and examples.

MySQL users

MOCO prepares a set of users.

  • moco-readonly can read all tables of all databases.
  • moco-writable can create users, databases, or tables.
  • moco-admin is the super user.

The exact privileges that moco-readonly has are:

  • PROCESS
  • REPLICATION CLIENT
  • REPLICATION SLAVE
  • SELECT
  • SHOW DATABASES
  • SHOW VIEW

The exact privileges that moco-writable has are:

  • ALTER
  • ALTER ROUTINE
  • CREATE
  • CREATE ROLE
  • CREATE ROUTINE
  • CREATE TEMPORARY TABLES
  • CREATE USER
  • CREATE VIEW
  • DELETE
  • DROP
  • DROP ROLE
  • EVENT
  • EXECUTE
  • INDEX
  • INSERT
  • LOCK TABLES
  • PROCESS
  • REFERENCES
  • REPLICATION CLIENT
  • REPLICATION SLAVE
  • SELECT
  • SHOW DATABASES
  • SHOW VIEW
  • TRIGGER
  • UPDATE

moco-writable cannot edit tables in mysql database, though.

You can create other users and grant them certain privileges as either moco-writable or moco-admin.

$ kubectl moco mysql -u moco-writable test -- -e "CREATE USER 'foo'@'%' IDENTIFIED BY 'bar'"
$ kubectl moco mysql -u moco-writable test -- -e "CREATE DATABASE db1"
$ kubectl moco mysql -u moco-writable test -- -e "GRANT ALL ON db1.* TO 'foo'@'%'"

Connecting to mysqld over network

MOCO prepares two Services for each MySQLCluster. For example, a MySQLCluster named test in foo Namespace has the following Services.

Service NameDNS NameDescription
moco-test-primarymoco-test-primary.foo.svcConnect to the primary instance.
moco-test-replicamoco-test-replica.foo.svcConnect to replica instances.

moco-test-replica can be used only for read access.

The type of these Services is usually ClusterIP. The following is an example to change Service type to LoadBalancer and add an annotation for MetalLB.

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: foo
  name: test
spec:
  primaryServiceTemplate:
    metadata:
      annotations:
        metallb.universe.tf/address-pool: production-public-ips
    spec:
      type: LoadBalancer
...

Backup and restore

MOCO can take full and incremental backups regularly. The backup data are stored in Amazon S3 compatible object storages.

You can restore data from a backup to a new MySQL cluster.

Object storage bucket

Bucket is a management unit of objects in S3. MOCO stores backups in a specified bucket.

MOCO does not remove backups. To remove old backups automatically, you can set a lifecycle configuration to the bucket.

ref: Setting lifecycle configuration on a bucket

A bucket can be shared safely across multiple MySQLClusters. Object keys are prefixed with moco/.

BackupPolicy

BackupPolicy is a custom resource to define a policy for taking backups.

The following is an example BackupPolicy to take a backup every day and store data in MinIO:

apiVersion: moco.cybozu.com/v1beta2
kind: BackupPolicy
metadata:
  namespace: backup
  name: daily
spec:
  # Backup schedule.  Any CRON format is allowed.
  schedule: "@daily"

  jobConfig:
    # An existing ServiceAccount name is required.
    serviceAccountName: backup-owner
    env:
    - name: AWS_ACCESS_KEY_ID
      value: minioadmin
    - name: AWS_SECRET_ACCESS_KEY
      value: minioadmin

    # bucketName is required.  Other fields are optional.
    bucketConfig:
      bucketName: moco
      endpointURL: http://minio.default.svc:9000
      usePathStyle: true

    # MOCO uses a filesystem volume to store data temporarily.
    workVolume:
      # Using emptyDir as a working directory is NOT recommended.
      # The recommended way is to use generic ephemeral volume with a provisioner
      # that can provide enough capacity.
      # https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes
      emptyDir: {}

To enable backup for a MySQLCluster, reference the BackupPolicy name like this:

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: default
  name: foo
spec:
  backupPolicyName: daily  # The policy name
...

Note: If you want to specify the ObjectBucket name in a ConfigMap or Secret, you can use envFrom and specify the environment variable name in jobConfig.bucketConfig.bucketName as follows. This behavior is tested.

apiVersion: moco.cybozu.com/v1beta2
kind: BackupPolicy
metadata:
  namespace: backup
  name: daily
spec:
  jobConfig:
    bucketConfig:
      bucketName: "$(BUCKET_NAME)"
      endpointURL: http://minio.default.svc:9000
      usePathStyle: true
    envFrom:
    - configMapRef:
        name: bucket-name
...
---
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: backup
  name: bucket-name
data:
  BUCKET_NAME: moco

MOCO creates a CronJob for each MySQLCluster that has spec.backupPolicyName.

The CronJob's name is moco-backup- + the name of MySQLCluster. For the above example, a CronJob named moco-backup-foo is created in default namespace.

The following podAntiAffinity is set by default for CronJob. If you want to override it, set BackupPolicy.spec.jobConfig.affinity.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: moco-backup-foo
spec:
...
  jobTemplate:
    spec:
      template:
        spec:
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                        - key: app.kubernetes.io/name
                          operator: In
                          values:
                            - mysql-backup
                        - key: app.kubernetes.io/created-by
                          operator: In
                          values:
                            - moco
                    topologyKey: kubernetes.io/hostname
                  weight: 100
...

Credentials to access S3 bucket

Depending on your Kubernetes service provider and object storage, there are various ways to give credentials to access the object storage bucket.

For Amazon's Elastic Kubernetes Service (EKS) and S3 users, the easiest way is probably to use IAM Roles for Service Accounts (IRSA).

ref: IAM ROLES FOR SERVICE ACCOUNTS

Another popular way is to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables as shown in the above example.

Taking an emergency backup

You can take an emergency backup by creating a Job from the CronJob for backup.

$ kubectl create job --from=cronjob/moco-backup-foo emergency-backup

Restore

To restore data from a backup, create a new MyQLCluster with spec.restore field as follows:

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: backup
  name: target
spec:
  # restore field is not editable.
  # to modify parameters, delete and re-create MySQLCluster.
  restore:
    # The source MySQLCluster's name and namespace
    sourceName: source
    sourceNamespace: backup

    # The restore point-in-time in RFC3339 format.
    restorePoint: "2021-05-26T12:34:56Z"

    # jobConfig is the same in BackupPolicy
    jobConfig:
      serviceAccountName: backup-owner
      env:
      - name: AWS_ACCESS_KEY_ID
        value: minioadmin
      - name: AWS_SECRET_ACCESS_KEY
        value: minioadmin
      bucketConfig:
        bucketName: moco
        endpointURL: http://minio.default.svc:9000
        usePathStyle: true
      workVolume:
        emptyDir: {}
...

Further details

Read backup.md for further details.

Deleting the cluster

By deleting MySQLCluster, all resources including PersistentVolumeClaims generated from the templates are automatically removed.

If you want to keep the PersistentVolumeClaims, remove metadata.ownerReferences from them before you delete a MySQLCluster.

Status, metrics, and logs

Cluster status

You can see the health and availability status of MySQLCluster as follows:

$ kubectl get mysqlcluster
NAME   AVAILABLE   HEALTHY   PRIMARY   SYNCED REPLICAS   ERRANT REPLICAS
test   True        True      0         3
  • The cluster is available when the primary Pod is running and ready.
  • The cluster is healthy when there is no problems.
  • PRIMARY is the index of the current primary instance Pod.
  • SYNCED REPLICAS is the number of ready Pods.
  • ERRANT REPLICAS is the number of instances having errant transactions.

You can also use kubectl describe mysqlcluster to see the recent events on the cluster.

Pod status

MOCO adds mysqld containers a liveness probe and a readiness probe to check the replication status in addition to the process status.

A replica Pod is ready only when it is replicating data from the primary without a significant delay. The default threshold of the delay is 60 seconds. The threshold can be configured as follows.

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: foo
  name: test
spec:
  maxDelaySeconds: 180
  ...

Unready replica Pods are automatically excluded from the load-balancing targets so that users will not see too old data.

Metrics

MOCO provides a built-in support to collect and expose mysqld metrics using mysqld_exporter.

This is an example YAML to enable mysqld_exporter. spec.collectors is a list of mysqld_exporter flag names without collect. prefix.

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: foo
  name: test
spec:
  collectors:
  - engine_innodb_status
  - info_schema.innodb_metrics
  podTemplate:
    ...

See metrics.md for all available metrics and how to collect them using Prometheus.

Logs

Error logs from mysqld can be viewed as follows:

$ kubectl logs moco-test-0 mysqld

Slow logs from mysqld can be viewed as follows:

$ kubectl logs moco-test-0 slow-log

Maintenance

Increasing the number of instances in the cluster

Edit spec.replicas field of MySQLCluster:

apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: foo
  name: test
spec:
  replicas: 5
  ...

You can only increase the number of instances in a MySQLCluster from 1 to 3 or 5, or from 3 to 5. Decreasing the number of instances is not allowed.

Switchover

Switchover is an operation to change the live primary to one of the replicas.

MOCO automatically switch the primary when the Pod of the primary instance is to be deleted.

Users can manually trigger a switchover with kubectl moco switchover CLUSTER_NAME. Read kubectl-moco.md for details.

Failover

Failover is an operation to replace the dead primary with the most advanced replica. MOCO automatically does this as soon as it detects that the primary is down.

The most advanced replica is a replica who has retrieved the most up-to-date transaction from the dead primary. Since MOCO configures loss-less semi-synchronous replication, the failover is guaranteed not to lose any user data.

After a failover, the old primary may become an errant replica as described.

Upgrading mysql version

You can upgrade the MySQL version of a MySQL cluster as follows:

  1. Check that the cluster is healthy.
  2. Check release notes of MySQL for any incompatibilities between the current and the new versions.
  3. Edit the Pod template of the MySQLCluster and update mysqld container image:
apiVersion: moco.cybozu.com/v1beta2
kind: MySQLCluster
metadata:
  namespace: default
  name: test
spec:
      containers:
      - name: mysqld
        # Edit the next line
        image: ghcr.io/cybozu-go/moco/mysql:8.0.35

You are advised to make backups and/or create a replica cluster before starting the upgrade process. Read upgrading.md for further details.

Re-initializing an errant replica

Delete the PVC and Pod of the errant replica, like this:

$ kubectl delete --wait=false pvc mysql-data-moco-test-0
$ kubectl delete --grace-period=1 pods moco-test-0

Depending on your Kubernetes version, StatefulSet controller may create a pending Pod before PVC gets deleted. Delete such pending Pods until PVC is actually removed.

Stop Clustering and Reconciliation

In MOCO, you can optionally stop the clustering and reconciliation of a MySQLCluster.

To stop clustering and reconciliation, use the following commands.

$ kubectl moco stop clustering <CLSUTER_NAME>
$ kubectl moco stop reconciliation <CLSUTER_NAME>

To resume the stopped clustering and reconciliation, use the following commands.

$ kubectl moco start clustering <CLSUTER_NAME>
$ kubectl moco start reconciliation <CLSUTER_NAME>

You could use this feature in the following cases:

  1. To stop the replication of a MySQLCluster and perform a manual operation to align the GTID
    • Run the kubectl moco stop clustering command on the MySQLCluster where you want to stop the replication
  2. To suppress the full update of MySQLCluster that occurs during the upgrade of MOCO
    • Run the kubectl moco stop reconciliation command on the MySQLCluster on which you want to suppress the update

To check whether clustering and reconciliation are stopped, use kubectl get mysqlcluster. Moreover, while clustering is stopped, AVAILABLE and HEALTHY values will be Unknown.

$ kubectl get mysqlcluster
NAME   AVAILABLE   HEALTHY   PRIMARY   SYNCED REPLICAS   ERRANT REPLICAS   CLUSTERING ACTIVE   RECONCILE ACTIVE   LAST BACKUP
test   Unknown     Unknown   0         3                                   False               False              <no value>

The MOCO controller outputs the following metrics to indicate that clustering has been stopped. 1 if the cluster is clustering or reconciliation stopped, 0 otherwise.

moco_cluster_clustering_stopped{name="mycluster", namespace="mynamesapce"} 1
moco_cluster_reconciliation_stopped{name="mycluster", namespace="mynamesapce"} 1

During the stop of clustering, monitoring of the cluster from MOCO will be halted, and the value of the following metrics will become NaN.

moco_cluster_available{name="test",namespace="default"} NaN
moco_cluster_healthy{name="test",namespace="default"} NaN
moco_cluster_ready_replicas{name="test",namespace="default"} NaN
moco_cluster_errant_replicas{name="test",namespace="default"} NaN