Severalnines - galera cluster

Galera Cluster, with its (virtually) synchronous replication, is commonly used in many different types of environments. Scaling it by adding new nodes is not hard (or just as simple a couple of clicks when you use ClusterControl).

The main problem with synchronous replication is, well, the synchronous part which often results in that the whole cluster being only as fast as its slowest node. Any write executed on a cluster has to be replicated to all of the nodes and certified on them. If, for whatever reason, this process slows down, it can seriously impact the cluster’s ability to accommodate writes. Flow control will then kick in, this is in order to ensure that the slowest node can still keep up with the load. This makes it quite tricky for some of the common scenarios that happen in a real world environment.

First off, let’s discuss geographically distributed disaster recovery. Sure, you can run clusters across a Wide Area Network, but the increased latency will have a significant impact on the cluster’s performance. This seriously limits the ability of using such a setup, especially over longer distances when latency is higher.

Another quite common use case - a test environment for major version upgrade. It is not a good idea to mix different versions of MariaDB Galera Cluster nodes in the same cluster, even if it is possible. On the other hand, migration to the more recent version requires detailed tests. Ideally, both reads and writes would have been tested. One way to achieve that is to create a separate Galera cluster and run the tests, but you would like to run tests in an environment as close to production as possible. Once provisioned, a cluster can be used for tests with real world queries but it would be hard to generate a workload which would be close to that of production. You cannot move some of the production traffic to such test system, this is because the data is not current.

Finally, the migration itself. Again, what we said earlier, even if it is possible to mix old and new versions of Galera nodes in the same cluster, it is not the safest way to do it.

Luckily, the simplest solution for all those three issues would be to connect separate Galera clusters with an asynchronous replication. What makes it such a good solution? Well, it’s asynchronous which makes it not affect the Galera replication. There is no flow control, thus the performance of the “master” cluster will not be affected by the performance of the “slave” cluster. As with every asynchronous replication, a lag may show up, but as long as it stays within acceptable limits, it can work perfectly fine. You also have to keep in mind that nowadays asynchronous replication can be parallelized (multiple threads can work together to increase bandwidth) and reduce replication lag even further.

In this blog post we will discuss what are the steps to deploy asynchronous replication between MariaDB Galera clusters.

How to Configure Asynchronous Replication Between MariaDB Galera Clusters?

First off we have to have to deploy a cluster. For our purposes we setup a three node cluster. We will keep the setup to the minimum, thus we will not discuss the complexity of the application and proxy layer. Proxy layer may be very useful for handling tasks for which you want to deploy asynchronous replication - redirecting a subset of the read-only traffic to the test cluster, helping in the disaster recovery situation when the “main” cluster is not available by redirecting the traffic to the DR cluster. There are numerous proxies you can try, depending on your preference - HAProxy, MaxScale or ProxySQL - all can be used in such setups and, depending on the case, some of them may be able to help you manage your traffic.

Configuring the Source Cluster

Our cluster consists of three MariaDB 10.3 nodes, we also deployed ProxySQL to do the read-write split and distribute the traffic across all nodes in the cluster. This is not a production-grade deployment, for that we would have to deploy more ProxySQL nodes and a Keepalived on top of them. It is still enough for our purposes. To set up asynchronous replication we will have to have a binary log enabled on our cluster. At least one node but it’s better to keep it enabled on all of them in case the only node with binlog enabled go down - then you want to have another node in the cluster up and running that you can slave off.

When enabling binary log, make sure that you configure the binary log rotation so the old logs will be removed at some point. You will use ROW binary log format. You should also ensure that you have GTID configured and in use - it will come very handy when you would have to reslave your “slave” cluster or if you would need to enable multi-threaded replication. As this is a Galera cluster, you want to have ‘wsrep_gtid_domain_id’ configured and ‘wsrep_gtid_mode’ enabled. Those settings will ensure that GTID’s will be generated for the traffic coming from the Galera cluster. More information can be found in the documentation. Once this is all done, you can proceed with setting up the second cluster.

Setting Up the Target Cluster

Given that currently there is no target cluster, we have to start with deploying it. We will not cover those steps in detail, you can find instructions in the documentation. Generally speaking the process consists of several steps:

Configure MariaDB repositories
Install MariaDB 10.3 packages
Configure nodes to form a cluster

At the beginning we will start with just one node. You can setup all of them to form a cluster but then you should stop them and use just one for the next step. That one node will become a slave to the original cluster. We will use mariabackup to provision it. Then we will configure the replication.

First, we have to create a directory where we will store the backup:

mkdir /mnt/mariabackup

Then we execute the backup and create it in the directory prepared in the step above. Please make sure you use the correct user and password to connect to the database:

mariabackup --backup --user=root --password=pass --target-dir=/mnt/mariabackup/

Next, we have to copy the backup files to the first node in the second cluster. We used scp for that, you can use whatever you like - rsync, netcat, anything which will work.

scp -r /mnt/mariabackup/* 10.0.0.104:/root/mariabackup/

After the backup has been copied, we have to prepare it by applying the log files:

mariabackup --prepare --target-dir=/root/mariabackup/
mariabackup based on MariaDB server 10.3.16-MariaDB debian-linux-gnu (x86_64)
[00] 2019-06-24 08:35:39 cd to /root/mariabackup/
[00] 2019-06-24 08:35:39 This target seems to be not prepared yet.
[00] 2019-06-24 08:35:39 mariabackup: using the following InnoDB configuration for recovery:
[00] 2019-06-24 08:35:39 innodb_data_home_dir = .
[00] 2019-06-24 08:35:39 innodb_data_file_path = ibdata1:100M:autoextend
[00] 2019-06-24 08:35:39 innodb_log_group_home_dir = .
[00] 2019-06-24 08:35:39 InnoDB: Using Linux native AIO
[00] 2019-06-24 08:35:39 Starting InnoDB instance for recovery.
[00] 2019-06-24 08:35:39 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2019-06-24  8:35:39 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-06-24  8:35:39 0 [Note] InnoDB: Uses event mutexes
2019-06-24  8:35:39 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2019-06-24  8:35:39 0 [Note] InnoDB: Number of pools: 1
2019-06-24  8:35:39 0 [Note] InnoDB: Using SSE2 crc32 instructions
2019-06-24  8:35:39 0 [Note] InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
2019-06-24  8:35:39 0 [Note] InnoDB: Completed initialization of buffer pool
2019-06-24  8:35:39 0 [Note] InnoDB: page_cleaner coordinator priority: -20
2019-06-24  8:35:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=3448619491
2019-06-24  8:35:40 0 [Note] InnoDB: Starting final batch to recover 759 pages from redo log.
2019-06-24  8:35:40 0 [Note] InnoDB: Last binlog file '/var/lib/mysql-binlog/binlog.000003', position 865364970
[00] 2019-06-24 08:35:40 Last binlog file /var/lib/mysql-binlog/binlog.000003, position 865364970
[00] 2019-06-24 08:35:40 mariabackup: Recovered WSREP position: e79a3494-964f-11e9-8a5c-53809a3c5017:25740

[00] 2019-06-24 08:35:41 completed OK!

In case of any error you may have to re-execute the backup. If everything went ok, we can remove the old data and replace it with the backup information

rm -rf /var/lib/mysql/*
mariabackup --copy-back --target-dir=/root/mariabackup/
…
[01] 2019-06-24 08:37:06 Copying ./sbtest/sbtest10.frm to /var/lib/mysql/sbtest/sbtest10.frm
[01] 2019-06-24 08:37:06         ...done
[00] 2019-06-24 08:37:06 completed OK!

We also want to set the correct owner of the files:

chown -R mysql.mysql /var/lib/mysql/

We will be relying on GTID to keep the replication consistent thus we need to see what was the last applied GTID in this backup. That information can be found in xtrabackup_info file that’s part of the backup:

root@vagrant:~/mariabackup# cat /var/lib/mysql/xtrabackup_info | grep binlog_pos
binlog_pos = filename 'binlog.000003', position '865364970', GTID of the last change '9999-1002-23012'

We will also have to ensure that the slave node has binary logs enabled along with ‘log_slave_updates’. Ideally, this will be enabled on all of the nodes in the second cluster - just in case the “slave” node failed and you would have to set up the replication using another node in the slave cluster.

The last bit we need to do before we can set up the replication is to create an user which we will use to run the replication:

MariaDB [(none)]> CREATE USER 'repuser'@'10.0.0.104' IDENTIFIED BY 'reppass';
Query OK, 0 rows affected (0.077 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.*  TO 'repuser'@'10.0.0.104';
Query OK, 0 rows affected (0.012 sec)

That’s all we need. Now, we can start the first node in the second cluster, our to-be-slave:

galera_new_cluster

Once it’s started, we can enter MySQL CLI and configure it to become a slave, using the GITD position we found couple steps earlier:

mysql -ppass

MariaDB [(none)]> SET GLOBAL gtid_slave_pos = '9999-1002-23012';
Query OK, 0 rows affected (0.026 sec)

Once that’s done, we can finally set up the replication and start it:

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_PORT=3306, MASTER_USER='repuser', MASTER_PASSWORD='reppass', MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.016 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.010 sec)

At this point we have a Galera Cluster consisting of one node. That node is also a slave of the original cluster (in particular, its master is node 10.0.0.101). To join other nodes we will use SST but to make it work first we have to ensure that SST configuration is correct - please keep in mind that we just replaced all the users in our second cluster with the contents of the source cluster. What you have to do now is to ensure that ‘wsrep_sst_auth’ configuration of the second cluster matches the one of the first cluster. Once that’s done, you can start remaining nodes one by one and they should join the existing node (10.0.0.104), get the data over SST and form the Galera cluster. Eventually, you should end up with two clusters, three node each, with asynchronous replication link across them (from 10.0.0.101 to 10.0.0.104 in our example). You can confirm that the replication is working by checking the value of:

MariaDB [(none)]> show global status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 106   |
+----------------------+-------+
1 row in set (0.001 sec)

MariaDB [(none)]> show global status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 114   |
+----------------------+-------+
1 row in set (0.001 sec)

How to Configure Asynchronous Replication Between MariaDB Galera Clusters Using ClusterControl?

As of the time of this blog, ClusterControl does not have the functionality to configure asynchronous replication across multiple clusters, we are working on it as I type this. Nonetheless ClusterControl can be of great help in this process - we will show you how you can speed up the laborious manual steps using automation provided by ClusterControl.

From what we showed before, we can conclude that those are the general steps to take when setting up replication between two Galera clusters:

Deploy a new Galera cluster
Provision new cluster using data from the old one
Configure new cluster (SST configuration, binary logs)
Set up the replication between the old and the new cluster

First three points are something you can easily do using ClusterControl even now. We are going to show you how to do that.

Deploy and Provision a New MariaDB Galera Cluster Using ClusterControl

The initial situation is similar - we have one cluster up and running. We have to set up the second one. One of the more recent features of ClusterControl is an option to deploy a new cluster and provision it using the data from backup. This is very useful to create test environments, it is also an option we will use to provision our new cluster for the replication setup. Therefore the first step we will take is to create a backup using mariabackup:

Three steps in which we picked the node to take the backup off it. This node (10.0.0.101) will become a master. It has to have binary logs enabled. In our case all of the nodes have binlog enabled but if they hadn’t it’s very easy to enable it from the ClusterControl - we will show the steps later, when we will do it for the second cluster.

Once the backup is completed, it will become visible on the list. We can then proceed and restore it:

Should we want that, we could even do the Point-In-Time Recovery, but in our case it does not really matter: once the replication will be configured, all required transactions from binlogs will be applied on the new cluster.

Then we pick the option to create a cluster from the backup. This opens another dialog:

It is a confirmation which backup will be used, which host the backup was taken from, what method was used to create it and some metadata to help verify if the backup looks sound.

Then we basically go to the regular deployment wizard in which we have to define SSH connectivity between ClusterControl host and the nodes to deploy the cluster on (the requirement for ClusterControl) and, in the second step, vendor, version, password and nodes to deploy on:

That’s all regarding deployment and provisioning. ClusterControl will set up the new cluster and it will provision it using the data from the old one.

We can monitor the progress in the activity tab. Once completed, second cluster will show up on the cluster list in ClusterControl.

Reconfiguration of the New Cluster Using ClusterControl

Now, we have to reconfigure the cluster - we will enable binary logs. In the manual process we had to make changes in the wsrep_sst_auth config and also configuration entries in [mysqldump] and [xtrabackup] sections of the config. Those settings can be found in secrets-backup.cnf file. This time it is not needed as ClusterControl generated new passwords for the cluster and configured the files correctly. What is important to keep in mind, though, should you change the password of the ‘backupuser’@’127.0.0.1’ user in the original cluster, you will have to make configuration changes in the second cluster too to reflect that as changes in the first cluster will replicate to the second cluster.

Binary logs can be enabled from the Nodes section. You have to pick node by node and run “Enable Binary Logging” job. You will be presented with a dialog:

Here you can define how long you would like to keep the logs, where they should be stored and if ClusterControl should restart the node for you to apply changes - binary log configuration is not dynamic and MariaDB has to be restarted to apply those changes.

When the changes will complete you will see all nodes marked as “master”, which means that those nodes have binary log enabled and can act as master.

If we do not have replication user created already, we have to do that. In the first cluster we have to go to Manage -> Schemas and Users:

On the right hand side we have an option to create a new user:

This concludes the configuration required to set up the replication.

Setting up replication between clusters using ClusterControl

As we stated, we are working on automating this part. Currently it has to be done manually. As you may remember, we need GITD position of our backup and then run couple commands using MySQL CLI. GTID data is available in the backup. ClusterControl creates backup using xbstream/mbstream and it compresses it afterwards. Our backup is stored on the ClusterControl host where we don’t have access to mbstream binary. You can try to install it or you can copy the backup file to the location, where such binary is available:

scp /root/backups/BACKUP-2/ backup-full-2019-06-24_144329.xbstream.gz 10.0.0.104:/root/mariabackup/

Once that’s done, on 10.0.0.104 we want to check the contents of xtrabackup_info file:

cd /root/mariabackup
zcat backup-full-2019-06-24_144329.xbstream.gz | mbstream -x
root@vagrant:~/mariabackup# cat /root/mariabackup/xtrabackup_info | grep binlog_pos
binlog_pos = filename 'binlog.000007', position '379', GTID of the last change '9999-1002-846116'

Finally, we configure the replication and start it:

MariaDB [(none)]> SET GLOBAL gtid_slave_pos ='9999-1002-846116';
Query OK, 0 rows affected (0.024 sec)

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_PORT=3306, MASTER_USER='repuser', MASTER_PASSWORD='reppass', MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.024 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.010 sec)

This is it - we just configured asynchronous replication between two MariaDB Galera clusters using ClusterControl. As you could have seen, ClusterControl was able to automate the majority of the steps we had to take in order to set up this environment.

Tags:

clustercontrol

MariaDB

mariadb cluster

galera cluster

galera

ProxySQL has supported native clustering since v1.4.2. This means multiple ProxySQL instances are cluster-aware; they are aware of each others' state and able to handle the configuration changes automatically by syncing up to the most up-to-date configuration based on configuration version, timestamp and checksum value. Check out this blog post which demonstrates how to configure clustering support for ProxySQL and how you could expect it to behave.

ProxySQL is a decentralized proxy, recommended to be deployed closer to the application. This approach scales pretty well even up to hundreds of nodes, as it was designed to be easily reconfigurable at runtime. To efficiently manage multiple ProxySQL nodes, one has to make sure whatever changes performed on one of the nodes should be applied across all nodes in the farm. Without native clustering, one has to manually export the configurations and import them to the other nodes (albeit, you could automate this by yourself).

In the previous blog post, we have covered ProxySQL clustering via Kubernetes ConfigMap. This approach is more or less pretty efficient with the centralized configuration approach in ConfigMap. Whatever loaded into ConfigMap will be mounted into pods. Updating the configuration can be done via versioning (modify the proxysql.cnf content and load it into ConfigMap with another name) and then push to the pods depending on the Deployment method scheduling and update strategy.

However, in a rapidly changing environment, this ConfigMap approach is probably not the best method because in order to load the new configuration, pod rescheduling is required to remount the ConfigMap volume and this might jeopardize the ProxySQL service as a whole. For example, let's say in our environment, our strict password policy requires to force MySQL user password expiration for every 7 days, which we would have to keep updating the ProxySQL ConfigMap for the new password on a weekly basis. As a side note, MySQL user inside ProxySQL requires user and password to match the one on the backend MySQL servers. That's where we should start making use of ProxySQL native clustering support in Kubernetes, to automatically apply the configuration changes without the hassle of ConfigMap versioning and pod rescheduling.

In this blog post, we’ll show you how to run ProxySQL native clustering with headless service on Kubernetes. Our high-level architecture can be illustrated as below:

We have 3 Galera nodes running on bare-metal infrastructure deployed and managed by ClusterControl:

192.168.0.21
192.168.0.22
192.168.0.23

Our applications are all running as pods within Kubernetes. The idea is to introduce two ProxySQL instances in between the application and our database cluster to serve as a reverse proxy. Applications will then connect to ProxySQL pods via Kubernetes service which will be load balanced and failover across a number of ProxySQL replicas.

The following is a summary of our Kubernetes setup:

root@kube1:~# kubectl get nodes -o wide
NAME    STATUS   ROLES    AGE     VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kube1   Ready    master   5m      v1.15.1   192.168.100.201   <none>        Ubuntu 18.04.1 LTS   4.15.0-39-generic   docker://18.9.7
kube2   Ready    <none>   4m1s    v1.15.1   192.168.100.202   <none>        Ubuntu 18.04.1 LTS   4.15.0-39-generic   docker://18.9.7
kube3   Ready    <none>   3m42s   v1.15.1   192.168.100.203   <none>        Ubuntu 18.04.1 LTS   4.15.0-39-generic   docker://18.9.7

ProxySQL Configuration via ConfigMap

Let's first prepare our base configuration which will be loaded into ConfigMap. Create a file called proxysql.cnf and add the following lines:

datadir="/var/lib/proxysql"

admin_variables=
{
    admin_credentials="proxysql-admin:adminpassw0rd;cluster1:secret1pass"
    mysql_ifaces="0.0.0.0:6032"
    refresh_interval=2000
    cluster_username="cluster1"
    cluster_password="secret1pass"
    cluster_check_interval_ms=200
    cluster_check_status_frequency=100
    cluster_mysql_query_rules_save_to_disk=true
    cluster_mysql_servers_save_to_disk=true
    cluster_mysql_users_save_to_disk=true
    cluster_proxysql_servers_save_to_disk=true
    cluster_mysql_query_rules_diffs_before_sync=3
    cluster_mysql_servers_diffs_before_sync=3
    cluster_mysql_users_diffs_before_sync=3
    cluster_proxysql_servers_diffs_before_sync=3
}

mysql_variables=
{
    threads=4
    max_connections=2048
    default_query_delay=0
    default_query_timeout=36000000
    have_compress=true
    poll_timeout=2000
    interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
    default_schema="information_schema"
    stacksize=1048576
    server_version="5.1.30"
    connect_timeout_server=10000
    monitor_history=60000
    monitor_connect_interval=200000
    monitor_ping_interval=200000
    ping_interval_server_msec=10000
    ping_timeout_server=200
    commands_stats=true
    sessions_sort=true
    monitor_username="proxysql"
    monitor_password="proxysqlpassw0rd"
    monitor_galera_healthcheck_interval=2000
    monitor_galera_healthcheck_timeout=800
}

mysql_galera_hostgroups =
(
    {
        writer_hostgroup=10
        backup_writer_hostgroup=20
        reader_hostgroup=30
        offline_hostgroup=9999
        max_writers=1
        writer_is_also_reader=1
        max_transactions_behind=30
        active=1
    }
)

mysql_servers =
(
    { address="192.168.0.21" , port=3306 , hostgroup=10, max_connections=100 },
    { address="192.168.0.22" , port=3306 , hostgroup=10, max_connections=100 },
    { address="192.168.0.23" , port=3306 , hostgroup=10, max_connections=100 }
)

mysql_query_rules =
(
    {
        rule_id=100
        active=1
        match_pattern="^SELECT .* FOR UPDATE"
        destination_hostgroup=10
        apply=1
    },
    {
        rule_id=200
        active=1
        match_pattern="^SELECT .*"
        destination_hostgroup=20
        apply=1
    },
    {
        rule_id=300
        active=1
        match_pattern=".*"
        destination_hostgroup=10
        apply=1
    }
)

mysql_users =
(
    { username = "wordpress", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 },
    { username = "sbtest", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 }
)

proxysql_servers =
(
    { hostname = "proxysql-0.proxysqlcluster", port = 6032, weight = 1 },
    { hostname = "proxysql-1.proxysqlcluster", port = 6032, weight = 1 }
)

Some of the above configuration lines are explained per section below:

admin_variables

Pay attention on the admin_credentials variable where we used non-default user which is "proxysql-admin". ProxySQL reserves the default "admin" user for local connection via localhost only. Therefore, we have to use other users to access the ProxySQL instance remotely. Otherwise, you would get the following error:

ERROR 1040 (42000): User 'admin' can only connect locally

We also appended the cluster_username and cluster_password value in the admin_credentials line, separated by semicolon to allow automatic syncing to happen. All variables prefixed with cluster_* are related to ProxySQL native clustering and are self-explanatory.

mysql_galera_hostgroups

This is a new directive introduced for ProxySQL 2.x (our ProxySQL image is running on 2.0.5). If you would like to run on ProxySQL 1.x, do remove this part and use scheduler table instead. We already explained the configuration details in this blog post, How to Run and Configure ProxySQL 2.0 for MySQL Galera Cluster on Docker under "ProxySQL 2.x Support for Galera Cluster".

mysql_servers

All lines are self-explanatory, which is based on three database servers running in MySQL Galera Cluster as summarized in the following Topology screenshot taken from ClusterControl:

proxysql_servers

Here we define a list of ProxySQL peers:

hostname - Peer's hostname/IP address
port - Peer's admin port
weight - Currently unused, but in the roadmap for future enhancements
comment - Free form comment field

In Docker/Kubernetes environment, there are multiple ways to discover and link up container hostnames or IP addresses and insert them into this table, either by using ConfigMap, manual insert, via entrypoint.sh scripting, environment variables or some other means. In Kubernetes, depending on the ReplicationController or Deployment method used, guessing the pod's resolvable hostname in advanced is somewhat tricky unless if you are running on StatefulSet.

Check out this tutorial on StatefulState pod ordinal index which provides a stable resolvable hostname for the created pods. Combine this with headless service (explained further down), the resolvable hostname format would be:

{app_name}-{index_number}.{service}

Where {service} is a headless service, which explains where "proxysql-0.proxysqlcluster" and "proxysql-1.proxysqlcluster" come from. If you want to have more than 2 replicas, add more entries accordingly by appending an ascending index number relative to the StatefulSet application name.

Now we are ready to push the configuration file into ConfigMap, which will be mounted into every ProxySQL pod during deployment:

$ kubectl create configmap proxysql-configmap --from-file=proxysql.cnf

Verify if our ConfigMap is loaded correctly:

$ kubectl get configmap
NAME                 DATA   AGE
proxysql-configmap   1      7h57m

Creating ProxySQL Monitoring User

The next step before we start the deployment is to create ProxySQL monitoring user in our database cluster. Since we are running on Galera cluster, run the following statements on one of the Galera nodes:

mysql> CREATE USER 'proxysql'@'%' IDENTIFIED BY 'proxysqlpassw0rd';
mysql> GRANT USAGE ON *.* TO 'proxysql'@'%';

If you haven't created the MySQL users (as specified under mysql_users section above), we have to create them as well:

mysql> CREATE USER 'wordpress'@'%' IDENTIFIED BY 'passw0rd';
mysql> GRANT ALL PRIVILEGES ON wordpress.* TO 'wordpress'@'%';
mysql> CREATE USER 'sbtest'@'%' IDENTIFIED BY 'passw0rd';
mysql> GRANT ALL PRIVILEGES ON sbtest.* TO 'proxysql'@'%';

That's it. We are now ready to start the deployment.

Deploying a StatefulSet

We will start by creating two ProxySQL instances, or replicas for redundancy purposes using StatefulSet.

Let's start by creating a text file called proxysql-ss-svc.yml and add the following lines:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: proxysql
  labels:
    app: proxysql
spec:
  replicas: 2
  serviceName: proxysqlcluster
  selector:
    matchLabels:
      app: proxysql
      tier: frontend
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: proxysql
        tier: frontend
    spec:
      restartPolicy: Always
      containers:
      - image: severalnines/proxysql:2.0.4
        name: proxysql
        volumeMounts:
        - name: proxysql-config
          mountPath: /etc/proxysql.cnf
          subPath: proxysql.cnf
        ports:
        - containerPort: 6033
          name: proxysql-mysql
        - containerPort: 6032
          name: proxysql-admin
      volumes:
      - name: proxysql-config
        configMap:
          name: proxysql-configmap
---
apiVersion: v1
kind: Service
metadata:
  annotations:
  labels:
    app: proxysql
    tier: frontend
  name: proxysql
spec:
  ports:
  - name: proxysql-mysql
    port: 6033
    protocol: TCP
    targetPort: 6033
  - name: proxysql-admin
    nodePort: 30032
    port: 6032
    protocol: TCP
    targetPort: 6032
  selector:
    app: proxysql
    tier: frontend
  type: NodePort

There are two sections of the above definition - StatefulSet and Service. The StatefulSet is the definition of our pods, or replicas and the mount point for our ConfigMap volume, loaded from proxysql-configmap. The next section is the service definition, where we define how the pods should be exposed and routed for internal or external network.

Verify the pod and service states:

$ kubectl get pods,svc
NAME             READY   STATUS    RESTARTS   AGE
pod/proxysql-0   1/1     Running   0          4m46s
pod/proxysql-1   1/1     Running   0          2m59s

NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                         AGE
service/kubernetes        ClusterIP   10.96.0.1        <none>        443/TCP                         10h
service/proxysql          NodePort    10.111.240.193   <none>        6033:30314/TCP,6032:30032/TCP   5m28s

If you look at the pod's log, you would notice we got flooded with this warning:

$ kubectl logs -f proxysql-0
...
2019-08-01 19:06:18 ProxySQL_Cluster.cpp:215:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer proxysql-1.proxysqlcluster:6032 . Error: Unknown MySQL server host 'proxysql-1.proxysqlcluster' (0)

The above simply means proxysql-0 was unable to resolve "proxysql-1.proxysqlcluster" and connect to it, which is expected since we haven't created our headless service for DNS records that is going to be needed for inter-ProxySQL communication.

Kubernetes Headless Service

In order for ProxySQL pods to be able to resolve the anticipated FQDN and connect to it directly, the resolving process must be able to lookup the assigned target pod IP address and not the virtual IP address. This is where headless service comes into the picture. When creating a headless service by setting "clusterIP=None", no load-balancing is configured and no cluster IP (virtual IP) is allocated for this service. Only DNS is automatically configured. When you run a DNS query for headless service, you will get the list of the pods IP addresses.

Here is what it looks like if we look up the headless service DNS records for "proxysqlcluster" (in this example we had 3 ProxySQL instances):

$ host proxysqlcluster
proxysqlcluster.default.svc.cluster.local has address 10.40.0.2
proxysqlcluster.default.svc.cluster.local has address 10.40.0.3
proxysqlcluster.default.svc.cluster.local has address 10.32.0.2

While, the following output shows the DNS record for the standard service called "proxysql" which resolves to the clusterIP:

$ host proxysql
proxysql.default.svc.cluster.local has address 10.110.38.154

To create a headless service and attach it to the pods, one has to define the ServiceName inside the StatefulSet declaration, and the Service definition must have "clusterIP=None" as shown below. Create a text file called proxysql-headless-svc.yml and add the following lines:

apiVersion: v1
kind: Service
metadata:
  name: proxysqlcluster
  labels:
    app: proxysql
spec:
  clusterIP: None
  ports:
  - port: 6032
    name: proxysql-admin
  selector:
    app: proxysql

Create the headless service:

$ kubectl create -f proxysql-headless-svc.yml

Just for verification, at this point, we have the following services running:

$ kubectl get svc
NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
kubernetes        ClusterIP   10.96.0.1       <none>        443/TCP                         8h
proxysql          NodePort    10.110.38.154   <none>        6033:30200/TCP,6032:30032/TCP   23m
proxysqlcluster   ClusterIP   None            <none>        6032/TCP                        4s

Now, check out one of our pod's log:

$ kubectl logs -f proxysql-0
...
2019-08-01 19:06:19 ProxySQL_Cluster.cpp:215:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer proxysql-1.proxysqlcluster:6032 . Error: Unknown MySQL server host 'proxysql-1.proxysqlcluster' (0)
2019-08-01 19:06:19 [INFO] Cluster: detected a new checksum for mysql_query_rules from peer proxysql-1.proxysqlcluster:6032, version 1, epoch 1564686376, checksum 0x3FEC69A5C9D96848 . Not syncing yet ...
2019-08-01 19:06:19 [INFO] Cluster: checksum for mysql_query_rules from peer proxysql-1.proxysqlcluster:6032 matches with local checksum 0x3FEC69A5C9D96848 , we won't sync.

You would notice the Cluster component is able to resolve, connect and detect a new checksum from the other peer, proxysql-1.proxysqlcluster on port 6032 via the headless service called "proxysqlcluster". Note that this service exposes port 6032 within Kubernetes network only, hence it is unreachable externally.

At this point, our deployment is now complete.

Connecting to ProxySQL

There are several ways to connect to ProxySQL services. The load-balanced MySQL connections should be sent to port 6033 from within Kubernetes network and use port 30033 if the client is connecting from an external network.

To connect to the ProxySQL admin interface from external network, we can connect to the port defined under NodePort section, 30032 (192.168.100.203 is the primary IP address of host kube3.local):

$ mysql -uproxysql-admin -padminpassw0rd -h192.168.100.203 -P30032

Use the clusterIP 10.110.38.154 (defined under "proxysql" service) on port 6032 if you want to access it from other pods in Kubernetes network.

Then perform the ProxySQL configuration changes as you wish and load them to runtime:

mysql> INSERT INTO mysql_users (username,password,default_hostgroup) VALUES ('newuser','passw0rd',10);
mysql> LOAD MYSQL USERS TO RUNTIME;

You will notice the following lines in one of the pods indicating the configuration syncing completes:

$ kubectl logs -f proxysql-0
...
2019-08-02 03:53:48 [INFO] Cluster: detected a peer proxysql-1.proxysqlcluster:6032 with mysql_users version 2, epoch 1564718027, diff_check 4. Own version: 1, epoch: 1564714803. Proceeding with remote sync
2019-08-02 03:53:48 [INFO] Cluster: detected peer proxysql-1.proxysqlcluster:6032 with mysql_users version 2, epoch 1564718027
2019-08-02 03:53:48 [INFO] Cluster: Fetching MySQL Users from peer proxysql-1.proxysqlcluster:6032 started
2019-08-02 03:53:48 [INFO] Cluster: Fetching MySQL Users from peer proxysql-1.proxysqlcluster:6032 completed

Keep in mind that the automatic syncing only happens if there is a configuration change in ProxySQL runtime. Therefore, it's vital to run "LOAD ... TO RUNTIME" statement before you can see the action. Don't forget to save the ProxySQL changes into the disk for persistency:

mysql> SAVE MYSQL USERS TO DISK;

Limitation

Note that there is a limitation to this setup due to ProxySQL does not support saving/exporting the active configuration into a text configuration file that we could use later on to load into ConfigMap for persistency. There is a feature request for this. Meanwhile, you could push the modifications to ConfigMap manually. Otherwise, if the pods were accidentally deleted, you would lose your current configuration because the new pods would be bootstrapped by whatever defined in the ConfigMap.

Special thanks to Sampath Kamineni, who sparked the idea of this blog post and provide insights about the use cases and implementation.

Tags:

Deployment and management your database environment can be a tedious task. It's very common nowadays to use tools for automating your deployment to make these tasks easier. Automation solutions such as Chef, Puppet, Ansible, or SaltStack are just some of the ways to achieve these goals.

This blog will show you how to use Puppet to deploy a Galera Cluster (specifically Percona XtraDB Cluster or PXC) utilizing ClusterControl Puppet Modules. This module makes the deployment, setup, and configuration easier than coding yourself from scratch. You may also want to check out one of our previous blogs about deploying a Galera Cluster using Chef, “How to Automate Deployment of MySQL Galera Cluster Using S9S CLI and Chef.”

Our S9S CLI tools are designed to be used in the terminal (or console) and can be utilized to automatically deploy databases. In this blog, we'll show you how to do deploy a Percona XtraDB Cluster on AWS using Puppet, using ClusterControl and its s9s CLI tools to help automate the job.

Installation and Setup For The Puppet Master and Agent Nodes

On this blog, I used Ubuntu 16.04 Xenial as the target Linux OS for this setup. It might be an old OS version for you, but we know it works with RHEL/CentOS and Debian/Ubuntu recent versions of the OS. I have two nodes that I used on this setup locally with the following host/IP:

Master Hosts:

IP = 192.168.40.200

Hostname = master.puppet.local

Agent Hosts:

IP = 192.168.40.20

Hostname = clustercontrol.puppet.local

Let's go over through the steps.

1) Setup the Master

## Install the packages required

wget https://apt.puppetlabs.com/puppet6-release-xenial.deb

sudo dpkg -i puppet6-release-xenial.deb

sudo apt update

sudo apt install -y puppetserver

## Now, let's do some minor configuration for Puppet

sudo vi /etc/default/puppetserver

## edit from

JAVA_ARGS="-Xms2g -Xmx2g -Djruby.logger.class=com.puppetlabs.jruby_utils.jruby.Slf4jLogger"

## to

JAVA_ARGS="-Xms512m -Xmx512m -Djruby.logger.class=com.puppetlabs.jruby_utils.jruby.Slf4jLogger"

## add alias hostnames in /etc/hosts

sudo vi /etc/hosts

## and add

192.168.40.10 client.puppet.local

192.168.40.200 server.puppet.local

## edit the config for server settings.

sudo vi /etc/puppetlabs/puppet/puppet.conf

## This can be depending on your setup so you might approach it differently than below.

[master]

vardir = /opt/puppetlabs/server/data/puppetserver

logdir = /var/log/puppetlabs/puppetserver

rundir = /var/run/puppetlabs/puppetserver

pidfile = /var/run/puppetlabs/puppetserver/puppetserver.pid

codedir = /etc/puppetlabs/code



dns_alt_names = master.puppet.local,master



[main]

certname = master.puppet.local

server = master.puppet.local 

environment = production

runinterval = 15m

## Generate a root and intermediate signing CA for Puppet Server

sudo /opt/puppetlabs/bin/puppetserver ca setup

## start puppet server

sudo systemctl start puppetserver

sudo systemctl enable puppetserver

2) Setup the Agent/Client Node

## Install the packages required

wget https://apt.puppetlabs.com/puppet6-release-xenial.deb

sudo dpkg -i puppet6-release-xenial.deb

sudo apt update

sudo apt install -y puppet-agent

## Edit the config settings for puppet client

sudo vi /etc/puppetlabs/puppet/puppet.conf

And add the example configuration below,

[main]

certname = clustercontrol.puppet.local

server = master.puppet.local

environment = production

runinterval = 15m

3) Authenticating (or Signing the Certificate Request) for Master/Client Communication

## Go back to the master node and run the following to view the view outstanding requests.

sudo /opt/puppetlabs/bin/puppetserver ca list

## The Result

Requested Certificates:

    clustercontrol.puppet.local   (SHA256) 0C:BA:9D:A8:55:75:30:27:31:05:6D:F1:8C:CD:EE:D7:1F:3C:0D:D8:BD:D3:68:F3:DA:84:F1:DE:FC:CD:9A:E1

## sign a request from agent/client

sudo /opt/puppetlabs/bin/puppetserver ca sign --certname clustercontrol.puppet.local

## The Result

Successfully signed certificate request for clustercontrol.puppet.local

## or you can also sign all request

sudo /opt/puppetlabs/bin/puppetserver ca sign --all

## in case you want to revoke, just do

sudo /opt/puppetlabs/bin/puppetserver ca revoke --certname <AGENT_NAME>

## to list all unsigned,

sudo /opt/puppetlabs/bin/puppetserver ca list --all

## Then verify or test in the client node,

## verify/test puppet agent

sudo /opt/puppetlabs/bin/puppet agent --test

Scripting Your Puppet Manifests and Setting up the ClusterControl Puppet Module

Our ClusterControl Puppet module can be downloaded here https://github.com/severalnines/puppet. Otherwise, you can also easily grab the Puppet Module from Puppet-Forge. We're regularly updating and modifying the Puppet Module, so we suggest you grab the github copy to ensure the most up-to-date version of the script.

You should also take into account that our Puppet Module is tested on CentOS/Ubuntu running with the most updated version of Puppet (6.7.x.). For this blog, the Puppet Module is tailored to work with the most recent release of ClusterControl (which as of this writing is 1.7.3). In case you missed it, you can check out our releases and patch releases over here.

1) Setup the ClusterControl Module in the Master Node

# Download from github and move the file to the module location of Puppet:

wget https://github.com/severalnines/puppet/archive/master.zip -O clustercontrol.zip; unzip -x clustercontrol.zip; mv puppet-master /etc/puppetlabs/code/environments/production/modules/clustercontrol

2) Create Your Manifest File and Add the Contents as Shown Below

vi /etc/puppetlabs/code/environments/production/manifests/site.pp

Now, before we proceed, we need to discuss the manifest script and the command to be executed. First, we'll have to define the type of ClusterControl and its variables we need to provide. ClusterControl requires every setup to have token and SSH keys be specified and provided accordingly. Hence, this can be done by running the following command below:

## Generate the key

bash /etc/puppetlabs/code/environments/production/modules/clustercontrol/files/s9s_helper.sh --generate-key

## Then, generate the token

bash /etc/puppetlabs/code/environments/production/modules/clustercontrol/files/s9s_helper.sh --generate-token

Now, let's discuss what we'll have to input within the manifest file one by one.

node 'clustercontrol.puppet.local' { # Applies only to mentioned node. If nothing mentioned, applies to all.

        class { 'clustercontrol':

            is_controller => true,

ip_address => '<ip-address-of-your-cluster-control-hosts>',

mysql_cmon_password => '<your-desired-cmon-password>',

                  api_token => '<api-token-generated-earlier>'

        }

Now, we'll have to define the <ip-address-of-your-cluster-control-hosts> of your ClusterControl node where it's actually the clustercontrol.puppet.local in this example. Specify also the cmon password and then place the API token as generated by the command mentioned earlier.

Afterwards, we'll use ClusterControl RPC to send a POST request to create an AWS entry:

exec { 'add-aws-credentials':

            path  => ['/usr/bin', '/usr/sbin', '/bin'],  

    command => "echo '{\"operation\" : \"add_credentials\", \"provider\" : aws, \"name\" : \"<your-aws-credentials-name>\", \"comment\" : \"<optional-comment-about-credential-entry>\", \"credentials\":{\"access_key_id\":\"<aws-access-key-id>\",\"access_key_secret\" : \"<aws-key-secret>\",\"access_key_region\" : \"<aws-region>\"}}'  | curl -sX POST -H\"Content-Type: application/json\" -d @- http://localhost:9500/0/cloud"

}

The placeholder variables I set are self-explanatory. You need to provide the desired credential name for your AWS, provide a comment if you wanted to, provided the AWS access key id, your AWS key secret and AWS region where you'll be deploying the Galera nodes.

Lastly, we'll have to run the command using s9s CLI tools.

exec { 's9s':

  path        => ['/usr/bin', '/usr/sbin', '/bin'],

  onlyif      => "test -f $(/usr/bin/s9s cluster --list --cluster-format='%I' --cluster-name '<cluster-name>' 2> /dev/null) > 0 ", 

  command     => "/usr/bin/s9s cluster --create --cloud=aws --vendor percona --provider-version 5.7  --containers=<node1>,<node2>,<node3> --nodes=<node1>,<node2>,<node3> --cluster-name=<cluster-name> --cluster-type=<cluster-type> --image <aws-image> --template <aws-instance-type> --subnet-id <aws-subnet-id> --region <aws-region> --image-os-user=<image-os-user> --os-user=<os-user> --os-key-file <path-to-rsa-key-file> --vpc-id <aws-vpc-id> --firewalls <aws-firewall-id> --db-admin <db-user> --db-admin-passwd <db-password> --wait --log",

  timeout     => 3600,

  logoutput   => true

}

Let’s look at the key-points of this command. First, the "onlyif" is defined by a conditional check to determine if such cluster name exists, then do not run since it's already added in the cluster. We'll proceed on running the command which utilizes the S9S CLI Tools. You'll need to specify the AWS IDs in the placeholder variables being set. Since the placeholder names are self-explanatory, its values will be taken from your AWS Console or by using the AWS CLI tools.

Now, let's check the succeeding steps remaining.

3) Prepare the Script for Your Manifest File

# Copy the example contents below (edit according to your desired values) and paste it to the manifest file, which is the site.pp.

node 'clustercontrol.puppet.local' { # Applies only to mentioned node. If nothing mentioned, applies to all.

        class { 'clustercontrol':

is_controller => true,

ip_address => '192.168.40.20',

mysql_cmon_password => 'R00tP@55',

mysql_server_addresses => '192.168.40.30,192.168.40.40',

api_token => '0997472ab7de9bbf89c1183f960ba141b3deb37c'

        }



exec { 'add-aws-credentials':

path  => ['/usr/bin', '/usr/sbin', '/bin'],  

command => "echo '{\"operation\" : \"add_credentials\", \"provider\" : aws, \"name\" : \"paul-aws-sg\", \"comment\" : \"my SG AWS Connection\", \"credentials\":{\"access_key_id\":\"XXXXXXXXXXX\",\"access_key_secret\" : \"XXXXXXXXXXXXXXX\",\"access_key_region\" : \"ap-southeast-1\"}}'  | curl -sX POST -H\"Content-Type: application/json\" -d @- http://localhost:9500/0/cloud"

}





exec { 's9s':

path        => ['/usr/bin', '/usr/sbin', '/bin'],

onlyif      => "test -f $(/usr/bin/s9s cluster --list --cluster-format='%I' --cluster-name 'cli-aws-repl' 2> /dev/null) > 0 ", 

command     => "/usr/bin/s9s cluster --create --cloud=aws --vendor percona --provider-version 5.7  --containers=db1,db2,db3 --nodes=db1,db2,db3 --cluster-name=cli-aws-repl --cluster-type=galera --image ubuntu18.04 --template t2.small --subnet-id subnet-xxxxxxxxx  --region ap-southeast-1 --image-os-user=s9s --os-user=s9s --os-key-file /home/vagrant/.ssh/id_rsa --vpc-id vpc-xxxxxxx --firewalls sg-xxxxxxxxx --db-admin root --db-admin-passwd R00tP@55 --wait --log",

timeout     => 3600,

logoutput   => true

}

}

Let's Do the Test and Run Within the Agent Node

/opt/puppetlabs/bin/puppet agent --test

The End Product

Now, let's have a look once the agent is being ran. Once you have this running, visiting the URLhttp://<cluster-control-host>/clustercontrol, you'll be asked by ClusterControl to register first.

Now, you wonder where's the result after we had run the RPC request with resource name 'add-aws-credentials'in our manifest file, it'll be found in the Integrations section within the ClusterControl. Let's see how it looks like after the Puppet perform the runbook.

You can modify this in accordance to your like through the UI but you can also modify this by using our RPC API.

Now, let's check the cluster,

From the UI view, it shows that it has been able to create the cluster, display the cluster in the dashboard, and also shows the job activities that were performed in the background.

Lastly, our AWS nodes are already present now in our AWS Console. Let's check that out,

All nodes are running healthy and are expected to its designated names and region.

Conclusion

In this blog, we are able to deploy a Galera/Percona Xtradb Cluster using automation with Puppet. We did not create the code from scratch, nor did we use any external tools that would have complicated the task. Instead, we used the CusterControl Module and the S9S CLI tool to build and deploy a highly available Galera Cluster.

Tags:

MySQL

galera cluster

percona xtradb cluster

Using logical backup programs like mysqldump is a common practice performed by MySQL admins for backup and restore (the process of moving a database from one server to another) and is also the most efficient way to perform a database mass modification using a single text file.

When doing this for MySQL Galera Cluster, however, the same rules apply except for the fact that it takes a lot of time to restore a dump file into a running Galera Cluster. In this blog, we will look at the best way to restore a Galera Cluster using mysqldump.

Galera Cluster Restoration Performance

One of the most common misconceptions about Galera Cluster is that restoring a database into a three-node cluster is faster than doing it to a standalone node. This is definitely incorrect when talking about a stateful service, like datastore and filesystem. To keep in sync, every member has to keep up with whatever changes happened with the other members. This is where locking, certifying, applying, rollbacking, committing are forced to be involved into the picture to ensure no data loss along the process, because for a database service, data loss is a big no-no.

Let's make some comparisons to see and understand the impact. Suppose we have a 2 GB of dump file for database 'sbtest'. We usually would load the data into the cluster via two endpoints:

load balancer host
one of the database hosts

As for control measurement, we are also going to restore on a standalone node. Variable pxc_strict_mode is set to PERMISSIVE on all Galera nodes.

The backup was created on one of the Galera nodes with the following command:

$ mysqldump --single-transaction sbtest > sbtest.sql

We are going to use 'pv' to observe the progress and measure the restoration performance. Thus, the restore command is:

$ pv sbtest.sql | mysql -uroot -p sbtest

The restorations were repeated 3 times for each host type as shown in the following table:

Endpoint Type

Database Server

Restoration Time

(seconds)

Restoration Speed

(MiB/s)

Standalone

MySQL 5.7.25

3m 29s

3m 36s

3m 31s

8.73

8.44

8.64

Load

Balancer

HAProxy -> PXC 5.7.25 (multiple DB hosts - all active, leastconn)

5m 45s

6m 03s

5m 43s

5.29

5.02

5.43

ProxySQL -> PXC 5.7.25

(single DB host - single writer hostgroup)

6m 07s

7m 00s

6m 54s

4.97

4.34

4.41

Galera

Cluster

PXC 5.7.25

(single DB host)

5m 22s

6m 00s

5m 28s

5.66

5.07

5.56

Note that the way pv measures the restoration speed is based on the mysqldump text file that is being passed through it through pipe. It's not highly accurate but good enough to give us some measurements to compare. All hosts are having the same specs and running as a virtual machine on the same underlying physical hardware.

The following column chart summarizes the average time it takes to restore the mysqldump:

Standalone host is the clear winner with 212 seconds, while ProxySQL is the worst for this workload; almost two-times slower if compared to standalone.

The following column chart summarizes the average speed pv measures when restoring the mysqldump:

As expected, restoration on the standalone note is way faster with 8.6 MiB/s on average, 1.5x better than restoration directly on the Galera node.

To summarize our observation, restoring directly on a Galera Cluster node is way slower than a standalone host. Restoring through a load balancer is even worse.

Turning Off Galera Replication

Running mysqldump on a Galera Cluster will cause every single DML statement (INSERTs in this case) being broadcasted, certified and applied by Galera nodes through its group communication and replication library. Thus, the fastest way to restore a mysqldump is to perform the restoration on a single node, with Galera Replication turned off, kind of making it running like a standalone mode. The steps are:

Pick one Galera node as the restore node. Stop the rest of the nodes.
Turn off Galera Replication on the restore node.
Perform the restoration.
Stop and bootstrap the restore node.
Force the remaining node to re-join and re-sync via SST.

For example, let's say we choose db1 to be the restore node. Stop the other nodes (db2 and db3) one node at a time so the nodes would leave the cluster gracefully:

$ systemctl stop mysql #db2

$ systemctl stop mysql #db3

Note: For ClusterControl users, simply go to Nodes -> pick the DB node -> Node Actions -> Stop Node. Do not forget to turn off ClusterControl automatic recovery for cluster and nodes before performing this exercise.

Now, login to db1 and turn the Galera node into a standalone node by setting wsrep_provider variable to 'none':

$ mysql -uroot -p

mysql> SET GLOBAL wsrep_provider = 'none';

mysql> SHOW STATUS LIKE 'wsrep_connected';

+-----------------+-------+

| Variable_name   | Value |

+-----------------+-------+

| wsrep_connected | OFF   |

+-----------------+-------+

Then perform the restoration on db1:

$ pv sbtest.sql | mysql -uroot -p sbtest

1.78GiB 0:02:46 [  11MiB/s] [==========================================>] 100%

The restoration time has improved 2x to 166 seconds (down from ~337 seconds) with 11MiB/s (up from ~5.43MiB/s). Since this node is now has the most updated data, we have to bootstrap the cluster based on this node and let the other nodes rejoin the cluster and force to re-syncing everything back.

On db1, stop the MySQL service and start it again in bootstrap mode:

$ systemctl status mysql #check whether mysql or mysql@bootstrap is running

$ systemctl status mysql@bootstrap #check whether mysql or mysql@bootstrap is running

$ systemctl stop mysql # if mysql was running

$ systemctl stop mysql@bootstrap # if mysql@bootstrap was running

$ systemctl start mysql@bootstrap

While on every remaining node, wipe out the datadir (or you can just simply delete grastate.dat file) and start the MySQL service:

$ rm /var/lib/mysql/grastate.dat  # remove this file to force SST

$ systemctl start mysql

Do perform the start up process one node at a time. Once the working node is synced, proceed with the next node and so on.

Note: For ClusterControl users, you could skip the above step because ClusterControl can be configured to force SST during the bootstrap process. Just click on the Cluster Actions -> Bootstrap Cluster and pick the db1 as the bootstrap node and toggle on the option for "Clear MySQL Datadir on Joining nodes", as shown below:

We could also juice up the restoration process by allowing bigger packet size for the mysql client:

$ pv sbtest.sql | mysql -uroot -p --max_allowed_packet=2G sbtest

At this point, our cluster should be running with the restored data. Take note that in this test case, the total restoration time for the cluster is actually longer than if we performed the restoration directly on the Galera node thanks to our small dataset. If you have a huge mysqldump file to restore, believe us, this is one of the best ways you should do.

That's it for now. Happy restoring!

Tags:

mariadb galera cluster

mariadb cluster

performance management

ClusterControl 1.7.3 comes with a notable improvement in cloud integration. It is possible to deploy a MySQL and PostgreSQL replication cluster to the cloud, as well as automatically launch a cloud instance and scale out your database cluster by adding a new database node.

This blog post showcases how to easily deploy a Galera Cluster using ClusterControl on AWS. This new feature is part of the ClusterControl Community Edition, which comes with free deployment and monitoring features. This means that you can take advantage of this feature for no cost!

ClusterControl Database Cluster Architecture

The following diagram summarizes our overall database clusters architecture.

The ClusterControl server is located outside of the AWS infrastructure, allowing for fair visibility to our database cluster (located in Frankfurt: eu-central-1). The ClusterControl server MUST have a dedicated public IP address. This is because the IP address will be granted by ClusterControl on the database server and AWS security group. The Galera database version that we are going to deploy is MariaDB Cluster 10.3, using ClusterControl 1.7.3.

Preparing the AWS Environment

ClusterControl is able to deploy a database cluster on supported cloud platforms, namely AWS, Google Cloud Platform (GCP), and Microsoft Azure. The first thing we have to configure is to get the AWS access keys to allow ClusterControl to perform programmatic requests to AWS services. You could use the root account access key, but this is not the recommended way. It's better to create a dedicated Identity and Access Management (IAM) user solely for this purpose.

Login to your AWS Console -> My Security Credentials -> Users -> Add User. Specify the user and pick "Programmatic Access" as the Access Type:

In the next page, create a new user group by clicking the "Create group" button and give the group name "DatabaseAutomation". Assign the following access type:

AmazonEC2FullAccess
AmazonVPCFullAccess
AmazonS3FullAccess (only if you plan to store the database backup on AWS S3)

Tick the DatabaseAutomation checkbox and click "Add user to group":

Optionally, you can assign tags on the next page. Otherwise, just proceed to create the user. You should get the two most important things, Access key ID and Secret access key.

Download the CSV file and store it somewhere safe. We are now good to automate the deployment on cloud.

Install ClusterControl on the respective server:

$ whoami

root

$ wget http://severalnines.com/downloads/cmon/install-cc

$ chmod 755 install-cc

$ ./install-cc

Follow the installation instructions and go to http://192.168.0.11/clustercontrol and create the super admin user and password.

To allow ClusterControl to perform automatic deployment on cloud, one has to create cloud credentials for the selected region with a valid AWS key ID and secret. Go to Sidebar -> Integrations -> Cloud Providers -> Add your first Cloud Credential -> Amazon Web Services and enter the required details and choose Frankfurt as the default region:

This credential will be used by ClusterControl to automate the cluster deployment and management. At this point, we are ready to deploy our first cluster.

Database Cluster Deployment

Go to Deploy -> Deploy in the Cloud -> MySQL Galera -> MariaDB 10.3 -> Configure Cluster to proceed to the next page.

Under Configure Cluster section, ensure the number of nodes is 3 and give a cluster name and MySQL root password:

Configure MySQL Galera Cluster in ClusterControl

Under Select Credential, choose a credential called "AWS Frankfurt" and proceed to the next page by clicking "Select Virtual Machine". Choose the preferred operating system and instance size. It's recommended to run our infrastructure inside a private cloud so we could get a dedicated internal IP address for our cloud instances and the hosts are not directly exposed to the public network. Click "Add New" button next to Virtual Private Cloud (VPC) field and give a subnet of 10.10.0.0/16 to this network:

The VPC that we have created is a private cloud and does not have internet connectivity. In order for ClusterControl to be able to deploy and manage the hosts from outside AWS network, we have to allow internet connectivity to this VPC. To do this, we have to do the following:

Create an internet gateway
Add external routing to the route table
Associate the subnet to the route table

To create an internet gateway, login to AWS Management Console -> VPC -> Internet Gateways -> Create internet gateway ->assign a name for this gateway. Then select the created gateway from the list and go to Actions -> Attach to VPC -> select the VPC for the dropdown list -> Attach. We have now attach an internet gateway to the private cloud. However, we need to configure the network to forward all external requests via this internet gateway. Hence, we have to add a default route to the route table. Go to VPC -> Route Tables -> select the route table -> Edit Routes and specify the destination network, 0.0.0.0/0 and target (the created internet gateway ID) as below:

Then, we have to associate the DB subnet to this network so it assigns all instances created inside this network to the default route that we have created earlier, select the route table -> Edit Subnet Association -> assign the the DB subnet, as shown below:

The VPC is now ready to be used by ClusterControl for the deployment.

Once created, select the created VPC from the dropdown. For SSH Key, we will ask ClusterControl to auto generate it:

The generated SSH key will be located inside ClusterControl server under /var/lib/cmon/autogenerated_ssh_keys/s9s/ directory.

Click on "Deployment Summary". In this page, we have to assign a subnet from the VPC to the database cluster. Since this is a new VPC, it has no subnet and we have to create a new one. Click on "Add New Subnet" button and assign 10.10.1.0/24 as the network for our database cluster:

Finally, select the create subnet in the textbox and click on "Deploy Cluster":

You can monitor the job progress under Activity -> Jobs -> Create Cluster. ClusterControl will perform the necessary pre-installation steps like creating the cloud instances, security group, generating SSH key and so on, before the actual installation steps begin.

Once cluster is ready, you should see the following cluster in ClusterControl dashboard:

Our cluster deployment is now complete.

Post AWS Database Deployment

We can start loading our data into the cluster or create a new database for your application usage. To connect, simply instruct your applications or clients to connect to the private or public IP address of one of the database servers. You can get this information by going to Nodes page, as shown in the following screenshot:

If you like to access the database nodes directly, you can use ClusterControl web-SSH module at Node Actions -> SSH Console, which gives you a similar experience like connecting via SSH client.

To scale the cluster up by adding a database node, you can just go Cluster Actions (server stack icon) -> Add Node -> Add a DB node on a new cloud instance and you will be presented with the following dialog:

Adding a Node ClusterControl AWS Deployment

Just simply follow the deployment wizard and configure your new instance accordingly. Once the instance is created, ClusterControl will install, configure and join the node into the cluster automatically.

That's it for now, folks. Happy clustering in the cloud!

Tags:

galera cluster

mariadb galera cluster

Running a MySQL Galera Cluster (either the Percona, MariaDB, or Codership build) is, unfortunately, not a supported (nor part of) the databases supported by Amazon RDS. Most of the databases supported by RDS use asynchronous replication, while Galera Cluster is a synchronous multi-master replication solution. Galera also requires InnoDB as its storage engine to function properly, and while you can use other storage engines such as MyISAM it is not advised that you use this storage engine because of the lack of transaction handling.

Because of the lack of support natively in RDS, this blog will focus on the offerings available when choosing and hosting your Galera-based cluster using an AWS environment.

There are certainly many reasons why you would choose or not choose the AWS cloud platform, but for this particular topic we’re going to go over the advantages and benefits of what you can leverage rather than why you would choose the AWS Platform.

The Virtual Servers (Elastic Compute Instances)

As mentioned earlier, MySQL Galera is not part of RDS and InnoDB is a transactional storage engine for which you need the right resources for your application requirement. It must have the capacity to serve the demand of your client request traffic. At the time of this article, your sole choice for running Galera Cluster is by using EC2, Amazon's compute instance cloud offering.

Because you have the advantage of running your system on a number of nodes on EC2 instances, running a Galera Cluster on EC2 verses on-prem doesn’t differ much. You can access the server remotely via SSH, install your desired software packages, and choose the kind of Galera Cluster build you like to utilize.

Moreover, with EC2 this offering is more elastic and flexible, allowing you to deliver and offer a simpler, granular setup. You can take advantage of the web services to automate or build a number of nodes if you need to scaleout your environment, or for example, automate the building of your staging or development environment. It also gives you an edge to quickly build your desired environment, choose and setup your desired OS, and pickup the right computing resources that fits your requirements (such as CPU, memory, and disk storage.) EC2 eliminates the time to wait for hardware, since you can do this on the fly. You can also leverage their AWS CLI tool to automate your Galera cluster setup.

Pricing for Amazon EC2 Instances

EC2 offers a number of selections which are very flexible for consumers who would like to host their Galera Cluster environment on AWS compute nodes. The AWS Free Tier includes 750 hours of Linux and Windows t2.micro instances, each month, for one year. You can stay within the Free Tier by using only EC2 Micro instances, but this might not be the best thing for production use.

There are multiple types of EC2 instances for which you can deploy when provisioning your Galera nodes. Ideally, these r4/r5/x1 family (memory optimized) and c4/c5 family (compute optimized) are an ideal choice, and these prices differ depending on how large your server resource needs are and type of OS.

These are the types of paid instances you can choose...

On Demand

Pay by compute capacity (per-hour or per-second), depends on the type of instances you run. For example, prices might differ when provisioning an Ubuntu instances vs RHEL instance aside from the type of instance. It has no long-term commitments or upfront payments needed. It also has the flexibility to increase or decrease your compute capacity. These instances are recommended for low cost and flexible environment needs like applications with short-term, spiky, or unpredictable workloads that cannot be interrupted, or applications being developed or tested on Amazon EC2 for the first time. Check it out here for more info.

Dedicated Hosts

If you are looking for compliance and regulatory requirements such as the need to acquire a dedicated server that runs on a dedicated hardware for use, this type of offer suits your needs. Dedicated Hosts can help you address compliance requirements and reduce costs by allowing you to use your existing server-bound software license, including Windows Server, SQL Server, SUSE Linux Enterprise Server, Red Hat Enterprise Linux, or other software licenses that are bound to VMs, sockets, or physical cores, subject to your license terms. It can be purchased On-Demand (hourly) or as a Reservation for up to 70% off the On-Demand price. Check it out here for more info.

Spot Instances

These instances allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price. This is recommended for applications that have flexible start and end times, applications that are only feasible at very low compute prices, or users with urgent computing needs for large amounts of additional capacity. Check it out here for more info.

Reserved Instances

This type of payment offer provides you the option to grab up to a 75% discount and, depending on which instance you would like to reserve, you can acquire a capacity reservation giving you additional confidence in your ability to launch instances when you need them. This is recommended if your applications have steady state or predictable usage, applications that may require reserved capacity, or customers that can commit to using EC2 over a 1 or 3 year term to reduce their total computing costs. Check it out here for more info.

Pricing Note

One last thing with EC2, they also offer a per-second billing which also takes cost of unused minutes and seconds in an hour off of the bill. This is advantageous if you are scaling-out for a minimal amount of time, just to handle traffic request from a Galera node or in case you want to try and test on a specific node for just a limited time use.

Database Encryption on AWS

If you're concerned about the confidentiality of your data, or abiding the laws required for your security compliance and regulations, AWS offers data-at-rest encryption. If you're using MariaDB Cluster version 10.2+, they have built-in plugin support to interface with the Amazon Web Services (AWS) Key Management Service (KMS) API. This allows you to take advantage of AWS-KMS key management service to facilitate separation of responsibilities and remote logging & auditing of key access requests. Rather than storing the encryption key in a local file, this plugin keeps the master key in AWS KMS.

When you first start MariaDB, the AWS KMS plugin will connect to the AWS Key Management Service and ask it to generate a new key. MariaDB will store that key on-disk in an encrypted form. The key stored on-disk cannot be used to decrypt the data; rather, on each startup, MariaDB connects to AWS KMS and has the service decrypt the locally-stored key(s). The decrypted key is stored in-memory as long as the MariaDB server process is running, and that in-memory decrypted key is used to encrypt the local data.

Alternatively, when deploying your EC2 instances, you can encrypt your data storage volume with EBS (Elastic Block Storage) or encrypt the instance itself. Encryption for EBS type volumes are all supported, though it might have an impact but the latency is very minimal or even not visible to the end users. For EC2 instance-type encryption, most of the large instances are supported. So if you're using compute or memory optimized nodes, you can leverage its encryption.

Below are the list of supported instances types...

General purpose: A1, M3, M4, M5, M5a, M5ad, M5d, T2, T3, and T3a
Compute optimized: C3, C4, C5, C5d, and C5n
Memory optimized: cr1.8xlarge, R3, R4, R5, R5a, R5ad, R5d, u-6tb1.metal, u-9tb1.metal, u-12tb1.metal, X1, X1e, and z1d
Storage optimized: D2, h1.2xlarge, h1.4xlarge, I2, and I3
Accelerated computing: F1, G2, G3, P2, and P3

You can setup your AWS account to always enable encryption upon deployment of your EC2-type instances. This means that AWS will encrypt new EBS volumes on launch and encrypts new copies of unencrypted snapshots.

Multi-AZ/Multi-Region/Multi-Cloud Deployments

Unfortunately, as of this writing, there's no such direct support in the AWS Console (nor any of their AWS API) that supports Multi-AZ/-Region/-Cloud deployments for Galera node clusters.

High Availability, Scalability, and Redundancy

To achieve a multi-AZ deployment, it's recommendable that you provision your galera nodes in different availability zones. This prevents the cluster from going down or a cluster malfunction due to lack of quorum.

You can also setup an AWS Auto Scaling and create an auto scaling group to monitor and do status checks so your cluster will always have redundancy, scalable, and highly availability. Auto Scaling should solve your problem in the case that your node goes down for some unknown reason.

This type of setup allows you to deploy multiple nodes in different regions for your Galera Cluster. Aside from that, you can also deploy your Galera nodes on a different vendor, for example, if it's hosted in Google Cloud and you want redundancy on Microsoft Azure.

I would recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Database Performance on AWS

Depending on your application demand, if your queries memory consuming the memory optimized instances are your ideal choice. If your application has higher transactions that require high-performance for web servers or batch processing, then choose compute optimized instances. If you want to learn more about optimizing your Galera Cluster, you can check out this blog How to Improve Performance of Galera Cluster for MySQL or MariaDB.

Database Backups on AWS

Creating backups can be difficult since there's no direct support within AWS that is specific for MySQL Galera technology. However, AWS provides you a disaster and recovery solution using EBS Snapshots. You can take snapshots of the EBS volumes attached to your instance, then either take a backup by schedule using CloudWatch or by using the Amazon Data Lifecycle Manager (Amazon DLM) to automate the snapshots.

Take note that the snapshots taken are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved. You can store these snapshots to AWS S3 to save storage costs. Alternatively, you can use external tools like Percona Xtrabackup, and Mydumper (for logical backups) and store these to AWS EFS -> AWS S3 -> AWS Glacier.

You can also setup Lifecycle Management in AWS if you need your backup data to be stored in a more cost efficient manner. If you have large files and are going to utilize the AWS EFS, you can leverage their AWS Backup solution as this is also a simple yet cost-effective solution.

On the other hand, you can also use external services (as well such as ClusterControl) which provides you both monitoring and backup solutions. Check this out if you want to know more.

Database Monitoring on AWS

AWS offers health checks and some status checks to provide you visibility into your Galera nodes. This is done through CloudWatch and CloudTrail.

CloudTrail lets you enable and inspect the logs and perform audits based on what actions and traces have been made.

CloudWatch lets you collect and track metrics, collect and monitor log files, and set custom alarms. You can set it up according to your custom needs and gain system-wide visibility into resource utilization, application performance, and operational health. CloudWatch comes with a free tier as long as you still fall within its limits (See the screenshot below.)

CloudWatch also comes with a price depending on the volume of metrics being distributed. Checkout its current pricing by checking here.

Take note: there's a downside to using CloudWatch. It is not designed to cater to the database health, especially for monitoring MySQL Galera cluster nodes. Alternatively, you can use external tools that offer high-resolution graphs or charts that are useful in reporting and are easier to analyze when diagnosing a problematic node.

For this you can use PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (as monitoring is FREE with ClusterControl Community.) I would recommend that you use a monitoring tool that suits your needs based on your individual application requirements. It's very important that your monitoring tool be able to notify you aggressively or provide you integration for instant messaging systems such as Slack, PagerDuty or even send you SMS when escalating severe health status.

Database Security on AWS

Securing your EC2 instances is one of the most vital parts of deploying your database into the public cloud. You can setup a private subnet and setup the required security groups only favored to allow the port or source IP depending on your setup. You can set your database nodes with a non-remote access and just set up a jump host or an Internet Gateway, if nodes requires to access the internet to access or update software packages. You can read our previous blog Deploying Secure Multicloud MySQL Replication on AWS and GCP with VPN on how we set this up.

In addition to this, you can secure your data in-transit by using TLS/SSL connection or encrypt your data when it's at rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, storing your data via S3 can be encrypted using AWS Server-Side Encryption or use AWS-KMS which I have discussed earlier. Check this external blog on how to setup and leverage a MariaDB Cluster using AWS-KMS so you can store your data securely at-rest.

Galera Cluster Troubleshooting on AWS

AWS CloudWatch can help especially when investigating and checking out the system metrics. You can check the network, CPU, memory, disk, and it's instance or compute usage and balance. This might not, however, meet your requirements when digging into a specific case.

CloudTrail can perform solid traces of actions that has been governed based on your specific AWS account. This will help you determine if the occurrences aren't coming from MySQL Galera, but might be some bug or issues within the AWS environment (such as Hyper-V is having issues within the host machine where your instance, as the guest, is being hosted.)

If you're using ClusterControl, going to Logs -> System Logs, you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.

Conclusion

AWS does not have pure support for a MySQL Galera Cluster setup, unlike AWS RDS which has MySQL compatibility. Because of this most of the recommendations or opinions running a Galera Cluster for production use within the AWS environment are based on experienced and well-tested environments that have been running for a very long time.

MariaDB Cluster comes with a great productivity, as they constantly provide concise support for the AWS technology stack solution. In the upcoming release of MariaDB 10.5 version, they will offer a support for S3 Storage Engine, which may be worth the wait.

External tools can help you manage and control your MySQL Galera Cluster running on the AWS Cloud, so it's not a huge concern if you have some dilemmas and FUD on why you should run or shift to the AWS Cloud Platform.

AWS might not be the one-size-fits-all solution in some cases, but it provides a wide-array of solutions that you can customize and tailor it to fit your needs.

In the next part of our blog, we'll look at another public cloud platform, particularly Google Cloud and see how we can leverage if we choose to run our Galera Cluster into their platform.

Tags:

In our previous blog, we showed how devops can automate your daily database tasks with Chef. Now, let's see how we can quickly deploy a MySQL Galera Cluster with Chef using s9s CLI.

Setting up a Galera Cluster manually can be fast for an experienced DBA. Automating it with something like Ansible may take a hell of a lot longer, as our colleague Krzysztof found out in this blog. While automating deployment of a database cluster with Chef is certainly doable, it is not an easy task as you can end up with hundreds of lines of code which can be hard to maintain when there are updates to the software. We will demonstrate here how you can integrate the s9s CLI in your deployment cookbook and speed up the process.

About the s9s CLI

s9s is the CLI tool for ClusterControl, you can use it to integrate with your runbooks for automation like Ansible, Puppet, Chef or Salt. This allows you to easily integrate database management functionality into your orchestration scripts. The command line tool allows you to interact, control and manage your database infrastructure using ClusterControl. For instance, we used it here to automate deployment when running ClusterControl and Galera Cluster on Docker Swarm. It is noteworthy that this tool is open source so you can freely use or contribute to it.

How to get the s9s CLI

The CLI can be installed by adding the s9s tools repository and using a package manager, or compiled from source. The current installation script to install ClusterControl, install-cc, will automatically install the command line client. The command line client can also be installed on another computer or workstation for remote management. More information in our documentation.

Some of the things you can do from the CLI:

Deploy and manage database clusters
- MySQL and MariaDB
- PostgreSQL
- MongoDB to be added soon
- TimescaleDB
Monitor your databases
- Status of nodes and clusters
- Cluster properties can be extracted
- Gives detailed enough information about your clusters
Manage your systems and integrate with DevOps tools
- Create, stop or start clusters
- Add, remove, or restart nodes in the cluster
- Create database users (CREATE USER, GRANT privileges to user)
  - Users created in the CLI are traceable through the system
- Create load balancers (HAProxy, ProxySQL)
- Create and Restore backups
- Use maintenance mode
- Conduct configuration changes of db nodes
- Integrate with existing deployment automation
  - Ansible, Puppet, Chef or Salt, …
- Integrate with chatbots like Slack, FlowDock and Hipchat

Automating your Galera Cluster Setup

In our previous blog, we have discussed about the flow of Chef in which you have to setup your workstation, Chef server, and the nodes/clients. Now, let's look at first the diagram for this setup on automating our Galera Cluster setup. Check out below:

The workstation serves here as your development machine, where you write your code. Then push the cookbooks/recipes to the Chef Server, which runs them in the target node which is the ClusterControl in this setup. This target ClusterControl must be a clean/dedicated host. As mentioned earlier, we'll be using s9s-tools to leverage the installation and setup of Galera nodes without writing lots of lines of code. Instead, code it like a boss.

Here are the prerequisites:

Workstation/Node
Chef Server
ClusterControl controller/server - The controller here is a requirement for our s9s CLI to operate. Our community version lets you deploy your database clusters.
3-galera nodes. For this setup, I have the following IP's used: 192.168.70.70,192.168.70.80,192.168.70.100
Setup your ClusterControl OS user public keys to all of the targeted Galera nodes to avoid further SSH errors.

Integrating ClusterControl into your automation tools

Installing ClusterControl can be done in several ways. You can use the package manager such as your favorite yum or apt and use our repository, you can use install-cc, or you can use our automation scripts for Puppet, Ansible, or Chef.

For the purpose of this blog, we will use the S9s_cookbook and integrate the automation process for our Galera Cluster setup. There are two ways to utilize the S9s_cookbook. You can use the github https://github.com/severalnines/S9s_cookbooks or through the marketplace using knife. We'll use the marketplace.

In your workstation, download the cookbook using Chef's knife tool, and uncompress the tar ball.

$ cd ~/dba-tasks-repo/cookbooks
$ knife cookbook site download clustercontrol
$ tar -xzvf clustercontrol-*
$ unlink clusercontrol-*.tar.gz

Then run the s9s_helper.sh located in app the clustercontrol cookbook.

$ cd ~/dba-tasks-repo/cookbooks/clustercontrol/files/default
$ ./s9s_helper.sh

For example, you'll see the following as you run the script:

[vagrant@node2 default]$ ./s9s_helper.sh 
==============================================
Helper script for ClusterControl Chef cookbook
==============================================

ClusterControl will install a MySQL server and setup the MySQL root user.
Enter the password for MySQL root user [password] : R00tP@55

ClusterControl will create a MySQL user called 'cmon' for automation tasks.
Enter the password for user cmon [cmon] : cm0nP@55

Generating config.json..
{
    "id" : "config",
    "mysql_root_password" : "R00tP@55",
    "cmon_password" : "cm0nP@55",
    "clustercontrol_api_token" : "f38389ba5d1cd87a1aa5f5b1c15b3ca0ee5a2b0f"
}

Data bag file generated at /home/vagrant/dba-tasks-repo/cookbooks/clustercontrol/files/default/config.json
To upload the data bag, you can use following command:
$ knife data bag create clustercontrol
$ knife data bag from file clustercontrol /home/vagrant/dba-tasks-repo/cookbooks/clustercontrol/files/default/config.json

** We highly recommend you to use encrypted data bag since it contains confidential information **

Then create a data bag as per instruction in the last output of the s9s_helper.sh script,

[vagrant@node2 clustercontrol]$ knife data bag create clustercontrol
Created data_bag[clustercontrol]
[vagrant@node2 clustercontrol]$ knife data bag from file clustercontrol ~/dba-tasks-repo/cookbooks/clustercontrol/files/default/config.json 
Updated data_bag_item[clustercontrol::config]

Before you upload to the Chef Server, ensure that you have the similar contents in your templates/default/configure_cmon_db.sql.erb as follows:

SET SQL_LOG_BIN=0;
GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'127.0.0.1' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'localhost' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'<%= node['ipaddress'] %>' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'<%= node['fqdn'] %>' IDENTIFIED BY '<%= node['cmon']['mysql_password'] %>' WITH GRANT OPTION;
REPLACE INTO dcps.apis(id, company_id, user_id, url, token) VALUES (1, 1, 1, 'http://127.0.0.1/cmonapi', '<%= node['api_token'] %>');
FLUSH PRIVILEGES;

Ensure in file recipes/controller.erb also that the cmon service will be restarted. See below:

service "cmon" do
    supports :restart => true, :start => true, :stop => true, :reload => true
    action [ :enable, :restart ]
end

Upload the cookbook to the Chef Server

$ ~/dba-tasks-repo/cookbooks/
$ knife cookbook upload clustercontrol

Alternatively, you can create roles to attach the role to the node. We'll use roles again just like in our previous blog. You can do the following for example:

Create a file called cc_controller.rb in path ~/dba-tasks-repo/cookbooks/clustercontrol with the following contents:
```
name "cc_controller"
description "ClusterControl Controller"
run_list ["recipe[clustercontrol]"]
```

Create a role from the file we created as follows:

[vagrant@node2 clustercontrol]$ knife role from file cc_controller.rb 
Updated Role cc_controller

Assign the roles to the target nodes/client as follows:
```
$ knife node run_list add <cluster_control_host> "role[cc_controller]"
```
Where <cluster_control_host> is your ClusterControl controller's hostname.

Verify what nodes the role is attached to and its run list. For example, I have the following:

[vagrant@node2 clustercontrol]$ knife role show cc_controller
chef_type:           role
default_attributes:
description:         ClusterControl Controller
env_run_lists:
json_class:          Chef::Role
name:                cc_controller
override_attributes:
run_list:
  recipe[clustercontrol]
  recipe[db_galera_install]

Now, we're not yet finished. We'll proceed on integrating our Galera Cluster automation Chef cookbook.

Writing our Chef Cookbook for MySQL Galera Cluster

As mentioned earlier, we will be integrating s9s CLI into our automation code. Let's proceed with the steps.

Let's generate a cookbook. Let's name it as db_galera_install,

$ cd ~/dba-tasks-repo/cookbooks/
$ chef generate cookbook db_galera_install

Let's also generate the attribute file,
```
$  chef generate attribute default
```

Go to attributes/default.rb and add the following contents in the file

default['cmon']['s9s_bin'] = '/bin/s9s'
default['cmon']['galera_cluster_name'] = 'PS-galera_cluster'
default['cmon']['db_pass'] = 'R00tP@55'

Go to recipes/default.rb and add the following contents in the file
```
galera_cluster_name = "#{node['cmon']['galera_cluster_name']}"
s9s_bin = "#{node['cmon']['s9s_bin']}"

cc_config = data_bag_item('clustercontrol','config')
db_pass = cc_config['mysql_root_password']


bash "install-galera-nodes" do
  user "root"
  code <<-EOH
  #{s9s_bin} cluster --create --cluster-type=galera --nodes="192.168.70.70,192.168.70.80,192.168.70.100" \
    --vendor=percona \
    --provider-version=5.7 \
    --db-admin-passwd='#{db_pass}' \
    --os-user=vagrant \
    --cluster-name='#{galera_cluster_name}' \
    --wait \
    --log
  EOH

  not_if "[[ ! -z $(#{s9s_bin} cluster --list --cluster-format='%I' --cluster-name '#{galera_cluster_name}') ]]"
end
```
A little bit of explanation about the code. It uses the s9s cluster --create command to create a Galera type of cluster. The nodes are specified using its IP addresses within the --nodes argument. We also use the same password setup from the ClusterControl database using the current data bag named clustercontrol. Hence, you can initiate another data bag as preferred. The rest are self-explainable but you can check here in our documentation.
Lastly, the conditional statement not_if part is very important. It determines that the bash resource, which handles the setup for the Galera cluster, will not be invoked once the Galera Cluster named PS-galera_cluster has been provisioned.
Since we have it setup, we'll then upload it to the Chef server as follows:
```
$ ~/dba-tasks-repo/cookbooks/
$ knife cookbook upload db_galera_install
```

Let's verify the list of roles and then add it to the role we have setup earlier namely cc_controller

[vagrant@node2 cookbooks]$ knife role list
cc_controller
pg_nodes

Then edit the role by running the command below:

$ export EDITOR=vi; 
$ knife role edit cc_controller

You might have something look like as follows,

{
  "name": "cc_controller",
  "description": "ClusterControl Controller",
  "json_class": "Chef::Role",
  "default_attributes": {

  },
  "override_attributes": {

  },
  "chef_type": "role",
  "run_list": [
    "recipe[clustercontrol]",
    "recipe[db_galera_install]"
  ],
  "env_run_lists": {

  }
}

You must ensure that the run_list must have the following,

"recipe[clustercontrol]",
"recipe[db_galera_install]"

That's all and we are ready to roll!

Executing the Runbooks

We're done preparing both ClusterControl and our Galera Cluster cookbooks ready to be tested. We'll proceed on running it and show the results of our Chef automation.

Go to the target node/client. In my end, I use node9 with IP 192.168.70.90. Then run the command below,

$ sudo chef-client -l info

In my client node, this shows as follows:

Setting and Installing the ClusterControl Server

Installing the Galera Cluster Nodes at Host 192.168.70.70

Installing the Galera Cluster Nodes at Host 192.168.70.80

Once it's all done, you'll have all your ClusterControl setup and the Galera Cluster nodes with ease!

Lastly, here's a screenshot of our Galera Cluster within ClusterControl:

ClusterControl Home Page

Our Galera Overview Dashboard

Conclusion

We have showed you how easy it is to integrate our s9s CLI tool for your common database activities. This is very helpful in an organization where Chef is used as the de-facto automation tool. s9s can make you more productive by automating your daily database administrative work.

Tags:

In our last blog we discussed the offerings available within Amazon Web Services (AWS) when running a MySQL Galera Cluster. In this blog, we'll continue the discussion by looking further at what the offerings are for running the same clustering technology, but this time on the Google Cloud Platform (GCP).

GCP, as an alternative to AWS, has been continuously attracting applications suited for DevOps by offering support for a wide array of full-stack technologies, containerized applications, and large production database systems. Google Cloud is a full-blown, battle-tested environment which powers its own hardware infrastructure at Google for products like YouTube and Gmail.

GCP has gained traction largely because of its ever-growing list of capabilities. It offers support for platforms like Visual Studio, Android Studio, Eclipse, Powershell and many others. GCP has one of the largest and most advanced computer networks and it provides access to numerous tools that help you focus on building your application.

Another thing that attracts customers to migrate, import, or use Google Cloud is their strong support and solutions for containerization. Kubernetes (GKE: Google Kubernetes Engine) is built on their platform.

GCP has also recently launched a new solution called Anthos. This product is designed to let organizations manage workloads using the same interface on the Google Cloud Platform (GCP) or on-premises using GKE On-Prem, and even on rival clouds such as Amazon Web Services (AWS) or Azure.

In addition to these technologies, GCP offers sophisticated and powerful, compute-optimized machine types like the C2 family in GCE which is built on the latest generation Intel Scalable Processors (Cascade Lake).

GCP is continuing to support open source as well, which benefits users by providing well-supported and a straightforward framework that makes it easy to deliver a final product in a timely manner. Despite this support of open source technology, GCP does not provide native support for the deployment or configuration of a MySQL Galera Cluster. In this blog we will show you the only option available to you if you wish to use this technology, deployment via a compute instance which you have to manage yourself.

The Google Compute Engine (GCE)

GCE has a sophisticated and powerful set of compute nodes available for your consumption. Unlike AWS, GCE has the most powerful compute node available on the market (n1-ultramem-160 having 160 vCPU and 3.75 TB of memory). GCE also just recently introduced a new type of compute instance family called C2 machine-type. Built on the latest generation of Intel Scalable Processors (Cascade Lake), C2 machine types offer up to 3.8 GHz sustained all-core turbo and provide full transparency into the architecture of the underlying server platforms; letting you fine-tune the performance. C2 machine types offer much more computing power, run on a newer platform, and are generally more robust for compute-intensive workloads than the N1 high-CPU machine types. C2 family offerings are limited (as of the time of writing) and it’s not available in all regions and zones. C2 also does not support regional persistent disks though it would be a great add-on for stateful database services that requires redundancy and high availability. The resources of a C2 instance is too much for a Galera node, so we'll focus on the compute nodes instead, which are ideal.

GCE also uses KVM as its virtualization technology software, whereas Amazon is using Xen. Let's take a look at the compute nodes available in GCE which are suitable for running Galera alongside its equivalence in AWS EC2. Prices differs based on region, but for this chart, we use us-east region using on-demand pricing type for AWS.

Machine/Instance Type

Google Compute Engine

AWS EC2

Shared

f1-micro

G1-small

Prices starts at $0.006 - $0.019 hourly

t2.nano – t3.2xlarge'

Price starts at $0.0058 - $0.3328 hourly

Standard

n1-standard-1 – n1-standard-96

Prices starts at $0.034 - $3.193 hourly

m4.large – m4.16xlarge

m5.large – m5d.metal

Prices starts at $0.1 - $5.424 hourly

High Memory/ Memory Optimized

n1-highmem-2 – n1-highmem-96

n1-megamem-96

n1-ultramem-40 – n1-ultramem-160

Prices starts at $0.083 - $17.651 hourly

r4.large – r4.16xlarge

x1.16xlarge – x1.32xlarge

x1e.xlarge – x1e.32xlarge

Prices starts at $0.133 - $26.688 hourly

High CPU/Storage Optimized

n1-highcpu-2 – n1-highcpu-32

Prices starts at $0.05 - $2.383 hourly

h1.2xlarge – h1.16xlarge

i3.large – i3.metal

I3en.large - i3en.metal

d2.xlarge – d2.8xlarge

Prices starts at $0.156 - $10.848 hourly

GCE has a fewer number of available predefined types of compute nodes to choose from, unlike AWS. When it comes to the type of node, however, it has more granularity. This makes it easier to setup and choose what kind of instance you want to use. For example, you can add a disk and set its physical block size (4 is default) to 16 or you can set its mode either read/write or read-only. This allows you to offer the right type of machine or compute instance ready to manage your Galera node. You may also instantiate your compute nodes using Cloud SDK, or by using Cloud APIs, to automate or integrate it to your Continuous Integration, Delivery, or Deployment (CI/CD).

Pricing (Compute Instance, Disk, vCPU, Memory, and Network)

The price as well depends on the region where its located, the type of OS or licensing (RHEL vs Suse Linux Enterprise), and also the type of disk storage you're using.

GCP also offers discounts which allows you to economize your resource consumption. For Compute Engine, it provides different discounts to avail.

Sustained use discounts apply to the following resources:

The vCPUs and memory for general-purpose custom and predefined machine types
The vCPUs and memory for memory-optimized machine types
The vCPUs and memory for compute-optimized machine types
The vCPUs and memory for sole-tenant nodes
The 10% premium cost for sole-tenant nodes, even if the vCPUs and memory in those nodes are covered by committed use discounts
GPU devices1

Take note that sustained use discounts do not apply to VMs created using App Engine Flexible Environment and Cloud Dataflow.

You can also use Committed Use Discounts when you purchase a VMS which is bound to a contract. This type of choice is ideal for predictable workloads and resource needs. When you purchase a committed use contract you purchase a certain amount of vCPUs, memory, GPUs, and local SSDs at a discounted price in return for committing to paying for those resources for 1 year or 3 years. The discount is up to 57% for most resources like machine types or GPUs. The discount is up to 70% for memory-optimized machine types. Once purchased, you are billed monthly for the resources you purchased for the duration of the term you selected (whether you use the services or not).

A preemptible VM is an instance that you can create and run at a much lower price than normal instances. Compute Engine may, however, terminate (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances use excess Compute Engine capacity, so their availability varies with usage.

If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances. If some of those instances terminate during processing, the job slows but does not completely stop. Preemptible instances complete your batch processing tasks without placing additional workload on your existing instances, and without requiring you to pay full price for additional normal instances.

For Compute Engine, disk size, machine type memory, and network usage are calculated in gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB). This means that GCP offers you to only pay based on the resource consumption you have allocated.

Now, if you have a high-grade, production database application, it's recommendable (and ideal) to attach or add a separate persistent disk. You would then use that disk as your database volume, as it offers you reliable and consistent disk performance in GCE. The higher the size you setup, the higher the IOPS it offers you. Checkout their list of persistent disk pricing to determine the price you would get. In addition to this, GCE has regional persistent disk which is suitable in case you require more solid and sustainable high-availability within your database cluster. Regional persistent disk adds more redundancy in the case that your instance terminates or crashes or becomes corrupted. It provides synchronous replication of data between two zones in one region which happens transparently in the VM instance. In the unlikely event of zone failure, your workload can fail-over to another VM instance in the same, or a secondary, zone. You can then force-attach your regional persistent disk to that instance. Force-attach time is estimated in less than one minute.

If you store backups as part of your disaster recovery solution, and requires a volume that is cluster-wide, GCP offers Cloud Filestore, NetApp Cloud Volumes, and some other alternative file-sharing solutions. These are fully-managed services that offers standard and premium services. You can checkout NetApp's pricing page here and Filestore pricing here.

Galera Encryption on GCP

GCP does not include specific support for the type of encryption available for Galera. GCP, however, encrypts customer data stored at rest by default, with no additional action required from you. GCP also offers another option to encrypt your data using Customer-managed encryption keys (CMEK) with Cloud KMS as well as with Customer-supplied encryption keys (CSEK). GCP also uses SSL/TLS encryption for all communications intercepted as data moves between your site and the cloud provider or between two services. This protection is achieved by encrypting the data before transmission; authenticating the endpoints; and decrypting and verifying the data on arrival.

Because Galera uses MySQL under the hood (Percona, MariaDB, or Codership build), you can take advantage of the File Key Management Encryption Plugin by MariaDB or by using the MySQL Keyring plugins. Here's an external blog by Percona which is a good resource on how you can implement this.

Galera Cluster Multi-AZ/Multi-Region/Multi-Cloud Deployments with GCP

Similarly to AWS, GCP does not offer direct support to deploy a Galera cluster on a Multi-AZ/-Region/-Cloud.

Galera Cluster High Availability, Scalability, and Redundancy on GCP

One of the primary reasons to use a Galera node cluster is the high-availability, redundancy, and it's ability to scale. If you are serving traffic globally, it's best that you cater your traffic based by regions with your architectural design including a geo-distribution of your database nodes. In order to achieve this, multi-AZ and multi-region or multi-cloud/multi-datacenter deployment is recommendable and achievable. This prevents the cluster from going down or a cluster malfunction due to lack of quorum.

To help you more with your scalability design, GCP also has an autoscaler you can set up with an autoscaling group. This will work as long as you created your cluster as managed instance groups. For example, you can monitor the CPU utilization or relying on the metrics from Stackdriver defined in your autoscaling policy. This allows you to provision and automate instances when a certain threshold is reached, or terminate the instances when it goes back to its normal state.

For multi-region or multi-cloud deployment, Galera has its own parameter called gmcast.segment for which you can set this upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments. This includes writeset relaying and IST and SST donor selection. This type of setup allows you to deploy multiple nodes in different regions. Aside from that, you can also deploy your Galera nodes on a different cloud vendor routing from GCP, AWS, Microsoft Azure, or within on-premise.

We recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Galera Cluster Database Performance on GCP

Since there's no available support for Galera in GCP your choices depend on the requirements and design of your application’s traffic and resource demands. For queries that are high on memory consumption, you can start with n1-highmem-2 instance. High CPU instances (n1-highcpu* family) can be a good fit if this is a high-transactional database, or a good fit for gaming applications.

Choosing the right storage and required IOPS for your database volume is a must. Generally, SSD-based persistent disk is your choice here. It depends on the volume of traffic is required, you might have to checkout the GCP storage options so you can determine the right size for your application.

We also recommend you to check and read our blog How to Improve Performance of Galera Cluster for MySQL or MariaDB to learn more about optimizing your Galera Cluster.

Galera Data Backups on GCP

Not only does your MySQL Galera data has to be backed-up, you should also backup the entire tier which comprises your database application. This includes log files (logical or binary), external files, temporary files, dump files, etc. Google recommends that you always create a snapshot of your persistent disks volumes which are being used by your GCE instances. You can easily create and schedule snapshots. GCP Snapshots are stored in Cloud Storage and you can select your desired location or region where the backup will be located. You can also setup a schedule for your snapshots as well as set a snapshot retention policy.

You can also use external services like, ClusterControl, which provides you both monitoring and backup solutions. Check this out if you want to know more.

Galera Cluster Database Monitoring on GCP

GCP does not offer database monitoring when using GCE. Monitoring your of instance health can be done through Stackdriver. For the database, though, you will need to grab an external monitoring tool which has advanced, highly-granular database metrics. There are a lot of choices you can choose from such as PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (Monitoring is FREE with ClusterControl Community.)

Galera Cluster Database Security on GCP

As discussed in our previous blog, you can take the same approach for securing your database in the public cloud. In GCP you can setup a private subnet, firewall rules to only allow the ports required for running Galera (particularly ports 3306, 4444, 4567, 4568). You can use NAT Gateway or setup a bastion host to access your private database nodes. When these nodes are encapsulated they cannot be accessed from the outside of the GCP premises. You can read our previous blog Deploying Secure Multicloud MySQL Replication on AWS and GCP with VPN on how we set this up.

In addition to this, you can secure your data-in-transit by using a TLS/SSL connection or by encrypting your data when it's at rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, you can follow the discussion I have stated earlier in the Encryption section of this blog.

Galera Cluster Troubleshooting

GCP offers Stackdriver Logging which you can leverage to help you with observability, monitoring, and notification requirements. The great thing about Stackdriver Logging is that it offers integration with AWS. With it you can catch the events selectively and then raise an alert based on that event. This can keep you in the loop on certain issues which may arise and help you during troubleshooting. GCP also has Cloud Audit Logs which provide you more traceable information from inside the GCP environment, from admin activity, data access, and system events.

If you're using ClusterControl, going to Logs -> System Logs, and you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.

Conclusion

The Google Cloud Platform offers a wide-variety of efficient and powerful services that you can leverage. There are indeed pros and cons for each of public cloud platforms, but GCP proves that AWS doesn’t have a lock on the cloud.

It's interesting that big companies such as Vimeo are moving to GCP coming from on-premise and they experienced some interesting results in their technology stack. Bloomberg as well is happy with GCP and is using Percona XtraDB Cluster (a Galera variant). Let us know what you think about using GCP for MySQL Galera setups in the comments below.

Tags:

mariadb galera cluster

High availability is a high percentage of time that the system is working and responding according to the business needs. For production database systems it is typically the highest priority to keep it close to 100%. We build database clusters to eliminate all single point of failure. If an instance becomes unavailable, another node should be able to take the workload and carry on from there. In a perfect world, a database cluster would solve all of our system availability problems. Unfortunately, while all may look good on paper, the reality is often different. So where can it go wrong?

Transactional databases systems come with sophisticated storage engines. Keeping data consistent across multiple nodes makes this task way harder. Clustering introduces a number of new variables that highly depend on network and underlying infrastructure. It is not uncommon for a standalone database instance that was running fine on a single node suddenly performs poorly in a cluster environment.

Among the number of things that can affect cluster availability, latency issues play a crucial role. However, what is the latency? Is it only related to the network?

The term "latency" actually refers to several kinds of delays incurred in the processing of data. It’s how long it takes for a piece of information to move from stage to another.

In this blog post, we’ll look at the two main high availability solutions for MySQL and MariaDB, and how they can each be affected by latency issues.

At the end of the article, we take a look at modern load balancers and discuss how they can help you address some types of latency issues.

In a previous article, my colleague Krzysztof Książek wrote about "Dealing with Unreliable Networks When Crafting an HA Solution for MySQL or MariaDB". You will find tips which can help you to design your production ready HA architecture, and avoid some of the issues described here.

Master-Slave replication for High Availability.

MySQL master-slave replication is probably the most popular database cluster type on the planet. One of the main things you want to monitor while running your master-slave replication cluster is the slave lag. Depending on your application requirements and the way how you utilize your database, the replication latency (slave lag) may determine if the data can be read from the slave node or not. Data committed on master but not yet available on an asynchronous slave means that the slave has an older state. When it’s not ok to read from a slave, you would need to go to the master, and that can affect application performance. In the worst case scenario, your system will not be able to handle all the workload on a master.

Slave lag and stale data

To check the status of the master-slave replication, you should start with below command:

SHOW SLAVE STATUS\G
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.3.100
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: binlog.000021
          Read_Master_Log_Pos: 5101
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 809
        Relay_Master_Log_File: binlog.000021
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 5101
              Relay_Log_Space: 1101
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 0-3-1179
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
1 row in set (0.01 sec)

Using the above information you can determine how good the overall replication latency is. The lower the value you see in "Seconds_Behind_Master", the better the data transfer speed for replication.

Another way to monitor slave lag is to use ClusterControl replication monitoring. In this screenshot we can see the replication status of asymchoronous Master-Slave (2x) Cluster with ProxySQL.

There are a number of things that can affect replication time. The most obvious is the network throughput and how much data you can transfer. MySQL comes with multiple configuration options to optimize replication process. The essential replication related parameters are:

Parallel apply
Logical clock algorithm
Compression
Selective master-slave replication
Replication mode

Parallel apply

It’s not uncommon to start replication tuning with enabling parallel process apply. The reason for that is by default, MySQL goes with sequential binary log apply, and a typical database server comes with several CPUs to use.

To get around sequential log apply, both MariaDB and MySQL offer parallel replication. The implementation may differ per vendor and version. E.g. MySQL 5.6 offers parallel replication as long as a schema separates the queries while MariaDB (starting version 10.0) and MySQL 5.7 both can handle parallel replication across schemas. Different vendors and versions come with their limitations and feature so always check the documentation.

Executing queries via parallel slave threads may speed up your replication stream if you are write heavy. However, if you aren’t, it would be best to stick to the traditional single-threaded replication. To enable parallel processing, change the slave_parallel_workers to the number of CPU threads you want to involve in the process. It is recommended to keep the value lower of the number of available CPU threads.

Parallel replication works best with the group commits. To check if you have group commits happening run following query.

show global status like 'binlog_%commits';

The bigger the ratio between these two values the better.

Logical clock

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock algorithm. When using a multithreaded slave this variable specifies the method used to decide which transactions are allowed to execute in parallel on the slave. The variable has no effect on slaves for which multithreading is not enabled so make sure slave_parallel_workers is set higher than 0.

MariaDB users should also check optimistic mode introduced in version 10.1.3 as it also may give you better results.

GTID

MariaDB comes with its own implementation of GTID. MariaDB’s sequence consists of a domain, server, and transaction. Domains allow multi-source replication with distinct ID. Different domain ID’s can be used to replicate the portion of data out-of-order (in parallel). As long it’s okayish for your application this can reduce replication latency.

The similar technique applies to MySQL 5.7 which can also use the multisource master and independent replication channels.

Compression

CPU power is getting less expensive over time, so using it for binlog compression could be a good option for many database environments. The slave_compressed_protocol parameter tells MySQL to use compression if both master and slave support it. By default, this parameter is disabled.

Starting from MariaDB 10.2.3, selected events in the binary log can be optionally compressed, to save the network transfers.

Replication formats

MySQL offers several replication modes. Choosing the right replication format helps to minimize the time to pass data between the cluster nodes.

Multimaster Replication For High Availability

Some applications can not afford to operate on outdated data.

In such cases, you may want to enforce consistency across the nodes with synchronous replication. Keeping data synchronous requires an additional plugin, and for some, the best solution on the market for that is Galera Cluster.

Galera cluster comes with wsrep API which is responsible of transmitting transactions to all nodes and executing them according to a cluster-wide ordering. This will block the execution of subsequent queries until the node has applied all write-sets from its applier queue. While it’s a good solution for consistency, you may hit some architectural limitations. The common latency issues can be related to:

The slowest node in the cluster
Horizontal scaling and write operations
Geolocated clusters
High Ping
Transaction size

The slowest node in the cluster

By design, the write performance of the cluster cannot be higher than the performance of the slowest node in the cluster. Start your cluster review by checking the machine resources and verify the configuration files to make sure they all run on the same performance settings.

Parallelization

Parallel threads do not guarantee better performance, but it may speed up the synchronization of new nodes with the cluster. The status wsrep_cert_deps_distance tells us the possible degree of parallelization. It is the value of the average distance between the highest and lowest seqno values that can be possibly applied in parallel. You can use the wsrep_cert_deps_distance status variable to determine the maximum number of slave threads possible.

Horizontal scaling

By adding more nodes in the cluster, we have fewer points that could fail; however, the information needs to go across multi-instances until it’s committed, which multiplies the response times. If you need scalable writes, consider an architecture based on sharding. A good solution can be a Spider storage engine.

In some cases, to reduce information shared across the cluster nodes, you can consider having one writer at a time. It’s relatively easy to implement while using a load balancer. When you do this manually make sure you have a procedure to change DNS value when your writer node goes down.

Geolocated clusters

Although Galera Cluster is synchronous, it is possible to deploy a Galera Cluster across data centers. Synchronous replication like MySQL Cluster (NDB) implements a two-phase commit, where messages are sent to all nodes in a cluster in a 'prepare' phase, and another set of messages are sent in a 'commit' phase. This approach is usually not suitable for geographically disparate nodes, because of the latencies in sending messages between nodes.

High Ping

Galera Cluster with the default settings does not handle well high network latency. If you have a network with a node that shows a high ping time, consider changing evs.send_window and evs.user_send_window parameters. These variables define the maximum number of data packets in replication at a time. For WAN setups, the variable can be set to a considerably higher value than the default value of 2. It’s common to set it to 512. These parameters are part of wsrep_provider_options.

--wsrep_provider_options="evs.send_window=512;evs.user_send_window=512"

Transaction size

One of the things you need to consider while running Galera Cluster is the size of the transaction. Finding the balance between the transaction size, performance and Galera certification process is something you have to estimate in your application. You can find more information about that in the article How to Improve Performance of Galera Cluster for MySQL or MariaDB by Ashraf Sharif.

Load Balancer Causal Consistency Reads

Even with the minimized risk of data latency issues, standard MySQL asynchronous replication cannot guarantee consistency. It is still possible that the data is yet not replicated to slave while your application is reading it from there. Synchronous replication can solve this problem, but it has architecture limitations and may not fit your application requirements (e.g., intensive bulk writes). So how to overcome it?

The first step to avoid stale data reading is to make the application aware of replication delay. It is usually programmed in application code. Fortunately, there are modern database load balancers with the support of adaptive query routing based on GTID tracking. The most popular are ProxySQL and Maxscale.

ProxySQL 2.0

ProxySQL Binlog Reader allows ProxySQL to know in real time which GTID has been executed on every MySQL server, slaves and master itself. Thanks to this, when a client executes a reads that needs to provide causal consistency reads, ProxySQL immediately knows on which server the query can be executed. If for whatever reason the writes were not executed on any slave yet, ProxySQL will know that the writer was executed on master and send the read there.

Maxscale 2.3

MariaDB introduced casual reads in Maxscale 2.3.0. The way it works it’s similar to ProxySQL 2.0. Basically when causal_reads are enabled, any subsequent reads performed on slave servers will be done in a manner that prevents replication lag from affecting the results. If the slave has not caught up to the master within the configured time, the query will be retried on the master.

Tags:

Microsoft Azure is known to many as an alternative public cloud platform to Amazon AWS. It's not easy to directly compare these two giant companies. Microsoft's cloud business -- dubbed commercial cloud -- includes everything from Azure to Office 365 enterprise subscriptions to Dynamics 365 to LinkedIn services. After LinkedIn was acquired by Microsoft it began moving its infrastructure to Azure. While moving LinkedIn to Azure could take some time, it demonstrates Microsoft Azure’s capabilities and ability to handle millions of transactions. Microsoft's strong enterprise heritage, software stack, and data center tools offer both familiarity and a hybrid approach to cloud deployments.

Microsoft Azure is built as an Infrastructure as a Service (IaaS) as well as a Platform as a Service (PaaS). The Azure Virtual machine offers per-second billing and it's currently a multi-tenant compute. It has, however, recently previewed its new offering which allows virtual machines to run on single-tenant physical servers. The offering is called Azure Dedicated Hosts.

Azure also offers specialized large instances (such as for SAP HANA). There are multitenant blocks, file storage, and many other additional IaaS and PaaS capabilities. These include object storage (Azure Blob Storage), a CDN, a Docker-based container service (Azure Container Service), a batch computing service (Azure Batch), and event-driven “serverless computing” (Azure Functions). The Azure Marketplace offers third-party software and services. Colocation needs are met via partner exchanges (Azure ExpressRoute) offered from partners like Equinix and CoreSite.

With all of these offerings Microsoft Azure has stepped up its game to play a vital role in the public cloud market. The PaaS infrastructure offered to its consumers has garnered a lot of trust and many are moving their own infrastructure or private cloud to Microsoft Azure's public cloud infrastructure. This is especially advantageous for consumers who need integration with other Windows Services, such as Visual Studio.

So what’s different between Azure and the other clouds we have looked at in this series? Microsoft has focused heavily on AI, analytics, and the Internet of Things. AzureStack is another “cloud-meets-data center” effort that has been a real differentiator in the market.

Microsoft Azure Migration Pros & Cons

There are several things you should consider when moving your legacy applications or infrastructure to Microsoft Azure.

Strengths

Enterprises that are strategically committed to Microsoft technology generally choose Azure as their primary IaaS+PaaS provider. The integrated end-to-end experience for enterprises building .NET applications using Visual Studio (and related services) is unsurpassed. Microsoft is also leveraging its tremendous sales reach and ability to co-sell Azure with other Microsoft products and services in order to drive adoption.
Azure provides a well-integrated approach to edge computing and Internet of Things (IoT), with offerings that reach from its hyperscale data center out through edge solutions such as AzureStack and Data Box Edge.
Microsoft Azure’s capabilities have become increasingly innovative and open. 50% of the workloads are Linux-based alongside numerous open-source application stacks. Microsoft has a unique vision for the future that involves bringing in technology partners through native, first-party offerings such as those from VMware, NetApp, Red Hat, Cray and Databricks.

Cautions

Microsoft Azure’s reliability issues continue to be a challenge for customers, largely as a result of Azure’s growing pains. Since September 2018, Azure has had multiple service-impacting incidents, including significant outages involving Azure Active Directory. These outages leave customers with no ability to mitigate the downtime.
Gartner clients often experience challenges with executing on-time implementations within budget. This comes from Microsoft often providing unreasonably high expectations for customers. Much of this stems from the Microsoft’s field sales teams being “encouraged” to appropriately position and sell Azure within its customer base.
Enterprises frequently lament the quality of Microsoft technical support (along with the increasing cost of support) and field solution architects. This negatively impacts customer satisfaction, and slows Azure adoption and therefore customer spending.

Microsoft may not be your first choice as it has been seen as a “not-so-open-source-friendly” tech giant, but in fairness it has embraced a lot of activity and support within the Open Source world. Microsoft Azure offers fully-managed services to most of the top open source RDBMS database like PostgreSQL, MySQL, and MariaDB.

Galera Cluster (Percona, Codership, or MariaDB) variants, unfortunately, aren't supported by Azure. The only way you can deploy your Galera Cluster to Azure is by means of a Virtual Machine. You may also want to check their blog on using MariaDB Enterprise Cluster (which is based on Galera) on Azure.

Azure's Virtual Machine

Virtual Machine is the equivalent offering for compute instances in GCP and AWS. An Azure Virtual Machine is an on-demand, high-performance computing server in the cloud and can be deployed in Azure using various methods. These might include the user interface within the Azure portal, using pre-configured images in the Azure marketplace, scripting through Azure PowerShell, deploying from a template that is defined by using a JSON file, or by deploying directly through Visual Studio.

Azure uses a deployment model called the Azure Resource Manager (ARM), which defines all resources that form part of your overall application solution, allowing you to deploy, update, or delete your solution in a single operation.

Resources may include the storage account, network configurations, and IP addresses. You may have heard the term “ARM templates”, which essentially means the JSON template which defines the different aspects of your solution which you are trying to deploy.

Azure Virtual Machines come in different types and sizes, with names beginning with A-series to N-series. Each VM type is built with specific workloads or performance needs in mind, including general purpose, compute optimized, storage optimized or memory optimized. You can also deploy less common types like GPU or high performance compute VMs.

Similar to other public cloud offerings, you can do the following in your virtual machine instances...

Encrypt your disk on virtual machine. Although this does not come easily when compared to GCP and AWS. Encrypting your virtual machine requires a more manual approach. It requires you to complete the Azure Disk Encryption prerequisites. Since Galera does not support Windows, we're only talking here about Linux-based images. Basically, it requires you to have dm-crypt and vfat modules present in the system. Once you get that piece right, then you can encrypt the VM using the Azure CLI. You can check out how to Enable Azure Disk Encryption for Linux IaaS VMs to see how to do it. Encrypting your disk is very important, especially if your company or organization requires that your Galera Cluster data must follow the standards mandated by laws and regulations such as PCI DSS or GDPR.
Creating a snapshot. You can create a snapshot either using the Azure CLI or through the portal. Check their manual on how to do it.
Use auto scaling or Virtual Machine Scale Sets if you require horizontal scaling. Check out the overview of autoscaling in Azure or the overview of virtual machine scale sets.
Multi Zone Deployment. Deploy your virtual machine instances into different availability zones to avoid single-point of failure.

You can also create (or get information from) your virtual machines in different ways. You can use the Azure portal, Azure PowerShell, REST APIs, Client SDKs, or with the Azure CLI. Virtual machines in the Azure virtual network can also easily be connected to your organization’s network and treated as an extended datacenter.

Microsoft Azure Pricing

Just like other public cloud providers, Microsoft Azure also offers a free tier with some free services. It also offers pay-as-you-go options and reserved instances to choose from. Pay-as-you-go starts at $0.008/hour - $0.126/hour.

For reserved instances, the longer you commit and contract with Azure, the more you save on the cost. Microsoft Azure claims to help subscribers save up to 72% of their billing costs compared to its pay-as-you-go model when subscribers sign up for a one to three year term for a Windows or Linux Virtual Machine. Microsoft also offers added flexibility in the sense that if your business needs change, you can cancel your Azure RI subscription at any time and return the remaining unused RI to Microsoft as an early termination fee.

Let's checkout it's pricing in comparison between GCP, AWS EC2, and an Azure Virtual Machine. This is based on us-east1 region and we will compare the price ranges for the compute instances required to run your Galera Cluster.

Machine/
Instance
Type

Google
Compute Engine

AWS EC2

Microsoft
Azure

Shared

f1-micro

G1-small

Prices starts at $0.006 - $0.019 hourly

t2.nano – t3a.2xlarge

Price starts at $0.0058 - $0.3328 hourly

B-Series

Price starts at $0.0052 - $0.832 hourly

Standard

n1-standard-1 – n1-standard-96

Prices starts at $0.034 - $3.193 hourly

m4.large – m4.16xlarge

m5.large – m5d.metal

Prices starts at $0.1 - $5.424 hourly

Av2 Standard, D2-64 v3 latest generation, D2s-64s v3 latest generation, D1-5 v2, DS1-S5 v2, DC-series

Price starts at $0.043 - $3.072 hourly

High Memory/ Memory Optimized

n1-highmem-2 – n1-highmem-96

n1-megamem-96

n1-ultramem-40 – n1-ultramem-160

Prices starts at $0.083 - $17.651 hourly

r4.large – r4.16xlarge

x1.16xlarge – x1.32xlarge

x1e.xlarge – x1e.32xlarge

Prices starts at $0.133 - $26.688 hourly

D2a – D64a v3, D2as – D64as v3, E2-64 v3 latest generation, E2a – E64a v3, E2as – E64as v3, E2s-64s v3 latest generation, D11-15 v2, DS11-S15 v2, M-series, Mv2-series, Instances, Extreme Memory Optimized

Price starts at $0.043 - $44.62 hourly

High CPU/Storage Optimized

n1-highcpu-2 – n1-highcpu-32

Prices starts at $0.05 - $2.383 hourly

h1.2xlarge – h1.16xlarge

i3.large – i3.metal

I3en.large - i3en.metal

d2.xlarge – d2.8xlarge

Prices starts at $0.156 - $10.848 hourly

Fsv2-series, F-series, Fs-Series

Price starts at $0.0497 - $3.045 hourly

Data Encryption on Microsoft Azure

Microsoft Azure does not offer encryption support directly for Galera Cluster (or vice-versa). There are, however, ways you can encrypt data either at-rest or in-transit.

Encryption in-transit is a mechanism for protecting data when it's transmitted across networks. With Azure Storage, you can secure data by using:

Transport-level encryption, such as HTTPS, when you transfer data into or out of Azure Storage.
Wire encryption, such as SMB 3.0 encryption, for Azure file shares.
Client-side encryption, to encrypt the data before it's transferred into Storage and to decrypt the data after it is transferred out of Storage.

Microsoft uses encryption to protect customer data when it’s in-transit between customers realm and Microsoft cloud services. More specifically, Transport Layer Security (TLS) is the protocol that Microsoft’s data centers will use to negotiate with client systems that are connected to Microsoft cloud services.

Perfect Forward Secrecy (PFS) is also employed so that each connection between customers’ client systems and Microsoft’s cloud services use unique keys. Connections to Microsoft cloud services also take advantage of RSA based 2,048-bit encryption key lengths.

Encryption At-Rest

For many organizations, data encryption at-rest is a mandatory step towards achieving data privacy, compliance, and data sovereignty. Three Azure features provide encryption of data at-rest:

Storage Service Encryption is always enabled and automatically encrypts storage service data when writing it to Azure Storage. If your application logic requires your MySQL Galera Cluster database to store valuable data, then storing to Azure Storage can be an option.
Client-side encryption also provides the feature of encryption at-rest.
Azure Disk Encryption enables you to encrypt the OS disks and data disks that an IaaS virtual machine uses. Azure Disk Encryption also supports enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm, and by enabling encryption on Linux VMs by using LVM for data disks

Galera Cluster Multi-AZ/Multi-Region/Multi-Cloud Deployments with GCP

Similar to AWS and GCP, Microsoft Azure does not offer direct support for deploying a Galera Cluster onto a Multi-AZ/-Region/-Cloud. You can, however, deploy your nodes manually as well as creating scripts using PowerShell or Azure CLI to do this for you. Alternatively, when you provision your Virtual Machine instance you can place your nodes in different availability zones. Microsoft Azure also offers another type of redundancy, aside from having its availability zone, which is called Virtual Machine Scale Sets. You can check the differences between virtual machine and scale sets.

Galera Cluster High Availability, Scalability, and Redundancy on Azure

One of the primary reasons for using a Galera node cluster is for high-availability, redundancy, and for its ability to scale. If you are serving traffic globally, it's best that you cater your traffic by region. You should ensure your architectural design includes geo-distribution of your database nodes. In order to achieve this, multi-AZ, multi-region, or multi-cloud/multi-datacenter deployments are recommended. This prevents the cluster from going down as well as a malfunction due to lack of quorum.

As mentioned earlier, Microsoft Azure has an auto scaling solution which can be leveraged using scale sets. This allows you to autoscale a node when a certain threshold has been met (based on what you are monitoring). This depends on which health status items you are monitoring before it then vertically scales. You can check out their tutorial on this topic here.

For multi-region or multi-cloud deployments, Galera has its own parameter called gmcast.segment for which can be set upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments. This includes writeset relaying and IST and SST donor selection. This type of setup allows you to deploy multiple nodes in different regions. Aside from that, you can also deploy your Galera nodes on a different cloud vendors routing from GCP, AWS, Microsoft Azure, or within an on-premise setup.

Galera Cluster Database Performance on Microsoft Azure

The underlying host machines used by virtual machines in Azure are, in fact, very powerful. The newest VM's in Azure have already been equipped with network optimization modules. You can check this in your kernel info by running (e.g. in Ubuntu).

uname -r|grep azure

Note: Make certain that your command has the "azure" string on it.

For Centos/RHEL, installing any Linux Integration Services (LIS) since version 4.2 contains network optimization. To learn more about this, visit the page on optimizing network throughput.

If your application is very sensitive to network latency, you might be interested in looking at the proximity placement group. It's currently in preview (and not yet recommended for production use) but this helps optimize your network throughput.

For the type of virtual machine you would consume, then this would depend on the requirement of your application traffic and resource demands. For queries that are high on memory consumption, you can start with Dv3. However, for memory-optimized, then start with the Ev3 series. For High CPU requirements, such as high-transactional database or gaming applications, then start with Fsv2 series.

Choosing the right storage and required IOPS for your database volume is a must. Generally, a SSD-based persistent disk is your ideal choice. Begin with Standard SSD which is cost-effective and offers consistent performance. This decision, however, might depend on if you need more IOPS in the long run. If this is the case, then you should go for Premium SSD storage.

We also recommend you to check and read our blog How to Improve Performance of Galera Cluster for MySQL or MariaDB to learn more about optimizing your Galera Cluster.

Database Backup for Galera Nodes on Azure

There's no existing naitve backup support for your MySQL Galera data in Azure, but you can take a snapshot. Microsoft Azure offers Azure VM Backup which takes a snapshot which can be scheduled and encrypted.

Alternatively, if you want to backup the data files from your Galera Cluster, you can also use external services like ClusterControl, use Percona Xtrabackup for your binary backup, or use mysqldump or mydumper for your logical backups. These tools provide backup copies for your mission-critical data and you can read this if you want to learn more.

Galera Cluster Monitoring on Azure

Microsoft Azure has its monitoring service named Azure Monitor. Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premise environments. It helps you understand how your applications are performing and proactively identifies issues affecting them (and the resources they depend on). You can setup or create health alerts, get notified on advisories and alerts detected in the services you deployed.

If you want monitoring specific to your database, then you will need to utilize external monitoring tools which have advanced, highly-granular database metrics. There are several choices you can choose from such as PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (Monitoring is FREE with ClusterControl Community.)

Galera Cluster Database Security on Azure

As discussed in our previous blogs for AWS and GCP, you can take the same approach for securing your database in the public cloud. Once you create a virtual machine, you can specify what ports only can be opened, or create and setup your Network Security Group in Azure. You can setup the ports need to be open (particularly ports 3306, 4444, 4567, 4568), or create a Virtual Network in Azure and specify the private subnets if they remain as a private node. To add this, if you setup your VM's in Azure without a public IP, it can still an outbound connection merely because it uses SNAT and PAT. If you're familiar with AWS and GCP, you'll like this explanation to make it easier to comprehend.

Another feature available is Role-Based Access Control in Microsoft Azure. This gives you control on which people that access to the specific resources they need.

In addition to this, you can secure your data-in-transit by using a TLS/SSL connection or by encrypting your data when it's at-rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, you can follow the discussion I have stated earlier in the Encryption section of this blog.

Galera Cluster Troubleshooting

Microsoft Azure offers a wide array of log types to aid troubleshooting and auditing. The logs Activity logs, Azure diagnostics logs, Azure AD reporting, Virtual machines and cloud services, Network Security Group (NSG) flow logs, and Application insight are very useful when troubleshooting. It might not always be necessary to go into all of these when you need troubleshooting, however, it would add more insights and clues when checking the logs.

Conclusion

As we finish this three part blog series, we have showed you the offerings and the advantages of each of the tech-giants serving the public cloud industry. There are advantages and disadvantages when selecting one over the other, but what matters most is your reason for moving to a public cloud, its benefits for your organization, and how it serves the requirements of your application.

The choice of provider for your Galera Cluster may involve financial considerations like “what's most cost-efficient” and better suits your budgetary needs. It could also be due to privacy laws and regulation compliance, or even because of the technology stack you are wanting to use. What's important is how your application and database will perform once it's in the cloud handling large amounts of traffic. It has to be highly-available, must be resilient, has the right levels of scalability and redundancy, and takes backups to ensure data recovery.

Tags:

galera

galera cluster

mariadb galera cluster

MySQL Galera Cluster 4.0 is the new kid on the database block with very interesting new features. Currently it is available only as a part of MariaDB 10.4 but in the future it will work as well with MySQL 5.6, 5.7 and 8.0. In this blog post we would like to go over some of the new features that came along with Galera Cluster 4.0.

Galera Cluster Streaming Replication

The most important new feature in this release is streaming replication. So far the certification process for the Galera Cluster worked in a way that whole transactions had to be certified after they completed.

This process was not ideal in several scenarios...

Hotspots in tables, rows which are very frequently updated on multiple nodes - hundreds of fast transactions running on multiple nodes, modifying the same set of rows result in frequent deadlocks and rollback of transactions
Long running transactions - if a transaction takes significant time to complete, this seriously increases chances that some other transaction, in the meantime, on another node, may modify some of the rows that were also updated by the long transaction. This resulted in a deadlock during certification and one of the transactions having to be rolled back.
Large transactions - if a transaction modifies a significant number of rows, it is likely that another transaction, at the same time, on a different node, will modify one of the rows already modified by the large transaction. This results in a deadlock during certification and one of the transactions has to be rolled back. In addition to this, large transactions will take additional time to be processed, sent to all nodes in the cluster and certified. This is not an ideal situation as it adds delay to commits and slows down the whole cluster.

Luckily, streaming replication can solve these problems. The main difference is that the certification happens in chunks where there is no need to wait for the whole transaction to complete. As a result, even if a transaction is large or long, majority (or all, depending on the settings we will discuss in a moment) of rows are locked on all of the nodes, preventing other queries from modifying them.

MySQL Galera Cluster Streaming Replication Options

There are two configuration options for streaming replication:

wsrep_trx_fragment_size

This tells how big a fragment should be (by default it is set to 0, which means that the streaming replication is disabled)

wsrep_trx_fragment_unit

This tells what the fragment really is. By default it is bytes, but it can also be a ‘statements’ or ‘rows’.

Those variables can (and should) be set on a session level, making it possible for user to decide which particular query should be replicated using streaming replication. Setting unit to ‘statements’ and size to 1 allow, for example, to use streaming replication just for a single query which, for example, updates a hotspot.

You can configure Galera 4.0 to certify every row that you have modified and grab the locks on all of the nodes while doing so. This makes streaming replication great at solving problems with frequent deadlocks which, prior to Galera 4.0, were possible to solve only by redirecting all writes to a single node.

WSREP Tables

Galera 4.0 introduces several tables, which will help to monitor the state of the cluster:

wsrep_cluster
wsrep_cluster_members
wsrep_streaming_log

All of them are located in the ‘mysql’ schema. wsrep_cluster will provide insight into the state of the cluster. wsrep_cluster_members will give you information about the nodes that are part of the cluster. wsrep_streaming_log helps to track the state of the streaming replication.

Galera Cluster Upcoming Features

Codership, the company behind the Galera, isn’t done yet. We were able to get a preview of the roadmap from CEO, Seppo Jaakola which was given at Percona Live earlier this year. Apparently, we are going to see features like XA transaction support and gcache encryption. This is really good news.

Support for XA transactions will be possible thanks to the streaming replication. In short, XA transactions are the distributed transactions which can run across multiple nodes. They utilize two-phase commit, which requires to first acquire all required locks to run the transaction on all of the nodes and then, once it is done, commit the changes. In previous versions Galera did not have means to lock resources on remote nodes, with streaming replication this has changed.

Gcache is a file which stores writesets. Its contents are sent to joiner nodes which asks for a data transfer. If all data is stored in the gcache, joiner will receive just the missing transactions in the process called Incremental State Transfer (IST). If gcache does not contain all required data, State Snapshot Transfer (SST) will be required and the whole dataset will have to be transferred to the joining node.

Gcache contains information about recent changes, therefore it’s great to see its contents encrypted for better security. With better security standards being introduced through more and more regulations, it is crucial that the software will become better at achieving compliance.

Conclusion

We are definitely looking forward to see how Galera Cluster 4.0 will work out on databases than MariaDB. Being able to deploy MySQL 5.7 or 8.0 with Galera Cluster will be really great. After all, Galera is one of the most widely tested synchronous replication solutions that are available on the market.

Tags:

mariadb galera cluster

It is quite common to see databases distributed across multiple geographical locations. One scenario for doing this type of setup is for disaster recovery, where your standby data center is located in a separate location than your main datacenter. It might as well be required so that the databases are located closer to the users.

The main challenge to achieving this setup is by designing the database in a way that reduces the chance of issues related to the network partitioning. One of the solutions might be to use Galera Cluster instead of regular asynchronous (or semi-synchronous) replication. In this blog we will discuss the pros and cons of this approach. This is the first part in a series of two blogs. In the second part we will design the geo-distributed Galera Cluster and see how ClusterControl can help us deploy such environment.

Why Galera Cluster Instead of Asynchronous Replication for Geo-Distributed Clusters?

Let’s consider the main differences between the Galera and regular replication. Regular replication provides you with just one node to write to, this means that every write from remote datacenter would have to be sent over the Wide Area Network (WAN) to reach the master. It also means that all proxies located in the remote datacenter will have to be able to monitor the whole topology, spanning across all data centers involved as they have to be able to tell which node is currently the master.

This leads to the number of problems. First, multiple connections have to be established across the WAN, this adds latency and slows down any checks that proxy may be running. In addition, this adds unnecessary overhead on the proxies and databases. Most of the time you are interested only in routing traffic to the local database nodes. The only exception is the master and only because of this proxies are forced to watch the whole infrastructure rather than just the part located in the local datacenter. Of course, you can try to overcome this by using proxies to route only SELECTs, while using some other method (dedicated hostname for master managed by DNS) to point the application to master, but this adds unnecessary levels of complexity and moving parts, which could seriously impact your ability to handle multiple node and network failures without losing data consistency.

Galera Cluster can support multiple writers. Latency is also a factor, as all nodes in the Galera cluster have to coordinate and communicate to certify writesets, it can even be the reason you may decide not to use Galera when latency is too high. It is also an issue in replication clusters - in replication clusters latency affects only writes from the remote data centers while the connections from the datacenter where master is located would benefit from a low latency commits.

In MySQL Replication you also have to take the worst case scenario in mind and ensure that the application is ok with delayed writes. Master can always change and you cannot be sure that all the time you will be writing to a local node.

Another difference between replication and Galera Cluster is the handling of the replication lag. Geo-distributed clusters can be seriously affected by lag: latency, limited throughput of the WAN connection, all of this will impact the ability of a replicated cluster to keep up with the replication. Please keep in mind that replication generates one to all traffic.

All slaves have to receive whole replication traffic - the amount of data you have to send to remote slaves over WAN increases with every remote slave that you add. This may easily result in the WAN link saturation, especially if you do plenty of modifications and WAN link doesn’t have good throughput. As you can see on the diagram above, with three data centers and three nodes in each of them master has to sent 6x the replication traffic over WAN connection.

With Galera cluster things are slightly different. For starters, Galera uses flow control to keep the nodes in sync. If one of the nodes start to lag behind, it has an ability to ask the rest of the cluster to slow down and let it catch up. Sure, this reduces the performance of the whole cluster, but it is still better than when you cannot really use slaves for SELECTs as they tend to lag from time to time - in such cases the results you will get might be outdated and incorrect.

Another feature of Galera Cluster, which can significantly improve its performance when used over WAN, are segments. By default Galera uses all to all communication and every writeset is sent by the node to all other nodes in the cluster. This behavior can be changed using segments. Segments allow users to split Galera cluster in several parts. Each segment may contain multiple nodes and it elects one of them as a relay node. Such node receives writesets from other segments and redistribute them across Galera nodes local to the segment. As a result, as you can see on the diagram above, it is possible to reduce the replication traffic going over WAN three times - just two “replicas” of the replication stream are being sent over WAN: one per datacenter compared to one per slave in MySQL Replication.

Galera Cluster Network Partitioning Handling

Where Galera Cluster shines is the handling of the network partitioning. Galera Cluster constantly monitors the state of the nodes in the cluster. Every node attempts to connect with its peers and exchange the state of the cluster. If subset of nodes is not reachable, Galera attempts to relay the communication so if there is a way to reach those nodes, they will be reached.

An example can be seen on the diagram above: DC 1 lost the connectivity with DC2 but DC2 and DC3 can connect. In this case one of the nodes in DC3 will be used to relay data from DC1 to DC2 ensuring that the intra-cluster communication can be maintained.

Galera Cluster is able to take actions based on the state of the cluster. It implements quorum - majority of the nodes have to be available in order for the cluster to be able to operate. If node gets disconnected from the cluster and cannot reach any other node, it will cease to operate.

As can be seen on the diagram above, there’s a partial loss of the network communication in DC1 and affected node is removed from the cluster, ensuring that the application will not access outdated data.

This is also true on a larger scale. The DC1 got all of its communication cut off. As a result, whole datacenter has been removed from the cluster and neither of its nodes will serve the traffic. The rest of the cluster maintained majority (6 out of 9 nodes are available) and it reconfigured itself to keep the connection between DC 2 and DC3. In the diagram above we assumed the write hits the node in DC2 but please keep in mind that Galera is capable of running with multiple writers.

MySQL Replication does not have any kind of cluster awareness, making it problematic to handle network issues. It cannot shut down itself upon losing connection with other nodes. There is no easy way of preventing old master to show up after the network split.

The only possibilities are limited to the proxy layer or even higher. You have to design a system, which would try to understand the state of the cluster and take necessary actions. One possible way is to use cluster-aware tools like Orchestrator and then run scripts that would check the state of the Orchestrator RAFT cluster and, based on this state, take required actions on the database layer. This is far from ideal because any action taken on a layer higher than the database, adds additional latency: it makes possible so the issue shows up and data consistency is compromised before correct action can be taken. Galera, on the other hand, takes actions on the database level, ensuring the fastest reaction possible.

Tags:

MySQL

galera

mariadb cluster

mariadb galera cluster

galera cluster

In the previous blog in the series we discussed the pros and cons of using Galera Cluster to create geo-distributed cluster. In this post we will design a Galera-based geo-distributed cluster and we will show how you can deploy all the required pieces using ClusterControl.

Designing a Geo-Distributed Galera Cluster

We will start with explaining the environment we want to build. We will use three remote data centers, connected via Wide Area Network (WAN). Each datacenter will receive writes from local application servers. Reads will also be only local. This is intended to avoid unnecessary traffic crossing the WAN.

For this setup the connectivity is in place and secured, but we won’t describe exactly how this can be achieved. There are numerous methods to secure the connectivity starting from proprietary hardware and software solutions through OpenVPN and ending up on SSH tunnels.

We will use ProxySQL as a loadbalancer. ProxySQL will be deployed locally in each datacenter. It will also route traffic only to the local nodes. Remote nodes can always be added manually and we will explain cases where this might be a good solution. Application can be configured to connect to one of the local ProxySQL nodes using round-robin algorithm. We can as well use Keepalived and Virtual IP to route the traffic towards the single ProxySQL node, as long as a single ProxySQL node would be able to handle all of the traffic.

Another possible solution is to collocate ProxySQL with application nodes and configure the application to connect to the proxy on the localhost. This approach works quite well under the assumption that it is unlikely that ProxySQL will not be available yet the application would work ok on the same node. Typically what we see is either node failure or network failure, which would affect both ProxySQL and application at the same time.

Geo-Distributed MySQL Galera Cluster with ProxySQL

The diagram above shows the version of the environment, where ProxySQL is collocated on the same node as the application. ProxySQL is configured to distribute the workload across all Galera nodes in the local datacenter. One of those nodes would be picked as a node to send the writes to while SELECTs would be distributed across all nodes. Having one dedicated writer node in a datacenter helps to reduce the number of possible certification conflicts, leading to, typically, better performance. To reduce this even further we would have to start sending the traffic over the WAN connection, which is not ideal as the bandwidth utilization would significantly increase. Right now, with segments in place, only two copies of the writeset are being sent across datacenters - one per DC.

The main concern with Galera Cluster geo-distributed deployments is latency. This is something you always have to test prior launching the environment. Am I ok with the commit time? At every commit certification has to happen so writesets have to be sent and certified on all nodes in the cluster, including remote ones. It may be that the high latency will deem the setup unsuitable for your application. In that case you may find multiple Galera clusters connected via asynchronous replication more suitable. This would be a topic for another blog post though.

Deploying a Geo-Distributed Galera Cluster Using ClusterControl

To clarify things, we will show here how a deployment may look like. We won’t use actual multi-DC setup, everything will be deployed in a local lab. We assume that the latency is acceptable and the whole setup is viable. What is great about ClusterControl is that it is infrastructure-agnostic. It doesn’t care if the nodes are close to each other, located in the same datacenter or if the nodes are distributed across multiple cloud providers. As long as there is SSH connectivity from ClusterControl instance to all of the nodes, the deployment process looks exactly the same. That’s why we can show it to you step by step using just local lab.

Installing ClusterControl

First, you have to install ClusterControl. You can download it for free. After registering, you should access the page with guide to download and install ClusterControl. It is as simple as running a shell script. Once you have ClusterControl installed, you will be presented with a form to create an administrative user:

Once you fill it, you will be presented with a Welcome screen and access to deployment wizards:

We’ll go with deploy. This will open a deployment wizard:

We will pick MySQL Galera. We have to pass SSH connectivity details - either root user or sudo user are supported. On the next step we are to define servers in the cluster.

We are going to deploy three nodes in one of the data centers. Then we will be able to extend the cluster, configuring new nodes in different segments. For now all we have to do is to click on “Deploy” and watch ClusterControl deploying the Galera cluster.

Our first three nodes are up and running, we can now proceed to adding additional nodes in other datacenters.

You can do that from the action menu, as shown on the screenshot above.

Here we can add additional nodes, one at a time. What is important, you should change the Galera segment to non-zero (0 is used for the initial three nodes).

After a while we end up with all nine nodes, distributed across three segments.

ClusterControl Geo-Distributed Database Nodes

Now, we have to deploy proxy layer. We will use ProxySQL for that. You can deploy it in ClusterControl via Manage -> Load Balancer:

This opens a deployment field:

First, we have to decide where to deploy ProxySQL. We will use existing Galera nodes but you can type anything in the field so it is perfectly possible to deploy ProxySQL on top of the application nodes. In addition, you have to pass access credentials for the administrative and monitoring user.

Then we have to either pick one of existing users in MySQL or create one right now. We also want to ensure that the ProxySQL is configured to use Galera nodes located only in the same datacenter.

When you have one ProxySQL ready in the datacenter, you can use it as a source of the configuration:

This has to be repeated for every application server that you have in all datacenters. Then the application has to be configured to connect to the local ProxySQL instance, ideally over the Unix socket. This comes with the best performance and the lowest latency.

After the last ProxySQL is deployed, our environment is ready. Application nodes connect to local ProxySQL. Each ProxySQL is configured to work with Galera nodes in the same datacenter:

Conclusion

We hope this two-part series helped you to understand the strengths and weaknesses of geo-distributed Galera Clusters and how ClusterControl makes it very easy to deploy and manage such cluster.

Tags:

MySQL

galera

mariadb cluster

mariadb galera cluster

clustercontrol

galera cluster

Streaming Replication is a new feature which was introduced with the 4.0 release of Galera Cluster. Galera uses replication synchronously across the entire cluster, but before this release write-sets greater than 2GB were not supported. Streaming Replication allows you to now replicate large write-sets, which is perfect for bulk inserts or loading data to your database.

In a previous blog we wrote about Handling Large Transactions with Streaming Replication and MariaDB 10.4, but as of writing this blog Codership had not yet released their version of the new Galera Cluster. Percona has, however, released their experimental binary version of Percona XtraDB Cluster 8.0 which highlights the following features...

Streaming Replication supporting large transactions
The synchronization functions allow action coordination (wsrep_last_seen_gtid, wsrep_last_written_gtid, wsrep_sync_wait_upto_gtid)
More granular and improved error logging. wsrep_debug is now a multi-valued variable to assist in controlling the logging, and logging messages have been significantly improved.
Some DML and DDL errors on a replicating node can either be ignored or suppressed. Use the wsrep_ignore_apply_errors variable to configure.
Multiple system tables help find out more about the state of the cluster state.
The wsrep infrastructure of Galera 4 is more robust than that of Galera 3. It features a faster execution of code with better state handling, improved predictability, and error handling.

What's New With Galera Cluster 4.0?

The New Streaming Replication Feature

With Streaming Replication, transactions are replicated gradually in small fragments during transaction processing (i.e. before actual commit, we replicate a number of small size fragments). Replicated fragments are then applied in slave threads, preserving the transaction’s state in all cluster nodes. Fragments hold locks in all nodes and cannot be conflicted later.

Galera SystemTables

Database Administrators and clients with access to the MySQL database may read these tables, but they cannot modify them as the database itself will make any modifications needed. If your server doesn’t have these tables, it may be that your server is using an older version of Galera Cluster.

#> show tables from mysql like 'wsrep%';

+--------------------------+

| Tables_in_mysql (wsrep%) |

+--------------------------+

| wsrep_cluster            |

| wsrep_cluster_members    |

| wsrep_streaming_log      |

+--------------------------+

3 rows in set (0.12 sec)

New Synchronization Functions

This version introduces a series of SQL functions for use in wsrep synchronization operations. You can use them to obtain the Global Transaction ID which is based on either the last write or last seen transaction. You can also set the node to wait for a specific GTID to replicate and apply, before initiating the next transaction.

Intelligent Donor Selection

Some understated features that have been present since Galera 3.x include intelligent donor selection and cluster crash recovery. These were originally planned for Galera 4, but made it into earlier releases largely due to customer requirements. When it comes to donor node selection in Galera 3, the State Snapshot Transfer (SST) donor was selected at random. However with Galera 4, you get a much more intelligent choice when it comes to choosing a donor, as it will favour a donor that can provide an Incremental State Transfer (IST), or pick a donor in the same segment. As a Database Administrator, you can force this via setting wsrep_sst_donor.

Why Use MySQL Galera Cluster Streaming Replication?

Long-Running Transactions

Galera's problems and limitations always revolved around how it handled long-running transactions and oftentimes caused the entire cluster to slow down due to large write-sets being replicated. It's flow control often goes high, causing the writes to slow down or even terminating the process in order to revert the cluster back to its normal state. This is a pretty common issue with previous versions of Galera Cluster.

Codership advises to use Streaming Replication for your long-running transactions to mitigate these situations. Once the node replicates and certifies a fragment, it is no longer possible for other transactions to abort it.

Large Transactions

This is very helpful when loading data to your report or analytics. Creating bulk inserts, deletes, updates, or using LOAD DATA statement to load large quantity of data can fall down in this category. Although it depends on how your manage your data for retrieval or storage. You must take into account that Streaming Replication has its limitations such that certification keys are generated from record locks.

Without Streaming Replication, updating a large number of records would result in a conflict and the whole transaction would have to be rolled back. Slaves that are also replicating large transactions are subject to the flow control as it hits the threshold and starts slowing down the entire cluster to process any writes as they tend to relax receiving incoming transactions from the synchronous replication. Galera will relax the replication until the write-set is manageable as it allows to continue replication again. Check this external blog by Percona to help you understand more about flow control within Galera.

With Streaming Replication, the node begins to replicate the data with each transaction fragment, rather than waiting for the commit. This means that there's no way for any conflicting transactions running within the other nodes to abort since this simply affirms that the cluster has certified the write-set for this particular fragment. It’s free to apply and commit other concurrent transactions without blocking and process large transaction with a minimal impact on the cluster.

Hot Records/Hot Spots

Hot records or rows are those rows in your table that gets constantly get updated. These data could be the most visited and highly gets the traffic of your entire database (e.g. news feeds, a counter such as number of visits or logs). With Streaming Replication, you can force critical updates to the entire cluster.

As noted by the Galera Team at Codership

“Running a transaction in this way effectively locks the hot record on all nodes, preventing other transactions from modifying the row. It also increases the chances that the transaction will commit successfully and that the client in turn will receive the desired outcome.”

This comes with limitations as it might not be persistent and consistent that you'll have successful commits. Without using Streaming Replication, you'll end up high chances or rollbacks and that could add overhead to the end user when experiencing this issue in the application's perspective.

Things to Consider When Using Streaming Replication

Certification keys are generated from record locks, therefore they don’t cover gap locks or next key locks. If the transaction takes a gap lock, it is possible that a transaction, which is executed on another node, will apply a write set which encounters the gap log and will abort the streaming transaction.
When enabling Streaming Replication, write-set logs are written to wsrep_streaming_log table found in the mysql system database to preserve persistence in case crash occurs, so this table serves upon recovery. In case of excessive logging and elevated replication overhead, streaming replication will cause degraded transaction throughput rate. This could be a performance bottleneck when high peak load is reached. As such, it’s recommended that you only enable Streaming Replication at a session-level and then only for transactions that would not run correctly without it.
Best use case is to use streaming replication for cutting large transactions
Set fragment size to ~10K rows
Fragment variables are session variables and can be dynamically set
Intelligent application can set streaming replication on/off on need basis

Conclusion

Thanks for reading, in part two we will discuss how to enable Galera Cluster Streaming Replication and what the results could look like for your setup.

Tags:

mariadb galera cluster

codership

In the first part of this blog we provided an overview of the new Streaming Replication feature in MySQL Galera Cluster. In this blog we will show you how to enable it and take a look at the results.

Enabling Streaming Replication

It is highly recommended that you enable Streaming Replication at a session-level for the specific transactions that interact with your application/client.

As stated in the previous blog, Galera logs its write-sets to the wsrep_streaming_log table in MySQL database. This has the potential to create a performance bottleneck, especially when a rollback is needed. This doesn't mean that you can’t use Streaming Replication, it just means you need to design your application client efficiently when using Streaming Replication so you’ll get better performance. Still, it's best to have Streaming Replication for dealing with and cutting down large transactions.

Enabling Streaming Replication requires you to define the replication unit and number of units to use in forming the transaction fragments. Two parameters control these variables: wsrep_trx_fragment_unit and wsrep_trx_fragment_size.

Below is an example of how to set these two parameters:

SET SESSION wsrep_trx_fragment_unit='statements';

SET SESSION wsrep_trx_fragment_size=3;

In this example, the fragment is set to three statements. For every three statements from a transaction, the node will generate, replicate, and certify a fragment.

You can choose between a few replication units when forming fragments:

bytes - This defines the fragment size in bytes.
rows- This defines the fragment size as the number of rows the fragment updates.
statements- This defines the fragment size as the number of statements in a fragment.

Choose the replication unit and fragment size that best suits the specific operation you want to run.

Streaming Replication In Action

As discussed in our other blog on handling large transactions in Mariadb 10.4, we performed and tested how Streaming Replication performed when enabled based on this criteria...

Baseline, set global wsrep_trx_fragment_size=0;
set global wsrep_trx_fragment_unit='rows'; set global wsrep_trx_fragment_size=1;
set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=1;
set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=5;

And results are

Transactions: 82.91 per sec., queries: 1658.27 per sec. (100%)

Transactions: 54.72 per sec., queries: 1094.43 per sec. (66%)

Transactions: 54.76 per sec., queries: 1095.18 per sec. (66%)

Transactions: 70.93 per sec., queries: 1418.55 per sec. (86%)

For this example we're using Percona XtraDB Cluster 8.0.15 straight from their testing branch using the Percona-XtraDB-Cluster_8.0.15.5-27dev.4.2_Linux.x86_64.ssl102.tar.gz build.

We then tried a 3-node Galera cluster with hosts info below:

testnode11 = 192.168.10.110

testnode12 = 192.168.10.120

testnode13 = 192.168.10.130

We pre-populated a table from my sysbench database and tried to delete a very large rows.

root@testnode11[sbtest]#> select count(*) from sbtest1;

+----------+

| count(*) |

+----------+

| 12608218 |

+----------+

1 row in set (25.55 sec)

At first, running without Streaming Replication,

root@testnode12[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size,  @@innodb_lock_wait_timeout;

+---------------------------+---------------------------+----------------------------+

| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |

+---------------------------+---------------------------+----------------------------+

| bytes                     | 0 |                         50000 |

+---------------------------+---------------------------+----------------------------+

1 row in set (0.00 sec)

Then run,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

However, we ended up getting a rollback...

---TRANSACTION 648910, ACTIVE 573 sec rollback

mysql tables in use 1, locked 1

ROLLING BACK 164858 lock struct(s), heap size 18637008, 12199395 row lock(s), undo log entries 11961589

MySQL thread id 183, OS thread handle 140041167468288, query id 79286 localhost 127.0.0.1 root wsrep: replicating and certifying write set(-1)

delete from sbtest1 where id >= 2000000

Using ClusterControl Dashboards to gather an overview of any indication of flow control, since the transaction runs solely on the master (active-writer) node until commit time, there's no any indication of activity for flow control:

In case you’re wondering, the current version of ClusterControl does not yet have direct support for PXC 8.0 with Galera Cluster 4 (as it is still experimental). You can, however, try to import it... but it needs minor tweaks to make your Dashboards work correctly.

Back to the query process. It failed as it rolled back!

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

ERROR 1180 (HY000): Got error 5 - 'Transaction size exceed set threshold' during COMMIT

regardless of the wsrep_max_ws_rows or wsrep_max_ws_size,

root@testnode11[sbtest]#> select @@global.wsrep_max_ws_rows, @@global.wsrep_max_ws_size/(1024*1024*1024);

+----------------------------+---------------------------------------------+

| @@global.wsrep_max_ws_rows | @@global.wsrep_max_ws_size/(1024*1024*1024) |

+----------------------------+---------------------------------------------+

|                          0 |               2.0000 |

+----------------------------+---------------------------------------------+

1 row in set (0.00 sec)

It did, eventually, reach the threshold.

During this time the system table mysql.wsrep_streaming_log is empty, which indicates that Streaming Replication is not happening or enabled,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;

+----------+

| count(*) |

+----------+

|        0 |

+----------+

1 row in set (0.01 sec)



root@testnode13[sbtest]#> select count(*) from mysql.wsrep_streaming_log;

+----------+

| count(*) |

+----------+

|        0 |

+----------+

1 row in set (0.00 sec)

and that is verified on the other 2 nodes (testnode12 and testnode13).

Now, let's try enabling it with Streaming Replication,

root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;

+---------------------------+---------------------------+----------------------------+

| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |

+---------------------------+---------------------------+----------------------------+

| bytes                     | 0 |                      50000 |

+---------------------------+---------------------------+----------------------------+

1 row in set (0.00 sec)



root@testnode11[sbtest]#> set wsrep_trx_fragment_unit='rows'; set wsrep_trx_fragment_size=100; 

Query OK, 0 rows affected (0.00 sec)



Query OK, 0 rows affected (0.00 sec)



root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;

+---------------------------+---------------------------+----------------------------+

| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |

+---------------------------+---------------------------+----------------------------+

| rows                      | 100 |                      50000 |

+---------------------------+---------------------------+----------------------------+

1 row in set (0.00 sec)

What to Expect When Galera Cluster Streaming Replication is Enabled?

When query has been performed in testnode11,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

What happens is that it fragments the transaction piece by piece depending on the set value of variable wsrep_trx_fragment_size. Let's check this in the other nodes:

Host testnode12

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''

TRANSACTIONS

------------

Trx id counter 567148

Purge done for trx's n:o < 566636 undo n:o < 0 state: running but idle

History list length 44

LIST OF TRANSACTIONS FOR EACH SESSION:

..

...

---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 553661, ACTIVE 190 sec

18393 lock struct(s), heap size 2089168, 1342600 row lock(s), undo log entries 1342600

MySQL thread id 898, OS thread handle 140266050008832, query id 216824 wsrep: applied write set (-1)

--------

FILE I/O

1 row in set (0.08 sec)



PAGER set to stdout

+----------------------------------+--------------+

| Variable_name                    | Value |

+----------------------------------+--------------+

| wsrep_flow_control_paused_ns     | 211197844753 |

| wsrep_flow_control_paused        | 0.133786 |

| wsrep_flow_control_sent          | 633 |

| wsrep_flow_control_recv          | 878 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |

+----------------------------------+--------------+

8 rows in set (0.00 sec)



+----------+

| count(*) |

+----------+

|    13429 |

+----------+

1 row in set (0.04 sec)

Host testnode13

root@testnode13[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''

TRANSACTIONS

------------

Trx id counter 568523

Purge done for trx's n:o < 567824 undo n:o < 0 state: running but idle

History list length 23

LIST OF TRANSACTIONS FOR EACH SESSION:

..

...

---TRANSACTION 552701, ACTIVE 216 sec

21587 lock struct(s), heap size 2449616, 1575700 row lock(s), undo log entries 1575700

MySQL thread id 936, OS thread handle 140188019226368, query id 600980 wsrep: applied write set (-1)

--------

FILE I/O

1 row in set (0.28 sec)



PAGER set to stdout

+----------------------------------+--------------+

| Variable_name                    | Value |

+----------------------------------+--------------+

| wsrep_flow_control_paused_ns     | 210755642443 |

| wsrep_flow_control_paused        | 0.0231273 |

| wsrep_flow_control_sent          | 1653 |

| wsrep_flow_control_recv          | 3857 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |

+----------------------------------+--------------+

8 rows in set (0.01 sec)



+----------+

| count(*) |

+----------+

|    15758 |

+----------+

1 row in set (0.03 sec)

Noticeably, the flow control just kicked in!

And WSREP queues send/received has been kicking as well:

Host testnode12 (192.168.10.120)

Host testnode13 (192.168.10.130)

Now, let's elaborate more of the result from the mysql.wsrep_streaming_log table,

root@testnode11[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

MySQL thread id 134822, OS thread handle 140041167468288, query id 0 System lock

---TRANSACTION 649008, ACTIVE 481 sec

mysql tables in use 1, locked 1

53104 lock struct(s), heap size 6004944, 3929602 row lock(s), undo log entries 3876500

MySQL thread id 183, OS thread handle 140041167468288, query id 105367 localhost 127.0.0.1 root updating

delete from sbtest1 where id >= 2000000

--------

FILE I/O

1 row in set (0.01 sec)

then taking the result of,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;

+----------+

| count(*) |

+----------+

|    38899 |

+----------+

1 row in set (0.40 sec)

It tells how much fragment has been replicated using Streaming Replication. Now, let's do some basic math:

root@testnode12[sbtest]#> select 3876500/38899.0;

+-----------------+

| 3876500/38899.0 |

+-----------------+

|         99.6555 |

+-----------------+

1 row in set (0.03 sec)

I'm taking the undo log entries from theSHOW ENGINE INNODB STATUS\G result and then divide the total count of the mysql.wsrep_streaming_log records. As I've set it earlier, I defined wsrep_trx_fragment_size= 100. The result will show you how much the total replicated logs are currently being processed by Galera.

It’s important to take note at what Streaming Replication is trying to achieve... "the node breaks the transaction into fragments, then certifies and replicates them on the slaves while the transaction is still in progress. Once certified, the fragment can no longer be aborted by conflicting transactions."

The fragments are considered transactions, which have been passed to the remaining nodes within the cluster, certifying the fragmented transaction, then applying the write-sets. This means that once your large transaction has been certified or prioritized, all incoming connections that could possibly have a deadlock will need to wait until the transactions finishes.

Now, the verdict of deleting a huge table?

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

Query OK, 12034538 rows affected (30 min 36.96 sec)

It finishes successfully without any failure!

How does it look like in the other nodes? In testnode12,

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 553661, ACTIVE (PREPARED) 2050 sec

165631 lock struct(s), heap size 18735312, 12154883 row lock(s), undo log entries 12154883

MySQL thread id 898, OS thread handle 140266050008832, query id 341835 wsrep: preparing to commit write set(215510)

--------

FILE I/O

1 row in set (0.46 sec)



PAGER set to stdout

+----------------------------------+--------------+

| Variable_name                    | Value |

+----------------------------------+--------------+

| wsrep_flow_control_paused_ns     | 290832524304 |

| wsrep_flow_control_paused        | 0 |

| wsrep_flow_control_sent          | 0 |

| wsrep_flow_control_recv          | 0 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |

+----------------------------------+--------------+

8 rows in set (0.53 sec)



+----------+

| count(*) |

+----------+

|   120345 |

+----------+

1 row in set (0.88 sec)

It stops at a total of 120345 fragments, and if we do the math again on the last captured undo log entries (undo logs are the same from the master as well),

root@testnode12[sbtest]#> select 12154883/120345.0;                                                                                                                                                   +-------------------+

| 12154883/120345.0 |

+-------------------+

|          101.0003 |

+-------------------+

1 row in set (0.00 sec)

So we had a total of 120345 transactions being fragmented to delete 12034538 rows.

Once you're done using or enabling Stream Replication, do not forget to disable it as it will always log huge transactions and adds a lot of performance overhead to your cluster. To disable it, just run

root@testnode11[sbtest]#> set wsrep_trx_fragment_size=0;

Query OK, 0 rows affected (0.04 sec)

Conclusion

With Streaming Replication enabled, it's important that you are able to identify how large your fragment size can be and what unit you have to choose (bytes, rows, statements).

It is also very important that you need to run it at session-level and of course identify when you only need to use Streaming Replication.

While performing these tests, deleting a large number of rows to a huge table with Streaming Replication enabled has noticeably caused a high peak of disk utilization and CPU utilization. The RAM was more stable, but this could due to the statement we performed is not highly a memory contention.

It’s safe to say that Streaming Replication can cause performance bottlenecks when dealing with large records, so using it should be done with proper decision and care.

Lastly, if you are using Streaming Replication, do not forget to always disable this once done on that current session to avoid unwanted problems.

Tags:

mariadb galera cluster

codership

MariaDB

Nextcloud is an open source file sync and share application that offers free, secure, and easily accessible cloud file storage, as well as a number of tools that extend its feature set. It's very similar to the popular Dropbox, iCloud and Google Drive but unlike Dropbox, Nextcloud does not offer off-premises file storage hosting.

In this blog post, we are going to deploy a high-available setup for our private "Dropbox" infrastructure using Nextcloud, GlusterFS, Percona XtraDB Cluster (MySQL Galera Cluster), ProxySQL with ClusterControl as the automation tool to manage and monitor the database and load balancer tiers.

Note: You can also use MariaDB Cluster, which uses the same underlying replication library as in Percona XtraDB Cluster. From a load balancer perspective, ProxySQL behaves similarly to MaxScale in that it can understand the SQL traffic and has fine-grained control on how traffic is routed.

Database Architecture for Nexcloud

In this blog post, we used a total of 6 nodes.

2 x proxy servers
3 x database + application servers
1 x controller server (ClusterControl)

The following diagram illustrates our final setup:

Highly Available MySQL Nextcloud Database Architecture

For Percona XtraDB Cluster, a minimum of 3 nodes is required for a solid multi-master replication. Nextcloud applications are co-located within the database servers, thus GlusterFS has to be configured on those hosts as well.

Load balancer tier consists of 2 nodes for redundancy purposes. We will use ClusterControl to deploy the database tier and the load balancer tiers. All servers are running on CentOS 7 with the following /etc/hosts definition on every node:

192.168.0.21 nextcloud1 db1

192.168.0.22 nextcloud2 db2

192.168.0.23 nextcloud3 db3

192.168.0.10 vip db

192.168.0.11 proxy1 lb1 proxysql1

192.168.0.12 proxy2 lb2 proxysql2

Note that GlusterFS and MySQL are highly intensive processes. If you are following this setup (GlusterFS and MySQL resides in a single server), ensure you have decent hardware specs for the servers.

Nextcloud Database Deployment

We will start with database deployment for our three-node Percona XtraDB Cluster using ClusterControl. Install ClusterControl and then setup passwordless SSH to all nodes that are going to be managed by ClusterControl (3 PXC + 2 proxies). On ClusterControl node, do:

$ whoami

root

$ ssh-copy-id 192.168.0.11

$ ssh-copy-id 192.168.0.12

$ ssh-copy-id 192.168.0.21

$ ssh-copy-id 192.168.0.22

$ ssh-copy-id 192.168.0.23

**Enter the root password for the respective host when prompted.

Open a web browser and go to https://{ClusterControl-IP-address}/clustercontrol and create a super user. Then go to Deploy -> MySQL Galera. Follow the deployment wizard accordingly. At the second stage 'Define MySQL Servers', pick Percona XtraDB 5.7 and specify the IP address for every database node. Make sure you get a green tick after entering the database node details, as shown below:

Click "Deploy" to start the deployment. The database cluster will be ready in 15~20 minutes. You can follow the deployment progress at Activity -> Jobs -> Create Cluster -> Full Job Details. The cluster will be listed under Database Cluster dashboard once deployed.

We can now proceed to database load balancer deployment.

Nextcloud Database Load Balancer Deployment

Nextcloud is recommended to run on a single-writer setup, where writes will be processed by one master at a time, and the reads can be distributed to other nodes. We can use ProxySQL 2.0 to achieve this configuration since it can route the write queries to a single master.

To deploy a ProxySQL, click on Cluster Actions > Add Load Balancer > ProxySQL > Deploy ProxySQL. Enter the required information as highlighted by the red arrows:

Fill in all necessary details as highlighted by the arrows above. The server address is the lb1 server, 192.168.0.11. Further down, we specify the ProxySQL admin and monitoring users' password. Then include all MySQL servers into the load balancing set and then choose "No" in the Implicit Transactions section. Click "Deploy ProxySQL" to start the deployment.

Repeat the same steps as above for the secondary load balancer, lb2 (but change the "Server Address" to lb2's IP address). Otherwise, we would have no redundancy in this layer.

Our ProxySQL nodes are now installed and configured with two host groups for Galera Cluster. One for the single-master group (hostgroup 10), where all connections will be forwarded to one Galera node (this is useful to prevent multi-master deadlocks) and the multi-master group (hostgroup 20) for all read-only workloads which will be balanced to all backend MySQL servers.

Next, we need to deploy a virtual IP address to provide a single endpoint for our ProxySQL nodes so your application will not need to define two different ProxySQL hosts. This will also provide automatic failover capabilities because virtual IP address will be taken over by the backup ProxySQL node in case something goes wrong to the primary ProxySQL node.

Go to ClusterControl -> Manage -> Load Balancers -> Keepalived -> Deploy Keepalived. Pick "ProxySQL" as the load balancer type and choose two distinct ProxySQL servers from the dropdown. Then specify the virtual IP address as well as the network interface that it will listen to, as shown in the following example:

Deploy Keepalived & ProxySQL for Nextcloud

Once the deployment completes, you should see the following details on the cluster's summary bar:

Nextcloud Database Cluster in ClusterControl

Finally, create a new database for our application by going to ClusterControl -> Manage -> Schemas and Users -> Create Database and specify "nextcloud". ClusterControl will create this database on every Galera node. Our load balancer tier is now complete.

GlusterFS Deployment for Nextcloud

The following steps should be performed on nextcloud1, nextcloud2, nextcloud3 unless otherwise specified.

Step One

It's recommended to have a separate this for GlusterFS storage, so we are going to add additional disk under /dev/sdb and create a new partition:

$ fdisk /dev/sdb

Follow the fdisk partition creation wizard by pressing the following key:

n > p > Enter > Enter > Enter > w

Step Two

Verify if /dev/sdb1 has been created:

$ fdisk -l /dev/sdb1

Disk /dev/sdb1: 8588 MB, 8588886016 bytes, 16775168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Step Three

Format the partition with XFS:

$ mkfs.xfs /dev/sdb1

Step Four

Mount the partition as /storage/brick:

$ mkdir /glusterfs

$ mount /dev/sdb1 /glusterfs

Verify that all nodes have the following layout:

$ lsblk

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda      8:0 0 40G  0 disk

└─sda1   8:1 0 40G  0 part /

sdb      8:16 0   8G 0 disk

└─sdb1   8:17 0   8G 0 part /glusterfs

Step Five

Create a subdirectory called brick under /glusterfs:

$ mkdir /glusterfs/brick

Step Six

For application redundancy, we can use GlusterFS for file replication between the hosts. Firstly, install GlusterFS repository for CentOS:

$ yum install centos-release-gluster -y

$ yum install epel-release -y

Step Seven

Install GlusterFS server

$ yum install glusterfs-server -y

Step Eight

Enable and start gluster daemon:

$ systemctl enable glusterd

$ systemctl start glusterd

Step Nine

On nextcloud1, probe the other nodes:

(nextcloud1)$ gluster peer probe 192.168.0.22

(nextcloud1)$ gluster peer probe 192.168.0.23

You can verify the peer status with the following command:

(nextcloud1)$ gluster peer status

Number of Peers: 2



Hostname: 192.168.0.22

Uuid: f9d2928a-6b64-455a-9e0e-654a1ebbc320

State: Peer in Cluster (Connected)



Hostname: 192.168.0.23

Uuid: 100b7778-459d-4c48-9ea8-bb8fe33d9493

State: Peer in Cluster (Connected)

Step Ten

On nextcloud1, create a replicated volume on probed nodes:

(nextcloud1)$ gluster volume create rep-volume replica 3 192.168.0.21:/glusterfs/brick 192.168.0.22:/glusterfs/brick 192.168.0.23:/glusterfs/brick

volume create: rep-volume: success: please start the volume to access data

Step Eleven

Start the replicated volume on nextcloud1:

(nextcloud1)$ gluster volume start rep-volume

volume start: rep-volume: success

Verify the replicated volume and processes are online:

$ gluster volume status

Status of volume: rep-volume

Gluster process                             TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------

Brick 192.168.0.21:/glusterfs/brick         49152 0 Y 32570

Brick 192.168.0.22:/glusterfs/brick         49152 0 Y 27175

Brick 192.168.0.23:/glusterfs/brick         49152 0 Y 25799

Self-heal Daemon on localhost               N/A N/A Y 32591

Self-heal Daemon on 192.168.0.22            N/A N/A Y 27196

Self-heal Daemon on 192.168.0.23            N/A N/A Y 25820



Task Status of Volume rep-volume

------------------------------------------------------------------------------

There are no active volume tasks

Step Twelve

Mount the replicated volume on /var/www/html. Create the directory:

$ mkdir -p /var/www/html

Step Thirteen

13) Add following line into /etc/fstab to allow auto-mount:

/dev/sdb1 /glusterfs xfs defaults,defaults 0 0

localhost:/rep-volume /var/www/html   glusterfs defaults,_netdev 0 0

Step Fourteen

Mount the GlusterFS to /var/www/html:

$ mount -a

And verify with:

$ mount | grep gluster

/dev/sdb1 on /glusterfs type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

localhost:/rep-volume on /var/www/html type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

The replicated volume is now ready and mounted in every node. We can now proceed to deploy the application.

Nextcloud Application Deployment

The following steps should be performed on nextcloud1, nextcloud2 and nextcloud3 unless otherwise specified.

Nextcloud requires PHP 7.2 and later and for CentOS distribution, we have to enable a number of repositories like EPEL and Remi to simplify the installation process.

Step One

If SELinux is enabled, disable it first:

$ setenforce 0

$ sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/config

You can also run Nextcloud with SELinux enabled by following this guide.

Step Two

Install Nextcloud requirements and enable Remi repository for PHP 7.2:

$ yum install -y epel-release yum-utils unzip curl wget bash-completion policycoreutils-python mlocate bzip2

$ yum install -y http://rpms.remirepo.net/enterprise/remi-release-7.rpm

$ yum-config-manager --enable remi-php72

Step Three

Install Nextcloud dependencies, mostly Apache and PHP 7.2 related packages:

$ yum install -y httpd php72-php php72-php-gd php72-php-intl php72-php-mbstring php72-php-mysqlnd php72-php-opcache php72-php-pecl-redis php72-php-pecl-apcu php72-php-pecl-imagick php72-php-xml php72-php-pecl-zip

Step Four

Enable Apache and start it up:

$ systemctl enable httpd.service

$ systemctl start httpd.service

Step Five

Make a symbolic link for PHP to use PHP 7.2 binary:

$ ln -sf /bin/php72 /bin/php

Step Six

On nextcloud1, download Nextcloud Server from here and extract it:

$ wget https://download.nextcloud.com/server/releases/nextcloud-17.0.0.zip

$ unzip nextcloud*

Step Seven

On nextcloud1, copy the directory into /var/www/html and assign correct ownership:

$ cp -Rf nextcloud /var/www/html

$ chown -Rf apache:apache /var/www/html

**Note the copying process into /var/www/html is going to take some time due to GlusterFS volume replication.

Step Eight

Before we proceed to open the installation wizard, we have to disable pxc_strict_mode variable to other than "ENFORCING" (the default value). This is due to the fact that Nextcloud database import will have a number of tables without primary key defined which is not recommended to run on Galera Cluster. This is explained further details under Tuning section further down.

To change the configuration with ClusterControl, simply go to Manage -> Configurations -> Change/Set Parameters:

Choose all database instances from the list, and enter:

Group: MYSQLD
Parameter: pxc_strict_mode
New Value: PERMISSIVE

ClusterControl will perform the necessary changes on every database node automatically. If the value can be changed during runtime, it will be effective immediately. ClusterControl also configure the value inside MySQL configuration file for persistency. You should see the following result:

Step Nine

Now we are ready to configure our Nextcloud installation. Open the browser and go to nextcloud1's HTTP server at http://192.168.0.21/nextcloud/ and you will be presented with the following configuration wizard:

Configure the "Storage & database" section with the following value:

Data folder: /var/www/html/nextcloud/data
Configure the database: MySQL/MariaDB
Username: nextcloud
Password: (the password for user nextcloud)
Database: nextcloud
Host: 192.168.0.10:6603 (The virtual IP address with ProxySQL port)

Click "Finish Setup" to start the configuration process. Wait until it finishes and you will be redirected to Nextcloud dashboard for user "admin". The installation is now complete. Next section provides some tuning tips to run efficiently with Galera Cluster.

Nextcloud Database Tuning

Primary Key

Having a primary key on every table is vital for Galera Cluster write-set replication. For a relatively big table without primary key, large update or delete transaction would completely block your cluster for a very long time. To avoid any quirks and edge cases, simply make sure that all tables are using InnoDB storage engine with an explicit primary key (unique key does not count).

The default installation of Nextcloud will create a bunch of tables under the specified database and some of them do not comply with this rule. To check if the tables are compatible with Galera, we can run the following statement:

mysql> SELECT DISTINCT CONCAT(t.table_schema,'.',t.table_name) as tbl, t.engine, IF(ISNULL(c.constraint_name),'NOPK','') AS nopk, IF(s.index_type = 'FULLTEXT','FULLTEXT','') as ftidx, IF(s.index_type = 'SPATIAL','SPATIAL','') as gisidx FROM information_schema.tables AS t LEFT JOIN information_schema.key_column_usage AS c ON (t.table_schema = c.constraint_schema AND t.table_name = c.table_name AND c.constraint_name = 'PRIMARY') LEFT JOIN information_schema.statistics AS s ON (t.table_schema = s.table_schema AND t.table_name = s.table_name AND s.index_type IN ('FULLTEXT','SPATIAL'))   WHERE t.table_schema NOT IN ('information_schema','performance_schema','mysql') AND t.table_type = 'BASE TABLE' AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL OR s.index_type IN ('FULLTEXT','SPATIAL')) ORDER BY t.table_schema,t.table_name;

+---------------------------------------+--------+------+-------+--------+

| tbl                                   | engine | nopk | ftidx | gisidx |

+---------------------------------------+--------+------+-------+--------+

| nextcloud.oc_collres_accesscache      | InnoDB | NOPK | | |

| nextcloud.oc_collres_resources        | InnoDB | NOPK | | |

| nextcloud.oc_comments_read_markers    | InnoDB | NOPK | | |

| nextcloud.oc_federated_reshares       | InnoDB | NOPK | | |

| nextcloud.oc_filecache_extended       | InnoDB | NOPK | | |

| nextcloud.oc_notifications_pushtokens | InnoDB | NOPK |       | |

| nextcloud.oc_systemtag_object_mapping | InnoDB | NOPK |       | |

+---------------------------------------+--------+------+-------+--------+

The above output shows there are 7 tables that do not have a primary key defined. To fix the above, simply add a primary key with auto-increment column. Run the following commands on one of the database servers, for example nexcloud1:

(nextcloud1)$ mysql -uroot -p

mysql> ALTER TABLE nextcloud.oc_collres_accesscache ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_collres_resources ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_comments_read_markers ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_federated_reshares ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_filecache_extended ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_notifications_pushtokens ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_systemtag_object_mapping ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

Once the above modifications have been applied, we can reconfigure back the pxc_strict_mode to the recommended value, "ENFORCING". Repeat step #8 under "Application Deployment" section with the corresponding value.

READ-COMMITTED Isolation Level

The recommended transaction isolation level as advised by Nextcloud is to use READ-COMMITTED, while Galera Cluster is default to stricter REPEATABLE-READ isolation level. Using READ-COMMITTED can avoid data loss under high load scenarios (e.g. by using the sync client with many clients/users and many parallel operations).

To modify the transaction level, go to ClusterControl -> Manage -> Configurations -> Change/Set Parameter and specify the following:

Nextcloud Change Set Parameter - ClusterControl

Click "Proceed" and ClusterControl will apply the configuration changes immediately. No database restart is required.

Multi-Instance Nextcloud

Since we performed the installation on nextcloud1 when accessing the URL, this IP address is automatically added into 'trusted_domains' variable inside Nextcloud. When you tried to access other servers, for example the secondary server, http://192.168.0.22/nextcloud, you would see an error that this is host is not authorized and must be added into the trusted_domain variable.

Therefore, add all the hosts IP address under "trusted_domain" array inside /var/www/html/nextcloud/config/config.php, as example below:

'trusted_domains' =>

  array (

    0 => '192.168.0.21',

    1 => '192.168.0.22',

    2 => '192.168.0.23'

  ),

The above configuration allows users to access all three application servers via the following URLs:

http://192.168.0.21/nextcloud (nextcloud1)
http://192.168.0.22/nextcloud (nextcloud2)
http://192.168.0.23/nextcloud (nextcloud3)

Note: You can add a load balancer tier on top of these three Nextcloud instances to achieve high availability for the application tier by using HTTP reverse proxies available in the market like HAProxy or nginx. That is out of the scope of this blog post.

Using Redis for File Locking

Nextcloud’s Transactional File Locking mechanism locks files to avoid file corruption during normal operation. It's recommended to install Redis to take care of transactional file locking (this is enabled by default) which will offload the database cluster from handling this heavy job.

To install Redis, simply:

$ yum install -y redis

$ systemctl enable redis.service

$ systemctl start redis.service

Append the following lines inside /var/www/html/nextcloud/config/config.php:

'filelocking.enabled' => true,

  'memcache.locking' => '\OC\Memcache\Redis',

  'redis' => array(

     'host' => '192.168.0.21',

     'port' => 6379,

     'timeout' => 0.0,

   ),

For more details, check out this documentation, Transactional File Locking.

Conclusion

Nextcloud can be configured to be a scalable and highly available file-hosting service to cater for your private file sharing demands. In this blog, we showed how you can bring redundancy in the Nextcloud, file system and database layers.

Tags:

mariadb galera cluster

Galera Cluster 4.0 was first released as part of the MariaDB 10.4 and there are a lot of significant improvements in this version release. The most impressive feature in this release is the Streaming Replication which is designed to handle the following problems.

Problems with long transactions
Problems with large transactions
Problems with hot-spots in tables

In a previous blog, we deep-dove into the new Streaming Replication feature in a two-part series blog (Part 1 and Part 2). Part of this new feature in Galera 4.0 are new system tables which are very useful for querying and checking the Galera Cluster nodes and also the logs that have been processed in Streaming Replication.

Also in previous blogs, we also showed you the Easy Way to Deploy a MySQL Galera Cluster on AWS and also how to Deploy a MySQL Galera Cluster 4.0 onto Amazon AWS EC2.

Percona hasn't released a GA for their Percona XtraDB Cluster (PXC) 8.0 yet as some features are still under development, such as the MySQL wsrep function WSREP_SYNC_WAIT_UPTO_GTID which looks to be not present yet (at least on PXC 8.0.15-5-27dev.4.2 version). Yet, when PXC 8.0 will be released, it will be packed with great features such as...

Improved resilient cluster
Cloud friendly cluster
improved packaging
Encryption support
Atomic DDL

While we're waiting for the release of PXC 8.0 GA, we'll cover in this blog how you can create a Hot Standby Node on Amazon AWS for Galera Cluster 4.0 using MariaDB.

What is a Hot Standby?

A hot standby is a common term in computing, especially on highly distributed systems. It's a method for redundancy in which one system runs simultaneously with an identical primary system. When failure happens on the primary node, the hot standby immediately takes over replacing the primary system. Data is mirrored to both systems in real time.

For database systems, a hot standby server is usually the second node after the primary master that is running on powerful resources (same as the master). This secondary node has to be as stable as the primary master to function correctly.

It also serves as a data recovery node if the master node or the entire cluster goes down. The hot standby node will replace the failing node or cluster while continuously serving the demand from the clients.

In Galera Cluster, all servers part of the cluster can serve as a standby node. However, if the region or entire cluster goes down, how will you be able to cope up with this? Creating a standby node outside the specific region or network of your cluster is one option here.

In the following section, we'll show you how to create a standby node on AWS EC2 using MariaDB.

Deploying a Hot Standby On Amazon AWS

Previously, we have showed you how you can create a Galera Cluster on AWS. You might want to read Deploying MySQL Galera Cluster 4.0 onto Amazon AWS EC2 in the case that you are new to Galera 4.0.

Deploying your hot standby node can be on another set of Galera Cluster which uses synchronous replication (check this blog Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node) or by deploying an asynchronous MySQL/MariaDB node. In this blog, we'll setup and deploy the hot standby node replicating asynchronously from one of the Galera nodes.

The Galera Cluster Setup

In this sample setup, we deployed 3-node cluster using MariaDB 10.4.8 version. This cluster is being deployed under US East (Ohio) region and the topology is shown below:

We'll use 172.31.26.175 server as the master for our asynchronous slave which will serve as the standby node.

Setting up your EC2 Instance for Hot Standby Node

In the AWS console, go to EC2 found under the Compute section and click Launch Instance to create an EC2 instance just like below.

We'll create this instance under the US West (Oregon) region. For your OS type, you can choose what server you like (I prefer Ubuntu 18.04) and choose the type of instance based on your preferred target type. For this example I will use t2.micro since it doesn't require any sophisticated setup and it's only for this sample deployment.

As we've mentioned earlier that its best that your hot standby node be located on a different region and not collocated or within the same region. So in case the regional data center goes down or suffers a network outage, your hot standby can be your failover target when things gone bad.

Before we continue, in AWS, different regions will have its own Virtual Private Cloud (VPC) and its own network. In order to communicate with the Galera cluster nodes, we must first define a VPC Peering so the nodes can communicate within the Amazon infrastructure and do not need to go outside the network which just adds overhead and security concerns.

First, go to your VPC from where your hot standby node shall reside, then go to Peering Connections. Then you need to specify the VPC of your standby node and the Galera cluster VPC. In the example below, I have us-west-2 interconnecting to us-east-2.

Once created, you'll see an entry under your Peering Connections. However, you need to accept the request from the Galera cluster VPC, which is on us-east-2 in this example. See below,

Once accepted, do not forget to add the CIDR to the routing table. See this external blog VPC Peering about how to do it after VPC Peering.

Now, let's go back and continue creating the EC2 node. Make sure your Security Group has the correct rules or required ports that needs to be opened. Check the firewall settings manual for more information about this. For this setup, I just set All Traffic to be accepted since this is just a test. See below,

Make sure when creating your instance, you have set the correct VPC and you have defined your proper subnet. You can check this blog in case you need some help about that.

Setting up the MariaDB Async Slave

Step One

First we need to setup the repository, add the repo keys and update the package list in the repository cache,

$ vi /etc/apt/sources.list.d/mariadb.list

$ apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8

$ apt update

Step Two

Install the MariaDB packages and its required binaries

$ apt-get install mariadb-backup  mariadb-client mariadb-client-10.4 libmariadb3 libdbd-mysql-perl mariadb-client-core-10.4 mariadb-common mariadb-server-10.4 mariadb-server-core-10.4 mysql-common

Step Three

Now, let's take a backup using xbstream to transfer the files to the network from one of the nodes in our Galera Cluster.

## Wipe out the datadir of the newly fresh installed MySQL in your hot standby node.

$ systemctl stop mariadb

$ rm -rf /var/lib/mysql/*

## Then on the hot standby node, run this on the terminal,

$ socat -u tcp-listen:9999,reuseaddr stdout 2>/tmp/netcat.log | mbstream -x -C /var/lib/mysql

## Then on the target master, i.e. one of the nodes in your Galera Cluster (which is the node 172.31.28.175 in this example), run this on the terminal,

$ mariabackup  --backup --target-dir=/tmp --stream=xbstream | socat - TCP4:172.32.31.2:9999

where 172.32.31.2 is the IP of the host standby node.

Step Four

Prepare your MySQL configuration file. Since this is in Ubuntu, I am editing the file in /etc/mysql/my.cnf and with the following sample my.cnf taken from our ClusterControl template,

[MYSQLD]

user=mysql

basedir=/usr/

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

pid_file=/var/lib/mysql/mysql.pid

port=3306

log_error=/var/log/mysql/mysqld.log

log_warnings=2

# log_output = FILE



#Slow logging    

slow_query_log_file=/var/log/mysql/mysql-slow.log

long_query_time=2

slow_query_log=OFF

log_queries_not_using_indexes=OFF



### INNODB OPTIONS

innodb_buffer_pool_size=245M

innodb_flush_log_at_trx_commit=2

innodb_file_per_table=1

innodb_data_file_path = ibdata1:100M:autoextend

## You may want to tune the below depending on number of cores and disk sub

innodb_read_io_threads=4

innodb_write_io_threads=4

innodb_doublewrite=1

innodb_log_file_size=64M

innodb_log_buffer_size=16M

innodb_buffer_pool_instances=1

innodb_log_files_in_group=2

innodb_thread_concurrency=0

# innodb_file_format = barracuda

innodb_flush_method = O_DIRECT

innodb_rollback_on_timeout=ON

# innodb_locks_unsafe_for_binlog = 1

innodb_autoinc_lock_mode=2

## avoid statistics update when doing e.g show tables

innodb_stats_on_metadata=0

default_storage_engine=innodb



# CHARACTER SET

# collation_server = utf8_unicode_ci

# init_connect = 'SET NAMES utf8'

# character_set_server = utf8



# REPLICATION SPECIFIC

server_id=1002

binlog_format=ROW

log_bin=binlog

log_slave_updates=1

relay_log=relay-bin

expire_logs_days=7

read_only=ON

report_host=172.31.29.72

gtid_ignore_duplicates=ON

gtid_strict_mode=ON



# OTHER THINGS, BUFFERS ETC

key_buffer_size = 24M

tmp_table_size = 64M

max_heap_table_size = 64M

max_allowed_packet = 512M

# sort_buffer_size = 256K

# read_buffer_size = 256K

# read_rnd_buffer_size = 512K

# myisam_sort_buffer_size = 8M

skip_name_resolve

memlock=0

sysdate_is_now=1

max_connections=500

thread_cache_size=512

query_cache_type = 0

query_cache_size = 0

table_open_cache=1024

lower_case_table_names=0

# 5.6 backwards compatibility (FIXME)

# explicit_defaults_for_timestamp = 1



performance_schema = OFF

performance-schema-max-mutex-classes = 0

performance-schema-max-mutex-instances = 0



[MYSQL]

socket=/var/lib/mysql/mysql.sock

# default_character_set = utf8

[client]

socket=/var/lib/mysql/mysql.sock

# default_character_set = utf8

[mysqldump]

socket=/var/lib/mysql/mysql.sock

max_allowed_packet = 512M

# default_character_set = utf8



[xtrabackup]



[MYSQLD_SAFE]

# log_error = /var/log/mysqld.log

basedir=/usr/

# datadir = /var/lib/mysql

Of course, you can change this according to your setup and requirements.

Step Five

Prepare the backup from step #3 i.e. the finish backup that is now in the hot standby node by running the command below,

$ mariabackup --prepare --target-dir=/var/lib/mysql

Step Six

Set the ownership of the datadir in the hot standby node,

$ chown -R mysql.mysql /var/lib/mysql

Step Seven

Now, start the MariaDB instance

$  systemctl start mariadb

Step Eight

Lastly, we need to setup the asynchronous replication,

## Create the replication user on the master node, i.e. the node in the Galera cluster

MariaDB [(none)]> CREATE USER 'cmon_replication'@'172.32.31.2' IDENTIFIED BY 'PahqTuS1uRIWYKIN';

Query OK, 0 rows affected (0.866 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'cmon_replication'@'172.32.31.2';

Query OK, 0 rows affected (0.127 sec)

## Get the GTID slave position from xtrabackup_binlog_info as follows,

$  cat /var/lib/mysql/xtrabackup_binlog_info

binlog.000002   71131632 1000-1000-120454

## Then setup the slave replication as follows,

MariaDB [(none)]> SET GLOBAL gtid_slave_pos='1000-1000-120454';

Query OK, 0 rows affected (0.053 sec)

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='172.31.28.175', MASTER_USER='cmon_replication', master_password='PahqTuS1uRIWYKIN', MASTER_USE_GTID = slave_pos;

## Now, check the slave status,

MariaDB [(none)]> show slave status \G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event

                   Master_Host: 172.31.28.175

                   Master_User: cmon_replication

                   Master_Port: 3306

                 Connect_Retry: 60

               Master_Log_File: 

           Read_Master_Log_Pos: 4

                Relay_Log_File: relay-bin.000001

                 Relay_Log_Pos: 4

         Relay_Master_Log_File: 

              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes

               Replicate_Do_DB: 

           Replicate_Ignore_DB: 

            Replicate_Do_Table: 

        Replicate_Ignore_Table: 

       Replicate_Wild_Do_Table: 

   Replicate_Wild_Ignore_Table: 

                    Last_Errno: 0

                    Last_Error: 

                  Skip_Counter: 0

           Exec_Master_Log_Pos: 4

               Relay_Log_Space: 256

               Until_Condition: None

                Until_Log_File: 

                 Until_Log_Pos: 0

            Master_SSL_Allowed: No

            Master_SSL_CA_File: 

            Master_SSL_CA_Path: 

               Master_SSL_Cert: 

             Master_SSL_Cipher: 

                Master_SSL_Key: 

         Seconds_Behind_Master: 0

 Master_SSL_Verify_Server_Cert: No

                 Last_IO_Errno: 0

                 Last_IO_Error: 

                Last_SQL_Errno: 0

                Last_SQL_Error: 

   Replicate_Ignore_Server_Ids: 

              Master_Server_Id: 1000

                Master_SSL_Crl: 

            Master_SSL_Crlpath: 

                    Using_Gtid: Slave_Pos

                   Gtid_IO_Pos: 1000-1000-120454

       Replicate_Do_Domain_Ids: 

   Replicate_Ignore_Domain_Ids: 

                 Parallel_Mode: conservative

                     SQL_Delay: 0

           SQL_Remaining_Delay: NULL

       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

              Slave_DDL_Groups: 0

Slave_Non_Transactional_Groups: 0

    Slave_Transactional_Groups: 0

1 row in set (0.000 sec)

Adding Your Hot Standby Node To ClusterControl

If you are using ClusterControl, it's easy to monitor your database server's health. To add this as a slave, select the Galera node cluster you have then go to the selection button as shown below to Add Replication Slave:

Click Add Replication Slave and choose adding an existing slave just like below,

Our topology looks promising.

As you might notice, our node 172.32.31.2 serving as our hot standby node has a different CIDR which prefixes as 172.32.% us-west-2 (Oregon) while the other nodes are of 172.31.% located on us-east-2 (Ohio). They're totally on different region, so in case network failure occurs on your Galera nodes, you can failover to your hot standby node.

Conclusion

Building a Hot Standby on Amazon AWS is easy and straightforward. All you need is to determine your capacity requirements and your networking topology, security, and protocols that need to be setup.

Using VPC Peering helps speed up inter-communication between different region without going to the public internet, so the connection stays within the Amazon network infrastructure.

Using asynchronous replication with one slave is, of course, not enough, but this blog serves as the foundation on how you can initiate this. You can now easily create another cluster where the asynchronous slave is replicating and create another series of Galera Clusters serving as your Disaster Recovery nodes, or you can also use gmcast.segment variable in Galera to replicate synchronously just like what we have on this blog.

Tags:

mariadb galera cluster

amazon

AWS

cloud

We’re excited to announce the 1.7.4 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure.

In this release we launch a new function that could be the ultimate way to minimize RTO as part of your disaster recovery strategy. Cluster-to-Cluster Replication for MySQL and PostgreSQL lets you build-out a clone of your entire database infrastructure and deploy it to a secondary data center, while keeping both synced. This ensures you always have an available up-to-date database setup ready to switch-over to should disaster strike.

In addition we are also announcing support for the new MariaDB 10.4 / Galera Cluster 4.x as well as support for ProxySQL 2.0, the latest release from the industry leading MySQL load balancer.

Lastly, we continue our commitment to PostgreSQL by releasing new user management functions, giving you complete control over who can access or administer your postgres setup.

Release Highlights

Cluster-to-Cluster Database Replication

Asynchronous MySQL Replication Between MySQL Galera Clusters.
Streaming Replication Between PostgreSQL Clusters.
Ability to Build Clusters from a Backup or by Streaming Directly from a Master Cluster.

Added Support for MariaDB 10.4 & Galera Cluster 4.x

Deployment, Configuration, Monitoring and Management of the Newest Version of Galera Cluster Technology, initially released by MariaDB
- New Streaming Replication Ability
- New Support for Dealing with Long Running & Large Transactions
- New Backup Locks for SST

Added Support for ProxySQL 2.0

Deployment and Configuration of the Newest Version of the Best MySQL Load Balancer on the Market
- Native Support for Galera Cluster
- Enables Causal Reads Using GTID
- New Support for SSL Frontend Connections
- Query Caching Improvements

New User Management Functions for PostgreSQL Clusters

Take full control of who is able to access your PostgreSQL database.

View Release Details and Resources

Release Details

Cluster-to-Cluster Replication

Either streaming from your master or built from a backup, the new Cluster-to-Cluster Replication function in ClusterControl let’s you create a complete disaster recovery database system into another data center; which you can then easily failover to should something go wrong, during maintenance, or during a major outage.

In addition to disaster recovery, this function also allows you to create a copy of your database infrastructure (in just a couple of clicks) which you can use to test upgrades, patches, or to try some database performance enhancements.

You can also use this function to deploy an analytics or reporting setup, allowing you to separate your reporting load from your OLTP traffic.

PostgreSQL User Management

You now have the ability to add or remove user access to your PostgreSQL setup. With the simple interface, you can specify specific permissions or restrictions at the individual level. It also provides a view of all defined users who have access to the database, with their respective permissions. For tips on best practices around PostgreSQL user management you can check out this blog.

MariaDB 10.4 / Galera Cluster 4.x Support

In an effort to boost performance for long running or large transactions, MariaDB & Codership have partnered to add Streaming Replication to the new MariaDB Cluster 10.4. This addition solves many challenges that this technology has previously experienced with these types of transactions. There are three new system tables added to the release to support this new function as well as new synchronisation functions. You can read more about what’s included in this release here.

Tags:

mariadb galera cluster

user management

proxysql

load balancing

In a previous blog, we announced a new ClusterControl 1.7.4 feature called Cluster-to-Cluster Replication. It automates the entire process of setting up a DR cluster off your primary cluster, with replication in between. For more detailed information please refer to the above mentioned blog entry.

Now in this blog, we will take a look at how to configure this new feature for an existing cluster. For this task, we will assume you have ClusterControl installed and the Master Cluster was deployed using it.

Requirements for the Master Cluster

There are some requirements for the Master Cluster to make it work:

Percona XtraDB Cluster version 5.6.x and later, or MariaDB Galera Cluster version 10.x and later.
GTID enabled.
Binary Logging enabled on at least one database node.
The backup credentials must be the same across the Master Cluster and Slave Cluster.

Preparing the Master Cluster

The Master Cluster needs to be prepared to use this new feature. It requires configuration from both ClusterControl and Database side.

ClusterControl Configuration

In the database node, check the backup user credentials stored in /etc/my.cnf.d/secrets-backup.cnf (For RedHat Based OS) or in /etc/mysql/secrets-backup.cnf (For Debian Based OS).

$ cat /etc/my.cnf.d/secrets-backup.cnf

# Security credentials for backup.

[mysqldump]

user=backupuser

password=cYj0GFBEdqdreZEl



[xtrabackup]

user=backupuser

password=cYj0GFBEdqdreZEl



[mysqld]

wsrep_sst_auth=backupuser:cYj0GFBEdqdreZEl

In the ClusterControl node, edit the /etc/cmon.d/cmon_ID.cnf configuration file (where ID is the Cluster ID Number) and make sure it contains the same credentials stored in secrets-backup.cnf.

$ cat /etc/cmon.d/cmon_8.cnf

backup_user=backupuser

backup_user_password=cYj0GFBEdqdreZEl

basedir=/usr

cdt_path=/

cluster_id=8

...

Any change on this file requires a cmon service restart:

$ service cmon restart

Check the database replication parameters to make sure that you have GTID and Binary Logging enabled.

Database Configuration

In the database node, check the file /etc/my.cnf (For RedHat Based OS) or /etc/mysql/my.cnf (For Debian Based OS) to see the configuration related to the replication process.

Percona XtraDB:

$ cat /etc/my.cnf

# REPLICATION SPECIFIC

server_id=4002

binlog_format=ROW

log_bin = /var/lib/mysql-binlog/binlog

log_slave_updates = ON

gtid_mode = ON

enforce_gtid_consistency = true

relay_log = relay-log

expire_logs_days = 7

MariaDB Galera Cluster:

$ cat /etc/my.cnf

# REPLICATION SPECIFIC

server_id=9000

binlog_format=ROW

log_bin = /var/lib/mysql-binlog/binlog

log_slave_updates = ON

relay_log = relay-log

wsrep_gtid_domain_id=9000

wsrep_gtid_mode=ON

gtid_domain_id=9000

gtid_strict_mode=ON

gtid_ignore_duplicates=ON

expire_logs_days = 7

Insted checking the configuration files, you can verify if it’s enabled in the ClusterControl UI. Go to ClusterControl -> Select Cluster -> Nodes. There you should have something like this:

The “Master” role added in the first node means that the Binary Logging is enabled.

Enabling Binary Logging

If you don’t have the binary logging enabled, go to ClusterControl -> Select Cluster -> Nodes -> Node Actions -> Enable Binary Logging.

Then, you must specify the binary log retention, and the path to store it. You should also specify if you want ClusterControl to restart the database node after configuring it, or if you prefer to restart it by yourself.

Keep in mind that Enabling Binary Logging always requires a restart of the database service.

Creating the Slave Cluster from the ClusterControl GUI

To create a new Slave Cluster, go to ClusterControl -> Select Cluster -> Cluster Actions -> Create Slave Cluster.

The Slave Cluster can be created by streaming data from the current Master Cluster or by using an existing backup.

In this section, you must also choose the master node of the current cluster from which the data will be replicated.

When you go to the next step, you must specify User, Key or Password and port to connect by SSH to your servers. You also need a name for your Slave Cluster and if you want ClusterControl to install the corresponding software and configurations for you.

After setting up the SSH access information, you must define the database vendor and version, datadir, database port, and the admin password. Make sure you use the same vendor/version and credentials as used by the Master Cluster. You can also specify which repository to use.

In this step, you need to add servers to the new Slave Cluster. For this task, you can enter both IP Address or Hostname of each database node.

You can monitor the status of the creation of your new Slave Cluster from the ClusterControl activity monitor. Once the task is finished, you can see the cluster in the main ClusterControl screen.

Managing Cluster-to-Cluster Replication Using the ClusterControl GUI

Now you have your Cluster-to-Cluster Replication up and running, there are different actions to perform on this topology using ClusterControl.

Configure Active-Active Clusters

As you can see, by default the Slave Cluster is set up in Read-Only mode. It’s possible to disable the Read-Only flag on the nodes one by one from the ClusterControl UI, but keep in mind that Active-Active clustering is only recommended if applications are only touching disjoint data sets on either cluster since MySQL/MariaDB doesn’t offer any Conflict Detection or Resolution.

To disable the Read-Only mode, go to ClusterControl -> Select Slave Cluster -> Nodes. In this section, select each node and use the Disable Read-Only option.

Rebuilding a Slave Cluster

To avoid inconsistencies, if you want to rebuild a Slave Cluster, this must be a Read-Only cluster, this means that all nodes must be in Read-Only mode.

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Rebuild Replication Slave.

Topology Changes

If you have the following topology:

And for some reason, you want to change the replication node in the Master Cluster. It’s possible to change the master node used by the Slave Cluster to another master node in the Master Cluster.

To be considered as a master node, it must have the binary logging enabled.

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Stop Slave/Start Slave.

Stop/Start Replication Slave

You can stop and start replication slaves in an easy way using ClusterControl.

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Stop Slave/Start Slave.

Reset Replication Slave

Using this action, you can reset the replication process using RESET SLAVE or RESET SLAVE ALL. The difference between them is, RESET SLAVE doesn’t change any replication parameter like master host, port and credentials. To delete this information you must use RESET SLAVE ALL that removes all the replication configuration, so using this command the Cluster-to-Cluster Replication link will be destroyed.

Before using this feature, you must stop the replication process (please refer to the previous feature).

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Reset Slave/Reset Slave All.

Managing Cluster-to-Cluster Replication Using the ClusterControl CLI

In the previous section, you were able to see how to manage a Cluster-to-Cluster Replication using the ClusterControl UI. Now, let’s see how to do it by using the command line.

Note: As we mentioned at the beginning of this blog, we will assume you have ClusterControl installed and the Master Cluster was deployed using it.

Create the Slave Cluster

First, let’s see an example command to create a Slave Cluster by using the ClusterControl CLI:

$ s9s cluster --create --cluster-name=Galera1rep --cluster-type=galera  --provider-version=10.4 --nodes="192.168.100.166;192.168.100.167;192.168.100.168"  --os-user=root --os-key-file=/root/.ssh/id_rsa --db-admin=root --db-admin-passwd=xxxxxxxx --vendor=mariadb --remote-cluster-id=11 --log

Now you have your create slave process running, let’s see each used parameter:

Cluster: To list and manipulate clusters.
Create: Create and install a new cluster.
Cluster-name: The name of the new Slave Cluster.
Cluster-type: The type of cluster to install.
Provider-version: The software version.
Nodes: List of the new nodes in the Slave Cluster.
Os-user: The user name for the SSH commands.
Os-key-file: The key file to use for SSH connection.
Db-admin: The database admin user name.
Db-admin-passwd: The password for the database admin.
Remote-cluster-id: Master Cluster ID for the Cluster-to-Cluster Replication.
Log: Wait and monitor job messages.

Using the --log flag, you will be able to see the logs in real time:

Verifying job parameters.

Checking ssh/sudo on 3 hosts.

All 3 hosts are accessible by SSH.

192.168.100.166: Checking if host already exists in another cluster.

192.168.100.167: Checking if host already exists in another cluster.

192.168.100.168: Checking if host already exists in another cluster.

192.168.100.157:3306: Binary logging is enabled.

192.168.100.158:3306: Binary logging is enabled.

Creating the cluster with the following:

wsrep_cluster_address = 'gcomm://192.168.100.166,192.168.100.167,192.168.100.168'

Calling job: setupServer(192.168.100.166).

192.168.100.166: Checking OS information.

…

Caching config files.

Job finished, all the nodes have been added successfully.

Configure Active-Active Clusters

As you could see earlier, you can disable the Read-Only mode in the new cluster by disabling it in each node, so let’s see how to do it from the command line.

$ s9s node --set-read-write --nodes="192.168.100.166" --cluster-id=16 --log

Let’s see each parameter:

Node: To handle nodes.
Set-read-write: Set the node to Read-Write mode.
Nodes: The node where to change it.
Cluster-id: The ID of the cluster in which the node is.

Then, you will see:

192.168.100.166:3306: Setting read_only=OFF.

Rebuilding a Slave Cluster

You can rebuild a Slave Cluster using the following command:

$ s9s replication --stage --master="192.168.100.157:3306" --slave="192.168.100.166:3306" --cluster-id=19 --remote-cluster-id=11 --log

The parameters are:

Replication: To monitor and control data replication.
Stage: Stage/Rebuild a Replication Slave.
Master: The replication master in the master cluster.
Slave: The replication slave in the slave cluster.
Cluster-id: The Slave Cluster ID.
Remote-cluster-id: The Master Cluster ID.
Log: Wait and monitor job messages.

The job log should be similar to this one:

Rebuild replication slave 192.168.100.166:3306 from master 192.168.100.157:3306.

Remote cluster id = 11

Shutting down Galera Cluster.

192.168.100.166:3306: Stopping node.

192.168.100.166:3306: Stopping mysqld (timeout=60, force stop after timeout=true).

192.168.100.166: Stopping MySQL service.

192.168.100.166: All processes stopped.

192.168.100.166:3306: Stopped node.

192.168.100.167:3306: Stopping node.

192.168.100.167:3306: Stopping mysqld (timeout=60, force stop after timeout=true).

192.168.100.167: Stopping MySQL service.

192.168.100.167: All processes stopped.

…

192.168.100.157:3306: Changing master to 192.168.100.166:3306.

192.168.100.157:3306: Changed master to 192.168.100.166:3306

192.168.100.157:3306: Starting slave.

192.168.100.157:3306: Collecting replication statistics.

192.168.100.157:3306: Started slave successfully.

192.168.100.166:3306: Starting node

Writing file '192.168.100.167:/etc/mysql/my.cnf'.

Writing file '192.168.100.167:/etc/mysql/secrets-backup.cnf'.

Writing file '192.168.100.168:/etc/mysql/my.cnf'.

Topology Changes

You can change your topology using another node in the Master Cluster from which replicate the data, so for example, you can run:

$ s9s replication --failover --master="192.168.100.161:3306" --slave="192.168.100.163:3306" --cluster-id=10 --remote-cluster-id=9 --log

Let’s check the used parameters.

Replication: To monitor and control data replication.
Failover: Take the role of master from a failed/old master.
Master: The new replication master in the Master Cluster.
Slave: The replication slave in the Slave Cluster.
Cluster-id: The ID of the Slave Cluster.
Remote-Cluster-id: The ID of the Master Cluster.
Log: Wait and monitor job messages.

You will see this log:

192.168.100.161:3306 belongs to cluster id 9.

192.168.100.163:3306: Changing master to 192.168.100.161:3306

192.168.100.163:3306: My master is 192.168.100.160:3306.

192.168.100.161:3306: Sanity checking replication master '192.168.100.161:3306[cid:9]' to be used by '192.168.100.163[cid:139814070386698]'.

192.168.100.161:3306: Executing GRANT REPLICATION SLAVE ON *.* TO 'cmon_replication'@'192.168.100.163'.

Setting up link between  192.168.100.161:3306 and 192.168.100.163:3306

192.168.100.163:3306: Stopping slave.

192.168.100.163:3306: Successfully stopped slave.

192.168.100.163:3306: Setting up replication using MariaDB GTID: 192.168.100.161:3306->192.168.100.163:3306.

192.168.100.163:3306: Changing Master using master_use_gtid=slave_pos.

192.168.100.163:3306: Changing master to 192.168.100.161:3306.

192.168.100.163:3306: Changed master to 192.168.100.161:3306

192.168.100.163:3306: Starting slave.

192.168.100.163:3306: Collecting replication statistics.

192.168.100.163:3306: Started slave successfully.

192.168.100.160:3306: Flushing logs to update 'SHOW SLAVE HOSTS'

Stop/Start Replication Slave

You can stop to replicate the data from the Master Cluster in this way:

$ s9s replication --stop --slave="192.168.100.166:3306" --cluster-id=19 --log

You will see this:

192.168.100.166:3306: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'.

192.168.100.166:3306: Stopping slave.

192.168.100.166:3306: Successfully stopped slave.

And now, you can start it again:

$ s9s replication --start --slave="192.168.100.166:3306" --cluster-id=19 --log

So, you will see:

192.168.100.166:3306: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'.

192.168.100.166:3306: Starting slave.

192.168.100.166:3306: Collecting replication statistics.

192.168.100.166:3306: Started slave successfully.

Now, let’s check the used parameters.

Replication: To monitor and control data replication.
Stop/Start: To make the slave stop/start replicating.
Slave: The replication slave node.
Cluster-id: The ID of the cluster in which the slave node is.
Log: Wait and monitor job messages.

Reset Replication Slave

Using this command, you can reset the replication process using RESET SLAVE or RESET SLAVE ALL. For more information about this command, please check the usage of this in the previous ClusterControl UI section.

Before using this feature, you must stop the replication process (please refer to the previous command).

RESET SLAVE:

$ s9s replication --reset  --slave="192.168.100.166:3306" --cluster-id=19 --log

The log should be like:

192.168.100.166:3306: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'.

192.168.100.166:3306: Executing 'RESET SLAVE'.

192.168.100.166:3306: Command 'RESET SLAVE' succeeded.

RESET SLAVE ALL:

$ s9s replication --reset --force  --slave="192.168.100.166:3306" --cluster-id=19 --log

And this log should be:

192.168.100.166:3306: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'.

192.168.100.166:3306: Executing 'RESET SLAVE /*!50500 ALL */'.

192.168.100.166:3306: Command 'RESET SLAVE /*!50500 ALL */' succeeded.

Let’s see the used parameters for both RESET SLAVE and RESET SLAVE ALL.

Replication: To monitor and control data replication.
Reset: Reset the slave node.
Force: Using this flag you will use the RESET SLAVE ALL command on the slave node.
Slave: The replication slave node.
Cluster-id: The Slave Cluster ID.
Log: Wait and monitor job messages.

Conclusion

This new ClusterControl feature will allow you to create Cluster-to-Cluster Replication fast and manage it in an easy and friendly way. This environment will improve your database/cluster topology and it would be useful for a Disaster Recovery Plan, testing environment and even more options mentioned in the overview blog.

Tags:

mariadb galera cluster

disaster recovery

A node crash can happen at any time, it is unavoidable in any real world situation. Back then, when giant, standalone databases roamed the data world, each fall of such a titan created ripples of issues that moved across the world. Nowadays data world has changed. Few of the titans survived, they were replaced by swarms of small, agile, clustered database instances that can adapt to the ever changing business requirements.

One example of such a database is Galera Cluster, which (typically) is deployed in the form of a cluster of nodes. What changes if one of the Galera nodes fail? How does this affect the availability of the cluster as a whole? In this blog post we will dig into this and explain the Galera High Availability basics.

Galera Cluster and Database High Availability

Galera Cluster is typically deployed in clusters of at least three nodes. This is due to the fact that Galera uses a quorum mechanism to ensure that the cluster state is clear for all of the nodes and that the automated fault handling can happen. For that three nodes are required - more than 50% of the nodes have to be alive after a node’s crash in order for cluster to be able to operate.

Let’s assume you have a three nodes in Galera Cluster, just as on the diagram above. If one node crashes, the situation changes into following:

Node “3” is off but there are nodes “1” and “2”, which consist of 66% of all nodes in the cluster. This means, those two nodes can continue to operate and form a cluster. Node “3” (if it happens to be alive but it cannot connect to the other nodes in the cluster) will account for 33% of the nodes in the cluster, thus it will cease to operate.

We hope this is now clear: three nodes are the minimum. With two nodes each would be 50% of the nodes in the cluster thus neither will have majority - such cluster does not provide HA. What if we would add one more node?

Such setup allows also for one node to fail:

In such case we have three (75%) nodes up-and-running, which is the majority. What would happen if two nodes fail?

Two nodes are up, two are down. Only 50% of the nodes are available, there is no majority thus cluster has to cease its operations. The minimal cluster size to support failure of two nodes is five nodes:

In the case as above two nodes are off, three are remaining which makes it 60% available thus the majority is reached and cluster can operate.

To sum up, three nodes are the minimum cluster size to allow for one node to fail. Cluster should have an odd number of nodes - this is not a requirement but as we have seen, increasing cluster size from three to four did not make any difference on the high availability - still only one failure at the same time is allowed. To make the cluster more resilient and support two node failures at the same time, cluster size has to be increased from three to five. If you want to increase the cluster's ability to handle failures even further you have to add another two nodes.

Impact of Database Node Failure on the Cluster Load

In the previous section we have discussed the basic math of the high availability in Galera Cluster. One node can be off in a three node cluster, two off in a five node cluster. This is a basic requirement for Galera.

You have to also keep in mind other aspects too. We’ll take a quick look at them just now. For starters, the load on the cluster.

Let’s assume all nodes have been created equal. Same configuration, same hardware, they can handle the same load. Having load on one node only doesn’t make too much sense cost-wise on three node cluster (not to mention five node clusters or larger). You can safely expect that if you invest in three or five galera nodes you want to utilize all of them. This is quite easy - load balancers can distribute the load across all Galera nodes for you. You can send the writes to one node and balance reads across all nodes in the cluster. This poses additional threat you have to keep in mind. How does the load will look like if one node will be taken out of the cluster? Let’s take a look at the following case of a five node cluster.

We have five nodes, each one is handling 50% load. This is quite ok, nodes are fairly loaded yet they still have some capacity to accommodate unexpected spikes in the workload. As we discussed, such cluster can handle up to two node failures. Ok, let’s see how this would look like:

Two nodes are down, that’s ok. Galera can handle it. 100% of the load has to be redistributed across three remaining nodes. This makes it a total 250% of the load distributed across three nodes. As a result, each of them will be running at 83% of their capacity. This may be acceptable but 83% of the load on average means that the response time will be increased, queries will take longer and any spike in the workload most likely will cause serious issues.

Will our five node cluster (with 50% utilization of all nodes) really able to handle failure of two nodes? Well, not really, no. It will definitely not be as performant as the cluster before the crashes. It may survive but it’s availability may be seriously affected by temporary spikes in the workload.

You also have to keep in mind one more thing - failed nodes will have to be rebuilt. Galera has an internal mechanism that allows it to provision nodes which join the cluster after the crash. It can either be IST, incremental state transfer, when one of the remaining nodes have required data in gcache. If not, full data transfer will have to happen - all data will be transferred from one node (donor) to the joining node. The process is called SST - state snapshot transfer. Both IST and SST requires some resources. Data has to be read from disk on the donor and then transferred over the network. IST is more light-weight, SST is much heavier as all the data has to be read from disk on the donor. No matter which method will be used, some additional CPU cycles will be burnt. Will the 17% of the free resources on the donor enough to run the data transfer? It’ll depend on the hardware. Maybe. Maybe not. What doesn’t help is that most of the proxies, by default, remove donor node from the pool of nodes to send traffic to. This makes perfect sense - node in “Donor/Desync” state may lag behind the rest of the cluster.

When using Galera, which is virtually a synchronous cluster, we don’t expect nodes to lag. This could be a serious issue for the application. On the other hand, in our case, removing donor from the pool of nodes to load balance the workload ensures that the cluster will be overloaded (250% of the load will be distributed across two nodes only, 125% of node’s capacity is, well, more than it can handle). This would make the cluster definitely not available at all.

Conclusion

As you can see, high availability in the cluster is not just a matter of quorum calculation. You have to account for other factors like workload, its change in time, handling state transfers. When in doubt, test yourself. We hope this short blog post helped you to understand that high availability is quite a tricky subject even if only discussed based on two variables - number of nodes and node’s capacity. Understanding this should help you design better and more reliable HA environments with Galera Cluster.

Tags:

galera

galera cluster

mariadb cluster

mariadb galera cluster

database failure

failure