Quantcast
Channel: Severalnines - galera cluster
Viewing all 97 articles
Browse latest View live

Schema changes in Galera cluster for MySQL and MariaDB - how to avoid RSU locks

$
0
0

Working as MySQL DBA, you will often have to deal with schema changes. Changes to production databases are not popular among DBAs, but they are necessary when applications add new requirements on the databases. If you manage a Galera Cluster, this is even more challenging than usual - the default method of doing schema changes (Total Order Isolation) locks the whole cluster for the duration of the alter. There are two more ways to go, though - online schema change and Rolling Schema Upgrade.

A popular method of performing schema changes, using pt-online-schema-change, has its own limitations. It can be tricky if your workload consists of long running transactions, or the workload is highly concurrent and the tool may not be able to acquire metadata locks needed to create triggers. Triggers themselves can become a hard stop if you already have triggers in the table you need to alter (unless you use Galera Cluster based on MySQL 5.7). Foreign keys may also become a serious issue to deal with. You can find more data on those limitations in this Become a MySQL DBA blog post . New alternatives to pt-online-schema-change arrived recently - gh-ost created by GitHub, but it’s still a new tool and unless you evaluated it already, you may have to stick to pt-online-schema-change for time being.

This leaves Rolling Schema Upgrade as the only feasible method to execute schema changes where pt-online-schema-change failed or is not feasible to use. Theoretically speaking, it is a non-blocking operation - you run:

SET SESSION wsrep_OSU_method=RSU;

And the rest should happen automatically once you start the DDL - the node should be desynced and alter should not impact the rest of the cluster.

Let’s check how it behaves in real life, in two scenarios. First, we have a single connection to the Galera cluster. We don’t scale out reads, we just use Galera as a way to improve availability of our application. We will simulate it by running a sysbench workload on one of the Galera cluster nodes. We are also going to execute RSU on this node. A screenshot with result of this operation can be found below.

On the bottom right window you can see the output of sysbench - our application. On the top window there’s a SHOW PROCESSLIST output at the time the alter was running. As you can see, our application stalled for couple of seconds - for the duration of the alter command (visible in bottom left window). Graphs in the ClusterControl show the stalled queries in detail:

You may say, and rightly so, that this is expected - if you write to a node where schema change is performed, those writes have to wait.

What about if we use some sort of round-robin routing of connections? This can be done in the application (just define a pool of hosts to connect to), it can be done at the connector level. It also can be done using a proxy. Results are in the screenshot below.

As you can see, here we also have locked threads which were routed to the host where RSU was in progress. The rest of threads worked ok but some of the connections stalled for a duration of the alter. Please take a closer look at the length of an alter (11.78s) and the maximum response time (12.63s). Some of the users experienced significant performance degradation.

One question you may want to ask is - starting ALTER in RSU desyncs the Galera node. Proxies like ProxySQL, MaxScale or HAProxy (when used in connection with clustercheck script) should detect this behavior and redirect the traffic off the desynced host. Why is it locking commits? Unfortunately, there’s a high probability that some transactions will be in progress and those will get locked after the ALTER starts.

How to avoid the problem? You need to use a proxy. It’s not enough on it’s own, as we have proven just before. But as long as your proxy removes desynced hosts out of rotation, you can easily add this step to the RSU process and make sure a node is desynced and not accepting any form of traffic before you actually start your DDL.

mysql> SET GLOBAL wsrep_desync=1; SELECT SLEEP(20); ALTER TABLE sbtest.sbtest3 DROP INDEX idx_pad; ALTER TABLE sbtest.sbtest3 ADD KEY idx_pad (pad); SET GLOBAL wsrep_desync=0;

This should work with all proxies deployed through ClusterControl - HAProxy and MaxScale. ProxySQL will also be able to handle RSU executed in that way correctly.

Another method, which can be used for HAProxy, would be to disable a backend node by setting it to maintenance state. You can do this from ClusterControl:

Make sure you ticked the correct node, confirm that’s what you want to do, and in a couple of minutes you should be good to start RSU on that node. The host will be highlighted in brown:

It’s still better to be on the safe side and verify using SHOW PROCESLIST (also available in the ClusterControl -> Query Monitor -> Running Queries) that indeed no traffic is hitting this node. Once you are done running DML’s, you can enable the backend node again in the HAProxy tab in ClusterControl and you’ll be able to route traffic also to this node.

As you can see, even with load balancers used, running RSU may seriously impact performance and availability of your database. Most likely it’ll affect just a small subset of users (a few percent of connections), but it’s still not something we’d like to see. Using properly configured proxies (like those deployed by ClusterControl) and ensuring you first desync the node and then execute RSU, will be enough to avoid this type of problem.


Planets9s - vidaXL choses ClusterControl, scaling & sharding MongoDB & more!

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

vidaXL choses ClusterControl to manage its MongoDB & MySQL databases

This week we’re happy to announce that we’re helping our customer vidaXL, a global e-commerce platform, compete with eBay and Amazon - and in doing so, keep its tills ringing. In their own words: “Our back-end is reliant on different MySQL & MongoDB databases to tackle different tasks. Using several different tools, rather than a one-stop shop, was detrimental to our productivity. Severalnines’ ClusterControl is that “one-stop shop” and we haven’t looked back. It’s an awesome solution like no other.”

Read the announcement

Live webinar next Tuesday on scaling & sharding MongoDB

Join us on Tuesday next week, November 15th, for this webinar during which we’ll discuss how to plan your MongoDB scaling strategy up front. We’ll cover topics such as what the differences are in read and write scaling with MongoDB, read scaling considerations and read preference will be explained; and we’ll look at how sharding works in MongoDB and at how to scale and shard MongoDB using ClusterControl. “See” you there!

Sign up for the webinar

HA on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

As we regularly get questions on how to set up a Galera cluster with just 2 nodes, we published this hand blog post on why and how to go about that. The general consensus is that users should have at least 3 Galera nodes to avoid network partitioning. Yet there are some valid reasons for considering a 2 node deployment, e.g., if you want to achieve database high availability but have limited budget to spend on a third database node. Or perhaps you are running Galera in a development/sandbox environment and prefer a minimal setup. Whichever the reasoning, here’s a handy quick-guide on how to go about it.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

MySQL on Docker: Deploy a Homogeneous Galera Cluster with etcd

$
0
0

In the previous blog post, we have looked into the multi-host networking capabilities with Docker with native network and Calico. In this blog post, our journey to make Galera Cluster run smoothly on Docker containers continues. Deploying Galera Cluster on Docker is tricky when using orchestration tools. Due to the nature of the scheduler in container orchestration tools and the assumption of homogenous images, the scheduler will just fire the respective containers according to the run command and leave the bootstrapping process to the container’s entrypoint logic when starting up. And you do not want to do that for Galera - starting all nodes at once means each node will form a “1-node cluster” and you’ll end up with a disjointed system.

“Homogeneousing” Galera Cluster

That might be a new word, but it holds true for stateful services like MySQL Replication and Galera Cluster. As one might know, the bootstrapping process for Galera Cluster usually requires manual intervention, where you usually have to decide which node is the most advanced node to start bootstrapping from. There is nothing wrong with this step, you need to be aware of the state of each database node before deciding on the sequence of how to start them up. Galera Cluster is a distributed system, and its redundancy model works like that.

However, container orchestration tools like Docker Engine Swarm Mode and Kubernetes are not aware of the redundancy model of Galera. The orchestration tool presumes containers are independent from each other. If they are dependent, then you have to have an external service that monitors the state. The best way to achieve this is to use a key/value store as a reference point for other containers when starting up.

This is where service discovery like etcd comes into the picture. The basic idea is, each node should report its state periodically to the service. This simplifies the decision process when starting up. For Galera Cluster, the node that has wsrep_local_state_comment equal to Synced shall be used as a reference node when constructing the Galera communication address (gcomm) during joining. Otherwise, the most updated node has to get bootstrapped first.

Etcd has a very nice feature called TTL, where you can expire a key after a certain amount of time. This is useful to determine the state of a node, where the key/value entry only exists if an alive node reports to it. As a result, the node won’t have to connect to each other to determine state (which is very troublesome in a dynamic environment) when forming a cluster. For example, consider the following keys:

    {
        "createdIndex": 10074,
        "expiration": "2016-11-29T10:55:35.218496083Z",
        "key": "/galera/my_wsrep_cluster/10.255.0.7/wsrep_last_committed",
        "modifiedIndex": 10074,
        "ttl": 10,
        "value": "2881"
    },
    {
        "createdIndex": 10072,
        "expiration": "2016-11-29T10:55:34.650574629Z",
        "key": "/galera/my_wsrep_cluster/10.255.0.7/wsrep_local_state_comment",
        "modifiedIndex": 10072,
        "ttl": 10,
        "value": "Synced"
    }

After 10 seconds (ttl value), those keys will be removed from the entry. Basically, all nodes should report to etcd periodically with an expiring key. Container should report every N seconds when it's alive (wsrep_cluster_state_comment=Synced and wsrep_last_committed=#value) via a background process. If a container is down, it will no longer send the update to etcd, thus the keys are removed after expiration. This simply indicates that the node was registered but is no longer synced with the cluster. It will be skipped when constructing the Galera communication address at a later point.

The overall flow of joining procedure is illustrated in the following flow chart:

We have built a Docker image that follows the above. It is specifically built for running Galera Cluster using Docker’s orchestration tool. It is available at Docker Hub and our Github repository. It requires an etcd cluster as the discovery service (supports multiple etcd hosts) and based on Percona XtraDB Cluster 5.6. The image includes Percona Xtrabackup, jq (JSON processor) and also a shell script tailored for Galera health check called report_status.sh.

You are welcome to fork or contribute to the project. Any bugs can be reported via Github or via our support page.

Deploying etcd Cluster

etcd is a distributed key value store that provides a simple and efficient way to store data across a cluster of machines. It’s open-source and available on GitHub. It provides shared configuration and service discovery. A simple use-case is to store database connection details or feature flags in etcd as key value pairs. It gracefully handles leader elections during network partitions and will tolerate machine failures, including the leader.

Since etcd is the brain of the setup, we are going to deploy it as a cluster daemon, on three nodes, instead of using containers. In this example, we are going to install etcd on each of the Docker hosts and form a three-node etcd cluster for better availability.

We used CentOS 7 as the operating system, with Docker v1.12.3, build 6b644ec. The deployment steps in this blog post are basically similar to the one used in our previous blog post.

  1. Install etcd packages:

    $ yum install etcd
  2. Modify the configuration file accordingly depending on the Docker hosts:

    $ vim /etc/etcd/etcd.conf

    For docker1 with IP address 192.168.55.111:

    ETCD_NAME=etcd1
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.111:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"

    For docker2 with IP address 192.168.55.112:

    ETCD_NAME=etcd2
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.112:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"

    For docker3 with IP address 192.168.55.113:

    ETCD_NAME=etcd3
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"
  3. Start the service on docker1, followed by docker2 and docker3:

    $ systemctl enable etcd
    $ systemctl start etcd
  4. Verify our cluster status using etcdctl:

    [docker3 ]$ etcdctl cluster-health
    member 2f8ec0a21c11c189 is healthy: got healthy result from http://0.0.0.0:2379
    member 589a7883a7ee56ec is healthy: got healthy result from http://0.0.0.0:2379
    member fcacfa3f23575abe is healthy: got healthy result from http://0.0.0.0:2379
    cluster is healthy

That’s it. Our etcd is now running as a cluster on three nodes. The below illustrates our architecture:

Deploying Galera Cluster

Minimum of 3 containers is recommended for high availability setup. Thus, we are going to create 3 replicas to start with, it can be scaled up and down afterwards. Running standalone is also possible with standard "docker run" command as shown further down.

Before we start, it’s a good idea to remove any sort of keys related to our cluster name in etcd:

$ etcdctl rm /galera/my_wsrep_cluster --recursive

Ephemeral Storage

This is a recommended way if you plan on scaling the cluster out on more nodes (or scale back by removing nodes). To create a three-node Galera Cluster with ephemeral storage (MySQL datadir will be lost if the container is removed), you can use the following command:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Persistent Storage

To create a three-node Galera Cluster with persistent storage (MySQL datadir persists if the container is removed), add the mount option with type=volume:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--mount type=volume,source=galera-vol,destination=/var/lib/mysql \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Custom my.cnf

If you would like to include a customized MySQL configuration file, create a directory on the physical host beforehand:

$ mkdir /mnt/docker/mysql-config # repeat on all Docker hosts

Then, use the mount option with “type=bind” to map the path into the container. In the following example, the custom my.cnf is located at /mnt/docker/mysql-config/my-custom.cnf on each Docker host:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--mount type=volume,source=galera-vol,destination=/var/lib/mysql \
--mount type=bind,src=/mnt/docker/mysql-config,dst=/etc/my.cnf.d \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Wait for a couple of minutes and verify the service is running (CURRENT STATE = Running):

$ docker service ls mysql-galera
ID                         NAME            IMAGE               NODE           DESIRED STATE  CURRENT STATE           ERROR
2vw40cavru9w4crr4d2fg83j4  mysql-galera.1  severalnines/pxc56  docker1.local  Running        Running 5 minutes ago
1cw6jeyb966326xu68lsjqoe1  mysql-galera.2  severalnines/pxc56  docker3.local  Running        Running 12 seconds ago
753x1edjlspqxmte96f7pzxs1  mysql-galera.3  severalnines/pxc56  docker2.local  Running        Running 5 seconds ago

External applications/clients can connect to any Docker host IP address or hostname on port 3306, requests will be load balanced between the Galera containers. The connection gets NATed to a Virtual IP address for each service "task" (container, in this case) using the Linux kernel's built-in load balancing functionality, IPVS. If the application containers reside in the same overlay network (galera-net), then use the assigned virtual IP address instead. You can retrieve it using the inspect option:

$ docker service inspect mysql-galera -f "{{ .Endpoint.VirtualIPs }}"
[{89n5idmdcswqqha7wcswbn6pw 10.255.0.2/16} {1ufbr56pyhhbkbgtgsfy9xkww 10.0.0.2/24}]

Our architecture is now looking like this:

As a side note, you can also run Galera in standalone mode. This is probably useful for testing purposes like backup and restore, testing the impact of queries and so on. To run it just like a standalone MySQL container, use the standard docker run command:

$ docker run -d \
-p 3306 \
--name=galera-single \
-e MYSQL_ROOT_PASSWORD=mypassword \
-e DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
-e CLUSTER_NAME=my_wsrep_cluster \
-e XTRABACKUP_PASSWORD=mypassword \
severalnines/pxc56
ClusterControl
Single Console for Your Entire Database Infrastructure
Deploy, manage, monitor, scale your databases on the technology stack of your choice!

Scaling the Cluster

There are two ways you can do scaling:

  1. Use “docker service scale” command.
  2. Create a new service with same CLUSTER_NAME using “docker service create” command.

Docker’s “scale” Command

The scale command enables you to scale one or more services either up or down to the desired number of replicas. The command will return immediately, but the actual scaling of the service may take some time. Galera needs to be run an odd number of nodes to avoid network partitioning.

So a good number to scale to would be 5 and so on:

$ docker service scale mysql-galera=5

Wait for a couple of minutes to let the new containers reach the desired state. Then, verify the running service:

$ docker service ls
ID            NAME          REPLICAS  IMAGE               COMMAND
bwvwjg248i9u  mysql-galera  5/5       severalnines/pxc56

One drawback of using this method is that you have to use ephemeral storage because Docker will likely schedule the new containers on a Docker host that already has a Galera container running. If this happens, the volume will overlap the existing Galera containers’ volume. If you would like to use persistent storage and scale in Docker Swarm mode, you should create another new service with a couple of different options, as described in the next section.

At this point, our architecture looks like this:

Another Service with Same Cluster Name

Another way to scale is to create another service with the same CLUSTER_NAME and network. However, you can’t really use the exact same command as the first one due to the following reasons:

  • The service name should be unique.
  • The port mapping must be other than 3306, since this port has been assigned to the mysql-galera service.
  • The volume name should be different to distinguish them from the existing Galera containers.

A benefit of doing this is you will got another virtual IP address assigned to the “scaled” service. This allows you to have an additional option for your application or client to connect to the “scaled” IP address for various tasks, e.g. perform a full backup in desync mode, database consistency check or server auditing.

The following example shows the command to add two more nodes to the cluster in a new service called mysql-galera-scale:

$ docker service create \
--name mysql-galera-scale \
--replicas 2 \
-p 3307:3306 \
--network galera-net \
--mount type=volume,source=galera-scale-vol,destination=/var/lib/mysql \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

If we look into the service list, here is what we see:

$ docker service ls
ID            NAME                REPLICAS  IMAGE               COMMAND
0ii5bedv15dh  mysql-galera-scale  2/2       severalnines/pxc56
71pyjdhfg9js  mysql-galera        3/3       severalnines/pxc56

And when you look into the cluster size on one of the container, you should get 5:

[root@docker1 ~]# docker exec -it $(docker ps | grep mysql-galera | awk {'print $1'}) mysql -uroot -pmypassword -e 'show status like "wsrep_cluster_size"'
Warning: Using a password on the command line interface can be insecure.
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 5     |
+--------------------+-------+

At this point, our architecture looks like this:

To get a clearer view of the process, we can simply look at the MySQL error log file (located under Docker’s data volume) on one of the running containers, for example:

$ tail -f /var/lib/docker/volumes/galera-vol/_data/error.log

Scale Down

Scaling down is simple. Just reduce the number of replicas or remove the service that holds the minority number of containers to ensure that Galera is still in quorum. For example, if you have fired two groups of nodes with 3 + 2 containers and reach total of 5, the majority need to survive thus you can only remove the second group with 2 containers. If you have three groups with 3 + 2 + 2 containers, you can lose a maximum of 3 containers. This is due to the fact that the Docker Swarm scheduler simply terminates and removes the containers corresponding to the service. This makes Galera think that there are nodes failing, as they are not shut down in a graceful way.

If you scaled up using “docker service scale” command, you should scale down using the same method by reducing the number of replicas. To scale it down, simply do:

$ docker service scale mysql-galera=3

Otherwise, if you chose to create another service to scale up, then simply remove the respective service to scale down:

$ docker service rm mysql-galera-scale

Known Limitations

There will be no automatic recovery if a split-brain happens (where all nodes are in Non-Primary state). This is because the MySQL service is still running, yet it will refuse to serve any data and will return error to the client. Docker has no capability to detect this since what it cares about is the foreground MySQL process which is not terminated, killed or stopped. Automating this process is risky, especially if the service discovery is co-located with the Docker host (etcd would also lose contact with other members). Although if the service discovery is healthy somewhere else, it is probably unreachable from the Galera containers perspective, preventing each other to see the container’s status correctly during the glitch.

In this case, you will need to intervene manually.

Choose the most advanced node to bootstrap and then run the following command to promote the node as Primary (other nodes shall then rejoin automatically if the network recovers):

$ docker exec -it [container ID] mysql -uroot -pyoursecret -e 'set global wsrep_provider_option="pc.bootstrap=1"'

Also, there is no automatic cleanup for the discovery service registry. You can remove all entries using either the following command (assuming the CLUSTER_NAME is my_wsrep_cluster):

$ curl http://192.168.55.111:2379/v2/keys/galera/my_wsrep_cluster?recursive=true -XDELETE # or
$ etcdctl rm /galera/my_wsrep_cluster --recursive

Conclusion

This combination of technologies opens a door for a more reliable database setup in the Docker ecosystem. Working with service discovery to store state makes it possible to have stateful containers to achieve a homogeneous setup.

In the next blog post, we are going to look into how to manage Galera Cluster on Docker.

Planets9s - MySQL on Docker: Building the Container Images, Monitoring MongoDB and more

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

MySQL on Docker: Building the Container Image

Building a docker image for MySQL is essential if you’d like to customize MySQL to suit your needs. In this second post of our ‘MySQL on Docker’ series, we show you two ways to build your own MySQL Docker image - changing a base image and committing, or using Dockerfile. We show you how to extend the Docker team’s MySQL image, and add Percona XtraBackup to it.

Read the blog

Sign up for our webinar on Monitoring MongoDB - Tuesday July 12th

MongoDB offers many metrics through various status overviews or commands, and as MySQL DBA, it might be a little unfamiliar ground to get started with. In this webinar on July 12th, we’ll discuss the most important ones and describe them in ordinary plain MySQL DBA language. We’ll have a look at the open source tools available for MongoDB monitoring and trending. And we’ll show you how to leverage ClusterControl’s MongoDB metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.

Sign up for the webinar

StreamAMG chooses ClusterControl to support its online European football streaming

This week we’re delighted to announce a new ClusterControl customer, StreamAMG (Advanced Media Group), Europe’s largest player in online video solutions, helping football teams such as Liverpool FC, Aston Villa, Sunderland AFC and the BBC keep fans watching from across the world. StreamAMG replaced its previous environment, based on a master-slave replication topology, with a multi-master Galera Cluster; and Severalnines’ ClusterControl platform was applied to automate operational tasks and provide visibility of uptime and performance through monitoring capabilities.

Read the story

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

How to set up read-write split in Galera Cluster using ProxySQL

$
0
0

Edited on Sep 12, 2016 to correct the description of how ProxySQL handles session variables. Many thanks to Francisco Miguel for pointing this out.


ProxySQL is becoming more and more popular as SQL-aware load balancer for MySQL and MariaDB. In previous blog posts, we covered installation of ProxySQL and its configuration in a MySQL replication environment. We’ve covered how to set up ProxySQL to perform failovers executed from ClusterControl. At that time, Galera support in ProxySQL was a bit limited - you could configure Galera Cluster and split traffic across all nodes but there was no easy way to implement read-write split of your traffic. The only way to do that was to create a daemon which would monitor Galera state and update weights of backend servers defined in ProxySQL - a much more complex task than to write a small bash script.

In one of the recent ProxySQL releases, a very important feature was added - a scheduler, which allows to execute external scripts from within ProxySQL even as often as every millisecond (well, as long as your script can execute within this time frame). This feature creates an opportunity to extend ProxySQL and implement setups which were not possible to build easily in the past due to too low granularity of the cron schedule. In this blog post, we will show you how to take advantage of this new feature and create a Galera Cluster with read-write split performed by ProxySQL.

First, we need to install and start ProxySQL:

[root@ip-172-30-4-215 ~]# wget https://github.com/sysown/proxysql/releases/download/v1.2.1/proxysql-1.2.1-1-centos7.x86_64.rpm

[root@ip-172-30-4-215 ~]# rpm -i proxysql-1.2.1-1-centos7.x86_64.rpm
[root@ip-172-30-4-215 ~]# service proxysql start
Starting ProxySQL: DONE!

Next, we need to download a script which we will use to monitor Galera status. Currently it has to be downloaded separately but in the next release of ProxySQL it should be included in the rpm. The script needs to be located in /var/lib/proxysql.

[root@ip-172-30-4-215 ~]# wget https://raw.githubusercontent.com/sysown/proxysql/master/tools/proxysql_galera_checker.sh

[root@ip-172-30-4-215 ~]# mv proxysql_galera_checker.sh /var/lib/proxysql/
[root@ip-172-30-4-215 ~]# chmod u+x /var/lib/proxysql/proxysql_galera_checker.sh

If you are not familiar with this script, you can check what arguments it accepts by running:

[root@ip-172-30-4-215 ~]# /var/lib/proxysql/proxysql_galera_checker.sh
Usage: /var/lib/proxysql/proxysql_galera_checker.sh <hostgroup_id write> [hostgroup_id read] [number writers] [writers are readers 0|1} [log_file]

As we can see, we need to pass couple of arguments - hostgroups for writers and readers, number of writers which should be active at the same time. We also need to pass information if writers can be used as readers and, finally, path to a log file.

Next, we need to connect to ProxySQL’s admin interface. For that you need to know credentials - you can find them in a configuration file, typically located in /etc/proxysql.cnf:

admin_variables=
{
        admin_credentials="admin:admin"
        mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
#       refresh_interval=2000
#       debug=true
}

Knowing the credentials and interfaces on which ProxySQL listens, we can connect to the admin interface and begin configuration.

[root@ip-172-30-4-215 ~]# mysql -P6032 -uadmin -padmin -h 127.0.0.1

First, we need to fill mysql_servers table with information about our Galera nodes. We will add them twice, to two different hostgroups. One hostgroup (with hostgroup_id of 0) will handle writes while the second hostgroup (with hostgroup_id of 1) will handle reads.

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (0, '172.30.4.238', 3306), (0, '172.30.4.184', 3306), (0, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (1, '172.30.4.238', 3306), (1, '172.30.4.184', 3306), (1, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

Next, we need to add information about users which will be used by the application. We used a plain text password here but ProxySQL accepts also hashed passwords in MySQL format.

MySQL [(none)]> INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('sbtest', 'sbtest', 1, 0);
Query OK, 1 row affected (0.00 sec)

What’s important to keep in mind is the default_hostgroup setting - we set it to ‘0’ which means that, unless one of query rules say different, all queries will be sent to the hostgroup 0 - our writers.

At this point we need to define query rules which will handle read/write split. First, we want to match all SELECT queries:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

It is important to make sure you get the regex correctly. It is also crucial to note that we set ‘apply’ column to ‘0’. This means that our rule won’t be the final one - a query, even if it matches the regex, will be tested against next rule in the chain. You can see why we’ve done that when you look at our second rule:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

We are looking for SELECT … FOR UPDATE queries, that’s why we couldn’t just finish checking our SELECT queries on the first rule. SELECT … FOR UPDATE should be routed to our write hostgroup, where UPDATE will happen.

Those settings will work fine if autocommit is enabled and no explicit transactions are used. If your application uses transactions, one of the methods to make them work safely in ProxySQL is to use the following set of queries:

SET autocommit=0;
BEGIN;
...

The transaction is created and it will stick to the host where it was opened. You also need to have a query rule for BEGIN, which would route it to the hostgroup for writers - in our case we leverage the fact that, by default, all queries executed as ‘sbtest’ user are routed to writers’ hostgroup (‘0’) so there’s no need to add anything.

Second method would be to enable persistent transactions for our user (transaction_persistent column in mysql_users table should be set to ‘1’).

ProxySQL’s handling of other SET statements and user-defined variables is another thing we’d like to discuss a bit here. ProxySQL works on two levels of routing. First - query rules. You need to make sure all your queries are routed accordingly to your needs. Then, connection mutiplexing - even when routed to the same host, every query you issue may in fact use a different connection to the backend. This makes things hard for session variables. Luckily, ProxySQL treats all queries containing ‘@’ character in a special way - once it detects it, it disables connection multiplexing for the duration of that session - thanks to that, we don’t have to be worried that the next query won’t know a thing about our session variable.

The only thing we need to make sure of is that we end up in the correct hostgroup before disabling connection multiplexing. To cover all cases, the ideal hostgroup in our setup would be the one with writers. This would require slight change in the way we set our query rules (you may require to run ‘DELETE FROM mysql_query_rules’ if you already added the query rules we mentioned earlier).

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '.*@.*', 0, 1);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

Those two cases could become a problem in our setup but as long as you are not affected by them (or if you used the proposed workarounds), we can proceed further with configuration. We still need to setup our script to be executed from ProxySQL:

MySQL [(none)]> INSERT INTO scheduler (id, active, interval_ms, filename, arg1, arg2, arg3, arg4, arg5) VALUES (1, 1, 1000, '/var/lib/proxysql/proxysql_galera_checker.sh', 0, 1, 1, 1, '/var/lib/proxysql/proxysql_galera_checker.log');
Query OK, 1 row affected (0.01 sec)

Additionally, because of the way how Galera handles dropped nodes, we want to increase the number of attempts that ProxySQL will make before it decides a host cannot be reached.

MySQL [(none)]> SET mysql-query_retries_on_failure=10;
Query OK, 1 row affected (0.00 sec)

Finally, we need to apply all changes we made to the runtime configuration and save them to disk.

MySQL [(none)]> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; LOAD SCHEDULER TO RUNTIME; SAVE SCHEDULER TO DISK; LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 64 rows affected (0.05 sec)

Ok, let’s see how things work together. First, verify that our script works by looking at /var/lib/proxysql/proxysql_galera_checker.log:

Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.238:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.238:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.67:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.67:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.238:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Check server 1:172.30.4.67:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Number of writers online: 3 : hostgroup: 0
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.238:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.67:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Enabling config

Looks ok. Next we can check mysql_servers table:

MySQL [(none)]> select hostgroup_id, hostname, status from mysql_servers;
+--------------+--------------+--------------+
| hostgroup_id | hostname     | status       |
+--------------+--------------+--------------+
| 0            | 172.30.4.238 | OFFLINE_SOFT |
| 0            | 172.30.4.184 | ONLINE       |
| 0            | 172.30.4.67  | OFFLINE_SOFT |
| 1            | 172.30.4.238 | ONLINE       |
| 1            | 172.30.4.184 | ONLINE       |
| 1            | 172.30.4.67  | ONLINE       |
+--------------+--------------+--------------+
6 rows in set (0.00 sec)

Again, everything looks as expected - one host is taking writes (172.30.4.184), all three are handling reads. Let’s start sysbench to generate some traffic and then we can check how ProxySQL will handle failure of the writer host.

[root@ip-172-30-4-215 ~]# while true ; do sysbench --test=/root/sysbench/sysbench/tests/db/oltp.lua --num-threads=6 --max-requests=0 --max-time=0 --mysql-host=172.30.4.215 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=6033 --oltp-tables-count=32 --report-interval=1 --oltp-skip-trx=on --oltp-read-only=off --oltp-table-size=100000  run ;done

We are going to simulate a crash by killing the mysqld process on host 172.30.4.184. This is what you’ll see on the application side:

[  45s] threads: 6, tps: 0.00, reads: 4891.00, writes: 1398.00, response time: 23.67ms (95%), errors: 0.00, reconnects:  0.00
[  46s] threads: 6, tps: 0.00, reads: 4973.00, writes: 1425.00, response time: 25.39ms (95%), errors: 0.00, reconnects:  0.00
[  47s] threads: 6, tps: 0.00, reads: 5057.99, writes: 1439.00, response time: 22.23ms (95%), errors: 0.00, reconnects:  0.00
[  48s] threads: 6, tps: 0.00, reads: 2743.96, writes: 774.99, response time: 23.26ms (95%), errors: 0.00, reconnects:  0.00
[  49s] threads: 6, tps: 0.00, reads: 0.00, writes: 1.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  50s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  51s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  52s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  53s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  54s] threads: 6, tps: 0.00, reads: 1235.02, writes: 354.01, response time: 6134.76ms (95%), errors: 0.00, reconnects:  0.00
[  55s] threads: 6, tps: 0.00, reads: 5067.98, writes: 1459.00, response time: 24.95ms (95%), errors: 0.00, reconnects:  0.00
[  56s] threads: 6, tps: 0.00, reads: 5131.00, writes: 1458.00, response time: 22.07ms (95%), errors: 0.00, reconnects:  0.00
[  57s] threads: 6, tps: 0.00, reads: 4936.02, writes: 1414.00, response time: 22.37ms (95%), errors: 0.00, reconnects:  0.00
[  58s] threads: 6, tps: 0.00, reads: 4929.99, writes: 1404.00, response time: 24.79ms (95%), errors: 0.00, reconnects:  0.00

There’s a ~5 seconds break but otherwise, no error was reported. Of course, your mileage may vary - all depends on Galera settings and your application. Such feat might not be possible if you use transactions in your application.

To summarize, we showed you how to configure read-write split in Galera Cluster using ProxySQL. There are a couple of limitations due to the way the proxy works, but as long as none of them are a blocker, you can use it and benefit from other ProxySQL features like caching or query rewriting. Please also keep in mind that the script we used for setting up read-write split is just an example which comes from ProxySQL. If you’d like it to cover more complex cases, you can easily write one tailored to your needs.

Planets9s - Try the new ClusterControl 1.3.2 with its new deployment wizard

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Try the new ClusterControl 1.3.2 with its new deployment wizard

This week we’re delighted to announce the release of ClusterControl 1.3.2, which includes the following features: a new alarm viewer and new deployment wizard for MySQL, MongoDB & PostgreSQL making it ever easier to deploy your favourite open source databases; it also includes deployment of MongoDB sharded clusters as well as MongoDB advisors. If you haven’t tried it out yet, now is the time to download this latest release and provide us with your feedback.

Download the new ClusterControl

New partnership with WooServers helps start-ups challenge Google, Amazon and Microsoft

In addition to announcing the new ClusterControl 1.3.2 this week, we’ve also officially entered into a new partnership with WooServers to bring ClusterControl to web hosting. WooServers is a web hosting platform, used by 5,500 businesses, such as WhiteSharkMedia and SwiftServe to host their websites and applications. With ClusterControl, WooServers makes available a managed service that includes comprehensive infrastructure automation and management of MySQL-based database clusters. The service is available on WooServers data centers, as well as on Amazon Web Services and Microsoft Azure.

Find out more

Sign up for Part 2 of our MySQL Query Tuning Trilogy: Indexing and EXPLAIN

You can now sign up for Part 2 of our webinar trilogy on MySQL Query Tuning. In this follow up webinar to the one on process and tools, we’ll cover topics such as SQL tuning, indexing, the optimizer and how to leverage EXPLAIN to gain insight into execution plans. More specifically, we’ll look at how B-Tree indexes are built, indexes MyISAM vs. InnoDB, different index types such as B-Tree, Fulltext and Hash, indexing gotchas and an EXPLAIN walkthrough of a query execution plan.

Sign up today

How to set up read-write split in Galera Cluster using ProxySQL

ProxySQL is an SQL-aware load balancer for MySQL and MariaDB. A scheduler was recently added, making it possible to execute external scripts from within ProxySQL. In this new blog post, we’ll show you how to take advantage of this new feature to perform read-write splits on your Galera Cluster.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Planets9s - Download our new ‘Database Sharding with MySQL Fabric’ whitepaper

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Download our new whitepaper: Database Sharding with MySQL Fabric

Database systems with large data sets or high throughput applications can challenge the capacity of a single database server, and sharding is a way to address that. Spreading your database across multiple servers sounds good, but how does this work in practice?

In this whitepaper, we will have a close look at MySQL Fabric. You will learn the basics, and also learn how to migrate to a sharded environment.

Download the whitepaper

Sign up for our 9 DevOps Tips for going in production with Galera Cluster for MySQL / MariaDB webinar

Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live. In this webinar, we’ll guide you through 9 key devops tips to consider before taking Galera Cluster for MySQL / MariaDB into production.

Sign up for the webinar

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

Deploying a MySQL Galera Cluster with redundant load balancing can be time consuming. This blog looks at how much time it would take to do it manually, using the popular “Google university” to search for how-to’s and blogs that provide deployment steps. Or using our agentless management and automation console, ClusterControl, which supports MySQL (Oracle and Percona server), MariaDB, MongoDB (MongoDB inc. and Percona), and PostgreSQL.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Planets9s - 9 DevOps Tips for MySQL / MariaDB Galera Cluster, MySQL Query Tuning Part 2 and more!

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

New webinar: 9 DevOps Tips for going in production with MySQL / MariaDB Galera Cluster

In this new webinar on October 11th, Johan Andersson, CTO at Severalnines, will guide you through 9 key DevOps tips to consider before taking Galera Cluster for MySQL / MariaDB into production. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider before going live with Galera Cluster and Johan will share his 9 DevOps tips with you for a successful production environment.

Sign up for the webinar

Watch the replay: MySQL Query Tuning Part 2 - Indexing and EXPLAIN

You can now watch the replay of Part 2 of our webinar trilogy on MySQL Query Tuning, which covers Indexing as well as EXPLAIN, one of the most important tools in the DBA’s arsenal. Our colleague Krzysztof Książek, Senior Support Engineer at Severalnines, presents this webinar trilogy and this week he looked into answering questions such as why a given query might be slow, what the execution plan might look like, how JOINs might be processed, whether a given query is using the correct indexes, or whether it’s creating a temporary table. Find out more by watching the replay of this webinar.

Watch the replay

Download our whitepaper on Database Sharding with MySQL Fabric

This new whitepaper provides a close look at database sharding with MySQL Fabric. You will learn the basics of it, and also learn how to migrate to a sharded environment. It further discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Download the whitepaper

Critical zero-day vulnerabilities exposed in MySQL

Database security notice: you can easily upgrade your MySQL and MariaDB servers with ClusterControl and this new blog post shows you how. You must have heard about the CVE-2016-6662, the recent zero-day exploit exposed in most of MySQL and its variants. The vulnerability flaw can be exploited by a remote attacker to inject malicious settings into your my.cnf,. We advise you to upgrade as soon as possible, if you haven’t done so yet, with these easy-to-follow instructions for ClusterControl users.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB


Planets9s - MySQL on Docker with Calico, Galera Cluster DevOps Tips, Percona Live and more

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

MySQL on Docker: Multi-Host Networking for MySQL Containers (Part 2 - Calico)

We’re continuing with our popular blogs covering MySQL on Docker and this time we’re looking at deploying MySQL Replication on top of three Docker hosts via Calico’s driver on multi-host networking. Having previously looked at Docker’s single-host networking for MySQL containers as well as multi-host networking and Docker swarm mode, we’re now casting our eyes on other networking drivers, starting with Calico.

Read the blog

Still time to sign up: 9 DevOps Tips for going in production with Galera Cluster

We’re live next Tuesday, October 11th, with Johan Andersson, CTO at Severalnines, who will be sharing his 9 DevOps tips with you for a successful Galera Cluster for MySQL / MariaDB production environment. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups: Johan will guide you through all aspects to consider before going live with Galera Cluster.

Sign up for the webinar

We had plenty of 9s on deck at Percona Live Amsterdam this week

If you didn’t get the chance to attend this year’s Percona Live Europe, here’s our recap of the conference with soundbites, live broadcast replays from our booth where we discussed ClusterControl and conference highlights with Severalnines CEO Vinay Joosery … we’re also putting our team in the spotlight, so that you can see some of the faces behind the 9s!

Read the conference blog

Sharding MySQL with MySQL Fabric and ProxySQL

There are numerous ways to migrate into MySQL Fabric, which can all be pretty challenging. The solution to this complex challenge consists of several elements: 1. MySQL Fabric with its sharding system of high availability groups and tools around it. 2. MySQL Router - it allows regular MySQL clients to connect to different high availability groups created in MySQL Fabric. 3. ProxySQL, which allows users to perform a flawless failover (from the old database to the sharded setup). This blog post and related whitepaper show you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Planets9s - Joining the beautiful game with Wyscout, 9 DevOps Tips for Galera Cluster replay and more

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Watch the replay: 9 DevOps Tips for going in production with MySQL / MariaDB Galera Cluster

You can now watch the replay of this week’s webinar, during which our CTO Johan Andersson walked us through his tips & tricks on important aspects to consider before going live with MySQL / MariaDB Galera Cluster. While it may be easy to deploy Galera Cluster, how it behaves under real workload, scale, and during long term operation is a slightly more complex matter. Find out about some key best practices around monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades and performing backups to be best prepared for going in production with Galera.

Watch the replay

We’ve signed up football video and data platform Wyscout used by Real Madrid, Arsenal, Juventus & many more!

This week we’re delighted to be joining in on the beautiful game thanks to our new customer Wyscout, the world’s leading company providing video, data and technology to football people all over the world. The Wyscout team use ClusterControl to manage the database where all of the player intelligence is stored in and their platform is used by the world’s biggest clubs including, Arsenal, Juventus and Real Madrid. Needless to say we’re excited to be part of Wyscout’s global football adventure!

Read the press release

ClusterControl Tips & Tricks - Custom graphs to monitor your MySQL, MariaDB, MongoDB and PostgreSQL systems

As all of you will likely know, graphs play an important part in database management, as they’re your window onto your monitored systems. ClusterControl comes with a predefined set of graphs for you to analyze, which are designed to give you, at first glance, as much information as possible about the state of your database cluster. And as you might have your own set of metrics you’d like to monitor, ClusterControl allows you to customize the graphs available in the cluster overview section. This blog shows you how to make best use of these graphs and of our customisation features.

Read the blog

Become a MongoDB DBA: How to scale reads

This new post in our ‘Become a MongoDB DBA’ blog series takes us into the realm of scaling MongoDB. MongoDB offers both read- and write scaling, and we will uncover the differences of these two strategies for you. Whether to choose read- or write scaling all depends on the workload of your application: if your application tends to read more often than it writes data you will probably want to make use of the read scaling capabilities of MongoDB. This blog walks you through MongoDB read scaling.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Schema changes in Galera cluster for MySQL and MariaDB - how to avoid RSU locks

$
0
0

Working as MySQL DBA, you will often have to deal with schema changes. Changes to production databases are not popular among DBAs, but they are necessary when applications add new requirements on the databases. If you manage a Galera Cluster, this is even more challenging than usual - the default method of doing schema changes (Total Order Isolation) locks the whole cluster for the duration of the alter. There are two more ways to go, though - online schema change and Rolling Schema Upgrade.

A popular method of performing schema changes, using pt-online-schema-change, has its own limitations. It can be tricky if your workload consists of long running transactions, or the workload is highly concurrent and the tool may not be able to acquire metadata locks needed to create triggers. Triggers themselves can become a hard stop if you already have triggers in the table you need to alter (unless you use Galera Cluster based on MySQL 5.7). Foreign keys may also become a serious issue to deal with. You can find more data on those limitations in this Become a MySQL DBA blog post . New alternatives to pt-online-schema-change arrived recently - gh-ost created by GitHub, but it’s still a new tool and unless you evaluated it already, you may have to stick to pt-online-schema-change for time being.

This leaves Rolling Schema Upgrade as the only feasible method to execute schema changes where pt-online-schema-change failed or is not feasible to use. Theoretically speaking, it is a non-blocking operation - you run:

SET SESSION wsrep_OSU_method=RSU;

And the rest should happen automatically once you start the DDL - the node should be desynced and alter should not impact the rest of the cluster.

Let’s check how it behaves in real life, in two scenarios. First, we have a single connection to the Galera cluster. We don’t scale out reads, we just use Galera as a way to improve availability of our application. We will simulate it by running a sysbench workload on one of the Galera cluster nodes. We are also going to execute RSU on this node. A screenshot with result of this operation can be found below.

On the bottom right window you can see the output of sysbench - our application. On the top window there’s a SHOW PROCESSLIST output at the time the alter was running. As you can see, our application stalled for couple of seconds - for the duration of the alter command (visible in bottom left window). Graphs in the ClusterControl show the stalled queries in detail:

You may say, and rightly so, that this is expected - if you write to a node where schema change is performed, those writes have to wait.

What about if we use some sort of round-robin routing of connections? This can be done in the application (just define a pool of hosts to connect to), it can be done at the connector level. It also can be done using a proxy. Results are in the screenshot below.

As you can see, here we also have locked threads which were routed to the host where RSU was in progress. The rest of threads worked ok but some of the connections stalled for a duration of the alter. Please take a closer look at the length of an alter (11.78s) and the maximum response time (12.63s). Some of the users experienced significant performance degradation.

One question you may want to ask is - starting ALTER in RSU desyncs the Galera node. Proxies like ProxySQL, MaxScale or HAProxy (when used in connection with clustercheck script) should detect this behavior and redirect the traffic off the desynced host. Why is it locking commits? Unfortunately, there’s a high probability that some transactions will be in progress and those will get locked after the ALTER starts.

How to avoid the problem? You need to use a proxy. It’s not enough on it’s own, as we have proven just before. But as long as your proxy removes desynced hosts out of rotation, you can easily add this step to the RSU process and make sure a node is desynced and not accepting any form of traffic before you actually start your DDL.

mysql> SET GLOBAL wsrep_desync=1; SELECT SLEEP(20); ALTER TABLE sbtest.sbtest3 DROP INDEX idx_pad; ALTER TABLE sbtest.sbtest3 ADD KEY idx_pad (pad); SET GLOBAL wsrep_desync=0;

This should work with all proxies deployed through ClusterControl - HAProxy and MaxScale. ProxySQL will also be able to handle RSU executed in that way correctly.

Another method, which can be used for HAProxy, would be to disable a backend node by setting it to maintenance state. You can do this from ClusterControl:

Make sure you ticked the correct node, confirm that’s what you want to do, and in a couple of minutes you should be good to start RSU on that node. The host will be highlighted in brown:

It’s still better to be on the safe side and verify using SHOW PROCESLIST (also available in the ClusterControl -> Query Monitor -> Running Queries) that indeed no traffic is hitting this node. Once you are done running DML’s, you can enable the backend node again in the HAProxy tab in ClusterControl and you’ll be able to route traffic also to this node.

As you can see, even with load balancers used, running RSU may seriously impact performance and availability of your database. Most likely it’ll affect just a small subset of users (a few percent of connections), but it’s still not something we’d like to see. Using properly configured proxies (like those deployed by ClusterControl) and ensuring you first desync the node and then execute RSU, will be enough to avoid this type of problem.

Planets9s - vidaXL choses ClusterControl, scaling & sharding MongoDB & more!

$
0
0

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

vidaXL choses ClusterControl to manage its MongoDB & MySQL databases

This week we’re happy to announce that we’re helping our customer vidaXL, a global e-commerce platform, compete with eBay and Amazon - and in doing so, keep its tills ringing. In their own words: “Our back-end is reliant on different MySQL & MongoDB databases to tackle different tasks. Using several different tools, rather than a one-stop shop, was detrimental to our productivity. Severalnines’ ClusterControl is that “one-stop shop” and we haven’t looked back. It’s an awesome solution like no other.”

Read the announcement

Live webinar next Tuesday on scaling & sharding MongoDB

Join us on Tuesday next week, November 15th, for this webinar during which we’ll discuss how to plan your MongoDB scaling strategy up front. We’ll cover topics such as what the differences are in read and write scaling with MongoDB, read scaling considerations and read preference will be explained; and we’ll look at how sharding works in MongoDB and at how to scale and shard MongoDB using ClusterControl. “See” you there!

Sign up for the webinar

HA on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

As we regularly get questions on how to set up a Galera cluster with just 2 nodes, we published this hand blog post on why and how to go about that. The general consensus is that users should have at least 3 Galera nodes to avoid network partitioning. Yet there are some valid reasons for considering a 2 node deployment, e.g., if you want to achieve database high availability but have limited budget to spend on a third database node. Or perhaps you are running Galera in a development/sandbox environment and prefer a minimal setup. Whichever the reasoning, here’s a handy quick-guide on how to go about it.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

MySQL on Docker: Deploy a Homogeneous Galera Cluster with etcd

$
0
0

In the previous blog post, we have looked into the multi-host networking capabilities with Docker with native network and Calico. In this blog post, our journey to make Galera Cluster run smoothly on Docker containers continues. Deploying Galera Cluster on Docker is tricky when using orchestration tools. Due to the nature of the scheduler in container orchestration tools and the assumption of homogenous images, the scheduler will just fire the respective containers according to the run command and leave the bootstrapping process to the container’s entrypoint logic when starting up. And you do not want to do that for Galera - starting all nodes at once means each node will form a “1-node cluster” and you’ll end up with a disjointed system.

“Homogeneousing” Galera Cluster

That might be a new word, but it holds true for stateful services like MySQL Replication and Galera Cluster. As one might know, the bootstrapping process for Galera Cluster usually requires manual intervention, where you usually have to decide which node is the most advanced node to start bootstrapping from. There is nothing wrong with this step, you need to be aware of the state of each database node before deciding on the sequence of how to start them up. Galera Cluster is a distributed system, and its redundancy model works like that.

However, container orchestration tools like Docker Engine Swarm Mode and Kubernetes are not aware of the redundancy model of Galera. The orchestration tool presumes containers are independent from each other. If they are dependent, then you have to have an external service that monitors the state. The best way to achieve this is to use a key/value store as a reference point for other containers when starting up.

This is where service discovery like etcd comes into the picture. The basic idea is, each node should report its state periodically to the service. This simplifies the decision process when starting up. For Galera Cluster, the node that has wsrep_local_state_comment equal to Synced shall be used as a reference node when constructing the Galera communication address (gcomm) during joining. Otherwise, the most updated node has to get bootstrapped first.

Etcd has a very nice feature called TTL, where you can expire a key after a certain amount of time. This is useful to determine the state of a node, where the key/value entry only exists if an alive node reports to it. As a result, the node won’t have to connect to each other to determine state (which is very troublesome in a dynamic environment) when forming a cluster. For example, consider the following keys:

    {"createdIndex": 10074,"expiration": "2016-11-29T10:55:35.218496083Z","key": "/galera/my_wsrep_cluster/10.255.0.7/wsrep_last_committed","modifiedIndex": 10074,"ttl": 10,"value": "2881"
    },
    {"createdIndex": 10072,"expiration": "2016-11-29T10:55:34.650574629Z","key": "/galera/my_wsrep_cluster/10.255.0.7/wsrep_local_state_comment","modifiedIndex": 10072,"ttl": 10,"value": "Synced"
    }

After 10 seconds (ttl value), those keys will be removed from the entry. Basically, all nodes should report to etcd periodically with an expiring key. Container should report every N seconds when it's alive (wsrep_cluster_state_comment=Synced and wsrep_last_committed=#value) via a background process. If a container is down, it will no longer send the update to etcd, thus the keys are removed after expiration. This simply indicates that the node was registered but is no longer synced with the cluster. It will be skipped when constructing the Galera communication address at a later point.

The overall flow of joining procedure is illustrated in the following flow chart:

We have built a Docker image that follows the above. It is specifically built for running Galera Cluster using Docker’s orchestration tool. It is available at Docker Hub and our Github repository. It requires an etcd cluster as the discovery service (supports multiple etcd hosts) and based on Percona XtraDB Cluster 5.6. The image includes Percona Xtrabackup, jq (JSON processor) and also a shell script tailored for Galera health check called report_status.sh.

You are welcome to fork or contribute to the project. Any bugs can be reported via Github or via our support page.

Deploying etcd Cluster

etcd is a distributed key value store that provides a simple and efficient way to store data across a cluster of machines. It’s open-source and available on GitHub. It provides shared configuration and service discovery. A simple use-case is to store database connection details or feature flags in etcd as key value pairs. It gracefully handles leader elections during network partitions and will tolerate machine failures, including the leader.

Since etcd is the brain of the setup, we are going to deploy it as a cluster daemon, on three nodes, instead of using containers. In this example, we are going to install etcd on each of the Docker hosts and form a three-node etcd cluster for better availability.

We used CentOS 7 as the operating system, with Docker v1.12.3, build 6b644ec. The deployment steps in this blog post are basically similar to the one used in our previous blog post.

  1. Install etcd packages:

    $ yum install etcd
  2. Modify the configuration file accordingly depending on the Docker hosts:

    $ vim /etc/etcd/etcd.conf

    For docker1 with IP address 192.168.55.111:

    ETCD_NAME=etcd1
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.111:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"

    For docker2 with IP address 192.168.55.112:

    ETCD_NAME=etcd2
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.112:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"

    For docker3 with IP address 192.168.55.113:

    ETCD_NAME=etcd3
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"
  3. Start the service on docker1, followed by docker2 and docker3:

    $ systemctl enable etcd
    $ systemctl start etcd
  4. Verify our cluster status using etcdctl:

    [docker3 ]$ etcdctl cluster-health
    member 2f8ec0a21c11c189 is healthy: got healthy result from http://0.0.0.0:2379
    member 589a7883a7ee56ec is healthy: got healthy result from http://0.0.0.0:2379
    member fcacfa3f23575abe is healthy: got healthy result from http://0.0.0.0:2379
    cluster is healthy

That’s it. Our etcd is now running as a cluster on three nodes. The below illustrates our architecture:

Deploying Galera Cluster

Minimum of 3 containers is recommended for high availability setup. Thus, we are going to create 3 replicas to start with, it can be scaled up and down afterwards. Running standalone is also possible with standard "docker run" command as shown further down.

Before we start, it’s a good idea to remove any sort of keys related to our cluster name in etcd:

$ etcdctl rm /galera/my_wsrep_cluster --recursive

Ephemeral Storage

This is a recommended way if you plan on scaling the cluster out on more nodes (or scale back by removing nodes). To create a three-node Galera Cluster with ephemeral storage (MySQL datadir will be lost if the container is removed), you can use the following command:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Persistent Storage

To create a three-node Galera Cluster with persistent storage (MySQL datadir persists if the container is removed), add the mount option with type=volume:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--mount type=volume,source=galera-vol,destination=/var/lib/mysql \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Custom my.cnf

If you would like to include a customized MySQL configuration file, create a directory on the physical host beforehand:

$ mkdir /mnt/docker/mysql-config # repeat on all Docker hosts

Then, use the mount option with “type=bind” to map the path into the container. In the following example, the custom my.cnf is located at /mnt/docker/mysql-config/my-custom.cnf on each Docker host:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--mount type=volume,source=galera-vol,destination=/var/lib/mysql \
--mount type=bind,src=/mnt/docker/mysql-config,dst=/etc/my.cnf.d \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Wait for a couple of minutes and verify the service is running (CURRENT STATE = Running):

$ docker service ls mysql-galera
ID                         NAME            IMAGE               NODE           DESIRED STATE  CURRENT STATE           ERROR
2vw40cavru9w4crr4d2fg83j4  mysql-galera.1  severalnines/pxc56  docker1.local  Running        Running 5 minutes ago
1cw6jeyb966326xu68lsjqoe1  mysql-galera.2  severalnines/pxc56  docker3.local  Running        Running 12 seconds ago
753x1edjlspqxmte96f7pzxs1  mysql-galera.3  severalnines/pxc56  docker2.local  Running        Running 5 seconds ago

External applications/clients can connect to any Docker host IP address or hostname on port 3306, requests will be load balanced between the Galera containers. The connection gets NATed to a Virtual IP address for each service "task" (container, in this case) using the Linux kernel's built-in load balancing functionality, IPVS. If the application containers reside in the same overlay network (galera-net), then use the assigned virtual IP address instead. You can retrieve it using the inspect option:

$ docker service inspect mysql-galera -f "{{ .Endpoint.VirtualIPs }}"
[{89n5idmdcswqqha7wcswbn6pw 10.255.0.2/16} {1ufbr56pyhhbkbgtgsfy9xkww 10.0.0.2/24}]

Our architecture is now looking like this:

As a side note, you can also run Galera in standalone mode. This is probably useful for testing purposes like backup and restore, testing the impact of queries and so on. To run it just like a standalone MySQL container, use the standard docker run command:

$ docker run -d \
-p 3306 \
--name=galera-single \
-e MYSQL_ROOT_PASSWORD=mypassword \
-e DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
-e CLUSTER_NAME=my_wsrep_cluster \
-e XTRABACKUP_PASSWORD=mypassword \
severalnines/pxc56
ClusterControl
Single Console for Your Entire Database Infrastructure
Deploy, manage, monitor, scale your databases on the technology stack of your choice!

Scaling the Cluster

There are two ways you can do scaling:

  1. Use “docker service scale” command.
  2. Create a new service with same CLUSTER_NAME using “docker service create” command.

Docker’s “scale” Command

The scale command enables you to scale one or more services either up or down to the desired number of replicas. The command will return immediately, but the actual scaling of the service may take some time. Galera needs to be run an odd number of nodes to avoid network partitioning.

So a good number to scale to would be 5 and so on:

$ docker service scale mysql-galera=5

Wait for a couple of minutes to let the new containers reach the desired state. Then, verify the running service:

$ docker service ls
ID            NAME          REPLICAS  IMAGE               COMMAND
bwvwjg248i9u  mysql-galera  5/5       severalnines/pxc56

One drawback of using this method is that you have to use ephemeral storage because Docker will likely schedule the new containers on a Docker host that already has a Galera container running. If this happens, the volume will overlap the existing Galera containers’ volume. If you would like to use persistent storage and scale in Docker Swarm mode, you should create another new service with a couple of different options, as described in the next section.

At this point, our architecture looks like this:

Another Service with Same Cluster Name

Another way to scale is to create another service with the same CLUSTER_NAME and network. However, you can’t really use the exact same command as the first one due to the following reasons:

  • The service name should be unique.
  • The port mapping must be other than 3306, since this port has been assigned to the mysql-galera service.
  • The volume name should be different to distinguish them from the existing Galera containers.

A benefit of doing this is you will got another virtual IP address assigned to the “scaled” service. This allows you to have an additional option for your application or client to connect to the “scaled” IP address for various tasks, e.g. perform a full backup in desync mode, database consistency check or server auditing.

The following example shows the command to add two more nodes to the cluster in a new service called mysql-galera-scale:

$ docker service create \
--name mysql-galera-scale \
--replicas 2 \
-p 3307:3306 \
--network galera-net \
--mount type=volume,source=galera-scale-vol,destination=/var/lib/mysql \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

If we look into the service list, here is what we see:

$ docker service ls
ID            NAME                REPLICAS  IMAGE               COMMAND
0ii5bedv15dh  mysql-galera-scale  2/2       severalnines/pxc56
71pyjdhfg9js  mysql-galera        3/3       severalnines/pxc56

And when you look into the cluster size on one of the container, you should get 5:

[root@docker1 ~]# docker exec -it $(docker ps | grep mysql-galera | awk {'print $1'}) mysql -uroot -pmypassword -e 'show status like "wsrep_cluster_size"'
Warning: Using a password on the command line interface can be insecure.
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 5     |
+--------------------+-------+

At this point, our architecture looks like this:

To get a clearer view of the process, we can simply look at the MySQL error log file (located under Docker’s data volume) on one of the running containers, for example:

$ tail -f /var/lib/docker/volumes/galera-vol/_data/error.log

Scale Down

Scaling down is simple. Just reduce the number of replicas or remove the service that holds the minority number of containers to ensure that Galera is still in quorum. For example, if you have fired two groups of nodes with 3 + 2 containers and reach total of 5, the majority need to survive thus you can only remove the second group with 2 containers. If you have three groups with 3 + 2 + 2 containers, you can lose a maximum of 3 containers. This is due to the fact that the Docker Swarm scheduler simply terminates and removes the containers corresponding to the service. This makes Galera think that there are nodes failing, as they are not shut down in a graceful way.

If you scaled up using “docker service scale” command, you should scale down using the same method by reducing the number of replicas. To scale it down, simply do:

$ docker service scale mysql-galera=3

Otherwise, if you chose to create another service to scale up, then simply remove the respective service to scale down:

$ docker service rm mysql-galera-scale

Known Limitations

There will be no automatic recovery if a split-brain happens (where all nodes are in Non-Primary state). This is because the MySQL service is still running, yet it will refuse to serve any data and will return error to the client. Docker has no capability to detect this since what it cares about is the foreground MySQL process which is not terminated, killed or stopped. Automating this process is risky, especially if the service discovery is co-located with the Docker host (etcd would also lose contact with other members). Although if the service discovery is healthy somewhere else, it is probably unreachable from the Galera containers perspective, preventing each other to see the container’s status correctly during the glitch.

In this case, you will need to intervene manually.

Choose the most advanced node to bootstrap and then run the following command to promote the node as Primary (other nodes shall then rejoin automatically if the network recovers):

$ docker exec -it [container ID] mysql -uroot -pyoursecret -e 'set global wsrep_provider_option="pc.bootstrap=1"'

Also, there is no automatic cleanup for the discovery service registry. You can remove all entries using either the following command (assuming the CLUSTER_NAME is my_wsrep_cluster):

$ curl http://192.168.55.111:2379/v2/keys/galera/my_wsrep_cluster?recursive=true -XDELETE # or
$ etcdctl rm /galera/my_wsrep_cluster --recursive

Conclusion

This combination of technologies opens a door for a more reliable database setup in the Docker ecosystem. Working with service discovery to store state makes it possible to have stateful containers to achieve a homogeneous setup.

In the next blog post, we are going to look into how to manage Galera Cluster on Docker.

Updated - How to Bootstrap MySQL or MariaDB Galera Cluster

$
0
0

Unlike standard MySQL server and MySQL Cluster, the way to start a MySQL/MariaDB Galera Cluster is a bit different. Galera requires you to start a node in a cluster as a reference point, before the remaining nodes are able to join and form the cluster. This process is known as cluster bootstrap. Bootstrapping is an initial step to introduce a database node as primary component, before others see it as a reference point to sync up data.

How does it work?

When Galera starts with the bootstrap command on a node, that particular node will reach Primary state (check the value of wsrep_cluster_status). The remaining nodes will just require a normal start command and they will automatically look for existing Primary Component (PC) in the cluster and join to form a cluster. Data synchronization then happens through either incremental state transfer (IST) or snapshot state transfer (SST) between the joiner and the donor.

So basically, you should only bootstrap the cluster if you want to start a new cluster or when no other nodes in the cluster is in PRIMARY state. Care should be taken when choosing the action to take, or else you might end up with split clusters or loss of data.

The following example scenarios illustrate when to bootstrap the a three-node cluster based on node state (wsrep_local_state_comment) and cluster state (wsrep_cluster_status):

Galera StateBootstrap Flow
  1. Restart the INITIALIZED node.
  1. Restart the INITIALIZED node.
  2. Once done, start the new node.
  1. Bootstrap the most advanced node using “pc.bootstrap=1”.
  2. Restart the remaining nodes, one node at a time.
  1. Start the new node.
  1. Start the new node, one node at a time.
  1. Bootstrap any node.
  2. Start the remaining nodes, one node at a time.

How to start Galera cluster?

The 3 Galera vendors use different bootstrapping commands (based on the software’s latest version). On the first node, run:

  • MySQL Galera Cluster (Codership):

    $ service mysql bootstrap # sysvinit
    $ galera_new_cluster # systemd
    $ mysqld_safe --wsrep-new-cluster # command line
  • Percona XtraDB Cluster (Percona):

    $ service mysql bootstrap-pxc # sysvinit
    $ systemctl start mysql@bootstrap.service # systemd
  • MariaDB Galera Cluster (MariaDB):

    $ service mysql bootstrap # sysvinit
    $ service mysql start --wsrep-new-cluster # sysvinit
    $ galera_new_cluster # systemd
    $ mysqld_safe --wsrep-new-cluster # command line

The above command is just a wrapper and what it actually does is to start the MySQL instance on that node with gcomm:// as the wsrep_cluster_address variable. You can also manually define the variables inside my.cnf and run the standard start/restart command. However, do not forget to change wsrep_cluster_address back again to contain the addresses to all nodes after the start.

When the first node is live, run the following command on the subsequent nodes:

$ service mysql start
$ systemctl start mysql

The new node connects to the cluster members as defined by the wsrep_cluster_address parameter. It will now automatically retrieve the cluster map and connect to the rest of the nodes and form a cluster.

Warning: Never bootstrap when you want to reconnect a node to an existing cluster, and NEVER run bootstrap on more than one node.

Safe-to-Bootstrap Flag

Galera starting with version 3.19 comes with a new flag called “safe_to_bootstrap” inside grastate.dat. This flag facilitates the decision and prevent unsafe choices by keeping track of the order in which nodes are being shut down. The node that was shut down last will be marked as “Safe-to-Bootstrap”. All the other nodes will be marked as unsafe to bootstrap from.

Looking at grastate.dat (default is under MySQL datadir) content and you should notice the flag on the last line:

# GALERA saved state
version: 2.1
uuid:    8bcf4a34-aedb-14e5-bcc3-d3e36277729f
seqno:   2575
safe_to_bootstrap: 0

When bootstrapping the new cluster, Galera will refuse to start the first node that was marked as unsafe to bootstrap from. You will see the following message in the logs:

“It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates.

To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .”

In case of unclean shutdown or hard crash, all nodes will have “safe_to_bootstrap: 0”, so we have to consult the InnoDB storage engine to determine which node has committed the last transaction in the cluster. This can be achieved by starting mysqld with the “--wsrep-recover” variable on each of the nodes, which produces an output like this:

$ mysqld --wsrep-recover
...
2016-11-18 01:42:15 36311 [Note] InnoDB: Database was not shutdown normally!
2016-11-18 01:42:15 36311 [Note] InnoDB: Starting crash recovery.
...
2016-11-18 01:42:16 36311 [Note] WSREP: Recovered position: 8bcf4a34-aedb-14e5-bcc3-d3e36277729f:114428
...

The number after the UUID string on the "Recovered position" line is the one to look for. Pick the node that has the highest number and edit its grastate.dat to set “safe_to_bootstrap: 1”, as shown in the example below:

# GALERA saved state
version: 2.1
uuid:    8bcf4a34-aedb-14e5-bcc3-d3e36277729f
seqno:   -1
safe_to_bootstrap: 1

You can then perform the standard bootstrap command on the chosen node.

What if the nodes have diverged?

In certain circumstances, nodes can have diverged from each other. The state of all nodes might turn into Non-Primary due to network split between nodes, cluster crash, or if Galera hit an exception when determining the Primary Component. You will then need to select a node and promote it to be a Primary Component.

To determine which node needs to be bootstrapped, compare the wsrep_last_committed value on all DB nodes:

node1> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed | 10032       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
node2> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed | 10348       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
node3> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed |   997       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+

From above outputs, node2 has the most up-to-date data. In this case, all Galera nodes are already started, so you don’t necessarily need to bootstrap the cluster again. We just need to promote node2 to be a Primary Component:

node2> SET GLOBAL wsrep_provider_options="pc.bootstrap=1";

The remaining nodes will then reconnect to the Primary Component (node2) and resync their data based on this node.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

If you are using ClusterControl (try it for free), you can determine the wsrep_last_committed and wsrep_cluster_status directly from the ClusterControl > Overview page:

Or from ClusterControl > Performance > DB Status page:

Updated - Full Restore of a MySQL or MariaDB Galera Cluster from Backup

$
0
0

Performing regular backups of your database cluster is imperative for high availability and disaster recovery. If for any reason you lost your entire cluster and had to do a full restore from backup, you would need a reliable and up-to-date backup to start from.

Best Practices for Backups

Some recommendations to consider for a good scheduled backup regime:

  • You should be able to completely recover from a catastrophic failure from at least two previous full backups. Just in case the most recent full backup is damaged, lost, or corrupt,
  • Your backup should contain at least one full backup within a chosen cycle, normally weekly,
  • Store backups away from the current data location, preferably off site,
  • Use a mixture of mysqldump and Xtrabackup for extra safety, and not rely on one method,
  • Test restore your backups on a regular basis, e.g. every two months.

A weekly full backup combined with daily incremental backup is normally enough. Keeping a number of backups for a period of time is always a good plan, maybe keep each weekly backup for one month. This allows you to recover an older database in case of emergencies or if for some reason you have local backup file corruption.

mysqldump or Xtrabackup

mysqldump is very likely the most popular way of backing up MySQL. It does a logical backup of the data, reading from each table using SQL statements then exporting the data into text files. Restoration of a mysqldump is as easy as creating the dump file. The main drawbacks are that it is very slow for large databases, it is not ‘hot’ and it wipes out the InnoDB buffer pool.

Xtrabackup performs hot backups, does not lock the database during the backup and is generally faster. Hot backups are important for high availability, as they run without blocking the application. This is also an important factor when used with Galera, as Galera relies on synchronous replication. However, restoring an xtrabackup can be a little tricky using manual ways.

ClusterControl supports the scheduling of both mysqldump and Xtrabackup (full and incremental), as well as the backup restoration right from the UI.

Full Restore from Backup

In this post, we will show you how to restore Xtrabackup (full + incremental) onto an empty cluster running on MariaDB Galera Cluster. These steps should also work on Percona XtraDB Cluster or Galera Cluster for MySQL from Codership.

In our original cluster, we had a full xtrabackup scheduled daily, with incremental backups created every hour. The backups are stored on ClusterControl as shown in the following screenshot:

Now, let’s assume we have lost our original cluster and have to do a full restore onto a new cluster. The steps include:

  1. Set up a new ClusterControl server.
  2. Set up a new MariaDB Cluster.
  3. Export the backup records and files to the new ClusterControl server.
  4. Start the restoration process.
  5. Start the remaining nodes.

The following diagram illustrates our architecture for this exercise:

Step 1 - Set up New MariaDB Cluster

Install ClusterControl and deploy a new MariaDB Cluster. Go to ClusterControl -> Deploy -> Deploy Database Cluster -> MySQL Galera and specify the required information in the deployment dialog:

Click on the Deploy button and start the deployment. Since we only have a cluster on the old server so the cluster ID should be identical (cluster ID: 1) in this new instance.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Step 2 - Export and import the backup files

Once the cluster is deployed, we will have to import the backups from the old ClusterControl server into the new one. First, export the content of cmon.backup_records to dump files. Since the old cluster ID and the new one is identical, we just need to modify the dump file with the new IP address and import it to the new ClusterControl node. If the cluster ID is different, then you have to change “cid” value accordingly inside the dump files before importing into CMON DB on the new node. Also, it is easier to keep the backup storage location as in the old server so the new ClusterControl can locate the backup files in the new server.

On the old ClusterControl server, export the backup_records table into dump files:

$ mysqldump -uroot -p --single-transaction --no-create-info cmon backup_records > backup_records.sql

Then, perform remote copy of the backup files from the old server into the new ClusterControl server:

$ scp -r /root/backups 192.168.55.150:/root/
$ scp ~/backup_records.sql 192.168.55.150:~

Next is to modify the dump files to reflect the new ClusterControl server IP address. Don’t forget to escape the dot in the IP address:

$ sed -i "s/192\.168\.55\.170/192\.168\.55\.150/g" backup_records.sql

On the new ClusterControl server, import the dump files:

$ mysql -uroot -p cmon < backup_records.sql

Verify that the backup list is correct in the new ClusterControl server:

As you can see, all occurences of the previous IP address (192.168.55.170) have been replaced by the new IP address (192.168.55.150). Now we are ready to perform the restoration in the new server.

Step 3 - Perform the Restoration

Performing restoration through the ClusterControl UI is a simple point-and-click step. Choose which backup to restore and click on the “Restore” button. We are going to restore the latest incremental backup available (Backup: 9). Click on the “Restore” button just below the backup name and you will be presented with the following pre-restoration dialog:

Looks like the backup size is pretty small (165.6 kB). It doesn’t really matter because ClusterControl will prepare all incremental backups grouped under Backup Set 6, which holds the full Xtrabackup. You also have several post-restoration options:

  • Restore backup on - Choose the node to restore the backup on.
  • Tmp Dir - Directory will be used on the local ClusterControl server as temporary storage during backup preparation. It must be as big as the estimated MySQL data directory.
  • Bootstrap cluster from the restored node - Since this is a new cluster, we are going to toggle this ON so ClusterControl will bootstrap the cluster automatically after the restoration succeeds.
  • Make a copy of the datadir before restoring the backup - If the restored data is corrupted or not as what you are expected it to be, you will have a backup of the previous MySQL data directory. Since this is a new cluster, we are going to ignore this one.

Percona Xtrabackup restoration will cause the cluster to be stopped. ClusterControl will:

  1. Stop all nodes in the cluster.
  2. Restore the backup on the selected node.
  3. Bootstrap the selected node.

To see the restoration progress, go to Activity -> Jobs -> Restore Backup and click on the “Full Job Details” button. You should see something like this:

One important thing that you need to do is to monitor the output of the MySQL error log on the target node (192.168.55.151) during the restoration process. After the restoration completes and during the bootstrapping process, you should see the following lines starting to appear:

Version: '10.1.22-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
2017-04-07 18:03:51 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:51 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:51 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:52 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:53 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:54 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:55 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)

Don’t panic. This is an expected behaviour because this backup set doesn’t store the cmon login credentials of the new ClusterControl cmon password. It has restored/replaced the old cmon user instead. What you need to do is to re-grant cmon user back to the server by running the following statement on this DB node:

GRANT ALL PRIVILEGES ON *.* to cmon@'192.168.55.150' IDENTIFIED BY 'mynewCMONpassw0rd' WITH GRANT OPTION;
FLUSH PRIVILEGES;

ClusterControl then would be able to connect to the bootstrapped node and determine the node and backup state. If everything is OK, you should see something like this:

At this point, the target node is bootstrapped and running. We can start the remaining nodes under Nodes -> choose node -> Start Node and check the “Perform an Initial Start” checkbox:

The restoration is now complete and you can expect Performance -> DB Growth to report the updated size of our newly restored data set :

Happy restoring!


ClusterControl Product Video for Galera Clusters

$
0
0

In this video Art van Scheppingen shows you how easy it is to add your Galera setups to ClusterControl. ClusterControl provides you with a single console to deploy, manage, monitor, and scale your Galera setups and mixed environment. In addition to advanced monitoring, ClusterControl affords feature benefits like comprehensive security, automation, reporting, and point-and-click deployments.

Watch the video and learn more about the following topics:

  • Deploying a new Galera cluster in ClusterControl
  • Importing an existing Galera Cluster into ClusterControl
  • Navigating the Cluster Overview section in ClusterControl
  • Accessing your Galera monitoring dashboards in ClusterControl
  • Accessing your Galera node data tables in ClusterControl
  • Accessing server statistics in ClusterControl
  • Navigating the nodes section in ClusterControl
  • Enabling binary logging from your Galera nodes in ClusterControl
  • Using advisors in your Galera setup
  • Performing backups
  • Access log files for
  • Adding new nodes to your Galera Clusters
  • Adding Asynchronous slaves
  • Cloning Clusters in ClusterControl
  • Load balancing for HAproxy, ProxySQL, MaxScale
  • Galera arbitrator in ClusterControl

How to Set Up Asynchronous Replication from Galera Cluster to Standalone MySQL server with GTID

$
0
0

Hybrid replication, i.e. combining Galera and asynchronous MySQL replication in the same setup, became much easier since GTID got introduced in MySQL 5.6. Although it was fairly straightforward to replicate from a standalone MySQL server to a Galera Cluster, doing it the other way round (Galera → standalone MySQL) was a bit more challenging. At least until the arrival of GTID.

There are a few good reasons to attach an asynchronous slave to a Galera Cluster. For one, long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. In a belts and suspenders approach, an asynchronous slave can also serve as a remote live backup.

In this blog post, we will show you how to replicate a Galera Cluster to a MySQL server with GTID, and how to failover the replication in case the master node fails.

Hybrid Replication in MySQL 5.5

In MySQL 5.5, resuming a broken replication requires you to determine the last binary log file and position, which are distinct on all Galera nodes if binary logging is enabled. We can illustrate this situation with the following figure:

Galera cluster asynchronous slave topology without GTID
Galera cluster asynchronous slave topology without GTID

If the MySQL master fails, replication breaks and the slave will need to switch over to another master. You will need pick a new Galera node, and manually determine a new binary log file and position of the last transaction executed by the slave. Another option is to dump the data from the new master node, restore it on slave and start replication with the new master node. These options are of course doable, but not very practical in production.

How GTID Solves the Problem

GTID (Global Transaction Identifier) provides a better transactions mapping across nodes, and is supported in MySQL 5.6. In Galera Cluster, all nodes will generate different binlog files. The binlog events are the same and in the same order, but the binlog file names and offsets may vary. With GTID, slaves can see a unique transaction coming in from several masters and this could easily being mapped into the slave execution list if it needs to restart or resume replication.

Galera cluster asynchronous slave topology with GTID failover
Galera cluster asynchronous slave topology with GTID failover

All necessary information for synchronizing with the master is obtained directly from the replication stream. This means that when you are using GTIDs for replication, you do not need to include MASTER_LOG_FILE or MASTER_LOG_POS options in the CHANGE MASTER TO statement. Instead it is necessary only to enable the MASTER_AUTO_POSITION option. You can find more details about the GTID in the MySQL Documentation page.

Setting Up Hybrid Replication by hand

Make sure the Galera nodes (masters) and slave(s) are running on MySQL 5.6 before proceeding with this setup. We have a database called sbtest in Galera, which we will replicate to the slave node.

1. Enable required replication options by specifying the following lines inside each DB node’s my.cnf (including the slave node):

For master (Galera) nodes:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=1         # 1 for master1, 2 for master2, 3 for master3
binlog_format=ROW

For slave node:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=101         # 101 for slave
binlog_format=ROW
replicate_do_db=sbtest
slave_net_timeout=60

2. Perform a cluster rolling restart of the Galera Cluster (from ClusterControl UI > Manage > Upgrade > Rolling Restart). This will reload each node with the new configurations, and enable GTID. Restart the slave as well.

3. Create a slave replication user and run following statement on one of the Galera nodes:

mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave'@'%' IDENTIFIED BY 'slavepassword';

4. Log into the slave and dump database sbtest from one of the Galera nodes:

$ mysqldump -uroot -p -h192.168.0.201 --single-transaction --skip-add-locks --triggers --routines --events sbtest > sbtest.sql

5. Restore the dump file onto the slave server:

$ mysql -uroot -p < sbtest.sql

6. Start replication on the slave node:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.201', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

To verify that replication is running correctly, examine the output of slave status:

mysql> SHOW SLAVE STATUS\G
       ...
       Slave_IO_Running: Yes
       Slave_SQL_Running: Yes
       ...

Setting up Hybrid Replication using ClusterControl

In the previous paragraph we described all the necessary steps to enable the binary logs, restart the cluster node by node, copy the data and then setup replication. The procedure is a tedious task and you can easily make errors in one of these steps. In ClusterControl we have automated all the necessary steps.

1. For ClusterControl users, you can go to the nodes in the Nodes page and enable binary logging.

Enable binary logging on Galera cluster using ClusterControl
Enable binary logging on Galera cluster using ClusterControl

This will open a dialogue that allows you to set the binary log expiration, enable GTID and auto restart.

Enable binary logging with GTID enabled
Enable binary logging with GTID enabled

This initiates a job, that will safely write these changes to the configuration, create replication users with the proper grants and restart the node safely.

Photo description

Repeat this process for each Galera node in the cluster, until all nodes indicate they are master.

All Galera Cluster nodes are now master
All Galera Cluster nodes are now master

2. Add the asynchronous replication slave to the cluster

Adding an asynchronous replication slave to Galera Cluster using ClusterControl
Adding an asynchronous replication slave to Galera Cluster using ClusterControl

And this is all you have to do. The entire process described in the previous paragraph has been automated by ClusterControl.

Changing Master

If the designated master goes down, the slave will retry to reconnect again in slave_net_timeout value (our setup is 60 seconds - default is 1 hour). You should see following error on slave status:

       Last_IO_Errno: 2003
       Last_IO_Error: error reconnecting to master 'slave@192.168.0.201:3306' - retry-time: 60  retries: 1

Since we are using Galera with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

If you wish to perform the failover manually, simply change the master node as follows:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

In some cases, you might encounter a “Duplicate entry .. for key” error after the master node changed:

       Last_Errno: 1062
       Last_Error: Could not execute Write_rows event on table sbtest.sbtest; Duplicate entry '1089775' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysqld-bin.000009, end_log_pos 85789000

In older versions of MySQL, you can just use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = n to skip statements, but it does not work with GTID. Miguel from Percona wrote a great blog post on how to repair this by injecting empty transactions.

Another approach, for smaller databases, could also be to just get a fresh dump from any of the available Galera nodes, restore it and use RESET MASTER statement:

mysql> STOP SLAVE;
mysql> RESET MASTER;
mysql> DROP SCHEMA sbtest; CREATE SCHEMA sbtest; USE sbtest;
mysql> SOURCE /root/sbtest_from_galera2.sql; -- repeat step #4 above to get this dump
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

You may also use pt-table-checksum to verify the replication integrity, more information in this blog post.

Note: Since in MySQL replication the slave applier is by default still single-threaded, do not expect the async replication performance to be the same as Galera’s parallel replication. For MySQL 5.6 and 5.7 there are options to make the asynchronous replication executed in parallel on the slave nodes, but in principle this replication is still depending on the correct order of transactions inside the same schema to happen. If the replication load is intensive and continuous, the slave lag will just keep growing. We have seen cases where slave could never catch up with the master.

How to Deploy Asynchronous Replication Slave to MariaDB Galera Cluster 10.x using ClusterControl

$
0
0

Combining Galera and asynchronous replication in the same MariaDB setup, aka Hybrid Replication, can be useful - e.g. as a live backup node in a remote datacenter or reporting/analytics server. We already blogged about this setup for Codership/Galera or Percona XtraDB Cluster users, but a master failover as described in that post does not work for MariaDB because of its different GTID approach. In this post, we will show you how to deploy an asynchronous replication slave to MariaDB Galera Cluster 10.x (with master failover!), using GTID with ClusterControl.

Preparing the Master

First and foremost, you must ensure that the master and slave nodes are running on MariaDB Galera 10.0.2 or later. A MariaDB replication slave requires at least one master with GTID among the Galera nodes. However, we would recommend users to configure all the MariaDB Galera nodes as masters. GTID, which is automatically enabled in MariaDB, will be used to do master failover.The following must be true for the masters:

  • At least one master among the Galera nodes
  • All masters must be configured with the same domain ID
  • log_slave_updates must be enabled
  • All masters’ MariaDB port is accessible by ClusterControl and slaves
  • Must be running MariaDB version 10.0.2 or later

From ClusterControl this is easily done by selecting Enable Binary Logging in the drop down for each node.

Enabling binary logging through ClusterControl
Enabling binary logging through ClusterControl

And then enable GTID in the dialogue:

Once Proceed has been clicked, a job will automatically configure the Galera node according to the settings described earlier.

If you wish to perform this action by hand, you can configure a Galera node as master, by changing the MariaDB configuration file for that node as per below:

gtid_domain_id=<must be same across all mariadb servers participating in replication>
server_id=<must be unique>
binlog_format=ROW
log_slave_updates=1
log_bin=binlog

After making these changes, restart the nodes one by one or using a rolling restart (ClusterControl > Manage > Upgrades > Rolling Restart)

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Preparing the Slave

For the slave, you would need a separate host or VM with or without MariaDB installed. If you do not have MariaDB installed, you need to perform the following tasks; configure root password (based on monitored_mysql_root_password), create slave user (based on repl_user, repl_password), configure MariaDB, start the server and finally start replication.

Adding the slave using ClusterControl, all these steps will be automated in the Add Replication Slave job as described below.

Add replication slave to MariaDB Cluster
Add replication slave to MariaDB Cluster

After adding our slave node, our deployment will look like this:

MariaDB Galera asynchronous slave topology
MariaDB Galera asynchronous slave topology

Master Failover and Recovery

Since we are using MariaDB with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

Automatic slave failover to another master in Galera cluster
Automatic slave failover to another master in Galera cluster

This way ClusterControl will add a robust asynchronous slave capability to your MariaDB Cluster!

ClusterControl for Galera Cluster for MySQL

$
0
0

ClusterControl allows you to easily manage your database infrastructure on premise or in the cloud. With in-depth support for technologies like Galera Cluster for MySQL and MariaDB setups, you can truly automate mixed environments for next-level applications.

Since the launch of ClusterControl in 2012, we’ve experienced growth in new industries with customers who are benefiting from the advancements ClusterControl has to offer - in particular when it comes to Galera Cluster for MySQL.

In addition to reaching new highs in ClusterControl demand, this past year we’ve doubled the size of our team allowing us to continue to provide even more improvements to ClusterControl.

Take a look at this infographic for our top Galera Cluster for MySQL resources and information about how ClusterControl works with Galera Cluster.

Press Release: Severalnines kicks off online European football streaming

$
0
0

Award-winning database management platform scores deal with continent’s largest online video solutions provider

Stockholm, Sweden and anywhere else in the world - 28/06/2016 - Severalnines, Europe’s leading database performance management provider, today announced its latest customer, StreamAMG (Advanced Media Group), a UK-based pioneer in the field of bespoke online video streaming and content management. StreamAMG is Europe’s largest player in online video solutions, helping football teams such as Liverpool FC, Aston Villa, Sunderland AFC and the BBC keep fans watching from across the world.

Long hailed as the future of online content, analysts predict that 90% of all consumer internet traffic will be video by 2019. This poses a challenge to streaming providers, both in terms of the amount of online video data to handle and the variety of ways the content is consumed. Customers expect a seamless viewing experience across any device on any operating system. Downtime, lag or disturbances to streaming can have serious repercussions for customer loyalty. Streaming providers should provide a secure and reliable media platform to maintain the interest of fans and attract new viewers, casting database performance in a starring role.

Founded in 2001, StreamAMG builds bespoke solutions for its customers to host and manage online video content. Its software delivers the high-availability needed for on-demand streaming or live broadcasting on any device. Loss of customer trust and damage to brand reputation are likely consequences of database failures, especially for those companies which operate in the online sports, betting and gaming industries.

Growing at 30% year on year required StreamAMG to have a scalable IT system to meet new customer demands and to maintain its leadership position in the market. StreamAMG reviewed its database performance as part of its IT infrastructure renewal project for to encompass new online channels, such as social media, and embedding marketing analytics to help its customers better understand and react to customer behaviour. It needed a solution to monitor and optimise its database management system and the detailed metrics to predict database failures.

After reviewing options provided by Oracle and AWS, amongst others, StreamAMG chose Severalnines to help future-proof its databases. The previous environment, based on a master-slave replication topology, was replaced with a multi-master Galera Cluster; and Severalnines’ ClusterControl platform was applied to automate operational tasks and provide visibility of uptime and performance through monitoring capabilities.

Thom Holliday, Marketing Manager StreamAMG, said: “With ClusterControl in place, StreamAMG’s flagship product is now backed with a fully automated database infrastructure which allows us to ensure excellent uptime. Severalnines increased our streaming speed by 76% and this has greatly improved the delivery of content to our customers. The implementation took only two months to complete and saved us 12% in costs. Expanding the current use of ClusterControl is definitely in the pipeline and we would love to work with Severalnines to develop new features.”

Vinay Joosery, Severalnines Founder and CEO, said: “Online video streaming is growing exponentially, and audiences expect quality, relevant content and viewing experiences tailor-made for each digital platform. I’m a big football fan myself and like to stay up to date with games whenever I can. Right now I’m following the European Championships and online streaming is key so I can watch the matches wherever I am. New types of viewerships place certain requirements on modern streaming platforms to create experiences that align with consumer expectations. StreamAMG is leading the way there, and helps its customers monetise online channels through a solidly architected video platform. We’re happy to be part of this.“

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular online database configurator. Currently counting BT, Orange, Cisco, CNRS, Technicolour, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit: http://www.severalnines.com/customers

About StreamAMG

StreamAMG helps businesses manage their online video solutions, such as Hosting video, integrating platforms, monetizing content and delivering live events. Since 2001, it has enabled clients across Europe to communicate through webcasting by building online video solutions to meet their goals.

For more information visit: https://www.streamamg.com

Media Contact

Positive Marketing
Steven de Waal / Camilla Nilsson
severalnines@positivemarketing.com
0203 637 0647/0645

Viewing all 97 articles
Browse latest View live