High Availability Galera Clusters for Indonesia’s leading eCommerce Payment Gateway

September 18, 2015, 6:09 am

≫ Next: High Availability MySQL on cPanel with Galera Cluster

Indonesia, the fourth most populous country in the world, is emerging as an economic powerhouse in Asia Pacific. With 60% of a population of 250 million being under 30 years of age, growing internet penetration and greater spending power by the middle-class have catalyzed the eCommerce sector. The vast majority of Indonesians don’t own credit cards, so local eCommerce players have developed innovative payment solutions.

Veritrans was established in October 2012, and in under 3 years, has grown into one of the largest payment gateways in the country. The company processes payments (credit/debit card, internet and mobile banking) for over 1000 merchants, including businesses such as Garuda Indonesia (the national airline of Indonesia), Groupon Indonesia (Disdus) and Rakuten Indonesia. The company has a staff of 85 employees. Competitors include the likes of Doku, iPayMu, Pay88 and a dozen others. Doku alone processed $520 million worth of transactions in 2014, so there’s some serious money flowing through these gateways.

A Payment Gateway is a secure online link between a merchant and an acquiring bank. It protects payment details (e.g. credit card details) by encrypting the information and passing it securely between the customer and the merchant, and between the merchant and the payment processor.

All merchants accepting credit card payments online are required to comply with PCI DSS. Veritrans is a Level 1 service provider, which is the highest level of PCI DSS compliance.

“Our database is mission critical for our business, it is where we store masked and encrypted credit card data, transaction data, merchant and end customer information. With the amount of transactions flowing through our systems, we cannot afford any downtime, performance problems or security glitches. Our databases are fully replicated to a separate DR site. Having a management tool like ClusterControl has helped us achieve our goals.“, says Idris Khanafi, Head of Infrastructure at Veritrans.

Closely monitoring and managing the infrastructure is an integral part of Veritrans’ production deployments, and ClusterControl offers a rich set of features to help. ClusterControl collects detailed statistics on system, network, storage and database operations, and helps visualize data so the ops team can take quick and effective action. It also manages everything from anomaly detection, remediation of failures, rolling restarts, mix of full and incremental backups and more long term capacity planning. The ops team is now able to manage their entire infrastructure from one tool, instead of having to develop different scripts to manage their databases.

Paytrail, the leading ePayment provider in Finland, also uses ClusterControl to manage MariaDB Cluster across multiple datacenters.

Blog category:

Company News

Tags:

↧

High Availability MySQL on cPanel with Galera Cluster

September 28, 2015, 3:18 am

≫ Next: s9s Tools and Resources: 'Become a MySQL DBA' series, ClusterControl 1.2.11 release, and more!

≪ Previous: High Availability Galera Clusters for Indonesia’s leading eCommerce Payment Gateway

cPanel is a very popular Linux hosting control panel, used not only by hosting providers but also enterprise and goverment bodies. For large scale hosting environments hosting mission critical databases, how do you integrate cPanel with a high availability database?

In this blog post, we will show you how to integrate Percona XtraDB Cluster into cPanel to provide a high availability database hosting solution. By default, cPanel uses localhost to host the MySQL databases created by cPanel users. It is not uncommon to find MySQL to be the culprit when it comes to overloading the server. By having a remote MySQL server or cluster, we can offload some of the burden and increase the availability of other important services on the server.

Setting up a Galera cluster to integrate with cPanel requires you to meet the following requirements:

skip-name-resolve must be turned OFF, as some cPanel services authenticate through hostname. Setting up correct host definitions in /etc/hosts is vital.
MySQL tables should only use InnoDB. Some application installers like cpAddons (Site Software) and Softaculous create MyISAM tables in MySQL. These features should be used with extra precaution.
The network infrastructure must support multicast or unicast to allow floating virtual IP address. This provides a single access point that can failover between load balancers.
Before you set up a remote MySQL server, ensure that the remote server is able to resolve your local server's hostname to its IP address. To confirm this, log into the remote server via SSH and use the host command.
MySQL and Perl must already be installed on the remote server.
You must be able to connect via SSH from this server’s IP address to the remote server.
You must ensure the MySQL users create databases with the following criteria:
- Storage engine must be InnoDB only. Otherwise, the cluster might crash.
- All tables must have a primary key defined.
- Comply to Galera cluster limitations as described here.
It’s preferable to let load balancers redirect queries to a single node, to reduce the chance of deadlocks and make it more predictable.

Take note that if you can’t meet the mentioned requirements, then this would be a risky attempt.

Our architecture is illustrated in the following diagram:

We assume you already have WHM installed. We will use the total of 4 nodes as per below:

WHM - cpanel.domain.local - 192.168.1.200
MySQL Galera Cluster #1 + HAproxy #1 + Keepalived - g1.domain.local - 192.168.1.211
MySQL Galera Cluster #2 + HAproxy #2 + Keepalived - g2.domain.local - 192.168.1.212
ClusterControl + Garbd - cc.domain.local -192.168.1.215

WHM will then connect through a virtual IP address floating on galera1 and galera2. This provides failover in case one of the DB nodes goes down. This setup allows only one node to be down at one particular time.

Prerequisites

1. Ensure all hosts have the following host definition inside /etc/hosts:

192.168.1.200     cpanel.domain.local cpanel
192.168.1.210     mysql.domain.local  mysql
192.168.1.211     mysql1.domain.local mysql1    lb1
192.168.1.212     mysql2.domain.local mysql2    lb2
192.168.1.215     cc.domain.local     cc        garbd

2. Ensure each host has proper FQDN set up as per host definition above. For example on cpanel server:

$ hostname -f
cpanel.domain.local

Deploying Galera Cluster for MySQL

1. To set up Galera Cluster, go to the Galera Configurator to generate a deployment package. In the wizard, we used the following values when configuring our database cluster (note that we specified one of the DB nodes twice under Database Servers’ text field) :

Vendor                 : Percona XtraDB Cluster
Infrastructure         : on-premise
Operating System       : RHEL6/CentOS6
Skip DNS Resolve       : No
Number of Galera Servers : 3 + 1
Max connections        : 200 
OS user                : root 
ClusterControl Server  : 192.168.1.215
Galera Servers         : 192.168.1.211 192.168.1.212 192.168.1.212

Follow the wizard, a deployment package will be generated and emailed to you.

2. Download and extract the deployment package:

$ wget http://www.severalnines.com/galera-configurator3/tmp/wb06357009915302877/s9s-galera-codership-3.5.0.tar.gz
$ tar -xzf s9s-galera-percona-3.5.0.tar.gz

3. Before we proceed with the deployment, we need to perform some customization when deploying a two-node Galera cluster. Go to ~/s9s-galera-percona-3.5.0/mysql/config/cmon.cnf.controller and remove the repeated node IP address next to mysql_server_addresses so it becomes as below:

mysql_server_addresses=192.168.1.211,192.168.1.212

4. Now we are ready to start the deployment:

$ cd ~/s9s-galera-percona-3.5.0/mysql/scripts/install/ 
$ bash ./deploy.sh 2>&1 | tee cc.log

5. The database cluster deployment will take about 15 minutes, and once completed, the ClusterControl UI is accessible at https://192.168.1.215/clustercontrol. Enter the default admin email address and password on the welcome page and you should be redirected to the ClusterControl UI dashboard.

6. It is recommended to run Galera on at least three nodes. So, install garbd, a lightweight arbitrator daemon for Galera on the ClusterControl node from the ClusterControl UI. Go to Manage > Load Balancer > Install Garbd > choose the ClusterControl node IP address from the dropdown > Install Garbd.

You will now see Galera Cluster with garbd installed as per below:

Deploying HAProxy and Keepalived

1. Deploy HAProxy on mysql1, 192.168.1.211. Go to ClusterControl > Manage > Load Balancer > Install HAProxy and include both MySQL nodes into the load balancing set with Active/Backup role.

2. Deploy another HAProxy on mysql2, 192.168.1.212 with similar setup:

3. Once deployed, you should see the following statistics page if you go to ClusterControl > Nodes > select the HAProxy node:

The green line indicates the MySQL server is up and active. All database requests through this HAProxy instance, port 3307 will be forwarded to mysql1 (192.168.1.211) unless it’s down, where mysql2 (192.168.1.212) will take over the active role. This means, we are not implementing multi-master writes in this setup.

4. We can now deploy Keepalived which requires 2 HAproxy instances. Go to ClusterControl > Manage > Load Balancer > Install Keepalived and specify the virtual IP address as per below:

Now we are ready to integrate Galera Cluster into WHM/cPanel under Setup Remote MySQL Server feature. We will use the virtual IP address specified above as the remote MySQL endpoint for WHM/cPanel.

Integrating Galera Cluster with cPanel

1. cPanel requires a MySQL options file for user root located at /root/.my.cnf which defines the credentials of the remote MySQL server that we are going to use. On all nodes (Galera and cPanel nodes), create a file /root/.my.cnf and add the following lines:

[client]
username=root
password=[mysql root password]
host=192.168.1.210
port=3307

**Replace [mysql root password] with the respective value.

Verify that you can login to MySQL server just by typing the ‘mysql’ command as user root:

root@cpanel [~]$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 372
Server version: 5.6.24-72.2-56 Percona XtraDB Cluster (GPL), Release rel72.2, Revision 1, WSREP version 25.11, wsrep_25.11

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>

2. From WHM, go to WHM > SQL Services > Setup Remote MySQL Server and specify the virtual IP address of your MySQL Galera Cluster:

3. Once added, verify if WHM/cPanel connects to the correct database server by going to WHM > SQL Services > PHPmyAdmin and look into the information under Database Server section. You should see something similar to the screenshot below:

4. Now it’s safe to shutdown the local MySQL. Go to WHM > Service Configuration > Service Manager and uncheck the ‘Enabled’ and ‘Monitor’ tickboxes for MySQL Server. From this point, MySQL databases will be hosted on a high availability Galera Cluster with auto failover.

From cPanel, when you are trying to create a new database, it will display the MySQL server address under “Remote MySQL Host” section:

So, make sure you specify the highlighted host when connecting to the MySQL server from the application/client side. The setup is now complete.

Making cPanel fit into Galera

As a hoster, it is likely you will enable cPaddons or some third-party application installer like Fantastico or Softaculous to automate the installation of web applications from cPanel. Some of the applications still explicitly define MyISAM as the storage engine in the SQL files. MyISAM support in Galera is experimental, we would not recommend you run MyISAM tables in your production Galera Cluster.

In order to avoid this, we have to find and replace SQL files which contain MyISAM in the ENGINE statement. Run the following commands to achieve this for the cPaddons installer:

$ cd /usr/local/cpanel/cpaddons
$ sed -i 's|ENGINE=MyISAM|ENGINE=InnoDB|g' $(grep -lir "engine=MyISAM" *)

You can also use the mentioned command for other application installer. In case you upgrade cPanel or the installers, please make sure the engine has not been reverted back to MyISAM. That’s all folks!

Blog category:

Devops

Tags:

↧

s9s Tools and Resources: 'Become a MySQL DBA' series, ClusterControl 1.2.11 release, and more!

November 4, 2015, 3:47 am

≫ Next: ClusterControl Tips & Tricks: wtmp Log Rotation Settings for Sudo User

≪ Previous: High Availability MySQL on cPanel with Galera Cluster

Check Out Our Latest Technical Resources for MySQL, MariaDB, PostgreSQL and MongoDB

This blog is packed with all the latest resources and tools we’ve recently published! Please do check it out and let us know if you have any comments or feedback.

Product Announcements & Resources

ClusterControl 1.2.11 Release
Last month, we were pleased to announce our best release yet for Postgres users as well as to introduce key new features for our MySQL/MariaDB users, such as support for MaxScale, an open-source, database-centric proxy that works with MySQL, MariaDB and Percona server.

View all the related resources here:

Support for MariaDB’s MaxScale
You can get started straight away with MaxScale by deploying and managing it with ClusterControl 1.2.11. This How-To blog shows you how, step-by-step.

Customer Case Studies

From small businesses to Fortune 500 companies, customers have chosen Severalnines to deploy and manage MySQL, MongoDB and PostgreSQL.

Computing: Monitoring high availability clusters: Why Ping Identity chose Severalnines over SolarWinds
Information Age: How Open Source MySQL keeps Eurovision Live
Press Release: Severalnines helps BT Expedite global expansion for its retail customers

View our Customer page to discover companies like yours who have found success with ClusterControl.

Screen Shot 2015-11-04 at 09.44.52.png

Technical Webinar - Replays

As you know, we run a monthly technical webinar cycle; these are the latest replays, which you can watch at your own leisure, all part of our ‘Become a MySQL DBA’ series:

View all our replays here

ClusterControl Blogs

We’ve started a new series of blogs focussing on how to use ClusterControl. Do check them out!

View all ClusterControl blogs here.

The MySQL DBA Blog Series

We’re on the 15th installment of our popular ‘Become a MySQL DBA’ series and you can view all of these blogs here. Here are the latest ones in the series:

View all the ‘Become a MySQL DBA’ blogs here

We trust these resources are useful. If you have any questions on them or on related topics, please do contact us!

Blog category:

DB Ops

Tags:

↧

ClusterControl Tips & Tricks: wtmp Log Rotation Settings for Sudo User

November 4, 2015, 6:40 pm

≫ Next: Webinar: Become a MySQL DBA - Performing live database upgrades in Replication and Galera setups - Tuesday November 24th

≪ Previous: s9s Tools and Resources: 'Become a MySQL DBA' series, ClusterControl 1.2.11 release, and more!

Requires ClusterControl. Applies to all supported database clusters. Applies to all supported operating systems (RHEL/CentOS/Debian/Ubuntu).

ClusterControl requires a super-privileged SSH user to provision database nodes. If you are running as non-root user, the corresponding user must able to execute sudo commands with or without sudo password. Unfortunately, this could generate another issue where performing remote command with “sudo” requires an interactive session (tty). We will explain this in details in the next sections.

What’s up with sudo?

By default, most of the RHEL flavors have the following configured under /etc/sudoers:

Defaults requiretty

When an interactive session (tty) is required, each time the sudo user SSH into the box with -t flag (force pseudo-tty allocation), entries will be created in /var/log/wtmp for the creation and destruction of terminals, or the assignment and release of terminals. These logs only record interactive sessions. If you didn’t specify -t, you would see the following error:

sudo: sorry, you must have a tty to run sudo

The root user does not require an interactive session when running remote SSH command, the entries only appear in /var/log/secure or /var/log/auth.log depending on the system configuration. Different distributions have different defaults in this regards. SSH does not make a file into wtmp if it is a non-interactive session.

To check the content of wtmp, we use the following command:

$ last -f /var/log/wtmp
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
...

On Debian/Ubuntu system, sudo user does not need to acquire tty as it defaults to have no “requiretty” configured. However, ClusterControl defaults to append -t flag if it detects the SSH user as a non-root user. Since ClusterControl performs all the monitoring and management tasks as this user, you may notice that /var/log/wtmp will grow rapidly, as shown in the following section.

Log rotation for wtmp

Example: Take note of the following default configuration of wtmp in RHEL 7.1 inside /etc/logrotate.conf:

/var/log/wtmp {
    monthly
    create 0664 root utmp
    minsize 1M
    rotate 1
}

By running the following commands on one of the database nodes managed by ClusterControl, we can see how fast /var/log/wtmp grows every minute:

[user@server ~]$ a=$(du -b /var/log/wtmp | cut -f1) && sleep 60 && b=$(du -b /var/log/wtmp | cut -f1) && c=$(expr $b - $a ) && echo $c
89088

From the above result, ClusterControl causes the log file to grow 89 KB per minute, which equals to 128MB per day. If the mentioned logrotate configuration is used (monthly rotation), /var/log/wtmp alone may consume 3.97 GB of disk space! If the partition where this file resides (usually under “/” partition) is small (it’s common to have “/” partition smaller, especially if it’s a cloud instance), there is a potential risk you would fill up the disk space on that partition in less than one month.

Workaround

The workaround is to play with the log rotation of wtmp. This is applicable to all operating systems mentioned in the beginning of this post. For those who are affected by this, you have to change the log rotation behaviour so it does not grow more than expected. The following is what we recommend:

/var/log/wtmp {
     size 100M
     create 0664 root utmp
     rotate 3
     compress
}

The above settings specify that the maximum size of wtmp should be 100 MB and, and we should keep the 3 most recent (compressed) files and remove older ones.

Logrotate uses crontab (under /etc/cron.daily/logrotate) to work. It is not a daemon so no need to reload its configuration. When the crontab executes logrotate, it will use the new config file automatically.

Happy Clustering!

PS.: To get started with ClusterControl, click here!

Blog category:

ClusterControl

Tags:

↧

Webinar: Become a MySQL DBA - Performing live database upgrades in Replication and Galera setups - Tuesday November 24th

November 6, 2015, 2:10 am

≫ Next: Become a MySQL DBA blog series - Galera Cluster diagnostic logs

≪ Previous: ClusterControl Tips & Tricks: wtmp Log Rotation Settings for Sudo User

Join us on November 24th for this new webinar on performing live database upgrades for MySQL Replication & Galera led by Krzysztof Książek, Senior Support Engineer at Severalnines. This is part of our ongoing ‘Become a MySQL DBA’ series.

Database vendors typically release patches with bug/security fixes on a monthly basis, why should we care? After all, upgrading a system is extra work and might incur downtime. Unfortunately, the news is full of reports of security breaches and hacked systems, so unless security is not a concern, you might want to have the most current security fixes on your systems. Major versions are rarer, and usually harder (and riskier) to upgrade to. But they might bring along some important features that make the upgrade worth the effort.

In this webinar we will cover one of the most basic, but essential tasks of the DBA: minor and major database upgrades in production environments.

DATE, TIME & REGISTRATION

Europe/MEA/APAC
Tuesday, November 24th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)
Register Now

North America/LatAm
Tuesday, November 24th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

AGENDA

What types of upgrades are there?
How do I best prepare for the upgrades?
Best practices for:
- Minor version upgrades - MySQL & Galera
- Major version upgrades - MySQL & Galera

SPEAKER

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

To view all the blogs of the ‘Become a MySQL DBA’ series visit: http://www.severalnines.com/blog-categories/db-ops

Blog category:

Events

Tags:

↧

Become a MySQL DBA blog series - Galera Cluster diagnostic logs

November 9, 2015, 12:59 am

≫ Next: nginx as Database Load Balancer for MySQL or MariaDB Galera Cluster

≪ Previous: Webinar: Become a MySQL DBA - Performing live database upgrades in Replication and Galera setups - Tuesday November 24th

In a previous post, we discussed the error log as a source of diagnostic information for MySQL. We’ve shown examples of problems where the error log can help you diagnose issues with MySQL. This post focuses on the Galera Cluster error logs, so if you are currently using Galera, read on.

In a Galera cluster, whether you use MariaDB Cluster, Percona XtraDB Cluster or the Codership build, the MySQL error log is even more important. It gives you the same information as MySQL would, but you’ll also find in it information about Galera internals and replication. This data is crucial to understand the state of your cluster and to identify any issues which may impact the cluster’s ability to operate. In this post, we’ll try to make the Galera error log easier to understand.

This is the sixteenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

Clean Galera startup

Let’s start by looking at what a clean startup of a Galera node looks like.

151028 14:28:50 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
151028 14:28:50 mysqld_safe Skipping wsrep-recover for 98ed75de-7c05-11e5-9743-de4abc22bd11:11152 pair
151028 14:28:50 mysqld_safe Assigning 98ed75de-7c05-11e5-9743-de4abc22bd11:11152 to wsrep_start_position

What we can see here is Galera trying to understand the sequence number of the last writeset this node applied. It is found in grastate.dat file. In this particular case, shutdown was clean so all the needed data was found in that file making wsrep-recovery process unnecessary (i.e., the process of calculating the last writeset based on the data applied to the database). Galera sets the wsrep_start_position to the last applied writeset.

2015-10-28 14:28:50 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-10-28 14:28:50 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56) starting as process 553 ...
2015-10-28 14:28:50 553 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-10-28 14:28:50 553 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2015-10-28 14:28:50 553 [Note] WSREP: wsrep_load(): Galera 3.12.2(rf3e626d) by Codership Oy <info@codership.com> loaded successfully.
2015-10-28 14:28:50 553 [Note] WSREP: CRC-32C: using hardware acceleration.
2015-10-28 14:28:50 553 [Note] WSREP: Found saved state: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152

Next we can see the WSREP library being loaded and, again, information about last state of the cluster before shutdown.

2015-10-28 14:28:50 553 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.30.4.156; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum =

This line covers all configuration related to Galera.

2015-10-28 14:28:50 553 [Note] WSREP: Service thread queue flushed.
2015-10-28 14:28:50 553 [Note] WSREP: Assign initial position for certification: 11152, protocol version: -1
2015-10-28 14:28:50 553 [Note] WSREP: wsrep_sst_grab()
2015-10-28 14:28:50 553 [Note] WSREP: Start replication
2015-10-28 14:28:50 553 [Note] WSREP: Setting initial position to 98ed75de-7c05-11e5-9743-de4abc22bd11:11152
2015-10-28 14:28:50 553 [Note] WSREP: protonet asio version 0
2015-10-28 14:28:50 553 [Note] WSREP: Using CRC-32C for message checksums.
2015-10-28 14:28:50 553 [Note] WSREP: backend: asio

Initialization process continues, WSREP replication has been started.

2015-10-28 14:28:50 553 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2015-10-28 14:28:50 553 [Note] WSREP: restore pc from disk failed

Those two lines above require a detailed comment. The file which is missing here, gvstate.dat, contains information about cluster state, as a given node sees it. You can find there the UUID of a node and other members of the cluster which are in the PRIMARY state (i.e., which are part of the cluster). In case of unexpected crash, this file will allow the node to understand the state of the cluster before it crashed. In case of a clean shutdown, the gvwstate.dat file is removed. That is why it cannot be found in our example.

2015-10-28 14:28:50 553 [Note] WSREP: GMCast version 0
2015-10-28 14:28:50 553 [Note] WSREP: (3506d410, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2015-10-28 14:28:50 553 [Note] WSREP: (3506d410, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2015-10-28 14:28:50 553 [Note] WSREP: EVS version 0
2015-10-28 14:28:50 553 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.30.4.156:4567,172.30.4.191:4567,172.30.4.220:4567'

In this section, you will see the initial communication within the cluster - the node is trying to connect to the other members listed in wsrep_cluster_address variable. For Galera it’s enough to get in touch with one active member of the cluster - even if node “A” has contact only with node “B” but not “C”, “D” and “E”, node “B” will work as a relay and pass through the needed communication.

2015-10-28 14:28:50 553 [Warning] WSREP: (3506d410, 'tcp://0.0.0.0:4567') address 'tcp://172.30.4.156:4567' points to own listening address, blacklisting
2015-10-28 14:28:50 553 [Note] WSREP: (3506d410, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2015-10-28 14:28:50 553 [Note] WSREP: declaring 6221477d at tcp://172.30.4.191:4567 stable
2015-10-28 14:28:50 553 [Note] WSREP: declaring 78b22e70 at tcp://172.30.4.220:4567 stable

Here we can see results of the exchange of the communication. At first, it blacklists it’s own IP address. Then, as a result of a network exchange, the remaining two nodes of the cluster are found and declared ‘stable’. Please note the IDs used here - those Node IDs are used also in other places but here you can easily link them with the relevant IP address. Node ID can also be located in gvwstate.dat file (it’s a first segment of ‘my_uuid’) or by running “SHOW GLOBAL STATUS LIKE ‘wsrep_gcomm_uuid’”. Please keep in mind that UUID may change, for example, after a crash.

2015-10-28 14:28:50 553 [Note] WSREP: Node 6221477d state prim
2015-10-28 14:28:50 553 [Note] WSREP: view(view_id(PRIM,3506d410,5) memb {
    3506d410,0
    6221477d,0
    78b22e70,0
} joined {
} left {
} partitioned {
})
2015-10-28 14:28:50 553 [Note] WSREP: save pc into disk
2015-10-28 14:28:50 553 [Note] WSREP: gcomm: connected

As a result, the node managed to join the cluster - you can see that all IDs are listed as members of the cluster, cluster is in ‘Primary’ state. There are no nodes in ‘joined’ state, the cluster was not partitioned.

2015-10-28 14:28:50 553 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2015-10-28 14:28:50 553 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2015-10-28 14:28:50 553 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2015-10-28 14:28:50 553 [Note] WSREP: Waiting for SST to complete.
2015-10-28 14:28:50 553 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2015-10-28 14:28:50 553 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 355385f7-7d80-11e5-bb70-17166cef2a4d
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: sent state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: got state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d from 0 (172.30.4.156)
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: got state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d from 1 (172.30.4.191)
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: got state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d from 2 (172.30.4.220)
2015-10-28 14:28:50 553 [Note] WSREP: Quorum results:
    version    = 3,
    component  = PRIMARY,
    conf_id    = 4,
    members    = 3/3 (joined/total),
    act_id     = 11152,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11

Cluster members exchange information about their status and as a result we see a report on quorum calculation - cluster is in ‘Primary’ state with all three nodes joined.

2015-10-28 14:28:50 553 [Note] WSREP: Flow-control interval: [28, 28]
2015-10-28 14:28:50 553 [Note] WSREP: Restored state OPEN -> JOINED (11152)
2015-10-28 14:28:50 553 [Note] WSREP: New cluster view: global state: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152, view# 5: Primary, number of nodes: 3, my index: 0, protocol version 3
2015-10-28 14:28:50 553 [Note] WSREP: SST complete, seqno: 11152
2015-10-28 14:28:50 553 [Note] WSREP: Member 0.0 (172.30.4.156) synced with group.
2015-10-28 14:28:50 553 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 11152)

Those few lines above are bit misleading - there was no SST performed as the node was synced with the cluster. In short, at this very moment, the cluster is up and our node is synced with the other cluster members.

2015-10-28 14:28:50 553 [Warning] option 'innodb-buffer-pool-instances': signed value -1 adjusted to 0
2015-10-28 14:28:50 553 [Note] Plugin 'FEDERATED' is disabled.
2015-10-28 14:28:50 553 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-10-28 14:28:50 553 [Note] InnoDB: The InnoDB memory heap is disabled
2015-10-28 14:28:50 553 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

...

2015-10-28 14:28:52 553 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.26-74.0-56'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12

Finally, MySQL startup process is coming to it’s end - MySQL is up and running, ready to handle connections.

As you can see, in the error log, during the clean start, we can see a lot of information. We can see how a node sees the cluster, we can see information about quorum calculation, we can see IDs and IP addresses of other nodes in the cluster. This give us great info about the cluster state - what nodes were up, were there any connectivity issues? Information about Galera configuration is also very useful - my.cnf may have been changed in the meantime, runtime configuration - the same. Yet, we can still check how exactly Galera was configured at the time of the startup. It may comes handy when dealing with misconfigurations.

Galera node - IST process

Ok, so we covered a regular startup - when a node is up to date with the cluster. What about when a node is not up to date? There are two options - if the donor has all the needed writesets stored in its gcache, Incremental State Transfer (IST) can be performed. If not, State Snapshot Transfer (SST) will be executed and it involves copying all data from the donor to the joining node. This time, we are looking at the node joining/synchronization using IST. To make it easier to read, we are going to remove some of the log entries which didn’t change compared to the clean startup we discussed earlier.

2015-10-28 16:36:50 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-10-28 16:36:50 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56) starting as process 10144 ...

...

2015-10-28 16:36:50 10144 [Note] WSREP: Found saved state: 98ed75de-7c05-11e5-9743-de4abc22bd11:-1

Ok, MySQL started and after initialization, Galera located the saved state with ‘-1’ as a writeset sequence number. It means that the node didn’t shutdown cleanly and the correct state wasn’t written in grastate.dat file.

...

2015-10-28 16:36:50 10144 [Note] WSREP: Start replication
2015-10-28 16:36:50 10144 [Note] WSREP: Setting initial position to 98ed75de-7c05-11e5-9743-de4abc22bd11:11152

Even though grastate.dat file didn’t contain sequence number, Galera can calculate it on it’s own and setup the initial position accordingly.

...

2015-10-28 16:36:50 10144 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.30.4.156:4567,172.30.4.191:4567,172.30.4.220:4567'
2015-10-28 16:36:50 10144 [Warning] WSREP: (16ca37d4, 'tcp://0.0.0.0:4567') address 'tcp://172.30.4.191:4567' points to own listening address, blacklisting
2015-10-28 16:36:50 10144 [Note] WSREP: (16ca37d4, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2015-10-28 16:36:51 10144 [Note] WSREP: declaring 3506d410 at tcp://172.30.4.156:4567 stable
2015-10-28 16:36:51 10144 [Note] WSREP: declaring 78b22e70 at tcp://172.30.4.220:4567 stable
2015-10-28 16:36:51 10144 [Note] WSREP: Node 3506d410 state prim
2015-10-28 16:36:51 10144 [Note] WSREP: view(view_id(PRIM,16ca37d4,8) memb {
    16ca37d4,0
    3506d410,0
    78b22e70,0
} joined {
} left {
} partitioned {
})

Similar to the clean startup, the Galera nodes had some communication and decided on the state of the cluster. In this case, all three nodes were up and the cluster was formed.

2015-10-28 16:36:51 10144 [Note] WSREP: save pc into disk
2015-10-28 16:36:51 10144 [Note] WSREP: Waiting for SST to complete.
2015-10-28 16:36:51 10144 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2015-10-28 16:36:51 10144 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 17633a80-7d92-11e5-8bdc-f6091df00f00
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: sent state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: got state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00 from 0 (172.30.4.191)
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: got state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00 from 1 (172.30.4.156)
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: got state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00 from 2 (172.30.4.220)
2015-10-28 16:36:51 10144 [Note] WSREP: Quorum results:
    version    = 3,
    component  = PRIMARY,
    conf_id    = 6,
    members    = 2/3 (joined/total),
    act_id     = 31382,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11

Again, communication exchange happens and quorum calculation is performed. This time you can see that three members were detected and only two of them joined. This is something we expect - one node is not synced with the cluster.

2015-10-28 16:36:51 10144 [Note] WSREP: Flow-control interval: [28, 28]
2015-10-28 16:36:51 10144 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 31382)
2015-10-28 16:36:51 10144 [Note] WSREP: State transfer required:
    Group state: 98ed75de-7c05-11e5-9743-de4abc22bd11:31382
    Local state: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152

State transfer was deemed necessary because the local node is not up to date with the applied writesets.

2015-10-28 16:36:51 10144 [Note] WSREP: New cluster view: global state: 98ed75de-7c05-11e5-9743-de4abc22bd11:31382, view# 7: Primary, number of nodes: 3, my index: 0, protocol version 3
2015-10-28 16:36:51 10144 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-10-28 16:36:51 10144 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.191' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '10144''''
WSREP_SST: [INFO] Streaming with xbstream (20151028 16:36:52.039)
WSREP_SST: [INFO] Using socat as streamer (20151028 16:36:52.041)

State transfer is performed using xtrabackup, this is true for both IST and SST (if xtrabackup is configured as SST method). Both xbstream and socat binaries are required for IST (and SST) to execute. It may happen that socat is not installed on the system. In such case, the node won’t be able to join the cluster.

WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151028 16:36:52.132)
2015-10-28 16:36:52 10144 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.30.4.191:4444/xtrabackup_sst//1
2015-10-28 16:36:52 10144 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-10-28 16:36:52 10144 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-10-28 16:36:52 10144 [Note] WSREP: Service thread queue flushed.
2015-10-28 16:36:52 10144 [Note] WSREP: Assign initial position for certification: 31382, protocol version: 3
2015-10-28 16:36:52 10144 [Note] WSREP: Service thread queue flushed.
2015-10-28 16:36:52 10144 [Note] WSREP: Prepared IST receiver, listening at: tcp://172.30.4.191:4568
2015-10-28 16:36:52 10144 [Note] WSREP: Member 0.0 (172.30.4.191) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-10-28 16:36:52 10144 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 31389)

This stage covers state transfer - as you can see, the donor is selected. In our case, it is 172.30.4.156. This is pretty important information, especially when SST is executed - donors will run xtrabackup to move the data and they will log the progress in innobackup.backup.log file in the MySQL data directory. Different network issues may also cause problems with state transfer - it’s good to check which node was a donor even if only to check if there are any errors on the donor’s side.

2015-10-28 16:36:52 10144 [Note] WSREP: Requesting state transfer: success, donor: 1
2015-10-28 16:36:53 10144 [Note] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.191) complete.
2015-10-28 16:36:53 10144 [Note] WSREP: Member 1.0 (172.30.4.156) synced with group.
WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20151028 16:36:53.159)
WSREP_SST: [INFO] Galera co-ords from recovery: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152 (20151028 16:36:53.162)
WSREP_SST: [INFO] Total time on joiner: 0 seconds (20151028 16:36:53.164)
WSREP_SST: [INFO] Removing the sst_in_progress file (20151028 16:36:53.166)
2015-10-28 16:36:53 10144 [Note] WSREP: SST complete, seqno: 11152
2015-10-28 16:36:53 10144 [Warning] option 'innodb-buffer-pool-instances': signed value -1 adjusted to 0
2015-10-28 16:36:53 10144 [Note] Plugin 'FEDERATED' is disabled.
2015-10-28 16:36:53 10144 [Note] InnoDB: Using atomics to ref count buffer pool pages

...

2015-10-28 16:36:55 10144 [Note] WSREP: Signalling provider to continue.
2015-10-28 16:36:55 10144 [Note] WSREP: Initialized wsrep sidno 1
2015-10-28 16:36:55 10144 [Note] WSREP: SST received: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152
2015-10-28 16:36:55 10144 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.26-74.0-56'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12
2015-10-28 16:36:55 10144 [Note] WSREP: Receiving IST: 20230 writesets, seqnos 11152-31382
2015-10-28 16:40:18 10144 [Note] WSREP: IST received: 98ed75de-7c05-11e5-9743-de4abc22bd11:31382
2015-10-28 16:40:18 10144 [Note] WSREP: 0.0 (172.30.4.191): State transfer from 1.0 (172.30.4.156) complete.

In this section, among others, we can see IST being performed and 20k writesets were transferred from the donor to the joining node.

2015-10-28 16:40:18 10144 [Note] WSREP: Shifting JOINER -> JOINED (TO: 34851)

At this point, the node applied all writesets sent by the donor and it switched to ‘Joined’ state. It’s not yet synced because there are remaining writesets (applied by the cluster while IST was performed) to be applied. This is actually something very important to keep in mind - when a node is a ‘Joiner’, it’s applying writesets as fast as possible, without any interruption to the cluster. Once a node switches to ‘Joined’ state, it may send flow control messages. This is particularly visible when there are lots of writesets to be sent via IST. Let’s say it took 2h to apply them on a joining node. This means it still has to apply 2h worth of traffic - and it can send flow control messages in the meantime. Given that the‘Joined’ node is significantly lagging behind (2h), it will definitely send flow control - for the duration of the time needed to apply those 2h worth of traffic, the cluster will run with degraded performance.

2015-10-28 16:42:30 10144 [Note] WSREP: Member 0.0 (172.30.4.191) synced with group.
2015-10-28 16:42:30 10144 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 36402)
2015-10-28 16:42:31 10144 [Note] WSREP: Synchronized with group, ready for connections
2015-10-28 16:42:31 10144 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Finally, the node caught up on the writeset replication and synced with the cluster.

Galera SST process

In the previous part, we covered IST process from the error log point of view. One more important and frequent process is left - State Snapshot Transfer (SST).

151102 15:39:10 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
151102 15:39:10 mysqld_safe Skipping wsrep-recover for 98ed75de-7c05-11e5-9743-de4abc22bd11:71174 pair
151102 15:39:10 mysqld_safe Assigning 98ed75de-7c05-11e5-9743-de4abc22bd11:71174 to wsrep_start_position

...

2015-11-02 15:39:11 1890 [Note] WSREP: Quorum results:
    version    = 3,
    component  = PRIMARY,
    conf_id    = 10,
    members    = 2/3 (joined/total),
    act_id     = 97896,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11
2015-11-02 15:39:11 1890 [Note] WSREP: Flow-control interval: [28, 28]
2015-11-02 15:39:11 1890 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 97896)
2015-11-02 15:39:11 1890 [Note] WSREP: State transfer required:
    Group state: 98ed75de-7c05-11e5-9743-de4abc22bd11:97896
    Local state: 98ed75de-7c05-11e5-9743-de4abc22bd11:71174
2015-11-02 15:39:11 1890 [Note] WSREP: New cluster view: global state: 98ed75de-7c05-11e5-9743-de4abc22bd11:97896, view# 11: Primary, number of nodes: 3, my index: 2, protocol version 3
2015-11-02 15:39:11 1890 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-11-02 15:39:11 1890 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.191' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '1890''''
WSREP_SST: [INFO] Streaming with xbstream (20151102 15:39:11.812)
WSREP_SST: [INFO] Using socat as streamer (20151102 15:39:11.814)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151102 15:39:11.908)
2015-11-02 15:39:12 1890 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.30.4.191:4444/xtrabackup_sst//1
2015-11-02 15:39:12 1890 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-02 15:39:12 1890 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-11-02 15:39:12 1890 [Note] WSREP: Service thread queue flushed.
2015-11-02 15:39:12 1890 [Note] WSREP: Assign initial position for certification: 97896, protocol version: 3
2015-11-02 15:39:12 1890 [Note] WSREP: Service thread queue flushed.
2015-11-02 15:39:12 1890 [Note] WSREP: Prepared IST receiver, listening at: tcp://172.30.4.191:4568
2015-11-02 15:39:12 1890 [Note] WSREP: Member 2.0 (172.30.4.191) requested state transfer from '*any*'. Selected 0.0 (172.30.4.220)(SYNCED) as donor.
2015-11-02 15:39:12 1890 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 98007)
2015-11-02 15:39:12 1890 [Note] WSREP: Requesting state transfer: success, donor: 0

Until this line, everything looks exactly like with Incremental State Transfer (IST). There was a gap in the sequence between joining node and the rest of the cluster. SST was deemed necessary because the required writeset (71174) wasn’t available in the donor’s gcache.

WSREP_SST: [INFO] Proceeding with SST (20151102 15:39:13.070)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151102 15:39:13.071)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151102 15:39:13.093)
removed ‘/var/lib/mysql/DB187/t1.ibd’
removed ‘/var/lib/mysql/DB187/t2.ibd’
removed ‘/var/lib/mysql/DB187/t3.ibd’

SST basically copies data from a donor to the joining node. For this to work, the current contents of the data directory have to be removed - the SST process takes care of it and you can follow the progress in the error log, as can be seen above.

...

removed directory: ‘/var/lib/mysql/DBx236’
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151102 15:39:30.349)
2015-11-02 15:42:32 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
2015-11-02 15:48:56 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
2015-11-02 15:54:59 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000002 of size 134217728 bytes
2015-11-02 16:00:45 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000003 of size 134217728 bytes
2015-11-02 16:06:21 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000004 of size 134217728 bytes
2015-11-02 16:09:32 1890 [Note] WSREP: 0.0 (172.30.4.220): State transfer to 2.0 (172.30.4.191) complete.
2015-11-02 16:09:32 1890 [Note] WSREP: Member 0.0 (172.30.4.220) synced with group.

The SST process takes some time, depending on the time needed to copy the data. For the duration of the SST, the joining node caches writesets in gcache and, if in-memory gcache is not large enough, on-disk gcache files are used. Please note that at this point, in the last two lines, we can see information about the donor finishing the SST and getting back in sync with the rest of the cluster.

WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20151102 16:09:32.187)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check  --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20151102 16:09:32.188)
2015-11-02 16:12:39 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000005 of size 134217728 bytes
2015-11-02 16:20:52 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000006 of size 134217728 bytes

In this case, the joiner use xtrabackup as the SST method therefore the ‘prepare’ phase has to be completed.

WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20151102 16:26:01.062)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check   --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20151102 16:26:01.064)
WSREP_SST: [INFO] Move successful, removing /var/lib/mysql//.sst (20151102 16:26:14.600)

Last phase is to copy back files from the temporary directory to MySQL data directory. The rest of the process is similar to what we discussed earlier. The node initializes all plugins and InnoDB, and MySQL gets ready to handle connections. Once that’s done, it applies the remaining writesets stored in gcache.

2015-11-02 16:27:06 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000000
2015-11-02 16:27:08 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000001
2015-11-02 16:27:11 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000002
2015-11-02 16:27:15 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000003
2015-11-02 16:28:11 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000004
2015-11-02 16:28:56 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000005
2015-11-02 16:29:17 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000006
2015-11-02 16:29:24 1890 [Note] WSREP: Member 2.0 (172.30.4.191) synced with group.
2015-11-02 16:29:24 1890 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 214214)
2015-11-02 16:29:24 1890 [Note] WSREP: Synchronized with group, ready for connections
2015-11-02 16:29:24 1890 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

After all writesets were applied, on-disk gcache files are deleted and the node finally syncs up with the cluster.

Xtrabackup logs for SST

The SST process (if you use xtrabackup as the SST method), as you may have noticed, consists of three stages - streaming data to joiner, applying logs on the joiner and moving data from temporary directory to the final location in MySQL data directory. To have a complete view of the process, we need to mention some additional log files related to those three stages of the SST. You can find them in MySQL data directory or in the temporary directory Galera uses to move the data to the joiner.

First of them (in order of appearance) is innobackup.backup.log - a file created on the donor host which consists of all the logs from xtrabackup process which is executed to stream the data from donor to joiner. This file can be very useful in dealing with any type of SST issue. In the next blog post, we will show how you can use it to identify the problem.

Another file located either in MySQL data directory (e.g., /var/lib/mysql) or .sst temporary subdirectory (e.g., /var/lib/mysql/.sst) is innobackup.prepare.log. This file contains logs related to ‘apply log’ phase. In this stage xtrabackup applies transaction logs to tablespaces (ibd files) copied from the donor. It may happen that InnoDB data is corrupted and, even though it was possible to stream the data from donor to joiner, it’s not possible to apply transactions from the redo log. Such errors will be logged in this file.

The last one is innobackup.move.log - it contains logs of the ‘move’ stage in which xtrabackup move files from temporary (.sst) directory to the final location in MySQL data directory.

In the next post, we’ll go through real life examples of errors which may impact your Galera Cluster. We will cover a wide range of issues that you can experience in production, and how to deal with them. For that, it is important to know what exactly you should expect in logs from a ‘normal’ system state - now that you know what normal looks like, it will much easier to understand what’s not normal :-)

Blog category:

DB Ops

Tags:

↧

nginx as Database Load Balancer for MySQL or MariaDB Galera Cluster

November 12, 2015, 5:59 am

≫ Next: Become a MySQL DBA blog series - Troubleshooting Galera cluster issues - part 2

≪ Previous: Become a MySQL DBA blog series - Galera Cluster diagnostic logs

Nginx is well-known for its ability to act as a reverse-proxy with small memory footprint. It usually sits in the front-end web tier to redirect connections to available backend services, provided these passed some health checks. Using a reverse-proxy is common when you are running a critical application or service that requires high availability. It also distributes the load equally among the backend services.

Recently, nginx 1.9 introduced support for TCP load balancing - similar to what HAProxy is capable of. The one major drawback is that it does not support advanced backend health checks. This is required when running MySQL Galera Cluster, as we’ll explain in the next section. Note that this limitation is removed in the paid-only edition called NGINX Plus.

In this blog post, we are going to play around with nginx as a reverse-proxy for MySQL Galera Cluster services to achieve higher availability. We had a Galera cluster up and running, deployed using ClusterControl on CentOS 7.1. We are going to install nginx on a fresh new host, as illustrated in the following diagram:

Backend Health Checks

With the existence of a synchronously-replicated cluster like Galera or NDB, it has become quite popular to use a TCP reverse-proxy as load balancer. All MySQL nodes are treated equal as one can read or write from any node. No read-write splitting is required, as you would with MySQL master-slave replication. Nginx is not database-aware, so some additional steps are required to configure the health checks for Galera Cluster backends so that they return something understandable.

If you are running HAProxy, a healthcheck script on each MySQL server in the load balancing set should be able to return an HTTP response status. For example, if the backend MySQL server is healthy, then the script will return a simple HTTP 200 OK status code. Else, the script will return 503 Service unavailable. HAProxy can then update the routing table to exclude the problematic backend servers from the load balancing set and redirect the incoming connections only to the available servers. This is well explained in this webinar on HAProxy. Unfortunately, the HAProxy health check uses xinetd to daemonize and listen to a custom port (9200) which is not configurable in nginx yet.

The following flowchart illustrates the process to report the health of a Galera node for multi-master setup:

At the time of this writing, NGINX Plus (the paid release of nginx) also supports advanced backend health but it does not support custom backend monitoring port.

**Update - 20th January 2016: NGINX Plus just released nginx-plus-r8 which supports custom backend monitoring port. Details at http://nginx.org/r/health_check.

Using clustercheck-iptables

To overcome this limitation, we’ve created a healthcheck script called clustercheck-iptables. It is a background script that checks the availability of a Galera node, and adds a redirection port using iptables if the Galera node is healthy (instead of returning HTTP response). This allows other TCP-load balancers with limited health check capabilities to monitor the backend Galera nodes correctly. Other than HAProxy, you can now use your favorite reverse proxy like nginx (>1.9), IPVS, keepalived, piranha, distributor, balance or pen to load balance requests across Galera nodes.

So how does it work? The script performs a health check every second on each Galera node. If the node is healthy (wsrep_cluster_state_comment=Synced and read_only=OFF) or (wsrep_cluster_state_comment=Donor and wsrep_sst_method=xtrabackup/xtrabackup-v2), a port redirection will be setup using iptables (default: 3308 redirects to 3306) using the following command:

$ iptables -t nat -A PREROUTING -s $0.0.0.0/0 -p tcp --dport 3308 -j REDIRECT --to-ports 3306

Else, the above rule will be taken out from the iptables PREROUTING chain. On the load balancer, define the designated redirection port (3308) instead. If the backend node is "unhealthy", port 3308 will be unreachable because the corresponding iptables rule is removed on the database node. The load balancer shall then exclude it from the load balancing set.

Let’s install the script and see how it works in practice.

1. On the database servers, run the following commands to install the script:

$ git clone https://github.com/ashraf-s9s/clustercheck-iptables
$ cp clustercheck-iptables/mysqlchk_iptables /usr/local/sbin

2. By default, the script will use a MySQL user called “mysqlchk_user” with password “mysqlchk_password”. We need to ensure this MySQL user exists with the corresponding password before the script is able to perform health checks. Run the following DDL statements on one of the DB nodes (Galera should replicate the statement to the other nodes):

mysql> GRANT PROCESS ON *.* TO 'mysqlchk_user'@'localhost' IDENTIFIED BY 'mysqlchk_password';
mysql> FLUSH PRIVILEGES;

** If you would like to run as different user/password, specify -u and/or -p argument in the command line. See examples on the Github page.

3. This script requires running iptables. In this example, we ran on CentOS 7 which comes with firewalld by default. We have to install iptables-services beforehand:

$ yum install -y iptables-services
$ systemctl enable iptables
$ systemctl start iptables

Then, setup basic rules for MySQL Galera Cluster so iptables won’t affect the database communication:

$ iptables -I INPUT -m tcp -p tcp --dport 3306 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 3308 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 4444 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 4567:4568 -j ACCEPT
$ service iptables save
$ service iptables restart

4. Once the basic rules are added, verify them with the following commands:

$ iptables -L -n

5. Test mysqlchk_iptables:

$ mysqlchk_iptables -t
Detected variables/status:
wsrep_local_state: 4
wsrep_sst_method: xtrabackup-v2
read_only: OFF

[11-11-15 08:33:49.257478192] [INFO] Galera Cluster Node is synced.

6. Looks good. Now we can daemonize the health check script:

$ mysqlchk_iptables -d
/usr/local/sbin/mysqlchk_iptables started with PID 66566.

7. Our PREROUTING rules will look something like this:

$ iptables -L -n -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
REDIRECT   tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:3308 redir ports 3306

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

8. Finally, add the health check command into /etc/rc.local so it starts automatically on boot:

echo '/usr/local/sbin/mysqlchk_iptables -d'>> /etc/rc.local

In some distributions, you need to verify that rc.local holds the correct permission to execute scripts on boot. Verify with:

$ chmod +x /etc/rc.local

From the application side, verify that you can connect to MySQL through port 3308. Repeat the above steps (except step #2) for the remaining DB nodes. Now, we have configured our backend health checks correctly. Let’s set up our MySQL load balancer as described in the next section.

Setting Up nginx as MySQL Load Balancer

1. On the load balancer node, install the required packages:

$ yum -y install pcre-devel zlib-devel

2. Install nginx 1.9 from source with TCP proxy module (--with-stream):

$ wget http://nginx.org/download/nginx-1.9.6.tar.gz
$ tar -xzf nginx-1.9.6.tar.gz
$ ./configure --with-stream

3. Add the following lines into nginx configuration file located at /usr/local/nginx/conf/nginx.conf:

stream {
      upstream stream_backend {
        zone tcp_servers 64k;
        server 192.168.55.201:3308;
        server 192.168.55.202:3308;
        server 192.168.55.203:3308;
    }

    server {
        listen 3307;
        proxy_pass stream_backend;
        proxy_connect_timeout 1s;
    }
}

4. Start nginx:

$ /usr/local/nginx/sbin/nginx

5. Verify that nginx is listening to port 3307 that we have defined. MySQL connections should be coming via this port of this node and then redirects to available backends on port 3308. Then the respective DB node will redirect it to port 3306 where MySQL is listening:

$ netstat -tulpn | grep 3307
tcp        0      0 0.0.0.0:3307            0.0.0.0:*               LISTEN      5348/nginx: master

Great. We have now set up our nginx instance as MySQL Galera Cluster load balancer. Let’s test it out!

Testing

Let’s perform some tests to verify that our Galera cluster is correctly load balanced. We performed various exercises to look at nginx and Galera cluster work in action. We performed the following actions consecutively:

Turn g1.local to read-only=ON and read_only=OFF.
Kill mysql service on g1.local and force SST when startup.
Kill the other two database nodes so g1.local will become non-primary.
Bootstrap g1.local from non-primary state.
Rejoin the other 2 nodes back to the cluster.

The screencast below contains several terminal outputs which explained as follows:

Terminal 1 (top left): iptables PREROUTING chain output
Terminal 2 (top right): MySQL error log for g1.local
Terminal 3 (middle left): Application output when connecting to nginx load balancer. It reports date, hostname, wsrep_last_committed and wsrep_local_state_comment
Terminal 4 (middle right): Output of /var/log/mysqlchk_iptables
Terminal 5 (bottom left): Output of read_only and wsrep_sst_method on g1.local
Terminal 6 (bottom right): Action console

The following asciinema recording shows the result:

Summary

Galera node health checks by a TCP load balancer was limited to HAProxy due to its ability to use a custom port for backend health checks. With this script, it’s now possible for any TCP load balancers/reverse-proxies to monitor Galera nodes correctly.

You are welcome to fork, pull request and extend the capabilities of this script.

Blog category:

Devops

Tags:

↧

Become a MySQL DBA blog series - Troubleshooting Galera cluster issues - part 2

November 19, 2015, 8:25 pm

≫ Next: Time to vote! Severalnines talks and tutorials for Percona Live 2016!

≪ Previous: nginx as Database Load Balancer for MySQL or MariaDB Galera Cluster

This is part 2 of our blog on how to troubleshoot Galera cluster - SST errors, and problems with network streaming. In part 1 of the blog, we covered issues ranging from node crashes to clusters that won’t restart, network splits and inconsistent data. Note that the issues described are all examples inspired from real life incidents in production environments.

This is the eighteenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

SST errors

State Snapshot Transfer (SST) is a mechanism intended to provision or rebuild Galera nodes. There are a couple of ways you can execute SST, including mysqldump and rsync. The most popular one, though, is to perform SST using xtrabackup - it allows the donor to stay online and makes the SST process less intrusive. Xtrabackup SST process involves streaming data over the network from the donor to the joining node. Additionally, xtrabackup needs access to MySQL to grab the InnoDB log data - it checks ‘wsrep_sst_auth’ variable for access credentials. In this chapter we are going to cover examples of problems related to SST.

Incorrect password for SST

Sometimes it may happen that the credentials for MySQL access, set in wsrep_sst_auth, are not correct. In such case SST won’t complete properly because xtrabackup, on the donor, will not be able to connect to the MySQL database to perform a backup. You may then see the following symptoms in the error log.

Let’s start with a joining node.

2015-11-13 10:40:57 30484 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 16,
        members    = 2/3 (joined/total),
        act_id     = 563464,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11
2015-11-13 10:40:57 30484 [Note] WSREP: Flow-control interval: [28, 28]
2015-11-13 10:40:57 30484 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 563464)
2015-11-13 10:40:57 30484 [Note] WSREP: State transfer required:
        Group state: 98ed75de-7c05-11e5-9743-de4abc22bd11:563464
        Local state: 00000000-0000-0000-0000-000000000000:-1

So far we have the standard SST process - a node joined the cluster and state transfer was deemed necessary.

...

2015-11-13 10:40:57 30484 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-11-13 10:40:57 30484 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --
defaults-group-suffix '' --parent '30484''''
WSREP_SST: [INFO] Streaming with xbstream (20151113 10:40:57.975)
WSREP_SST: [INFO] Using socat as streamer (20151113 10:40:57.977)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20151113 10:40:57.980)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 10:40:58.005)
2015-11-13 10:40:58 30484 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.30.4.220:4444/xtrabackup_sst//1

Xtrabackup started.

2015-11-13 10:40:58 30484 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-13 10:40:58 30484 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-11-13 10:40:58 30484 [Note] WSREP: Service thread queue flushed.
2015-11-13 10:40:58 30484 [Note] WSREP: Assign initial position for certification: 563464, protocol version: 3
2015-11-13 10:40:58 30484 [Note] WSREP: Service thread queue flushed.
2015-11-13 10:40:58 30484 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (98ed75de-7c05-11e5-9743-de4abc22bd11): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2015-11-13 10:40:58 30484 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 10:40:58 30484 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)

IST was not available due to the state of the node (in this particular case, the MySQL data directory was not available).

2015-11-13 10:40:58 30484 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing (20151113 10:40:58.430)
WSREP_SST: [INFO] Proceeding with SST (20151113 10:40:58.434)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 10:40:58.435)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 10:40:58.436)
removed ‘/var/lib/mysql/ibdata1’
removed ‘/var/lib/mysql/ib_logfile1’
removed ‘/var/lib/mysql/ib_logfile0’
removed ‘/var/lib/mysql/auto.cnf’
removed ‘/var/lib/mysql/mysql.sock’
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 10:40:58.568)

The remaining files from the data directory were removed, and the node started to wait for SST process to complete.

2015-11-13 10:41:00 30484 [Note] WSREP: (05e14925, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20151113 10:41:08.407)
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20151113 10:41:08.409)
2015-11-13 10:41:08 30484 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '30484''' : 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-11-13 10:41:08 30484 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] Aborting

2015-11-13 10:41:08 30484 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 10:41:08 30484 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.

Unfortunately, SST failed on a donor node and, as a result, Galera node aborted and the mysqld process stopped. There’s not much information here about the exact cause of the problem, we’ve been pointed to the donor, though. Let’s take a look at it’s log.

2015-11-13 10:44:33 30400 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 10:44:33 30400 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 563464)
2015-11-13 10:44:33 30400 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-13 10:44:33 30400 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464''
2015-11-13 10:44:33 30400 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20151113 10:44:33.598)
WSREP_SST: [INFO] Using socat as streamer (20151113 10:44:33.600)
WSREP_SST: [INFO] Using /tmp/tmp.jZEi7YBrNl as xtrabackup temporary directory (20151113 10:44:33.613)
WSREP_SST: [INFO] Using /tmp/tmp.wz0XmveABt as innobackupex temporary directory (20151113 10:44:33.615)
WSREP_SST: [INFO] Streaming GTID file before SST (20151113 10:44:33.619)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 10:44:33.621)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20151113 10:44:33.626)
2015-11-13 10:44:35 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 172.30.4.220 4444 (20151113 10:44:43.628)

In this part of the log, you can see some similar lines as in the joiner’s log - joiner (172.30.4.220) is a joiner and it requested state transfer from 172.30.4.156 (our donor node). Donor switched to a donor/desync state and xtrabackup has been triggered.

WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>
${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 10:44:43.631)
2015-11-13 10:44:43 30400 [Warning] Access denied for user 'root'@'localhost' (using password: YES)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 10:44:43.638)

At some point SST failed - we can see a hint about what may be the culprit of the problem. There’s also information about where we can look even further for information - innobackup.backup.log file in MySQL data directory.

WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 10:44:43.640)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 10:44:43.642)
2015-11-13 10:44:43 30400 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.s
ock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464'
2015-11-13 10:44:43 30400 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464': 22 (Invalid argument)
2015-11-13 10:44:43 30400 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464'
2015-11-13 10:44:43 30400 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 10:44:43 30400 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 563464)
2015-11-13 10:44:43 30400 [Note] WSREP: Member 1.0 (172.30.4.156) synced with group.
2015-11-13 10:44:43 30400 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 563464)

All SST-related processes terminate and, the donor gets in sync with the rest of the cluster switching to ‘Synced’ state.

Finally, let’s take a look at the innobackup.backup.log:

151113 14:00:57 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

151113 14:00:57 Connecting to MySQL server host: localhost, user: root, password: not set, port: 3306, socket: /var/lib/mysql/mysql.sock
Failed to connect to MySQL server: Access denied for user 'root'@'localhost' (using password: YES).

This time there’s nothing new - the problem is related to access issues to MySQL database - such issue should immediately point you to my.cnf and ‘wsrep_sst_auth’ variable.

Problems with network streaming

Sometimes it may happen that streaming doesn’t work correctly. It can happen because there’s a network error or maybe some of the processes involved died. Let’s check how a network error impacts the SST process.

First error log snippet comes from the joiner node.

2015-11-13 14:30:46 24948 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 14:30:46 24948 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 14:30:46 24948 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Proceeding with SST (20151113 14:30:46.352)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 14:30:46.353)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 14:30:46.356)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 14:30:46.363)
2015-11-13 14:30:48 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-11-13 14:31:01 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.156:4567
2015-11-13 14:31:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 0
2015-11-13 14:33:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 30
2015-11-13 14:35:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 60
^@2015-11-13 14:37:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 90
^@2015-11-13 14:39:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 120
^@^@2015-11-13 14:41:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 150
^@^@2015-11-13 14:43:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 180
^@^@2015-11-13 14:45:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 210
^@^@2015-11-13 14:46:30 24948 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 14:46:30 24948 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: terminating thread
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: joining thread
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: closing backend
2015-11-13 14:46:30 24948 [Note] WSREP: view(view_id(NON_PRIM,201db672,149) memb {
    201db672,0
} joined {
} left {
} partitioned {
    b7a335ea,0
    fff6c307,0
})
2015-11-13 14:46:30 24948 [Note] WSREP: view((empty))
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: closed
2015-11-13 14:46:30 24948 [Note] WSREP: /usr/sbin/mysqld: Terminated.

As you can see, it’s obvious that SST failed:

2015-11-13 14:46:30 24948 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 14:46:30 24948 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.

A few lines before that, though:

2015-11-13 14:31:01 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.156:4567
...
2015-11-13 14:45:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 210

The donor node was declared inactive - this is a very important indication of network problems. Given that Galera tried 210 times to communicate with the other node, it’s pretty clear this is not a transient error but something more serious. On the donor node, the error log contains the following entries that also make it clear something is not right with the network:

WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 14:30:56.713)
2015-11-13 14:31:02 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.220:4567
2015-11-13 14:31:03 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') reconnecting to 201db672 (tcp://172.30.4.220:4567), attempt 0

...

2015-11-13 14:45:03 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') reconnecting to 201db672 (tcp://172.30.4.220:4567), attempt 210
2015/11/13 14:46:30 socat[8243] E write(3, 0xc1a1f0, 8192): Connection timed out
2015-11-13 14:46:30 30400 [Warning] Aborted connection 76 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 14:46:30.767)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 14:46:30.769)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 14:46:30.771)

This time the disappearing node is 172.30.4.220 - the node which is joining the cluster. Finally, innobackup.backup.log looks as follows:

Warning: Using unique option prefix open_files instead of open_files_limit is deprecated and will be removed in a future release. Please use the full name instead.
151113 14:30:56 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

151113 14:30:56 Connecting to MySQL server host: localhost, user: root, password: not set, port: 3306, socket: /var/lib/mysql/mysql.sock
Using server version 5.6.26-74.0-56
innobackupex version 2.3.2 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 306a2e0)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: open files limit requested 4000000, set to 1000000
xtrabackup: using the following InnoDB configuration:
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:100M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 2
xtrabackup:   innodb_log_file_size = 536870912
xtrabackup: using O_DIRECT
151113 14:30:56 >> log scanned up to (8828922053)
xtrabackup: Generating a list of tablespaces
151113 14:30:57 [01] Streaming ./ibdata1
151113 14:30:57 >> log scanned up to (8828922053)
151113 14:30:58 >> log scanned up to (8828922053)
151113 14:30:59 [01]        ...done
151113 14:30:59 [01] Streaming ./mysql/innodb_table_stats.ibd
151113 14:30:59 >> log scanned up to (8828922053)
151113 14:31:00 >> log scanned up to (8828922053)
151113 14:31:01 >> log scanned up to (8828922053)
151113 14:31:02 >> log scanned up to (8828922053)

...

151113 14:46:27 >> log scanned up to (8828990096)
151113 14:46:28 >> log scanned up to (8828990096)
151113 14:46:29 >> log scanned up to (8828990096)
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

Pipe was broken because the network connection timed out.

One interesting question - how we can track errors when one of the components required to perform SST dies? It can be anything from xtrabackup process to socat, which is needed to stream the data. Let’s go over some examples:

2015-11-13 15:40:02 29258 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 15:40:02 29258 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [INFO] Proceeding with SST (20151113 15:40:02.497)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 15:40:02.499)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 15:40:02.499)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 15:40:02.511)
2015-11-13 15:40:04 29258 [Note] WSREP: (cd0c84cd, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015/11/13 15:40:13 socat[29525] E write(1, 0xf2d420, 8192): Broken pipe
/usr//bin/wsrep_sst_xtrabackup-v2: line 112: 29525 Exit 1                  socat -u TCP-LISTEN:4444,reuseaddr stdio
     29526 Killed                  | xbstream -x
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 1 137 (20151113 15:40:13.113)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20151113 15:40:13.116)

In the case described above, as can be clearly seen, xbstream was killed.

When socat process gets killed, joiner logs do not offer much data:

2015-11-13 15:43:05 9717 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 15:43:05 9717 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 15:43:05.438)
WSREP_SST: [INFO] Proceeding with SST (20151113 15:43:05.438)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 15:43:05.440)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 15:43:05.448)
2015-11-13 15:43:07 9717 [Note] WSREP: (3a69eae9, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 137 0 (20151113 15:43:11.442)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20151113 15:43:11.444)
2015-11-13 15:43:11 9717 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '9717''' : 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-11-13 15:43:11 9717 [ERROR] WSREP: SST script aborted with error 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] Aborting

We have just an indication that something went really wrong and that SST failed. On the donor node, things are a bit different:

WSREP_SST: [INFO] Streaming the backup to joiner at 172.30.4.220 4444 (20151113 15:43:15.875)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 15:43:15.878)
2015/11/13 15:43:15 socat[16495] E connect(3, AF=2 172.30.4.220:4444, 16): Connection refused
2015-11-13 15:43:16 30400 [Warning] Aborted connection 81 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 15:43:16.305)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 15:43:16.306)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 15:43:16.309)

We can find ‘Aborted connection’ warning and information that innobackupex (wrapper that’s used to execute xtrabackup) finished with error. Still not that much hard data. Luckily, we have also innobackup.backup.log to check:

151113 15:46:59 >> log scanned up to (8829243081)
xtrabackup: Generating a list of tablespaces
151113 15:46:59 [01] Streaming ./ibdata1
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

This time it’s much more clear - we have an error in xtrabackup_copy_datafile() function which, as you may expect, is responsible for copying files. This leads us to the conclusion that some kind of network connection issue happened between donor and joiner. Maybe not a detailed explanation but the conclusion is still valid - killing the process which is on the other end of the network connection closes that connection abruptly - something which can be described like a network issue of some kind.

We are approaching the end of this post - hopefully you enjoyed our small trip through the Galera logs. As you have probably figured out by now, it is important to check an issue from every possible angle when dealing with Galera errors. Galera nodes work together to form a cluster but they are still separate hosts connected via a network. Sometimes looking at single node’s logs is not enough to understand how the issue unfolded - you need to check every node’s point of view. This is also true for all potential issues about SST - sometimes it’s not enough to check logs on a joiner node. You also have to look into the donor’s logs - both error log and innobackup.backup.log.

The examples of issues described in this post (and the one before) only covers a small part of all possible problems, but hopefully, they’ll help you understand the troubleshooting process.

In the next post, we are going to take a closer look at pt-stalk - a tool which may help you understand what is going on with MySQL when the standard approach fails to catch the problem.

Blog category:

DB Ops

Tags:

↧

Time to vote! Severalnines talks and tutorials for Percona Live 2016!

December 11, 2015, 9:05 am

≫ Next: Planets9s: Building scalable database infrastructures with MariaDB & HAProxy

≪ Previous: Become a MySQL DBA blog series - Troubleshooting Galera cluster issues - part 2

The Percona Live Data Performance Conference (for MySQL and MongoDB users and more) is coming up in just a few months and talk submissions have been going strong judging by the social media activity.

As you might have seen from various communications, Percona are asking participants to vote upfront for the tutorials and talks that have been submitted for consideration in the conference programme.

We’ve been busy ourselves with submissions and we’d like to ask you to have a look at the content we submitted. If you like what you see and would like to find out more in Santa Clara, then please vote for your prefered Severalnines talks and/or tutorials below!

Thank you and we look forward to seeing you in April!

Tutorials

Become a MySQL DBA

This hands-on tutorial is intended to help you navigate your way through the steps that lead to becoming a MySQL DBA. We are going to talk about the most important aspects of managing MySQL infrastructure and we will be sharing best practices and tips on how to perform the most common activities.
Vote for this tutorial!

Become a Polyglot Persistence DBA with ClusterControl

This tutorial will cover the four major operational challenges when extending your infrastructure from only MySQL to deploying other storage backends: deployment, management, monitoring and scaling … and how to deal with them using Severalnines’ ClusterControl software. We will cover MySQL, PostgreSQL and MongoDB storage backends and provide a setup using virtuals where you can freely test upon.
Vote for this tutorial!

Talks

Docker and Galera: Stateful in a stateless World!

Docker is becoming more mainstream and adopted by users as a method to package and deploy self-sufficient applications in primarily stateless Linux containers. It's a great toolset on top of OS level virtualization (LXC a.k.a containers) and plays well in the world of micro services. There are a number ways to provide persistent storage in docker containers and in this presentation we will talk about how to setup a persistence data service with docker that can be teared down and brought up across hosts and containers.
Vote for this talk!

Load Balancers for MySQL: an overview and comparison of options

This session aims to give a solid grounding in load balancer technologies for MySQL and MariaDB. We will review the wide variety of open-source options available: from application connectors (php-mysqlnd, jdbc), TCP reverse proxies (HAproxy, Keepalived, Nginx) and SQL-aware load balancers (MaxScale, ProxySQL, MySQL Router), and look at what considerations you should make when assessing their suitability for your environment.
Vote for this talk!

MySQL (NDB) Cluster - Best Practices

In this session we will talk about Core architecture and Design principles of NDB Cluster APIs for data access (SQL and NoSQL interfaces) Important configuration parameters Best practices: indexing and schema We will also compare performance between MySQL Cluster 7.4 and Galera (MySQL 5.6), and how to best make use of the feature set of MySQL Cluster 7.4.
Vote for for this talk!

Performance Analysis and Auto-tuning of MySQL using DBA-Minions

In this session, we will introduce you to a new tool that can be used to develop DBA-Minions. The tool is available freely to the wider MySQL community, and a number of DBA-Minions for analysing database performance are already available to download from GitHub. At the end of the session, attendees will learn how to use these DBA-Minions, modify them, or create new ones using an integrated IDE.
Vote for this talk!

How to automate, monitor and manage your MongoDB servers

The business model of the company behind MongoDB is to sell premium support and administration tools to maintain and monitor MongoDB. As an alternative there are (open source) solutions that could make your life as a DBA easier. In this session, we will go beyond the deployment phase and show you how you can automate tasks, how to monitor a cluster and how to manage MongoDB overall.
Vote for this talk!

Polyglot Persistence for the MySQL & MongoDB DBA

The introduction of DevOps in organisations has changed the development process, and perhaps introduced some challenges. Developers, in addition to their own preferred programming languages, also have their own preference for backend storage. Extending your infrastructure from only MySQL, to deploying other storage backends like MongoDB and PostgreSQL, implies you have to also monitor, manage and scale them. This session, we will show you how!
Vote for this talk!

About the speakers

	Alex Yu, VP Products, Severalnines Alex is the VP of Products at Severalnines, responsible for all product related strategy and operations. Prior to Severalnines, Alex was Master Principal Sales Consultant at MySQL/Sun Microsystems/Oracle in the APAC region, where he worked with some of the region's largest telecoms service providers and network equipment manufactures to build massively scalable database infrastructures. He previously held key development roles in various startups, and was part of the original MySQL Cluster development team at Ericsson Alzato, which MySQL acquired 2003.
	Johan Andersson, CTO, Severalnines Johan is CTO at Severalnines, a company that enables developers to easily deploy, manage, monitor and scale highly-available MySQL clusters in the data center, in hosted environments and on the cloud. Prior to Severalnines, Johan worked at MySQL/Sun/Oracle and was the Principal Consultant and lead of the MySQL Clustering and High Availability consulting group, where he designed and implemented large-scale MySQL systems at key customers.
	Art van Scheppingen, Senior Support Engineer, Severalnines Art is a pragmatic MySQL and Database expert with over 15 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to Couchbase, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, FOSDEM) and related meetups.
	Ashraf Sharif, System Support Engineer, Severalnines Ashraf is System Support Engineer at Severalnines. He was previously involved in hosting world and LAMP stack, where he worked as principal consultant and head of support team and delivered clustering solutions for large websites in the South East Asia region. His professional interests are on system scalability and high availability.

See you all in Santa Clara!

Blog category:

Company News

Tags:

↧

Planets9s: Building scalable database infrastructures with MariaDB & HAProxy

February 25, 2016, 12:59 pm

≫ Next: Severalnines Launches #MySQLHA CrowdChat

≪ Previous: Time to vote! Severalnines talks and tutorials for Percona Live 2016!

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source databases.

Watch the replay: How to build scalable database infrastructures with MariaDB & HAProxy

You can now sign up to watch the replay of this week’s webinar with our partner WooServers - How CloudStats.me moved from MySQL to clustered MariaDB for High Availability. This webinar covered how CloudStats.me evaluated solutions from NDB Cluster to MySQL Replication with application sharding in order to scale MySQL write performance.

MySQL Replication failover: Maxscale vs MHA (Parts 1 to 3)

At present, the most commonly used products for automated failover are MySQL Master HA (aka MHA), Percona Replication Manager, but also newer options like Orchestrator and MaxScale + MariaDB Replication Manager have become available lately. In this three-parts blog series, we first focus on MySQL Master HA (MHA), in our second part we cover MaxScale + MariaDB Replication Manager and the final part compares the two with each other.

Polyglot Persistence for the MongoDB, PostgreSQL & MySQL DBA

The introduction of DevOps in organisations has changed the development process, and perhaps introduced some challenges. Developers, in addition to their own preferred programming languages, also have their own preference for backend storage.The former is often referred to as polyglot languages and the latter as polyglot persistence. This webinar covered the four major operational challenges for MySQL, MongoDB & PostgreSQL: Deployment, Management, Monitoring, Scaling and how to deal with them.

Do share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Blog category:

Tags:

↧

Severalnines Launches #MySQLHA CrowdChat

June 21, 2016, 7:50 am

≫ Next: Planets9s - How to monitor MongoDB, what to test when upgrading to MySQL 5.7 and more

≪ Previous: Planets9s: Building scalable database infrastructures with MariaDB & HAProxy

Today we launch our live CrowdChat on everything #MySQLHA!

This CrowdChat is brought to you by Severalnines and is hosted by a community of subject matter experts. CrowdChat is a community platform that works across Facebook, Twitter, and LinkedIn to allow users to discuss a topic using a specific #hashtag. This crowdchat focuses on the hashtag #MySQLHA. So if you’re a DBA, architect, CTO, or a database novice register to join and become part of the conversation!

Join this online community to interact with experts on Galera clusters. Get your questions answered and join the conversation around everything #MySQLHA.

Meet the experts

	Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic MySQL and Database expert with over 15 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to Couchbase, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, FOSDEM) and related meetups.
Krzysztof Książek is a Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
	Ashraf Sharif is a System Support Engineer at Severalnines. He has previously worked as principal consultant and head of support team and delivered clustering solutions for big websites in the South East Asia region. His professional interests focus on system scalability and high availability.
Vinay Joosery is a passionate advocate and builder of concepts and businesses around Big Data computing infrastructures. Prior to co-founding Severalnines, Vinay held the post of Vice-President EMEA at Pentaho Corporation - the Open Source BI leader. He has also held senior management roles at MySQL / Sun Microsystems / Oracle, where he headed the Global MySQL Telecoms Unit, and built the business around MySQL's High Availability and Clustering product lines. Prior to that, Vinay served as Director of Sales & Marketing at Ericsson Alzato, an Ericsson-owned venture focused on large scale real-time databases.

Tags:

↧

Planets9s - How to monitor MongoDB, what to test when upgrading to MySQL 5.7 and more

June 23, 2016, 4:01 am

≫ Next: Press Release: Severalnines kicks off online European football streaming

≪ Previous: Severalnines Launches #MySQLHA CrowdChat

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

	New webinar on how to monitor MongoDB (if you’re really a MySQL DBA) As you may know, MongoDB offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? In this new webinar on July 12th, we’ll discuss the most important ones and describe them in ordinary plain MySQL DBA language. And we’ll have a look at the (open source) solutions available for MongoDB monitoring and trending, including how to leverage ClusterControl’s MongoDB metrics, dashboards, custom alerting and other features to track and optimize the performance of your system. Sign up for the webinar
What to test before upgrading to MySQL 5.7 In this blog post, we look at some of the important things to keep in mind when preparing and performing tests for an upgrade to MySQL 5.7. As we saw in a previous post, there are some important changes between MySQL 5.6 and 5.7. Since the behaviour of some existing features been altered, some detailed testing of the upgrade is in order. This new blog post (and associated whitepaper) show you how. Read the blog
	Deploy & monitor MySQL Galera clusters on Digital Ocean with NinesControl Designed with the needs of developers in mind, NinesControl enables users to easily deploy and monitor (MySQL) Galera clusters on DigitalOcean. Droplets are launched and managed using your own DigitalOcean account. We’d love to get your feedback on this new solution if you haven’t tested it yet, so please check it out and let us know what you think. Try NinesControl
Become a PostgreSQL DBA: Provisioning & Deployment In this blog post, we address the following questions from the MySQL DBA standpoint: why use both MySQL and PostgreSQL in one environment? Is there any value in having a replicated PostgreSQL setup running alongside a Galera Cluster? We also discuss different methods of deploying PostgreSQL. Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Tags:

↧

Press Release: Severalnines kicks off online European football streaming

June 28, 2016, 2:24 am

≫ Next: Planets9s - MySQL on Docker: Building the Container Images, Monitoring MongoDB and more

≪ Previous: Planets9s - How to monitor MongoDB, what to test when upgrading to MySQL 5.7 and more

Award-winning database management platform scores deal with continent’s largest online video solutions provider

Stockholm, Sweden and anywhere else in the world - 28/06/2016 - Severalnines, Europe’s leading database performance management provider, today announced its latest customer, StreamAMG (Advanced Media Group), a UK-based pioneer in the field of bespoke online video streaming and content management. StreamAMG is Europe’s largest player in online video solutions, helping football teams such as Liverpool FC, Aston Villa, Sunderland AFC and the BBC keep fans watching from across the world.

Long hailed as the future of online content, analysts predict that 90% of all consumer internet traffic will be video by 2019. This poses a challenge to streaming providers, both in terms of the amount of online video data to handle and the variety of ways the content is consumed. Customers expect a seamless viewing experience across any device on any operating system. Downtime, lag or disturbances to streaming can have serious repercussions for customer loyalty. Streaming providers should provide a secure and reliable media platform to maintain the interest of fans and attract new viewers, casting database performance in a starring role.

Founded in 2001, StreamAMG builds bespoke solutions for its customers to host and manage online video content. Its software delivers the high-availability needed for on-demand streaming or live broadcasting on any device. Loss of customer trust and damage to brand reputation are likely consequences of database failures, especially for those companies which operate in the online sports, betting and gaming industries.

Growing at 30% year on year required StreamAMG to have a scalable IT system to meet new customer demands and to maintain its leadership position in the market. StreamAMG reviewed its database performance as part of its IT infrastructure renewal project for to encompass new online channels, such as social media, and embedding marketing analytics to help its customers better understand and react to customer behaviour. It needed a solution to monitor and optimise its database management system and the detailed metrics to predict database failures.

After reviewing options provided by Oracle and AWS, amongst others, StreamAMG chose Severalnines to help future-proof its databases. The previous environment, based on a master-slave replication topology, was replaced with a multi-master Galera Cluster; and Severalnines’ ClusterControl platform was applied to automate operational tasks and provide visibility of uptime and performance through monitoring capabilities.

Thom Holliday, Marketing Manager StreamAMG, said: “With ClusterControl in place, StreamAMG’s flagship product is now backed with a fully automated database infrastructure which allows us to ensure excellent uptime. Severalnines increased our streaming speed by 76% and this has greatly improved the delivery of content to our customers. The implementation took only two months to complete and saved us 12% in costs. Expanding the current use of ClusterControl is definitely in the pipeline and we would love to work with Severalnines to develop new features.”

Vinay Joosery, Severalnines Founder and CEO, said: “Online video streaming is growing exponentially, and audiences expect quality, relevant content and viewing experiences tailor-made for each digital platform. I’m a big football fan myself and like to stay up to date with games whenever I can. Right now I’m following the European Championships and online streaming is key so I can watch the matches wherever I am. New types of viewerships place certain requirements on modern streaming platforms to create experiences that align with consumer expectations. StreamAMG is leading the way there, and helps its customers monetise online channels through a solidly architected video platform. We’re happy to be part of this.“

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular online database configurator. Currently counting BT, Orange, Cisco, CNRS, Technicolour, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit: http://www.severalnines.com/customers

About StreamAMG

StreamAMG helps businesses manage their online video solutions, such as Hosting video, integrating platforms, monetizing content and delivering live events. Since 2001, it has enabled clients across Europe to communicate through webcasting by building online video solutions to meet their goals.

For more information visit: https://www.streamamg.com

Media Contact

Positive Marketing
Steven de Waal / Camilla Nilsson
severalnines@positivemarketing.com
0203 637 0647/0645

Tags:

↧

Planets9s - MySQL on Docker: Building the Container Images, Monitoring MongoDB and more

June 30, 2016, 1:33 am

≫ Next: How to set up read-write split in Galera Cluster using ProxySQL

≪ Previous: Press Release: Severalnines kicks off online European football streaming

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

MySQL on Docker: Building the Container Image

Building a docker image for MySQL is essential if you’d like to customize MySQL to suit your needs. In this second post of our ‘MySQL on Docker’ series, we show you two ways to build your own MySQL Docker image - changing a base image and committing, or using Dockerfile. We show you how to extend the Docker team’s MySQL image, and add Percona XtraBackup to it.

Read the blog

Sign up for our webinar on Monitoring MongoDB - Tuesday July 12th

MongoDB offers many metrics through various status overviews or commands, and as MySQL DBA, it might be a little unfamiliar ground to get started with. In this webinar on July 12th, we’ll discuss the most important ones and describe them in ordinary plain MySQL DBA language. We’ll have a look at the open source tools available for MongoDB monitoring and trending. And we’ll show you how to leverage ClusterControl’s MongoDB metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.

StreamAMG chooses ClusterControl to support its online European football streaming

This week we’re delighted to announce a new ClusterControl customer, StreamAMG (Advanced Media Group), Europe’s largest player in online video solutions, helping football teams such as Liverpool FC, Aston Villa, Sunderland AFC and the BBC keep fans watching from across the world. StreamAMG replaced its previous environment, based on a master-slave replication topology, with a multi-master Galera Cluster; and Severalnines’ ClusterControl platform was applied to automate operational tasks and provide visibility of uptime and performance through monitoring capabilities.

Read the story

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Tags:

↧

How to set up read-write split in Galera Cluster using ProxySQL

September 5, 2016, 7:02 am

≫ Next: Planets9s - Try the new ClusterControl 1.3.2 with its new deployment wizard

≪ Previous: Planets9s - MySQL on Docker: Building the Container Images, Monitoring MongoDB and more

Edited on Sep 12, 2016 to correct the description of how ProxySQL handles session variables. Many thanks to Francisco Miguel for pointing this out.

ProxySQL is becoming more and more popular as SQL-aware load balancer for MySQL and MariaDB. In previous blog posts, we covered installation of ProxySQL and its configuration in a MySQL replication environment. We’ve covered how to set up ProxySQL to perform failovers executed from ClusterControl. At that time, Galera support in ProxySQL was a bit limited - you could configure Galera Cluster and split traffic across all nodes but there was no easy way to implement read-write split of your traffic. The only way to do that was to create a daemon which would monitor Galera state and update weights of backend servers defined in ProxySQL - a much more complex task than to write a small bash script.

In one of the recent ProxySQL releases, a very important feature was added - a scheduler, which allows to execute external scripts from within ProxySQL even as often as every millisecond (well, as long as your script can execute within this time frame). This feature creates an opportunity to extend ProxySQL and implement setups which were not possible to build easily in the past due to too low granularity of the cron schedule. In this blog post, we will show you how to take advantage of this new feature and create a Galera Cluster with read-write split performed by ProxySQL.

First, we need to install and start ProxySQL:

[root@ip-172-30-4-215 ~]# wget https://github.com/sysown/proxysql/releases/download/v1.2.1/proxysql-1.2.1-1-centos7.x86_64.rpm

[root@ip-172-30-4-215 ~]# rpm -i proxysql-1.2.1-1-centos7.x86_64.rpm
[root@ip-172-30-4-215 ~]# service proxysql start
Starting ProxySQL: DONE!

Next, we need to download a script which we will use to monitor Galera status. Currently it has to be downloaded separately but in the next release of ProxySQL it should be included in the rpm. The script needs to be located in /var/lib/proxysql.

[root@ip-172-30-4-215 ~]# wget https://raw.githubusercontent.com/sysown/proxysql/master/tools/proxysql_galera_checker.sh

[root@ip-172-30-4-215 ~]# mv proxysql_galera_checker.sh /var/lib/proxysql/
[root@ip-172-30-4-215 ~]# chmod u+x /var/lib/proxysql/proxysql_galera_checker.sh

If you are not familiar with this script, you can check what arguments it accepts by running:

[root@ip-172-30-4-215 ~]# /var/lib/proxysql/proxysql_galera_checker.sh
Usage: /var/lib/proxysql/proxysql_galera_checker.sh <hostgroup_id write> [hostgroup_id read] [number writers] [writers are readers 0|1} [log_file]

As we can see, we need to pass couple of arguments - hostgroups for writers and readers, number of writers which should be active at the same time. We also need to pass information if writers can be used as readers and, finally, path to a log file.

Next, we need to connect to ProxySQL’s admin interface. For that you need to know credentials - you can find them in a configuration file, typically located in /etc/proxysql.cnf:

admin_variables=
{
        admin_credentials="admin:admin"
        mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
#       refresh_interval=2000
#       debug=true
}

Knowing the credentials and interfaces on which ProxySQL listens, we can connect to the admin interface and begin configuration.

[root@ip-172-30-4-215 ~]# mysql -P6032 -uadmin -padmin -h 127.0.0.1

First, we need to fill mysql_servers table with information about our Galera nodes. We will add them twice, to two different hostgroups. One hostgroup (with hostgroup_id of 0) will handle writes while the second hostgroup (with hostgroup_id of 1) will handle reads.

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (0, '172.30.4.238', 3306), (0, '172.30.4.184', 3306), (0, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (1, '172.30.4.238', 3306), (1, '172.30.4.184', 3306), (1, '172.30.4.67', 3306);
Query OK, 3 rows affected (0.00 sec)

Next, we need to add information about users which will be used by the application. We used a plain text password here but ProxySQL accepts also hashed passwords in MySQL format.

MySQL [(none)]> INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('sbtest', 'sbtest', 1, 0);
Query OK, 1 row affected (0.00 sec)

What’s important to keep in mind is the default_hostgroup setting - we set it to ‘0’ which means that, unless one of query rules say different, all queries will be sent to the hostgroup 0 - our writers.

At this point we need to define query rules which will handle read/write split. First, we want to match all SELECT queries:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

It is important to make sure you get the regex correctly. It is also crucial to note that we set ‘apply’ column to ‘0’. This means that our rule won’t be the final one - a query, even if it matches the regex, will be tested against next rule in the chain. You can see why we’ve done that when you look at our second rule:

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

We are looking for SELECT … FOR UPDATE queries, that’s why we couldn’t just finish checking our SELECT queries on the first rule. SELECT … FOR UPDATE should be routed to our write hostgroup, where UPDATE will happen.

Those settings will work fine if autocommit is enabled and no explicit transactions are used. If your application uses transactions, one of the methods to make them work safely in ProxySQL is to use the following set of queries:

SET autocommit=0;
BEGIN;
...

The transaction is created and it will stick to the host where it was opened. You also need to have a query rule for BEGIN, which would route it to the hostgroup for writers - in our case we leverage the fact that, by default, all queries executed as ‘sbtest’ user are routed to writers’ hostgroup (‘0’) so there’s no need to add anything.

Second method would be to enable persistent transactions for our user (transaction_persistent column in mysql_users table should be set to ‘1’).

ProxySQL’s handling of other SET statements and user-defined variables is another thing we’d like to discuss a bit here. ProxySQL works on two levels of routing. First - query rules. You need to make sure all your queries are routed accordingly to your needs. Then, connection mutiplexing - even when routed to the same host, every query you issue may in fact use a different connection to the backend. This makes things hard for session variables. Luckily, ProxySQL treats all queries containing ‘@’ character in a special way - once it detects it, it disables connection multiplexing for the duration of that session - thanks to that, we don’t have to be worried that the next query won’t know a thing about our session variable.

The only thing we need to make sure of is that we end up in the correct hostgroup before disabling connection multiplexing. To cover all cases, the ideal hostgroup in our setup would be the one with writers. This would require slight change in the way we set our query rules (you may require to run ‘DELETE FROM mysql_query_rules’ if you already added the query rules we mentioned earlier).

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '.*@.*', 0, 1);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Query OK, 1 row affected (0.00 sec)

Those two cases could become a problem in our setup but as long as you are not affected by them (or if you used the proposed workarounds), we can proceed further with configuration. We still need to setup our script to be executed from ProxySQL:

MySQL [(none)]> INSERT INTO scheduler (id, active, interval_ms, filename, arg1, arg2, arg3, arg4, arg5) VALUES (1, 1, 1000, '/var/lib/proxysql/proxysql_galera_checker.sh', 0, 1, 1, 1, '/var/lib/proxysql/proxysql_galera_checker.log');
Query OK, 1 row affected (0.01 sec)

Additionally, because of the way how Galera handles dropped nodes, we want to increase the number of attempts that ProxySQL will make before it decides a host cannot be reached.

MySQL [(none)]> SET mysql-query_retries_on_failure=10;
Query OK, 1 row affected (0.00 sec)

Finally, we need to apply all changes we made to the runtime configuration and save them to disk.

MySQL [(none)]> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; LOAD SCHEDULER TO RUNTIME; SAVE SCHEDULER TO DISK; LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 64 rows affected (0.05 sec)

Ok, let’s see how things work together. First, verify that our script works by looking at /var/lib/proxysql/proxysql_galera_checker.log:

Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.238:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.238:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 0:172.30.4.67:3306 , status OFFLINE_SOFT , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Changing server 0:172.30.4.67:3306 to status ONLINE
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.184:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:15 UTC 2016 Check server 1:172.30.4.238:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Check server 1:172.30.4.67:3306 , status ONLINE , wsrep_local_state 4
Fri Sep  2 21:43:16 UTC 2016 Number of writers online: 3 : hostgroup: 0
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.238:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Number of writers reached, disabling extra write server 0:172.30.4.67:3306 to status OFFLINE_SOFT
Fri Sep  2 21:43:16 UTC 2016 Enabling config

Looks ok. Next we can check mysql_servers table:

MySQL [(none)]> select hostgroup_id, hostname, status from mysql_servers;
+--------------+--------------+--------------+
| hostgroup_id | hostname     | status       |
+--------------+--------------+--------------+
| 0            | 172.30.4.238 | OFFLINE_SOFT |
| 0            | 172.30.4.184 | ONLINE       |
| 0            | 172.30.4.67  | OFFLINE_SOFT |
| 1            | 172.30.4.238 | ONLINE       |
| 1            | 172.30.4.184 | ONLINE       |
| 1            | 172.30.4.67  | ONLINE       |
+--------------+--------------+--------------+
6 rows in set (0.00 sec)

Again, everything looks as expected - one host is taking writes (172.30.4.184), all three are handling reads. Let’s start sysbench to generate some traffic and then we can check how ProxySQL will handle failure of the writer host.

[root@ip-172-30-4-215 ~]# while true ; do sysbench --test=/root/sysbench/sysbench/tests/db/oltp.lua --num-threads=6 --max-requests=0 --max-time=0 --mysql-host=172.30.4.215 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=6033 --oltp-tables-count=32 --report-interval=1 --oltp-skip-trx=on --oltp-read-only=off --oltp-table-size=100000  run ;done

We are going to simulate a crash by killing the mysqld process on host 172.30.4.184. This is what you’ll see on the application side:

[  45s] threads: 6, tps: 0.00, reads: 4891.00, writes: 1398.00, response time: 23.67ms (95%), errors: 0.00, reconnects:  0.00
[  46s] threads: 6, tps: 0.00, reads: 4973.00, writes: 1425.00, response time: 25.39ms (95%), errors: 0.00, reconnects:  0.00
[  47s] threads: 6, tps: 0.00, reads: 5057.99, writes: 1439.00, response time: 22.23ms (95%), errors: 0.00, reconnects:  0.00
[  48s] threads: 6, tps: 0.00, reads: 2743.96, writes: 774.99, response time: 23.26ms (95%), errors: 0.00, reconnects:  0.00
[  49s] threads: 6, tps: 0.00, reads: 0.00, writes: 1.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  50s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  51s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  52s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  53s] threads: 6, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  54s] threads: 6, tps: 0.00, reads: 1235.02, writes: 354.01, response time: 6134.76ms (95%), errors: 0.00, reconnects:  0.00
[  55s] threads: 6, tps: 0.00, reads: 5067.98, writes: 1459.00, response time: 24.95ms (95%), errors: 0.00, reconnects:  0.00
[  56s] threads: 6, tps: 0.00, reads: 5131.00, writes: 1458.00, response time: 22.07ms (95%), errors: 0.00, reconnects:  0.00
[  57s] threads: 6, tps: 0.00, reads: 4936.02, writes: 1414.00, response time: 22.37ms (95%), errors: 0.00, reconnects:  0.00
[  58s] threads: 6, tps: 0.00, reads: 4929.99, writes: 1404.00, response time: 24.79ms (95%), errors: 0.00, reconnects:  0.00

There’s a ~5 seconds break but otherwise, no error was reported. Of course, your mileage may vary - all depends on Galera settings and your application. Such feat might not be possible if you use transactions in your application.

To summarize, we showed you how to configure read-write split in Galera Cluster using ProxySQL. There are a couple of limitations due to the way the proxy works, but as long as none of them are a blocker, you can use it and benefit from other ProxySQL features like caching or query rewriting. Please also keep in mind that the script we used for setting up read-write split is just an example which comes from ProxySQL. If you’d like it to cover more complex cases, you can easily write one tailored to your needs.

Tags:

↧

Planets9s - Try the new ClusterControl 1.3.2 with its new deployment wizard

September 8, 2016, 7:23 am

≫ Next: Planets9s - Download our new ‘Database Sharding with MySQL Fabric’ whitepaper

≪ Previous: How to set up read-write split in Galera Cluster using ProxySQL

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

	Try the new ClusterControl 1.3.2 with its new deployment wizard This week we’re delighted to announce the release of ClusterControl 1.3.2, which includes the following features: a new alarm viewer and new deployment wizard for MySQL, MongoDB & PostgreSQL making it ever easier to deploy your favourite open source databases; it also includes deployment of MongoDB sharded clusters as well as MongoDB advisors. If you haven’t tried it out yet, now is the time to download this latest release and provide us with your feedback. Download the new ClusterControl
New partnership with WooServers helps start-ups challenge Google, Amazon and Microsoft In addition to announcing the new ClusterControl 1.3.2 this week, we’ve also officially entered into a new partnership with WooServers to bring ClusterControl to web hosting. WooServers is a web hosting platform, used by 5,500 businesses, such as WhiteSharkMedia and SwiftServe to host their websites and applications. With ClusterControl, WooServers makes available a managed service that includes comprehensive infrastructure automation and management of MySQL-based database clusters. The service is available on WooServers data centers, as well as on Amazon Web Services and Microsoft Azure. Find out more
	Sign up for Part 2 of our MySQL Query Tuning Trilogy: Indexing and EXPLAIN You can now sign up for Part 2 of our webinar trilogy on MySQL Query Tuning. In this follow up webinar to the one on process and tools, we’ll cover topics such as SQL tuning, indexing, the optimizer and how to leverage EXPLAIN to gain insight into execution plans. More specifically, we’ll look at how B-Tree indexes are built, indexes MyISAM vs. InnoDB, different index types such as B-Tree, Fulltext and Hash, indexing gotchas and an EXPLAIN walkthrough of a query execution plan. Sign up today
How to set up read-write split in Galera Cluster using ProxySQL ProxySQL is an SQL-aware load balancer for MySQL and MariaDB. A scheduler was recently added, making it possible to execute external scripts from within ProxySQL. In this new blog post, we’ll show you how to take advantage of this new feature to perform read-write splits on your Galera Cluster. Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Tags:

↧

Planets9s - Download our new ‘Database Sharding with MySQL Fabric’ whitepaper

September 22, 2016, 6:41 am

≫ Next: Planets9s - 9 DevOps Tips for MySQL / MariaDB Galera Cluster, MySQL Query Tuning Part 2 and more!

≪ Previous: Planets9s - Try the new ClusterControl 1.3.2 with its new deployment wizard

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Download our new whitepaper: Database Sharding with MySQL Fabric

Database systems with large data sets or high throughput applications can challenge the capacity of a single database server, and sharding is a way to address that. Spreading your database across multiple servers sounds good, but how does this work in practice?

In this whitepaper, we will have a close look at MySQL Fabric. You will learn the basics, and also learn how to migrate to a sharded environment.

Download the whitepaper

Sign up for our 9 DevOps Tips for going in production with Galera Cluster for MySQL / MariaDB webinar

Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live. In this webinar, we’ll guide you through 9 key devops tips to consider before taking Galera Cluster for MySQL / MariaDB into production.

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

Deploying a MySQL Galera Cluster with redundant load balancing can be time consuming. This blog looks at how much time it would take to do it manually, using the popular “Google university” to search for how-to’s and blogs that provide deployment steps. Or using our agentless management and automation console, ClusterControl, which supports MySQL (Oracle and Percona server), MariaDB, MongoDB (MongoDB inc. and Percona), and PostgreSQL.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Tags:

↧

Planets9s - 9 DevOps Tips for MySQL / MariaDB Galera Cluster, MySQL Query Tuning Part 2 and more!

September 29, 2016, 7:09 am

≫ Next: Planets9s - MySQL on Docker with Calico, Galera Cluster DevOps Tips, Percona Live and more

≪ Previous: Planets9s - Download our new ‘Database Sharding with MySQL Fabric’ whitepaper

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

	New webinar: 9 DevOps Tips for going in production with MySQL / MariaDB Galera Cluster In this new webinar on October 11th, Johan Andersson, CTO at Severalnines, will guide you through 9 key DevOps tips to consider before taking Galera Cluster for MySQL / MariaDB into production. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider before going live with Galera Cluster and Johan will share his 9 DevOps tips with you for a successful production environment. Sign up for the webinar
Watch the replay: MySQL Query Tuning Part 2 - Indexing and EXPLAIN You can now watch the replay of Part 2 of our webinar trilogy on MySQL Query Tuning, which covers Indexing as well as EXPLAIN, one of the most important tools in the DBA’s arsenal. Our colleague Krzysztof Książek, Senior Support Engineer at Severalnines, presents this webinar trilogy and this week he looked into answering questions such as why a given query might be slow, what the execution plan might look like, how JOINs might be processed, whether a given query is using the correct indexes, or whether it’s creating a temporary table. Find out more by watching the replay of this webinar. Watch the replay
	Download our whitepaper on Database Sharding with MySQL Fabric This new whitepaper provides a close look at database sharding with MySQL Fabric. You will learn the basics of it, and also learn how to migrate to a sharded environment. It further discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL. Download the whitepaper
Critical zero-day vulnerabilities exposed in MySQL Database security notice: you can easily upgrade your MySQL and MariaDB servers with ClusterControl and this new blog post shows you how. You must have heard about the CVE-2016-6662, the recent zero-day exploit exposed in most of MySQL and its variants. The vulnerability flaw can be exploited by a remote attacker to inject malicious settings into your my.cnf,. We advise you to upgrade as soon as possible, if you haven’t done so yet, with these easy-to-follow instructions for ClusterControl users. Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Tags:

↧

Planets9s - MySQL on Docker with Calico, Galera Cluster DevOps Tips, Percona Live and more

October 7, 2016, 5:37 am

≫ Next: Planets9s - Joining the beautiful game with Wyscout, 9 DevOps Tips for Galera Cluster replay and more

≪ Previous: Planets9s - 9 DevOps Tips for MySQL / MariaDB Galera Cluster, MySQL Query Tuning Part 2 and more!

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

	MySQL on Docker: Multi-Host Networking for MySQL Containers (Part 2 - Calico) We’re continuing with our popular blogs covering MySQL on Docker and this time we’re looking at deploying MySQL Replication on top of three Docker hosts via Calico’s driver on multi-host networking. Having previously looked at Docker’s single-host networking for MySQL containers as well as multi-host networking and Docker swarm mode, we’re now casting our eyes on other networking drivers, starting with Calico. Read the blog
Still time to sign up: 9 DevOps Tips for going in production with Galera Cluster We’re live next Tuesday, October 11th, with Johan Andersson, CTO at Severalnines, who will be sharing his 9 DevOps tips with you for a successful Galera Cluster for MySQL / MariaDB production environment. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups: Johan will guide you through all aspects to consider before going live with Galera Cluster. Sign up for the webinar
	We had plenty of 9s on deck at Percona Live Amsterdam this week If you didn’t get the chance to attend this year’s Percona Live Europe, here’s our recap of the conference with soundbites, live broadcast replays from our booth where we discussed ClusterControl and conference highlights with Severalnines CEO Vinay Joosery … we’re also putting our team in the spotlight, so that you can see some of the faces behind the 9s! Read the conference blog
Sharding MySQL with MySQL Fabric and ProxySQL There are numerous ways to migrate into MySQL Fabric, which can all be pretty challenging. The solution to this complex challenge consists of several elements: 1. MySQL Fabric with its sharding system of high availability groups and tools around it. 2. MySQL Router - it allows regular MySQL clients to connect to different high availability groups created in MySQL Fabric. 3. ProxySQL, which allows users to perform a flawless failover (from the old database to the sharded setup). This blog post and related whitepaper show you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL. Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Tags:

↧

Planets9s - Joining the beautiful game with Wyscout, 9 DevOps Tips for Galera Cluster replay and more

October 14, 2016, 5:10 am

≫ Next: Schema changes in Galera cluster for MySQL and MariaDB - how to avoid RSU locks

≪ Previous: Planets9s - MySQL on Docker with Calico, Galera Cluster DevOps Tips, Percona Live and more

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

	Watch the replay: 9 DevOps Tips for going in production with MySQL / MariaDB Galera Cluster You can now watch the replay of this week’s webinar, during which our CTO Johan Andersson walked us through his tips & tricks on important aspects to consider before going live with MySQL / MariaDB Galera Cluster. While it may be easy to deploy Galera Cluster, how it behaves under real workload, scale, and during long term operation is a slightly more complex matter. Find out about some key best practices around monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades and performing backups to be best prepared for going in production with Galera. Watch the replay
We’ve signed up football video and data platform Wyscout used by Real Madrid, Arsenal, Juventus & many more! This week we’re delighted to be joining in on the beautiful game thanks to our new customer Wyscout, the world’s leading company providing video, data and technology to football people all over the world. The Wyscout team use ClusterControl to manage the database where all of the player intelligence is stored in and their platform is used by the world’s biggest clubs including, Arsenal, Juventus and Real Madrid. Needless to say we’re excited to be part of Wyscout’s global football adventure! Read the press release
	ClusterControl Tips & Tricks - Custom graphs to monitor your MySQL, MariaDB, MongoDB and PostgreSQL systems As all of you will likely know, graphs play an important part in database management, as they’re your window onto your monitored systems. ClusterControl comes with a predefined set of graphs for you to analyze, which are designed to give you, at first glance, as much information as possible about the state of your database cluster. And as you might have your own set of metrics you’d like to monitor, ClusterControl allows you to customize the graphs available in the cluster overview section. This blog shows you how to make best use of these graphs and of our customisation features. Read the blog
Become a MongoDB DBA: How to scale reads This new post in our ‘Become a MongoDB DBA’ blog series takes us into the realm of scaling MongoDB. MongoDB offers both read- and write scaling, and we will uncover the differences of these two strategies for you. Whether to choose read- or write scaling all depends on the workload of your application: if your application tends to read more often than it writes data you will probably want to make use of the read scaling capabilities of MongoDB. This blog walks you through MongoDB read scaling. Read the blog