MySQL Upgrade Insane: Using Xtrabackup for migrating clusters from 5.5 to 5.6

Well, it sounds insane and impossible that we use xtrabackup for mysql upgrading from percona 5.5 to mysql 5.6. But we just did it though there exists all kinds of trade off, for instance we have disabled most of new features -- GTID, etc.


Because our instance is of large size, average is 500 giga bytes as well as the huge number of it. What is worse, we were pushed for upgrading in a very quick time -- 4 days up to hundreds of instances.

Why possible?

The essence of our approach is the same as:

1) shutdown the instance
2) replacing mysqld  binary and other relavant files
3) start the instance
4) use mysql_upgrade

The final result shows this method works ok.

Be CAREFULLY with following:

1. differences on variables and configs between 5.5 and 5.6. Some variable and configurations is incompatible.
2. you must restart the instance again after completion of mysql upgrade operation. Otherwise you may find the instance cannot start slave when in replication.

Some Tips on Database Backup and Restore

  • Verifying the restore-able ability is very important.
  • Checksum the data integrity of master and its replica is also necessary.

For the mysql backup made by xtrabackup, the only way for testing  the  capability of restore is truly performing restore operation once:  

1. download this backup if you upload it somewhere; 
2. apply log; 
3. change master to the original master if you use a replica for backup.

Why testing ?

Xtrabackup cannot gain our fully trust for several reasons. 

It is very likely to get a invalid backup when there is ddl operation or some kind of huge transaction if you use an relative old version of xtrabackup, say 2.0.6.

We use xtrabackup stream feature therefore maybe the backup is already corrupted when uploading the backup server.  The backup may fail at prepare stage and shows log sequence number is incorrect.

If you enable parallel replication feature in mysql 5.6, you may encounter one xtrabackup bug for getting the incorrect coordinate position of master.  We can trigger this bug very easily. The root cause for this is when enabling the parallel replication, innobackupex cannot get the correct coordinate if it does not stop slave sql thread before flushing table.  The fix is simple: stop sql thread before flushing table and start it again after it.

For large scale mysql servers, you may want to learn more about  facebook team's work which optimizes the mysqldump up to 10x speed.