With a big increase in the number of customers on Testpad this Spring, we've upgraded our servers over the weekend. This engineering-focused post talks about how we did this with zero downtime.
his Spring has seen a signficant increase in traffic due to a growing customer base, to the point of prompting server upgrades that were just completed this weekend. The new servers have increased Testpad's capacity and reduced page loading times; with the upgrade process itself proving the convenience of technologies like MongoDB for seamless database replication and Amazon EC2 for command-line control of servers archicture.
Whilst Testpad's customers have been busy, Testpad has been quieter on the development front. However, with these latest upgrades, and further improvements around the corner, it's set to be a busy Summer. In the short term, look for improvements to tag filtering, followed later on with the publication of much needed user guides and documentation.
The rest of this post has a strong engineering bias. Read on if you're curious about Testpad's hosting, or get back to your testing, which should now be snappier than ever!
The upgrade saw the main application servers moving from Amazon's older generation of servers (m1) to their latest (m3) servers. As well as newer CPU architecture, the m3 servers come with SSD storage, providing a big increase in database IO performance.
Thanks to the convenience of the Amazon EC2 management interfaces, and robustness of MongoDB's replication technology, these upgrades were performed with zero downtime.
Just to be safe, the upgrade was timed for the weekend when traffic is quieter than during the week, but nevertheless, the logs showed continuous usage before, during and after the upgrade, with no interruption to service.
Broadly, the upgrades were achieved by modifying one database (MongoDB) replica and application server at a time. Once a database replica is shutdown, the live service is maintained by the surviving replicas, thus freeing that server for maintenance. As all Testpad instances have their Linux root partition on an EBS volume, the upgrade itself is as simple as using AmazonEC2 controls to Stop the instance, edit the instance type, and Start it again. When the instance boots up, it is now running on bigger and better hardware! The database folder is then moved from its previous mount point on an EBS volume to its new home on the SSD drive. Once restarted, the MongoDB replica contacts the other replicas and catches up with all the changes that happened during that server's upgrade. Seamless.
Readers familiar with AWS might spot that m3 instance SSD drives are 'ephemeral' devices that do not persist if the instance is stopped or otherwise lost. Testpad therefore still runs replicas with the data volume stored on EBS drives, which continues to enable Testpad's backup strategy of hourly snapshots and archive to AmazonS3.
Please get in touch if you have any feedback or would like to know more about the details of this upgrade - just email stef@ontestpad.com.