Scaling is hard. If you are fortunate enough to work on a Rails application that gets a significant amount of traffic, you will eventually have to address the need to scale. The idea of microservices and Service Oriented Architecture (SOA) has become a popular methodology in the Ruby community, but there’s a big difference between talking about it and implementing it.
I’ve read tons of great blog posts and watched dozens of awesome talks on this subject, but one thing that usually seems to be missing from the SOA discussion is an approach to handling the database. Carving your monolithic application into a series of lightweight services is probably going to be an incremental affair. You need to balance ongoing feature development and business needs against your desire to take a refactoring axe to your codebase.
If you’re taking the smallest possible first step, you’re trying to find the seams in your code in order to identify and break off that first service. What about the data? The SOA dream means that most services will eventually have their own databases, and other services that need that data will have to talk to the service that owns it. Getting there is going to be hard. If you’re a traditional Rails shop, you’re likely dealing with a single, massive SQL instance (MySQL in my case). Up until now, “scaling” the database has meant throwing more hardware at the problem. So, do we shard? Great idea! Only now you’ve introduced a significant amount of complexity and operations overhead into what was supposed to be a small, incremental step in your scaling journey. Scope creep, ahoy! Maybe there’s another way.
One thing you can easily do for your database if you haven’t already is set up replication. This is great for backups and durability, but it would be nice if we could take advantage of replication by distributing database reads from Rails across all of the replication nodes. This is harder than it seems at first glance, and a myriad of problems must be addressed: What happens if you attempt to read data from a slave that hasn’t been replicated yet? What happens if one or more slave nodes go down? How do you deal with reading data immediately after it’s been written?
Fortunately, the folks over at TaskRabbit have been working on this. They’re developing a library called Makara which makes it easy to distribute SQL reads across multiple slave servers while simultaneously addressing all of the previously mentioned problems. Makara is designed for any Ruby application, but comes with a very handy set of ActiveRecord adapters for plugging in to Rails.
If you’re anything like those of us over at Optoro, we tend to be picky about introducing new dependencies into our already bloated Gemfile. Additionally, the idea of blindly embracing something as critical as an ActiveRecord adapter from a pre-1.0 library across the entire application is very scary. So, we wanted a chance to evaluate Makara by choosing the parts of our application where we specifically wanted to distribute reads to our replication slave servers. Unfortunately, Makara doesn’t make this easy by default. It’s designed to either be on or off with very little flexibility in between. Typically, if you force Makara to read or write to your master SQL instance any time in the context of a request, it’s going to “stick” to that master for any subsequent reads for the remainder of the request. This is a good thing, but it also means you can’t choose the parts of the request where you want to distribute your reads.
We wanted to be able to do something like this…
1 2 3 4 5 6 7 8 9 10 11
Furthermore, we didn’t want to introduce the overhead of establishing, disconnecting, and reestablishing ActiveRecord connections within our requests. Just use the one already defined connection (Makara maintains separate pools for the various nodes), and within a block distribute the reads if it’s appropriate to do so. I was able to accomplish this by subclassing and extending the MySQL ActiveRecord adapter that comes with Makara.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
By default, the Makara adapter returns false for #needs_master? if the pending SQL statement is interpreted as a read operation. I wanted it to return true all the time unless we’re operating in “distributed” mode. Unfortunately, there was still one problem. The recommended Makara configuration is to set your adapter to “sticky” mode, which means that once any SQL operation hits a particular node within a specific context, it will continue to use that node until the context changes. This is a good thing because replication across different nodes may happen at different times. You don’t want to read data from one node, only to find on a subsequent read (from a different node) that the data doesn’t exist. For our purposes, the downside to this is that every request starts out by reading (and writing) to the master node. So, by the time we enter a distributed_mode block, the request is always “stuck” to the master node. Therefore, I made a #with_new_context method that resets the Makara context only for the duration of the given block, and resets it afterwards. This will give the request a chance to hit a slave node, and subsequently become “stuck” to whatever node it ends up with. Then, when the block ends, the context is reset to what it was before the operation. The previous context is always the one that was originally stuck to the master node. It’s important to note that the context handling for Makara uses class methods and singletons which essentially means the entire library is not threadsafe. This isn’t a problem for us at Optoro because we use a forking server model (Unicorn).
Finally, I needed a method that takes a block and uses the adapter’s #with_new_context method…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
I wanted application-wide access to the method, so sticking it on ActiveRecord::Base seemed like the way to go. I’m not in love with the idea of monkey patching ActiveRecord::Base, but it gets the job done for the time being until I can come up with a better implementation. The upside of this implementation is that it falls back to normal behavior even if we’re not using our custom ActiveRecord adapter. This means I can use ActiveRecord::Base.execute_distribued wherever I want, and if it just so happens that we’re not using the Makara adapter (development mode, A/B testing production, etc.), nothing will break. If you use this approach, you’ll always need to require the adapter even if you’re not using it.
That’s all there is to it! 60-something lines of code and one new gem, and now we have the ability to distribute reads across any number of MySQL slave nodes. Depending on how widely you use distributed mode, this has the potential to greatly reduce the load on your MySQL master node. Additionally, it will buy you some breathing room for a strained database. Most importantly, it’s a small, incremental step toward scalability that doesn’t require you to make huge, sweeping ops changes.
Make sure you check out the Makara documentation for details on configuring the gem for your particular needs.
If you enjoyed this post, please consider subscribing.