In the past two weeks, I have been called in for some near-emergency tuning of two rails applications that recently went into production. One was for a state-level government agency, the other was for a startup that recently went into production and found performance problems as their app grew in popularity. In both cases, the first place I looked was the innodb table settings for mysql - and in both cases, I found things that could immediately help the application in question.
I'm going to walk you through that thought process now, potentially teach you something about your own rails app, and hopefully improve the performance of your rails application.
Innodb - the background
While this advice comes from two deployed rails applications, the advice I'm giving here applies to any application using mysql and the innodb storage engine. Rails uses innodb by default.
One of the coolest things about mysql is its ability to swap in different storage engines. This is also one of the reasons mysql gets a lot of grief about 'not supporting transactions', 'not recovering well after a crash', or other nasty rumors. The MyISAM storage engine, in fact, doesn't support transactions and does have issues recovering from a crash. But rails apps typically don't use that storage engine - by default, rails apps on mysql use Innodb. If I were so inclined, I could write a storage engine for mysql that stored all text in flat files, converted to pig latin... but thats not the point - the point is that Innodb gives us everything we expect from a real database - including more knobs and dials to turn than we could experiment with in a lifetime.
Innodb - the knobs and dials
Someplace, your mysql installation has a "my.cnf" file. Typically, this is under /etc, but the exact location can vary depending on your operating system installation. In this file, we can tweak the values of various parameters that mysql uses.
There is a long list of tunable innodb parameters"; for this article I'm going to teach you about 4 of them - the 4 I found can most profoundly affect a rails app, and the 4 that most often appear in tribal lore about what the 'correct' values should be. I'm not going to tell you the correct values - I'm going to teach you how to figure out what the correct values should be based on measurements of your running application.
Innodb parameters - the 4 "Usual Suspects"
There are 4 values I like to explicitly define in a server's my.cnf file. They are:
- innodb_buffer_pool_size
- innodb_log_buffer_size
- innodb_thread_concurrency
- innodb_flush_method
Google any of those, and you'll see trite advice like "Set this to be about 80% of your server's memory" or "set this to 2x the number of processor cores your server has". That advice is Not Even Wrong... because it might even be right, but it doesn't give you a clue to answer questions like "How much memory should we put in the server?" and "how many processors should we have on that new box we are going to set up?"... worse, it might be wrong for your particular setup, because the advice was free of any particular context.
There are certainly other things worth tuning to get the most out of your setup, but if you haven't tuned anything yet, those are the first 4 that will 'take the handcuffs' off of your mysql server.
Buffer Pool Size
This is the value of Innodb's buffer pool. By default, it is ridiculously low - something like 128 Megs. Even if you were to add tons of real memory to your server, MySQL wouldn't be able to use it unless you tweak this parameter.
By increasing this value, you are telling mysql "keep the most frequently accessed stuff in memory, so you don't have to go to disk to get it when someone wants to look at it". Changing this number to something appropriate for your app will likely give you the single biggest database performance improvement you will ever see while tuning.
In a perfect world, every server would have terabytes of ram, and we'd be able to set this number to something incredibly high and never worry about it. But memory isn't free, so we have to figure out what to set this to, and whether getting more memory is 'worth it'.
Here in the first half of 2011, I typically see servers on one of two ends - on the 'low end' is a virtual server with something small, like 2 gigs of ram. On the 'high end' are dedicated mysql servers with 64 gigs of ram. You can certainly have servers with much more than that; I just don't typically see installs larger than that without a mysql guru already along for the consultant ride.
So what do we set this to? Well, it depends... Is this a dedicated mysql server, or is it also hosting the apache/nginx/passenger/ruby/rails part of the stack? Are you running memcached on the same server? Is this virtual hardware, or something real?
Advice for a stand-alone mysql server
If this is a stand alone mysql server, the answer is "give as much ram as you can afford". Assuming we can decide how much memory the box will ultimately have, lets start with as much as we can afford - something like 4, 8, or 16 gigs of ram. Subtract a reasonable amount of the operating system to run (perhaps a gigabyte), subtract a little bit more for any user-space monitoring you might run (perhaps another gigabyte), and then just a little bit more for some of the other values we are about to give mysql (maybe another 512 megs), and put the rest of your system memory towards this value.
Based on that logic, you can see where the rumor "50% to 80% of your systems memory" comes from. It also shows that less than 4 gigs is too constrained, and if you find youself in front of a 64 gigabyte monster, even 80% is too low, leaving a chunk of memory unused. Once you are in production, you should be monitoring your server's performance - and if you become performance-bound based on memory usage, your best dollars spent will be on more ram.
Advice for a full-rails-stack server
This is going to be a harder tuning job, but perhaps even more "worth it". We need to give mysql enough to get started and do its job properly, but we can quickly get into an area where it might make more sense to give extra memory to Passenger so it can run more processes for rendering, rather than give it to mysql for caching... or perhaps give it to memcached so we can cache fragments and avoid database queries altogether.
Based on that logic, your server is going to need a minimum of 4 gigabytes, and you should give half of that to this parameter, tuning aggressively to make sure you are using most of your servers memory *someplace* when your app is in production. An 8 gig server will handle most rails apps we tend to see in production... but when your traffic or data makes you hit this memory wall, the solution is easy - throw money at the problem, double the machine's ram, and tune aggressively again. When your server monitoring shows that you are processor or I/O bound instead of memory-bound, then tuning mysql isn't your best performance answer anymore.
The longer answer here will also require knowledge on tuning apache/passenger, as well as your applications use of fragment caching in memcached, since all that is mixed in the same ram profile.
Log Buffer Size
Every time we write data, mysql holds it in a buffer until it has a large enough data set to warrant an update to the innodb tables. If it were *always* writing to the disk, Mysql would be seriously write i/o constrained. We don't want the disk activity overwhelming the system, so we need to figure out a value for this log buffer size that writes frequently, but not so frequently that we are constantly writing to disk.
This value is going to change depending on your exact server setup... On a system with 5400rpm drives we might want a different value than for 7200 rpm drives. Same for IDE/SATA/SCSI hardware. If we have a slow storage area network, this value can affect our mysql performance nearly as much as the buffer pool size. Recently, I've seen a trend to build big disk arrays with SSDs; that would warrant a tuning of this value as well. But in order to have any insight into that value, we need to be able to answer the question "how often *is* this buffer being flushed to disk?
Log into your mysql server, get to a mysql> prompt, and type this:
mysql> show innodb status\G
There is a lot of information there we can use to tune other stuff, but for now, just check out the value of "log i/o's/second" under the "LOG" section. This lets us know how often we are writing to disk.
Is that value a problem? It depends... we need more context to know for sure.
Using a tool like munin, look at your disk activity for values like "Disk Latency" and "Disk throughput". You can also measure this from the command line, with tools like hdparm.
Based on those values, we can get a sense of what this value can be. Surprisingly, this value is going to look miniscule compared to the value we set above. I start with 4 megabytes and adjust up from there. We reach a point where making it bigger gains us nothing, especially depending on how we have things set for flushing buffers for things like ACID compliance and replication (but more on that below).
Tuning mythology says "don't let your server be writing out this cache more than 10 times a second". Obviously, the mythology will be too low for incredibly fast SSD setups, and too high for resource-constrained virtual machines.
Thread Concurrency
By default, mysql has this value set to 8. So if you have a dual core box that's also handling apache for you, this value is set way too high - and if you have a 16-core dedicated box, most of your cores will sit idle. Tuning mythology says "set this value to 2x the number of cpu cores your server has". I don't think thats bad advice in itself, but there are a few places where it can steer you wrong.
Several times I have actually had to play around with manually setting processor affinity on a multi-core box, to ensure that several cores handle apache/passenger, and several cores handle mysql, and then allowing other cores to 'float', depending on the exact demand. In this case, I might want to adjust this number to be 2x the number of *possible* cores.
Also, Stock mysql has issues with scheduling concurrency that can actually make performance *worse* as you add cores... Manually setting affinity can help this, as can using the Percona builds of mysql, which have performance improvements that fix this, among other improvements..
Flush Method
The innodb_flush_method parameter is a tricky one, and it has several settings that could be considered controversial. The fastest, safest option for this parameter is O_DIRECT as long as you aren't running on a storage-area-network, and ideally, you are also using a battery-based-up hardware raid card (we use a battery backed-up raid card running RAID-0 for our mysql instances at CodeSherpas). Setting this to O_DIRECT will turn off double-buffering when flushing logs, significantly speeding up disk activity.
Its worth reading the documentation and doing some performance testing based on your individual server's configuration. This only has a handful of values, so its easy to test and decide what performance level you get for the various tradeoffs.
If you can, set this to O_DIRECT, otherwise leave it alone.
Bonus #0 - flushing logs at transaction commit
This isn't a parameter I normally change for real in production, but changing it when doing performance testing can lead to other insights and clues to bottlechecks. The variable
innodb_flush_log_at_trx_commithas a default value of "1", and should remain at this level if you want the typical ACID guarantees the innodb provides... however setting it to "0" or "2" changes that flush to happen either at a time interval, or at a time determined by conventional disk I/O. This does have the risk of a loss of about 1 second of data updates in the event of a system crash, but while tuning, it can help you determine if disk I/O and log flushing is a bottleneck, and if the data you risk losing is equivalent to the comment people leave on YouTube, then the performance gain might make sense for your application.
Bonus #1 - mysql logging
As long as we are looking at performance values in the my.cnf file, I suggest that you turn on the slow query log.
log-slow-queries=/var/log/mysql/slow-queries.log long_query_time = 1
Whenever a query takes more than a second to run, it'll get logged in the file you specify (another nice thing to note about the Percona build of mysql - by default, mysql's slow query time resolution is 1 second increments... Percona changes that to milliseconds, giving you much more visibility into what you might consider 'slow').
And as long as we're logging things that are slow, add
log-queries-not-using-indexes
and that will give us visibility into things we can speed up by adding indexes to our tables.
Bonus #2 - linux 'swappiness'
As an application running under linux, mysql does its best to manage its own memory usage - and as you saw above, we are going to give it the bulk of available memory on pretty much any install we put it on. But at the same time, the linux kernel is going to do its best to look for memory that is going unused and swap it out to disk so it can be free for other things. and MySQL looks like a big target, sitting there using most of our memory.
As you can see, we can create a big mess if mysql thinks something is in memory, but linux has swapped it out - imagine the scenario where mysql simply wants to return the data it thinks is in memory - it *still* has to read it from disk, which is what we were hoping to avoid. Imagine mysql trying to clear its own memory usage - in order to free up memory it decides it isn't using, linux will have to swap something else out to disk just so the data can be brought in and freed. There should be something smarter we can do here.
First and foremost, I use munin to watch the swappiness and make sure that memory is never being swapped. Swapping kills performance on servers, no matter what is happening. We can't fix it if we don't know its happening in the first place.
Second, while I typically don't do it, Baron Schwartz from Percona, and one of the authors of the High Performance MySQL book recommends disabling swap entirely, . I don't personally disable swap, because I would rather have it as a tool the system can use if it really needs it, but monitor to ensure its not happening on a regular basis. My approach is more like that documented by Peter Zaitsev.
Bonus #3 - my "Virtualization Rant"
And finally, I'd like to rant a little bit about virtualization and 'cloud based' services with MySQL. "The Cloud" seems to be all the rage lately, especially with startups. At CodeSherpas, we host several small clients on services like WebbyNode and Linode, we use Amazon EC2 for testing and for some surge support, and we certainly use Google apps and GMail. I'm also a fan of Heroku for its ease-of-use in getting a rails app deployed with no fuss... But by the time I'm dealing with clients that have serious mysql tuning, "100,000 visitors-a-day - why is our app so slow" kinds-of issues, I like to take virtualization out of the equation - at least on the database. There are several reasons why, which probably deserve another blog post - but the simple reason in this context - Virtualization makes your server lie to you.
Seriously. The tuning we were doing above depended on us learning things about our disk I/O, number of cores, swappiness, and other various measurements from the system. When your disk I/O reports something on a virtual machine, is that the *real* answer, or is that just how long it took the VM to be happy (meanwhile your write is still in a cache on the host)? Do you *really* have that many cores to schedule something against? Might the stuff that your guest OS is reporting as 'in memory' might actually be swapped out by the host OS, which you have no visibility into? Might your carefully tuned mysql server go completely pear-shaped because another client on the same physical hardware suddenly has a spike that uses more cpu than you do? I'll probably write more on this at another time, but in the meantime, check this post by Mark Imbriaco of 37 signals, about the performance increase they saw moving away from virtualization. If you saw the movie Inception, you know what kinds of issues this can create.
Conclusion
This was a particularly long blog post, but for both clients I mentioned, this analysis and tuning took less than 20 minutes in the real world. This is just the top of the iceberg for tuning a Rails application in production; if you are interested in learning more, or seeing if we can tune your application further, you can always contact me. Some of this material is also covered in our Rails In Production training course.