David Bock's complete blog can be found at: http://blog.codesherpas.com

Items:   11 to 15 of 22   « Previous  | Next »

Wednesday, March 9, 2011

In the past two weeks, I have been called in for some near-emergency tuning of two rails applications that recently went into production. One was for a state-level government agency, the other was for a startup that recently went into production and found performance problems as their app grew in popularity. In both cases, the first place I looked was the innodb table settings for mysql - and in both cases, I found things that could immediately help the application in question.

I'm going to walk you through that thought process now, potentially teach you something about your own rails app, and hopefully improve the performance of your rails application.

Innodb - the background

While this advice comes from two deployed rails applications, the advice I'm giving here applies to any application using mysql and the innodb storage engine. Rails uses innodb by default.

One of the coolest things about mysql is its ability to swap in different storage engines. This is also one of the reasons mysql gets a lot of grief about 'not supporting transactions', 'not recovering well after a crash', or other nasty rumors. The MyISAM storage engine, in fact, doesn't support transactions and does have issues recovering from a crash. But rails apps typically don't use that storage engine - by default, rails apps on mysql use Innodb. If I were so inclined, I could write a storage engine for mysql that stored all text in flat files, converted to pig latin... but thats not the point - the point is that Innodb gives us everything we expect from a real database - including more knobs and dials to turn than we could experiment with in a lifetime.

Innodb - the knobs and dials

Someplace, your mysql installation has a "my.cnf" file. Typically, this is under /etc, but the exact location can vary depending on your operating system installation. In this file, we can tweak the values of various parameters that mysql uses.

There is a long list of tunable innodb parameters"; for this article I'm going to teach you about 4 of them - the 4 I found can most profoundly affect a rails app, and the 4 that most often appear in tribal lore about what the 'correct' values should be. I'm not going to tell you the correct values - I'm going to teach you how to figure out what the correct values should be based on measurements of your running application.

Innodb parameters - the 4 "Usual Suspects"

There are 4 values I like to explicitly define in a server's my.cnf file. They are:

  • innodb_buffer_pool_size
  • innodb_log_buffer_size
  • innodb_thread_concurrency
  • innodb_flush_method

Google any of those, and you'll see trite advice like "Set this to be about 80% of your server's memory" or "set this to 2x the number of processor cores your server has". That advice is Not Even Wrong... because it might even be right, but it doesn't give you a clue to answer questions like "How much memory should we put in the server?" and "how many processors should we have on that new box we are going to set up?"... worse, it might be wrong for your particular setup, because the advice was free of any particular context.

There are certainly other things worth tuning to get the most out of your setup, but if you haven't tuned anything yet, those are the first 4 that will 'take the handcuffs' off of your mysql server.

Buffer Pool Size

This is the value of Innodb's buffer pool. By default, it is ridiculously low - something like 128 Megs. Even if you were to add tons of real memory to your server, MySQL wouldn't be able to use it unless you tweak this parameter.

By increasing this value, you are telling mysql "keep the most frequently accessed stuff in memory, so you don't have to go to disk to get it when someone wants to look at it". Changing this number to something appropriate for your app will likely give you the single biggest database performance improvement you will ever see while tuning.

In a perfect world, every server would have terabytes of ram, and we'd be able to set this number to something incredibly high and never worry about it. But memory isn't free, so we have to figure out what to set this to, and whether getting more memory is 'worth it'.

Here in the first half of 2011, I typically see servers on one of two ends - on the 'low end' is a virtual server with something small, like 2 gigs of ram. On the 'high end' are dedicated mysql servers with 64 gigs of ram. You can certainly have servers with much more than that; I just don't typically see installs larger than that without a mysql guru already along for the consultant ride.

So what do we set this to? Well, it depends... Is this a dedicated mysql server, or is it also hosting the apache/nginx/passenger/ruby/rails part of the stack? Are you running memcached on the same server? Is this virtual hardware, or something real?

Advice for a stand-alone mysql server

If this is a stand alone mysql server, the answer is "give as much ram as you can afford". Assuming we can decide how much memory the box will ultimately have, lets start with as much as we can afford - something like 4, 8, or 16 gigs of ram. Subtract a reasonable amount of the operating system to run (perhaps a gigabyte), subtract a little bit more for any user-space monitoring you might run (perhaps another gigabyte), and then just a little bit more for some of the other values we are about to give mysql (maybe another 512 megs), and put the rest of your system memory towards this value.

Based on that logic, you can see where the rumor "50% to 80% of your systems memory" comes from. It also shows that less than 4 gigs is too constrained, and if you find youself in front of a 64 gigabyte monster, even 80% is too low, leaving a chunk of memory unused. Once you are in production, you should be monitoring your server's performance - and if you become performance-bound based on memory usage, your best dollars spent will be on more ram.

Advice for a full-rails-stack server

This is going to be a harder tuning job, but perhaps even more "worth it". We need to give mysql enough to get started and do its job properly, but we can quickly get into an area where it might make more sense to give extra memory to Passenger so it can run more processes for rendering, rather than give it to mysql for caching... or perhaps give it to memcached so we can cache fragments and avoid database queries altogether.

Based on that logic, your server is going to need a minimum of 4 gigabytes, and you should give half of that to this parameter, tuning aggressively to make sure you are using most of your servers memory *someplace* when your app is in production. An 8 gig server will handle most rails apps we tend to see in production... but when your traffic or data makes you hit this memory wall, the solution is easy - throw money at the problem, double the machine's ram, and tune aggressively again. When your server monitoring shows that you are processor or I/O bound instead of memory-bound, then tuning mysql isn't your best performance answer anymore.

The longer answer here will also require knowledge on tuning apache/passenger, as well as your applications use of fragment caching in memcached, since all that is mixed in the same ram profile.

Log Buffer Size

Every time we write data, mysql holds it in a buffer until it has a large enough data set to warrant an update to the innodb tables. If it were *always* writing to the disk, Mysql would be seriously write i/o constrained. We don't want the disk activity overwhelming the system, so we need to figure out a value for this log buffer size that writes frequently, but not so frequently that we are constantly writing to disk.

This value is going to change depending on your exact server setup... On a system with 5400rpm drives we might want a different value than for 7200 rpm drives. Same for IDE/SATA/SCSI hardware. If we have a slow storage area network, this value can affect our mysql performance nearly as much as the buffer pool size. Recently, I've seen a trend to build big disk arrays with SSDs; that would warrant a tuning of this value as well. But in order to have any insight into that value, we need to be able to answer the question "how often *is* this buffer being flushed to disk?

Log into your mysql server, get to a mysql> prompt, and type this:

mysql> show innodb status\G

There is a lot of information there we can use to tune other stuff, but for now, just check out the value of "log i/o's/second" under the "LOG" section. This lets us know how often we are writing to disk.

Is that value a problem? It depends... we need more context to know for sure.

Using a tool like munin, look at your disk activity for values like "Disk Latency" and "Disk throughput". You can also measure this from the command line, with tools like hdparm.

Based on those values, we can get a sense of what this value can be. Surprisingly, this value is going to look miniscule compared to the value we set above. I start with 4 megabytes and adjust up from there. We reach a point where making it bigger gains us nothing, especially depending on how we have things set for flushing buffers for things like ACID compliance and replication (but more on that below).

Tuning mythology says "don't let your server be writing out this cache more than 10 times a second". Obviously, the mythology will be too low for incredibly fast SSD setups, and too high for resource-constrained virtual machines.

Thread Concurrency

By default, mysql has this value set to 8. So if you have a dual core box that's also handling apache for you, this value is set way too high - and if you have a 16-core dedicated box, most of your cores will sit idle. Tuning mythology says "set this value to 2x the number of cpu cores your server has". I don't think thats bad advice in itself, but there are a few places where it can steer you wrong.

Several times I have actually had to play around with manually setting processor affinity on a multi-core box, to ensure that several cores handle apache/passenger, and several cores handle mysql, and then allowing other cores to 'float', depending on the exact demand. In this case, I might want to adjust this number to be 2x the number of *possible* cores.

Also, Stock mysql has issues with scheduling concurrency that can actually make performance *worse* as you add cores... Manually setting affinity can help this, as can using the Percona builds of mysql, which have performance improvements that fix this, among other improvements..

Flush Method

The innodb_flush_method parameter is a tricky one, and it has several settings that could be considered controversial. The fastest, safest option for this parameter is O_DIRECT as long as you aren't running on a storage-area-network, and ideally, you are also using a battery-based-up hardware raid card (we use a battery backed-up raid card running RAID-0 for our mysql instances at CodeSherpas). Setting this to O_DIRECT will turn off double-buffering when flushing logs, significantly speeding up disk activity.

Its worth reading the documentation and doing some performance testing based on your individual server's configuration. This only has a handful of values, so its easy to test and decide what performance level you get for the various tradeoffs.

If you can, set this to O_DIRECT, otherwise leave it alone.

Bonus #0 - flushing logs at transaction commit

This isn't a parameter I normally change for real in production, but changing it when doing performance testing can lead to other insights and clues to bottlechecks. The variable

innodb_flush_log_at_trx_commit
has a default value of "1", and should remain at this level if you want the typical ACID guarantees the innodb provides... however setting it to "0" or "2" changes that flush to happen either at a time interval, or at a time determined by conventional disk I/O. This does have the risk of a loss of about 1 second of data updates in the event of a system crash, but while tuning, it can help you determine if disk I/O and log flushing is a bottleneck, and if the data you risk losing is equivalent to the comment people leave on YouTube, then the performance gain might make sense for your application.

Bonus #1 - mysql logging

As long as we are looking at performance values in the my.cnf file, I suggest that you turn on the slow query log.

log-slow-queries=/var/log/mysql/slow-queries.log
long_query_time = 1

Whenever a query takes more than a second to run, it'll get logged in the file you specify (another nice thing to note about the Percona build of mysql - by default, mysql's slow query time resolution is 1 second increments... Percona changes that to milliseconds, giving you much more visibility into what you might consider 'slow').

And as long as we're logging things that are slow, add

log-queries-not-using-indexes

and that will give us visibility into things we can speed up by adding indexes to our tables.

Bonus #2 - linux 'swappiness'

As an application running under linux, mysql does its best to manage its own memory usage - and as you saw above, we are going to give it the bulk of available memory on pretty much any install we put it on. But at the same time, the linux kernel is going to do its best to look for memory that is going unused and swap it out to disk so it can be free for other things. and MySQL looks like a big target, sitting there using most of our memory.

As you can see, we can create a big mess if mysql thinks something is in memory, but linux has swapped it out - imagine the scenario where mysql simply wants to return the data it thinks is in memory - it *still* has to read it from disk, which is what we were hoping to avoid. Imagine mysql trying to clear its own memory usage - in order to free up memory it decides it isn't using, linux will have to swap something else out to disk just so the data can be brought in and freed. There should be something smarter we can do here.

First and foremost, I use munin to watch the swappiness and make sure that memory is never being swapped. Swapping kills performance on servers, no matter what is happening. We can't fix it if we don't know its happening in the first place.

Second, while I typically don't do it, Baron Schwartz from Percona, and one of the authors of the High Performance MySQL book recommends disabling swap entirely, . I don't personally disable swap, because I would rather have it as a tool the system can use if it really needs it, but monitor to ensure its not happening on a regular basis. My approach is more like that documented by Peter Zaitsev.

Bonus #3 - my "Virtualization Rant"

And finally, I'd like to rant a little bit about virtualization and 'cloud based' services with MySQL. "The Cloud" seems to be all the rage lately, especially with startups. At CodeSherpas, we host several small clients on services like WebbyNode and Linode, we use Amazon EC2 for testing and for some surge support, and we certainly use Google apps and GMail. I'm also a fan of Heroku for its ease-of-use in getting a rails app deployed with no fuss... But by the time I'm dealing with clients that have serious mysql tuning, "100,000 visitors-a-day - why is our app so slow" kinds-of issues, I like to take virtualization out of the equation - at least on the database. There are several reasons why, which probably deserve another blog post - but the simple reason in this context - Virtualization makes your server lie to you.

Seriously. The tuning we were doing above depended on us learning things about our disk I/O, number of cores, swappiness, and other various measurements from the system. When your disk I/O reports something on a virtual machine, is that the *real* answer, or is that just how long it took the VM to be happy (meanwhile your write is still in a cache on the host)? Do you *really* have that many cores to schedule something against? Might the stuff that your guest OS is reporting as 'in memory' might actually be swapped out by the host OS, which you have no visibility into? Might your carefully tuned mysql server go completely pear-shaped because another client on the same physical hardware suddenly has a spike that uses more cpu than you do? I'll probably write more on this at another time, but in the meantime, check this post by Mark Imbriaco of 37 signals, about the performance increase they saw moving away from virtualization. If you saw the movie Inception, you know what kinds of issues this can create.

Conclusion

This was a particularly long blog post, but for both clients I mentioned, this analysis and tuning took less than 20 minutes in the real world. This is just the top of the iceberg for tuning a Rails application in production; if you are interested in learning more, or seeing if we can tune your application further, you can always contact me. Some of this material is also covered in our Rails In Production training course.


Sunday, January 9, 2011

If you can't tell from my recent series of blog entries, I have spent a lot of time lately maintaining the CodeSherpas Server Farm and provisioning a few new machines for clients. I have several more entries pending, but they all rely on higher-order sysadmin techniques to keep track of all the 'noise' that a server can generate; so I thought I'd pull those out into a separate entry.

A running server generates a lot of system notifications. Tools like LogWatch, Monit, DenyHosts, LSM, etc. all send out emails when they find something. I have seen machines that send emails to the root user, to an email address set up specifically to receive them, and to the person who installed a specific tool - all at the same time. Without a unified view into the happenings of the system, an opportunity is lost and a hacker can slip through.

Technique #1 - Unify and Forward

The first part of this technique is obvious - unify all those email addresses into one destination - I typically prefer the root user on the machine itself, although I have also set up a user account specifically for this purpose. When done consistently, this is easy to maintain; The confusion over multiple email addresses only arises when the people installing tools think they need an original answer to this question. But having all those great system notifications don't do much good if they never leave the machine... do they?

The second part of this technique involves a few levels of indirection (and don't all good solutions, really?), and it uses a little-known trick of the linux-sysadmin gurus - for ".forward" file.

In the root users home directory, I create a file named ".forward". In this text file, I put an email address - your own email address would work - and now anything that gets sent to the root user simply gets forwarded to that email address.

At CodeSherpas, we use one more layer of indirection on top of that... the .forward file on our servers forwards to a special email address like "all-system-notifications@codesherpas.com" (and that isn't the real address, so don't bother spamming it). That address is simply an alias that gets rotated between the CodeSherpas - whoever is on syste duty is responsible for getting those messages.

Of course, the ultra-paranoid will tell you that an intruder can delete the .forward file, and I'll stop getting notices. Thats true - but if an intruder has gotten far enough to delete files in the root's home directory, then I can't trust any output from the machine. My trip wires should have gone off long before they have gotten that far.

Technique #2 - If it does't email, make it!

While many tools send emails as part of their normal operation, there are many that don't. thats ok - we can make them!

Take lynis, for instance. Lynis is an easy to install system checker that runs on just about every flavor of *nix out there. It generates a great report that makes system hardening recommendations, checks to make sure keys haven't expired, verifies firewall rules are in effect, and all kinds of other things (worth a blog entry or more by itself). It is a command line tool that is easy to run, but I don't want to have to remember to run it... We should never send a human to do a computers job.

I want to create a shell script that takes the output from lynix and mails it to root@localhost, and I want to set it to run once a week. This is trivial:

#!/bin/sh
(
  /root/lynis/lynis -c -Q -q
) | /bin/mail -s "Lynis Weekly Run for $HOSTNAME" root@localhost

There are two magic lines here:

/root/lynis/lynis -c -Q -q

This tells linus to do a complete system check, do it without stopping for human intervention, and only report problems.

| /bin/mail -s "Lynis Weekly Run for $HOSTNAME" root@localhost

And this line takes the output, creates a subject that includes the host name of the machine (useful when you are getting several of these a week), and email it to root@localhost (which has the .forward file, as described above).

I put that shell script in /etc/cron.weekly/lynis.sh, and voila! Every week the system checks itself and emails the current CodeSherpa sysadmin watchdog any findings.

That shell script above can easily be modified; just keep in mind that the command you put in between the parentheses should not require any human interaction, and ideally should report only problems (humans learn to ignore emails they get every day that say "all is well").


Sunday, January 2, 2011

At CodeSherpas, we have a client who deals with public school systems. This winter break is an ideal time to do a server upgrade for them, so I have spent the past few days moving their application and data to a new server we provisioned at ServerBeach, my absolute favorite hosting company.

We set the new machine up just a few days ago, and the first thing I did was install a few things to lock down the box. Given the popularity of my earlier security related blog entries, I thought I share this data with you.

This is anecdotal date, yes, but this is typical for every new server I set up. From day one, they get hammered with attacks. The output you are about to see is from a tool called LogWatch. I did nothing special to get this data - it works out of the box on pretty much every version of linux - its even set up to run every day via cron and mail this report to the root user.

I'm anonymizing some of the information in the log to protect the client's identity and the exact location of the machine - everything else is real - including the IP addresses of the attackers. Here's the scary thing - this log was generated when the server had been turned on and available for less than 24 hours. Imagine how this would look when the machine actually has a real named address, some applications, and some sensitive client data on it!

################### Logwatch 7.3 #################################
      Processing Initiated: Sat Jan  1 10:02:01 2011
      Date Range Processed: yesterday
                            ( 2010-Dec-31 )
                            Period is day.
    Detail Level of Output: 0
            Type of Output: unformatted
         Logfiles for Host: <>
##################################################################

--------------------- httpd Begin ------------------------

Requests with error response codes
  404 Not Found
     /webdav/: 6 Time(s)

---------------------- httpd End -------------------------

Even though the box isn't set up yet, there were 6 attempts to find something at http://hostname/webdav. WebDav is a file sharing protocol; There were 6 attempts to see if we were sharing files accidentally. These are basically attempts to see if we configured webdav incorrectly

--------------------- pam_unix Begin ------------------------

sshd:
  Authentication Failures:
     root (220.173.136.52): 859 Time(s)
     root (mail.successcolaire.ca): 473 Time(s)
     unknown (211.237.24.224): 35 Time(s)
     root (200.201.195.94): 20 Time(s)
     root (ns1.embdhaka.org): 19 Time(s)
     root (3c.1d.344a.static.theplanet.com): 17 Time(s)
     root (211.237.24.224): 10 Time(s)
     unknown (ns1.embdhaka.org): 8 Time(s)
     unknown (mail.successcolaire.ca): 5 Time(s)
     unknown (95.172.35.14): 3 Time(s)
     postgres (ns1.embdhaka.org): 2 Time(s)
     ftp (ns1.embdhaka.org): 1 Time(s)
     mysql (211.237.24.224): 1 Time(s)
     nobody (211.237.24.224): 1 Time(s)
     postgres (211.237.24.224): 1 Time(s)
     unknown (220.173.136.52): 1 Time(s)
     unknown (3c.1d.344a.static.theplanet.com): 1 Time(s)
  Invalid Users:
     Unknown Account: 53 Time(s)

su-l:
  Unknown Entries:
     session closed for user root: 1 Time(s)
     session opened for user root by bokmann(uid=500): 1 Time(s)

---------------------- pam_unix End -------------------------

There were hundreds of attempts to log in as root, as well as attempts to log in as common usernames on the machine. If we had something improperly configured (for instance, the root password set to something like 'abc123', this box would be compromised. For this reason, we also don't allow common usernames on the machine, and we also set process accounts (like mysql and postgres) so they cannot actually be logged in to.

The bottom section identifies my 'bokmann' logging in and switching to root; and this is the only time that will happen (and I changed that name here so no one knows my real account name). Since this was the initial configuration of the box, I had to set up some stuff, change the root password, and so on. From now on, all root-level access comes from a person's named account and is logged, thanks to a toold called 'sudo'

--------------------- SSHD Begin ------------------------

Failed logins from:
  72.55.184.12 (mail.successcolaire.ca): 473 times
  74.52.29.60 (3c.1d.344a.static.theplanet.com): 17 times
  116.212.186.38 (ns1.embdhaka.org): 22 times
  200.201.195.94 (94.hostfree.colocation.matrix.com.br): 20 times
  211.237.24.224: 13 times
  220.173.136.52: 859 times

Illegal users from:
  72.55.184.12 (mail.successcolaire.ca): 5 times
  74.52.29.60 (3c.1d.344a.static.theplanet.com): 1 time
  95.172.35.14 (static.mega.nn.ru): 3 times
  116.212.186.38 (ns1.embdhaka.org): 8 times
  211.237.24.224: 35 times
  220.173.136.52: 1 time

Users logging in through sshd:
  bokmann:
     24.127.60.199 (c-24-127-60-199.hsd1.va.comcast.net): 1 time


Received disconnect:
  11: Bye Bye : 1457 Time(s)

**Unmatched Entries**
error retrieving information about user smbguest : 1 time(s)
error retrieving information about user file : 1 time(s)
error retrieving information about user fernando : 2 time(s)
error retrieving information about user spdu : 1 time(s)
error retrieving information about user tempuser : 1 time(s)
error retrieving information about user info : 2 time(s)
error retrieving information about user mike : 1 time(s)
error retrieving information about user guest2 : 1 time(s)
error retrieving information about user postmaster : 1 time(s)
error retrieving information about user kiosk : 1 time(s)
error retrieving information about user rott : 1 time(s)
error retrieving information about user temp : 1 time(s)
error retrieving information about user postgresql : 1 time(s)
error retrieving information about user linux : 1 time(s)
error retrieving information about user admin : 5 time(s)
error retrieving information about user smb : 1 time(s)
error retrieving information about user oracle : 3 time(s)
error retrieving information about user tmp : 1 time(s)
error retrieving information about user demouser : 1 time(s)
error retrieving information about user qwerty : 1 time(s)
error retrieving information about user user2 : 1 time(s)
error retrieving information about user user1 : 1 time(s)
error retrieving information about user sql : 1 time(s)
error retrieving information about user WinD3str0y : 1 time(s)
error retrieving information about user www-data : 1 time(s)
reverse mapping checking getaddrinfo for
94.hostfree.colocation.matrix.com.br failed
- POSSIBLE BREAK-IN ATTEMPT! : 20 time(s)
error retrieving information about user www : 1 time(s)
error retrieving information about user sun : 1 time(s)
error retrieving information about user r00t : 1 time(s)
error retrieving information about user uss : 1 time(s)
error retrieving information about user fluffy : 1 time(s)
taddrinfo for static.mega.nn.ru failed - POSSIBLE BREAK-IN ATTEMPT! : 3 time(s)
error retrieving information about user xs : 1 time(s)
error retrieving information about user user : 1 time(s)
error retrieving information about user spd : 1 time(s)
error retrieving information about user dan : 1 time(s)
error retrieving information about user guest : 1 time(s)
error retrieving information about user foc : 1 time(s)
error retrieving information about user tmpuser : 1 time(s)
error retrieving information about user bret : 1 time(s)
error retrieving information about user os : 1 time(s)
error retrieving information about user antonio : 1 time(s)
error retrieving information about user hadoop : 1 time(s)
error retrieving information about user us : 1 time(s)
error retrieving information about user guest1 : 1 time(s)
error retrieving information about user dany : 1 time(s)
error retrieving information about user glassfish : 1 time(s)

---------------------- SSHD End -------------------------

And here we see some of the same data from the other section - the failed login attempts; but this time from the actual tool used to log in, ssh.

Notice those two ip addresses tried to log in and failed 473 and 859 times? That isn't a human - someone sysadmins affectionately call a script kiddie is running a program that tries hundreds of passwords in an attempt to break in.

We also see my one successful login - when I started to set up this box. If I saw unsuccessful logins under my name, or more successful logins than I remember, that would be a problem.

--------------------- Disk Space Begin ------------------------

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/SysVolGroup-LogVolRoot
                     450G  2.6G  425G   1% /
/dev/sda1             122M  8.3M  107M   8% /boot


---------------------- Disk Space End -------------------------

###################### Logwatch End #########################

Nothing much to see in this last section - its a brand new box and the disk is pretty empty. As I secure the box I'll probably partition that into several different pieces, setting some to be read-only, some so that they can't hold executable programs, and so on.

You might be thinking something like "859 failed login attempts? Couldn't you block them after the first 5 or so?"... Yes, and someday soon I'll write a blog entry about denyhosts, which does exactly that. But unless your sysadmin installed it, it isn't protecting you, and your servers are being hammered like this right now.

We get a report like this every day from every one of our servers. We have a rotating 'sysadmin' hat, and if its your turn, your morning starts with a quick scan, just to identify nothing is out of the ordinary (disk space hasn't increased, no successful logins from strangely named accounts, etc). This log isn't our first line of defense - its just a sanity check. Its also a little bit of 'forensic evidence' if we do later find an intrusion. I'm proud to say that CodeSherpas has never had an intrusion into a server we administer.


Friday, December 31, 2010

Happy 2011. I can't think of a better time to get people thinking about timezones on their servers.

I have a passionate belief that all servers should run in the UTC timezone, and that all system admins should be comfortable thinking about time based on UTC, NOT just on their local time zone. You can bend your personal time perspective by checking out this awesome web app by Amy Hoy and Thomas Fuchs.

Wanna set your linux server to UTC? Its trivial.

  • Type 'date' from a command-line, just to see where we are already.
  • If you are running NTP, stop it (you really should run NTP, which I'll cover in a future entry)
    service htpd stop
    
  • Lets set your time zone to UTC. ZoneInfo should be in your basic linux install, so we just need to get rid of the old time zone and in with the new:
    rm -rf /etc/localtime
    ln -s /usr/share/zoneinfo/UTC /etc/localtime
    
  • Here's a pro-tip: your system has a hardware clock in addition to the 'software' clock your operating system is keeping. You need to set *it* to run in UTC as well, otherwise some services (mysql among them) can get dates wrong, and dates/times can get really squirrly if your server is rebooted.
    hwclock --systohc
    
  • If you are running NTP, lets update the clock and start the service back up:
    ntpdate us.pool.ntp.org
    service ntpd start
    
  • Happy New Year!


Wednesday, December 29, 2010

I occasionally get to help debug a problem on a production linux server, and I'm surprised how often I see config directories littered with stuff like:

/etc/httpd/conf.d
  httpd.conf
  httpd.orig
  httpd.conf.old
  httpd.conf.previous
  httpd.DO_NOT_USE
  httpd_cache_experiment
  httpd.memory_hog

Of course, its easy to know which one is in use, but what are all these files? Why weren't they deleted? Is there useful knowlege here? Which configuration file 'won' and is actually the contents of the production file? Is there a better way to deal with this kind of thing?

Of course there is. There are two techniques I use to clean up this kind of stuff, and I want to share them with you.

Version Control your /etc directory with git

Git is a version control tool that is commonly used in software development; it can save the day here too. In short, it creates a database of all of the files, and lets you compare previous versions, roll back to previous versions, store metadata like comments, date change, and person editing the file, and all kinds of other neat things. Teaching git is a large subject, but I want to show you how easy it can be to get started. Once git is installed on your machine, starting is as simple as these commands (as a user who has permissions to read and write /etc):

cd /etc
git init
git add .
git commit -m "initial snapshot of existing etc configuration"

now your config is stored away in a hidden /etc/.git directory.

Lets say you edit your httpd.conf file to add the apache module. After you are done, you'd simply do this:

git add httpd.conf
git commit -m "Modified to include mod_security.  This resolves issue #1148 in our help desk ticketing system"

not only is the new version tucked away, there is a nice comment that says why it was modified, we know who did it, we know when they did it, and we can see the exact contents of the change (git diff HEAD HEAD~1 will do that. git log will show the history of the file).

Now that we have versioning, we can get rid of the files named

httpd.orig
httpd.conf.old
httpd.conf.previous

as they were all about keeping around older versions, presumably so we could roll back if there was a problem. Git does a better job of versioning than cluttered, conflicting filenames.

Symlink to a Variant

The other versions of this file may have been about experiments, or variants for different configurations, as their names confess. Rather than have them clutter our config directory though, lets create a 'varia' directory in /etc/httpd/conf.d (the naming convention makes my high school latin feel relevant. Humor me).

mkdir /etc/httpd/conf.d/varia

and we'll move all the other files under there, following a naming convention that shows what the variant is:

/etc/httpd/conf.d/varia
  httpd.master
  httpd.DO_NOT_USE
  httpd.cache_experiment
  httpd.memory_hog

Ideally we would also create a short preamble comment at the top of each file, so future sysadmins have a clue what they are about.

now, we just use a symlink to point to the config file we currently want to use.

ln -s ./varia/httpd.master httpd.conf

Now, whenever we want to change configurations, we just update the symlink. Why is this better?

  • our conf directory isn't littered with unused files
  • The files, names, and comments document intent for the variant configurations
  • we never have to edit the live config file - edit a variant instead
  • we know the intent of the current configuration by looking at the symlink
  • We can automate changing configurations by having cron switch symlinks (perhaps we have a night time variant that uses less memory, freeing resources for a big report generation every Sunday evening)

Of course, all of this stuff gets put in git as well.

There is one more benefit to git I didn't mention above - with git, it can become easy to push those configuration changes across multiple servers. If you make a change to, say, your sudoers file, git's "push" and "pull" mechanism can distribute that change among git repositories on different machines easily.

I hope your server's /etc directory isn't a mess of versions and variants cluttering the real configuration of your server. If it is, I hope this inspires you to clean it up.