This is the blog for Scout, Enterprise-grade server monitoring without the bloat.
70+ plugins, realtime charts, alerts, and Chef/Puppet-friendly configuration.

4 Simple Steps to Detect & Fix Slow Rails Requests

Posted in HowTo | Comments 11 comments

In Blink, Malcom Gladwell’s book on split-second decisions, Gladwell tells the story of how the Emergency Department at Chicago’s Cook County Hospital changed their process for diagnosing chest pain.

Dr. Brendan Reilly instituted a simple test to determine whether a patient was suffering from a heart attack. It combined just 4 questions with the results of an ECG. This simple test was 70% better than the barrage of questions previously asked by hospital staff to identify patients that weren’t having a heart attack and was nearly 20% better at identifying patients that were having heart attacks.

More information on the patient’s symptoms often led to an incorrect diagnosis. It distracted doctors from the real issues.

I’ve seen this many times with developers trying to debug performance issues in Rails applications. They look at outliers instead of the obvious culprits. It’s part of the reason I’ve never felt a need for a deep, detailed Rails monitoring application (i.e. – benchmarks from the controller to the database level on every request).

The majority of the time, our performance problems have nothing to do with the Rails framework (and we’ve worked through a lot of issues since we started building Rails apps in 2005). Why benchmark the entire request cycle when the vast majority of issues are isolated at the database layer? After I’ve ruled out the database, I can see benchmarking a single request (there’s a great free tool below), but I simply don’t want the other, often irrelevant information clouding my mind.

The root symptom we want to avoid in our apps is slow requests. Our Scout plugin for analyzing slow Rails requests has been installed nearly 250 times, so we’re not alone there.

First, it’s probably not Rails

Contrary to what you heard on the Interweb, it’s probably not Rails itself that’s making your app slow. We conducted an internal survey of the Highgroove Studios team to see where we’ve encountered performance issues and the root cause:

skitched-20080728-193641.jpg

The database layer has a huge edge on all other issues. In fact, almost all of the performance problems could have happen in any framework in any language. Issues like missing database indexes, not using joins correctly, loading too many records into memory, manipulating too many records through iteration (#map, #each), and memory leaks occur in many languages.

It’s not a bad thing to have performance issues, your web app is growing, but it’s a problem if they aren’t quickly fixed.

1. Monitor for slow web requests

First, we want to be aware of slow web requests ASAP. We use Scout’s Slow Rails Requests plugin for real-time notification of slow requests because:
  • We have a very fast release cycle, and it’s important that we’re aware of any side-effects of a new release ASAP
  • We could analyze our log files weekly, but it’s too easy to push off a task that isn’t done automatically.
  • We like knowing about it before our clients

Once this plugin is installed, we’ll quickly be alerted of slow requests. Now, lets monitor a couple of key metrics that can impact the performance of our applications.

2. Monitor Server Load

Most nix servers measure a form of server health called the Server Load. Usually, the Server Load is given in Load Averages over 3 different time periods.

Your Server’s Load is essentially a rough idea of the number of queued processes waiting for a resource to become available. This resource is generally CPU time, but could also include a number of other factors like Memory, swap space, disk, etc. A lower number is a good indicator of your overall system health and responsiveness.

The 3 averages are for the last minute, the last 5 minutes, and the last 15 minutes. Using these averages, we can see how busy your server really is.

Take a look at the “top” program’s output on this server:

We can see this server is not busy at all! In fact, this server is currently at 0.00 load on all three load averages. This is ideal, and indicates an idle server, waiting for a process to handle.

It’s common to see that when the load reaches a certain threshold (perhaps 3.0), processes can slow to a crawl and your Rails app may stop responding. We typically generate an alert through Scout’s Load Average plugin if the load exceeds 3.00.

Why

A slow web request could cause a spike in the load or it could be slow because a background job is using a lot of the CPU, a large number of requests are coming through, etc. Tracking the load helps us figure out these issues.

3. Monitor Memory Usage

On the memory-side, there are 2 things we typically monitor on our Rails setups:

  • The memory usage of our Mongrel processes & associated processes (like a Ferret server)
  • The memory usage of the system, most importantly the swap space usage.

It is important to note that as processes use resident memory, they will also increase their use of virtual memory, in step. Processes will actually appear to consume more of this “virtual memory” than the amount of actual physical memory of the system. This is perfectly normal, since most operating systems can manage in-memory paging and sharing of resources, but, when this virtual memory begins the process of “paging” to disk, using swap space to utilize the hard drive to simulate physical memory, do we experience slowness or worse – out of memory problems.

Think about it this way. If you worked in a restaurant and I gave you a big load of dishes (your processes) and 5 really fast dish-washing machines (resident / physical memory), and 5 really slow dish-washers (hard drive / swap space), you would do best to try and optimize all your dishes to be handled by the fast machines. Only when you really needed to, would you utilize those slow dish-washers, and only if you couldn’t handle all the dishes coming in.

Many Rails applications – either the apps themselves or third party libraries – suffer from memory leaks. As your server uses more and more memory, both their resident memory and virtual memory begin to grow. They begin to use the hard drive as swap space for virtual memory, which is far slower than physical memory. This can dramatically slow performance of the entire system, and thus, all requests. We generate an alert through the Process Usage Plugin if our Mongrel processes exceed a given threshold (usually around 100 MB) and if the percentage of swap space used exceeds a given threshold (usually around 60%) using the Memory Profiler Plugin.

Why

This is often an easy problem to fix: if finding the leak is hard (and it usually is), you can do a scheduled restart. If you are constantly using a lot of swap space, you probably need more memory (that’s cheap compared to development hours).

4. Fixing slow requests

So, Scout sends you an alert regarding a slow web request – now what?

Install the Query Reviewer Rails Plugin

As stated earlier, most of our performance issues are related to the database, and the Query Reviewer Plugin does a tremendous job of finding issues with MySQL and benchmarking the entire request cycle. The key feature of this plugin is that the query information is embedded directly on in the view.

The Optimization Process

We use the following process when Scout identifies a slow web request:

  1. Login to Scout and view the data across the slow Rails requests, CPU load, and memory usage plugins. If the CPU load is high, the memory usage of our Mongrel proccess are high, or the % of swap space used is unreasonably high, other issues could be impacting this slow request. We may restart our mongrel process or check on any background jobs that are running and re-run the request.
  2. Re-run the slow request in our local environment, seeing if we can replicate the issue. Make sure the MySQL query reviewer plugin is enabled.
  3. Review the information provided by the MySQL Query Reviewer plugin, massage your SQL queries. Repeat steps 2 and 3 until performance is acceptable.

Summary

We’ve seen lots of people waste time tracing the Rails stack for performance issues when the cause is usually quite simpler – look at the obvious places first before digging through the Rails stack.

Links:

Comments

  1. Stephan Schmidt said about 18 hours later:

    “The database layer has a huge edge on all other issues. In fact, almost all of the performance problems could have happen in any framework in any language.”

    Probably not. An ORM with distributed caching (Hibernate, EH Cache, Terracotta) will solve lots of your DB problems. So it IS a Rails issue, not a DB issue.

    Peace -stephan

  2. Alex Tretyakov said about 20 hours later:

    Stephan Schmidt: I think you wrong. I was J2EE developer before ROR and Hibernate even with cache taken huge amount of memory and perfomance was not good as rewriten on ROR

  3. Stephan Schmidt said about 21 hours later:

    @Alex: My numbers differ, but it’s interesting that in your case ROR was faster with fewer DB requests than Hibernate with a distributed EHCache instance.

    Peace

    -stephan

  4. DirtyDeliciousLady said about 23 hours later:

    Hooray for improperly titled graphs. It’s brilliant that they called it “Possible Causes of Slow Rails Web Requests” rather than what it really is, “Miniscule Amounts of Idle Speculation at One Little Company On Problems They Might Have Had With Rails”.

  5. Derek Haynes said about 24 hours later:

    @DirtyDeliciousLady:

    Perhaps I wasn’t clear, but I do mention directly above the graph:

    “We conducted an internal survey of the Highgroove Studios team to see where we’ve encountered performance issues and the root cause”

    This is based on a review of 27 client projects, which we felt was a large enough sample size to publish information about. There’s certainly some opinion into how each performance issue was categorized, but not enough to have a significant impact in the general numbers.

    We’d love to see how other developer numbers compare.

  6. Derek Haynes said about 24 hours later:

    @Stephan:

    Based on our experience, there are plenty of basic SQL issues that are often missed until the database grows sufficiently in size – improper indexes, the 1+N problem, etc. – and the combination of Scout + Query Reviewer Plugin makes these issues easy to resolve.

  7. Sam Smoot said 1 day later:

    I should probably just write an article on datamapper.org. ;-)

    Briefly: Yes, I think you’re focussing on the right area, but I also think the explanation and some of the conclusions are over-simplified.

    There are a lot of reasons for this. Here’s just a few:

    • ActiveRecord’s Database drivers are expensive. We’re not talking about AR itself necessarily (except mixins), we’re just talking about the work of making a query, and getting back type-casted values.
    • Ruby is slow. Seriously. It’s not a joke. It’s something that deserves a well reasoned response to how you code. c#’s method-dispatch is literally about 1000 times faster. That makes a very big difference. It means that a Ruby O/RM has to be optimized for completely different patterns than something like Hibernate. In Hibernate they can write their own query-language that allows really expressive queries, and they can optimize the results for database execution speed. You just can’t do that to the same extent in Ruby.
    • The “critical loop” in any Ruby O/RM is going to be iterating the result of a query and creating objects out of it. There’s a whole chain of method dispatches that have to happen here for every value loaded into an attribute.

    Now sure, you need to have good indexes. You need to be writing sane reporting queries where appropriate. That’s a given. It’s hard to blame the poor performance of uncached requests on that. I believe the problem is both more fundamental, and something we can overcome.

    JRuby, YARV , etc, will help marginally in response time, but more importantly in throughput. Asynchronous database drivers combined with native threads will allow your server to keep processing requests during the time right now that it’s twiddling it’s thumbs. There’s probably a four-fold increase in those two factors alone. Alternative frameworks and libraries can and will overcome some of the rest.

    Looking forward from the tools and techniques we have today, I think in the next couple of years we’ll see an increase in performance of an order-of-magnitude. It will come from not pretending Ruby’s limitations and drawbacks don’t exist, or that the playing-field with Ruby and other languages is on the level; It will come from identifying these limitations and actively pursuing techniques to overcome or workaround them entirely.

  8. Greg Jorgensen said 1 day later:

    Take the Rails database layer (Active Record) out of the picture, and maybe get someone who knows SQL to write the queries, and the database would not be the problem. Unless the tables are really huge or the schema was created by an amateur (sadly common with web apps) I promise the joins and N+1 and missing indexes are not the problem.

    Relational databases and SQL are a lot older and well understood and optimized than Rails, so fobbing off Rails performance, and problems caused by RDBMS /SQL-illiterate programmers is a little rich.

    I can’t count how many web apps I’ve worked on that were slow at the database layer, and the problems had nothing to do with the database engine or even the schema or data, but with lame code programmers had put around it, often to protect themselves from just learning how to properly use a database.

    If you can’t make a souffle after watching a couple of cooking shows on TV you probably shouldn’t blame the oven.

  9. Matt Todd said 1 day later:

    Good article, and it raises the point quite well that we can subconsciously overcomplicate things, perhaps because we are developers always wanting a good challenge. Thanks for the simple reminder to simplify, even when troubleshooting.

    @DirtyDeliciousLady: If you disagree, give us your opinion on the matter, don’t just troll.

  10. Derek Haynes said 1 day later:

    @Sam:

    I don’t disagree with your points – my concern though is that when I start optimizing the ORM layer, I get a little worried. Is this really a major bottleneck for the many Rails apps that aren’t handling a signifiant number of requests? Is it not something that a little more hardware could handle (and probably do it cheaper)?

    One of the beautiful things about Rails is that you don’t need to be a developer to build a useful web app. It’s easy to crave lots of performance benchmarks when you don’t know what you’re looking for. I’m just suggesting where to look first.

  11. Lindsay Holmwood said 1 day later:

    You might want note that load averages on Linux are calculated on the number of processes queued as well as the amount of time spent in iowait.

    Having bad IO (either through fallback drivers or dodgy hardware) can severely impact the overall system performance, which nicely cascades onto your Rails stack.

    For all your hardcore (Linux) system profiling needs you can’t go past the sysstat project. sar + pidstat are you friends.

comments powered by Disqus