Your high-powered server is suddenly running dog slow, and you need to remember the troubleshooting steps again. Bookmark this page for a ready reminder the next time you need to diagnose a slow server.
Get on "top" of it
Linux's top command provides a wealth of troubleshooting information, but you have to know what you're looking for. Reference this diagram as you go through the steps below:
We take pride in building a server monitoring product our customers love with a lean, flat team. We're looking to add the fourth human to our close-knit group.
So, what's special about being a Ruby dev @ Scout?
First, great people! Second, great tech: come build beautiful realtime monitoring visualizations in d3 and Ruby. There won't be an LDAP integration in sight, we promise. Third, you'll have a tremendous impact as developer #2.
Beyond the technical chops, the single most important thing is your initiative. Will you dive into a problem unprompted? Point out problems and give suggestions on fixing it? Given a high-level goal, can you break it actionable chunks, ask for help when you need it, and see everything through to completion? We're a flat organization, and we won't micro-manage your work.
Competitive salary, health care reimbursement, and unlimited vacation time.
We'll consider great remote candidates, but we'd love for you to join us in Fort Collins, Colorado.
A few things about Fort Collins: best place to live (Money Magazine), ranked 3rd on the Best Bicycle Cities list, one of the Ten Best Vacation Cities for Beer Lovers, and 300 days of sunshine! Our office is located minutes from Old Town, the heart of Fort Collins.
How to apply
Email us at email@example.com. Resumes are fine, but a more personal email is better.
A big thanks to Eric Lindvall of Papertrail for adding steal time to Scout's CPU Usage Plugin and helping out on this blog post!
Netflix tracks CPU Steal Time closely. In fact, if steal time exceeds their chosen threshold, they shut down the virtual machine and restart on a different physical server.
If you deploy to a virtualized environment (for example, Amazon EC2), steal time is a metric you'll want to watch. If this number is high, performance can suffer significantly. What is steal time? What causes high steal time? When should you be worried (and what should you do)?
A couple of years ago I visited Argentina. I have trouble enough pronouncing my limited English vocabulary and I don't speak Spanish, but after a bit of time, it was pretty easy to order food, buy groceries, and use a taxi. However, occasional hangups that happen during my regular life in the states would throw me out of sorts in Spanish: a taxi driver trying to explain he doesn't have enough change would send me off the rails.
Ruby is my English when it comes to writing software, so when I hit hangups installing something Ruby-related, I can usually work my way out of them. Our monitoring agent at Scout is a Ruby gem, and while most of our customers already have Ruby installed, for those that don't a seemingly small hangup to me can be frustrating for them.
Now, thanks to Omnibus, there's an easy way to distribute your Ruby gems as standalone, full-stack program. This means folks without Ruby can have as smooth of an experience with your hip new gem as a hardened Rubyist.
Here's how I've built a full-stack installer for our scout Ruby Gem.
Back in 2010, we suggested using /bin/bash -l -c to run scout via Cron when using RVM. However, this was a brute approach: /bin/bash -l -c tells bash to behave as a login, interactive process. However, as Daniel Szmulewicz elequently stated in the comments for the original blog post, "Cron jobs are by nature non-login, non-interactive processes".
Fast-forward to today: RVM usage is continuing in production, and to make things more complicated, Cron jobs often need to account for both RVM and Bundler. So, what's our preferred approach when running Ruby executables via Cron in an RVM, RVM+Bundler, or Bundler environment? A shell script.
Cron Shell Script: RVM + Bundler
Lets say we want to run a Ruby executable (scout [KEY]) via Cron with (1) Ruby 1.9.2 and (2) my Rails App's Gem bundle:
Make the shell script executable: chmod +x FILE.sh.
Add the Cron job:
* * * * * shell_script.sh
But that's a lot of typing...
It's tempting to use /bin/bash -l -c when you are busy/lazy. To get around this, the scout install [KEY] command will detect if you are using (1) RVM and/or (2) Bundler. If so, we generate the shell script for you and make it executable.
scout install BNrIneEBMwE8h6VlhO4Bw4WmOVSLmnygSFZEPCfi
=== Scout Installation Wizard ===
It looks like you've installed Scout under RVM and/or Bundler.
We've generated a shell script for you.
Run `crontab -e`, pasting the line below into your Crontab file:
* * * * * /Users/dlite/.scout/scout_cron.sh
How do we detect RVM and Bundler? We've encapsulated it into an Environment class:
Scout’s realtime charts have been a big hit. Once you start using them for major deploys or performance incidents, going back to ten terminal windows running “top” feels like the dark ages.
So, how did we go about it?
To inspire hard work, some young men hang a poster on their wall that includes: (1) an exotic sports car (2) a scantly clad lady and (3) a beach house. My inspirational poster would be much less attractive: a friendly butler who offers time-honored wisdom (with an accent because people with accents are smarter) and absolutely loves running errands for me.
I don’t like running errands because I don’t like waiting in lines. My nightmare: having to pickup groceries during a busy weekend afternoon. There are 3 queues at the grocery store that can cause a delay:
- Finding a parking spot
- Getting a shopping cart
- Checking out
Modern web apps face the same queuing issues serving web requests under heavy traffic. For example, a web request served by Scout passes through several queues:
That’s Apache (for SSL processing) to HAProxy on the load balancer, then Apache to Passenger to the Rails app on a web server.
A request can get stuck in any of those five spots. The worst part about queues? Time in queue is easy to miss. Most of the time, people look at the application log when they suspect a slowdown. However, a slowdown in any of the four earlier queues won’t show up in your application log. Just looking at your application and database activity for slowdowns is like recording the time it takes to get your groceries from the time you grab the first item on the shelf till you start waiting to checkout: you’re leaving out the time it takes to find a parking spot, get a cart, and checkout.
Now, before you start worrying about queues, take a deep breath. First, each of these systems are super reliable. For the most part, they just work. Second, it’s much more likely your application logic is the cause of a performance issue than a queuing problem. Look there first.
Third (and most importantly), each of these systems handles queues in remarkably similar ways. Understanding some basic queuing concepts will go a long way. Let’s take a look at some basics and then specific examples for Apache, HAProxy, and Passenger.
A big part of providing good support is making it painless. At Scout, Andre and I handle all of the support requests. Once we’ve gathered the account information, it usually doesn’t take much time to help. The problem is quickly putting the account information together. We don’t want to use a dedicated support application – we usually handle just a couple of support requests per-day.
Why not view all of the account information right from Gmail, where the support request originates? We’re using Rapportive with a custom Raplet to make it happen. When we receive an email from a Scout customer, we see their Scout account info.
You maintain a growing Rails application and you’re seeing something peculiar. Sometimes when you use the application, it feels like the performance deteriorates significantly. However, all of your performance data shows no issues – requests in the Rails log file look speedy, CPU utilization is fine, database performance is solid, etc.
At first, you wave it off as a fluke. But then a customer reports the same issue. Now you’re concerned.
~ or ~
Sysadmin Eye for the Dev Guy
Developers! You can churn out a Rails or Sinatra app in no time. What about putting it out there in production? Occasionally forget the syntax for crontab or logrotate? Yeah, me too.
That's why I wrote up a few essential notes for a serviceable production environment.
This article covers Centos/Red Hat and Ubuntu, which is what I always end up on. My approach is to get some minimal configurations working quickly so I can see some results. From there, I can go back and refine the configurations.