The Linux kernel is an incredible circus performer, carefully juggling many processes and their resource needs to keep your server humming along. The kernel is also all about equity: when there is competition for resources, the kernel tries to distribute those resources fairly.
However, what if you've got an important process that needs priority? What about a low-priority process? Or what about limiting resources for a group of a processes?
The kernel can't determine what CPU processes are important without your help.
Most processes are started at the same priority level and the Linux kernel schedules time for each task evenly on the processor. Have a CPU intensive process that can be run at a lower priority? Then you need to tell the scheduler about it!
There are at least three ways in which you can control how much CPU time a process gets:
- Use the
nice command to manually lower the task's priority.
- Use the
cpulimit command to repeatedly pause the process so that it doesn’t exceed a certain limit.
- Use Linux’s built-in control groups, a mechanism which tells the scheduler to limit the amount of resources available to the process.
Let's look at how these work and the pros and cons of each.
You try creating a file on a server and see this error message:
No space left on device
...but you've got plenty of space:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 10321208 3159012 6637908 33% /
Who is the invisible monster chewing up all of your space?
Why, the inode monster of course!
What are inodes?
An index node (or inode) contains metadata information (file size, file type, etc.) for a file system object (like a file or a directory). There is one inode per file system object.
An inode doesn't store the file contents or the name: it simply points to a specific file or directory.
Your high-powered server is suddenly running dog slow, and you need to remember the troubleshooting steps again. Bookmark this page for a ready reminder the next time you need to diagnose a slow server.
Get on "top" of it
Linux's top command provides a wealth of troubleshooting information, but you have to know what you're looking for. Reference this diagram as you go through the steps below:
We take pride in building a server monitoring product our customers love with a lean, flat team. We're looking to add the fourth human to our close-knit group.
So, what's special about being a Ruby dev @ Scout?
First, great people! Second, great tech: come build beautiful realtime monitoring visualizations in d3 and Ruby. There won't be an LDAP integration in sight, we promise. Third, you'll have a tremendous impact as developer #2.
Beyond the technical chops, the single most important thing is your initiative. Will you dive into a problem unprompted? Point out problems and give suggestions on fixing it? Given a high-level goal, can you break it actionable chunks, ask for help when you need it, and see everything through to completion? We're a flat organization, and we won't micro-manage your work.
Competitive salary, health care reimbursement, and unlimited vacation time.
We'll consider great remote candidates, but we'd love for you to join us in Fort Collins, Colorado.
A few things about Fort Collins: best place to live (Money Magazine), ranked 3rd on the Best Bicycle Cities list, one of the Ten Best Vacation Cities for Beer Lovers, and 300 days of sunshine! Our office is located minutes from Old Town, the heart of Fort Collins.
How to apply
Email us at firstname.lastname@example.org. Resumes are fine, but a more personal email is better.
A big thanks to Eric Lindvall of Papertrail for adding steal time to Scout's CPU Usage Plugin and helping out on this blog post!
Netflix tracks CPU Steal Time closely. In fact, if steal time exceeds their chosen threshold, they shut down the virtual machine and restart on a different physical server.
If you deploy to a virtualized environment (for example, Amazon EC2), steal time is a metric you'll want to watch. If this number is high, performance can suffer significantly. What is steal time? What causes high steal time? When should you be worried (and what should you do)?
A couple of years ago I visited Argentina. I have trouble enough pronouncing my limited English vocabulary and I don't speak Spanish, but after a bit of time, it was pretty easy to order food, buy groceries, and use a taxi. However, occasional hangups that happen during my regular life in the states would throw me out of sorts in Spanish: a taxi driver trying to explain he doesn't have enough change would send me off the rails.
Ruby is my English when it comes to writing software, so when I hit hangups installing something Ruby-related, I can usually work my way out of them. Our monitoring agent at Scout is a Ruby gem, and while most of our customers already have Ruby installed, for those that don't a seemingly small hangup to me can be frustrating for them.
Now, thanks to Omnibus, there's an easy way to distribute your Ruby gems as standalone, full-stack program. This means folks without Ruby can have as smooth of an experience with your hip new gem as a hardened Rubyist.
Here's how I've built a full-stack installer for our scout Ruby Gem.
Back in 2010, we suggested using /bin/bash -l -c to run scout via Cron when using RVM. However, this was a brute approach: /bin/bash -l -c tells bash to behave as a login, interactive process. However, as Daniel Szmulewicz elequently stated in the comments for the original blog post, "Cron jobs are by nature non-login, non-interactive processes".
Fast-forward to today: RVM usage is continuing in production, and to make things more complicated, Cron jobs often need to account for both RVM and Bundler. So, what's our preferred approach when running Ruby executables via Cron in an RVM, RVM+Bundler, or Bundler environment? A shell script.
Cron Shell Script: RVM + Bundler
Lets say we want to run a Ruby executable (scout [KEY]) via Cron with (1) Ruby 1.9.2 and (2) my Rails App's Gem bundle:
Make the shell script executable: chmod +x FILE.sh.
Add the Cron job:
* * * * * shell_script.sh
But that's a lot of typing...
It's tempting to use /bin/bash -l -c when you are busy/lazy. To get around this, the scout install [KEY] command will detect if you are using (1) RVM and/or (2) Bundler. If so, we generate the shell script for you and make it executable.
scout install BNrIneEBMwE8h6VlhO4Bw4WmOVSLmnygSFZEPCfi
=== Scout Installation Wizard ===
It looks like you've installed Scout under RVM and/or Bundler.
We've generated a shell script for you.
Run `crontab -e`, pasting the line below into your Crontab file:
* * * * * /Users/dlite/.scout/scout_cron.sh
How do we detect RVM and Bundler? We've encapsulated it into an Environment class:
Scout’s realtime charts have been a big hit. Once you start using them for major deploys or performance incidents, going back to ten terminal windows running “top” feels like the dark ages.
So, how did we go about it?
To inspire hard work, some young men hang a poster on their wall that includes: (1) an exotic sports car (2) a scantly clad lady and (3) a beach house. My inspirational poster would be much less attractive: a friendly butler who offers time-honored wisdom (with an accent because people with accents are smarter) and absolutely loves running errands for me.
I don’t like running errands because I don’t like waiting in lines. My nightmare: having to pickup groceries during a busy weekend afternoon. There are 3 queues at the grocery store that can cause a delay:
- Finding a parking spot
- Getting a shopping cart
- Checking out
Modern web apps face the same queuing issues serving web requests under heavy traffic. For example, a web request served by Scout passes through several queues:
That’s Apache (for SSL processing) to HAProxy on the load balancer, then Apache to Passenger to the Rails app on a web server.
A request can get stuck in any of those five spots. The worst part about queues? Time in queue is easy to miss. Most of the time, people look at the application log when they suspect a slowdown. However, a slowdown in any of the four earlier queues won’t show up in your application log. Just looking at your application and database activity for slowdowns is like recording the time it takes to get your groceries from the time you grab the first item on the shelf till you start waiting to checkout: you’re leaving out the time it takes to find a parking spot, get a cart, and checkout.
Now, before you start worrying about queues, take a deep breath. First, each of these systems are super reliable. For the most part, they just work. Second, it’s much more likely your application logic is the cause of a performance issue than a queuing problem. Look there first.
Third (and most importantly), each of these systems handles queues in remarkably similar ways. Understanding some basic queuing concepts will go a long way. Let’s take a look at some basics and then specific examples for Apache, HAProxy, and Passenger.
A big part of providing good support is making it painless. At Scout, Andre and I handle all of the support requests. Once we’ve gathered the account information, it usually doesn’t take much time to help. The problem is quickly putting the account information together. We don’t want to use a dedicated support application – we usually handle just a couple of support requests per-day.
Why not view all of the account information right from Gmail, where the support request originates? We’re using Rapportive with a custom Raplet to make it happen. When we receive an email from a Scout customer, we see their Scout account info.