Understanding Disk I/O - when should you be worried?

By Derek Bullet_white Posted in Development Bullet_white Comments Comments

Our co-author today is Christian Paredes, Senior System Administrator at Blue Box Group, a Ruby on Rails-focused web host that specializes in providing the operations expertise required to keep powerful apps running at peak performance. Christian keeps Blue Box Group’s internal infrastructure in top-shape and provides tier 3 customer support. He also volunteers for LOPSA, a guild for system administrators. We’re pleased to have him share some of his expertise on disk I/O.

If you’re old enough to remember floppy drives, you’ve heard the symptoms of a disk I/O bottleneck. For example, while Oregon Trail loaded the next scene, you’d hear the drive grinding away, reading data from the disk. The CPU would sit idle during this time, twiddling its fingers waiting for data. If that floppy drive was faster, you’d be running the Columbia River rapids by now.

It’s more difficult to detect an I/O bottleneck if the disk isn’t on your desktop. I’ll look at four important disk I/O questions for web apps:

  • Do you have an I/O bottleneck?
  • What impacts I/O performance?
  • What’s the best path to fixing an I/O bottleneck?
  • How do you monitor disk I/O?

A banana slug vs. an F-18 Hornet

Disk I/O encompasses the input/output operations on a physical disk. If you’re reading data from a file on a disk, the processor needs to wait for the file to be read (the same goes for writing).

The killer when working with a disk? Access time. This is the time required for a computer to process a data request from the processor and then retrieve the required data from the storage device. Since hard disks are mechanical, you need to wait for the disk to rotate to the required disk sector.

Disk latency is around 13ms, but it depends on the quality and rotational speed of the hard drive. RAM latency is around 83 nanoseconds. How big is the difference? If RAM was an F-18 Hornet with a max speed of 1,190 mph (more than 1.5x the speed of sound), disk access speed is a banana slug with a top speed of 0.007 mph.

This is why caching data in memory is so important for performance – the difference in latency between RAM and a hard drive is enormous*.

Do you have an I/O bottleneck?

Read More →


How much slower is Disk vs. RAM latency?

By Derek Bullet_white Posted in Development Bullet_white Comments Comments

It takes longer to access data stored on your hard disk vs. RAM. But how big is the difference? It’s really big.

How can you tell if you your web application is being impacted by slow disk access? What impacts I/O performance? What’s the best path to fixing the bottleneck? How do you monitor it? On Thursday, Christian Paredes of Blue Box Group joins us to talk about disk i/o.

UPDATED: Christian’s article has been published:

Understanding Disk I/O – when should you be worried?

Subscribe to our RSS feed or follow us on Twitter for more.


Sleep Better with a Proper Staging Environment

By Andre Bullet_white Posted in Business, Development Bullet_white Comments Comments

Nothing helps you sleep better at night like a staging environment that’s faithful to your production setup. That means your staging environment has the same Linux distro, same version of Ruby and gems, the same Apache and Passenger configuration, etc.

VPS not cloud

We’ve found that an inexpensive “always-on” VPS instance is better as a staging environment than a cloud instance we have to spin up and down. Why? Spinning up a cloud instance takes time. We’re more likely to actually use our staging environment if it’s as low-friction as possible to do so.

A staging environment isn’t free—you’ll spend money on the VPS, and you’ll spend time configuring and maintaining it. However, the peace of mind you’ll get is a great return on investment.

Setting up your staging environment

If setting up your staging environment is difficult, you have something to work on: a repeatable process for configuring production-like boxes. Remember, your staging environment should mimic your production environment as closely as possible. If you have a scripted process for setting up production boxes, then setting up your staging environment will be trivial.

If you’re like many organizations, however, there is no authoritative definition for production. Instead, it has evolved over time with manual tweaks and optimizations. In that case, the staging environment is a perfect opportunity to pull together a repeatable script. It doesn’t have to be automated (ours is not)—but it does need to be written down.

Staging deployments with Capistrano

We Rubyists are lucky—there are tools for just about everything. We use capistrano multistage for staging deployments. It’s straightforward to set up, and makes staging deployments completely frictionless.

You should end up with a “staging” file In your config/deploy directory, but not in your config/environments directory. You’ll use the your production environment for staging.

The unsolved staging problem: production-like load

The harder part is simulating production-like traffic on your staging server. In a perfect world, you would have holodeck for deployments. We don’t have a solution for this yet—ideas are welcome!

Previously in Developer Happiness

This is Part 4 in our Developer Happiness series. See previous articles:

Read More →


The 11" MacBook Air: like a good Linux tool

By Derek Bullet_white Posted in Development Bullet_white Comments Comments

I’ve been tuning Scout’s Apache setup lately. To start, I looked at the output of:

apachectl status

…but this only provides an instant snapshot. I wanted to watch the results over a longer period of time:

watch --interval=1 apachectl status

…but this generates a lot of output and I was only concerned with the number of idle workers. I wanted to make sure there were enough around during peak periods:

watch --interval=1 "apachectl status | grep 'idle'"

Perfect. One line of output and just the bits I cared about:

515 requests currently being processed, 143 idle workers

That’s why I love the Linux toolset: 4 single-purpose commands designed to work together.

My tools outside the terminal have come to resemble those within it. My latest update was a switch to an 11” MacBook Air as my development computer. The Air has limitations – a small screen, weak speakers, etc – but it’s perfect for the core work I do every day. Apps and files open quickly. Search is fast. At 2 lbs, there’s no burden carrying it around.

It’s easy to combine the Air with other tools when I need it to do more: an external monitor for a larger display or an Airport connected to my stereo for great sound.

Linux’s tools are usable by themselves but extendable – the 11” MacBook Air is in the same vein.

Read More →


Relentlessly Shortcut: .bashrc & Thor

By Andre Bullet_white Posted in Development Bullet_white Comments Comments

Check out the incredible shortcut Lance Armstrong takes in the above clip.

As developers, we should try to shortcut as smoothly as Lance does. You might not get cheered on quite as much—but then again, you have a lot more shortcut opportunities!

Shortcuts and Development Workflow

The quicker I can go from intent to action, the happier I am with my development workflow. Below are two tools I rely on to build shortcuts as effortlessly as possible.

An Organic, Evolving .bashrc

The best general-purpose shortcut mechanism is aliases in your .bashrc. I have one- and two-letter aliases for all my common working directories, git commands, server startups, etc.

If you want to relentlessly shortcut, you need a shortcut for creating shortcuts:

alias brc='vi ~/.bashrc;. ~/.bashrc'

All this does is load up .bashrc, and re-source it when I exit out of vi. This one command has turned my .bashrc into an organic, evolving toolbox, making whatever I’m working on easier, faster, and more fun.


For more involved scripting, I’ve recently become a fan of Thor. Thor is everything you like about Rake combined with everything you used to like about Sake:

  • a central place for your ad-hoc scripts
  • Usable system-wide
  • Write your own or install from remote repository
  • Low barrier to rolling your own.
  • Simple options parsing.

Here is the hello world of Thor, and here is a more advanced article to get the juices flowing.

Previously in Developer Happiness

This is Part 3 in our Developer Happiness series. See previous articles:

Read More →


CouchDB in production

By Derek Bullet_white Posted in Development Bullet_white Comments Comments

john p wood couchdb

John P. Wood of Signal, which offers a mobile customer engagement platform used by many top brands, recently created a couple of Scout Plugins for monitoring CouchDB. I’ve always been impressed by the team at Signal, so I was curious how they were using CouchDB in production. It turns out CouchDB is a huge part of their infrastructure – for example, one of their CouchDB databases is over 130GB in size.

John was kind enough to share his experiences with CouchDB below.

You use a number of different storage engines (MySQL, CouchDB, MongoDB, and Memcached) at Signal. Where does CouchDB fit in?

A couple of years ago we were running into performance issues with some very large MySQL tables. Queries against these tables were taking very long to run, and were causing page timeouts in our web application. At the advice of a friend who was helping us out as a consultant, we started looking at CouchDB. CouchDB views turned out to be a great fit for our problem.

A key component of our application is SMS messaging. The problematic MySQL queries we were running were collecting aggregate stats on these messages (how many messages did account A send in January of 2009, all of 2009, how many for all accounts, etc). Most of the queries were executing on past data, meaning the results of those queries would not change over time after that time period had past. So, it was simply a waste to re-calculate these numbers over and over. We considered using summary tables in MySQL to avoid this costly re-calculation, but saw them as being inflexible and difficult to maintain.

Read More →


Older posts: 1 ... 30 31 32 33 34 ... 68