The winding path to server roles

May 29 Bullet_white By Derek Bullet_white Posted in Development Bullet_white Comments Comments

roles timeline

We're overjoyed with the reaction to server roles, our new feature that makes monitoring many servers as easy as monitoring a few. The end result hits our favorite sweet spot: it makes something that used to be painful into something fun.

Server Roles was the biggest release since the launch of Scout and the path to the release was anything but a smooth, rolling path. It's a story of fast-changing deployment environments, tangents, a failed experiment, listening, first-hand experience, and finally, something we were happy with.

Here's the story of Scout's evolution to roles.

Oct 2007: Before AWS

first account

Scout started as an internal tool at Highgroove Studios (now Big Nerd Ranch) in 2007, or, roughly one year before AWS exited Beta status. For you young chicks out there, this was a time when you couldn't click a button to provision a server.

Since it wasn't as easy to provision servers, there was less churn in the size of environments. When you wanted to monitor a new server in Scout, you'd create it in our UI and then use the provided locally in your Crontab entry. The manual step of copying the key to to the server didn't feel tedious (and was way easier than configuring Nagios, Munin, etc) since our customers weren't provisioning servers frequently.

Oct 2008: AWS Exits Beta

Using a unique key per-monitored server didn't align well with EC2. As our customers encountered this, they initially requested that we provide an API method to provision a new server and its associated key. This would be run as part of the instance provisioning process.

However, we weren't big fans of this: it's vital that monitoring works. If the script to initiate a Scout server failed during the startup sequence, the server wouldn't be monitored. Everything from a temporary network hiccup to a missing gem dependency could cause monitoring to fail.

Jan 2009: The emergence of Chef and Configuration Management Tools

The release of Chef in 2009 rapidly accelerated the configuration management world for our customers. While the overall adoption between Puppet and Chef appears to be fairly equal, we received far more inquiries about configuring Scout with Chef.

Sep 2009: Cloud Images

cloud image

EC2 made it much easier to deploy more servers and create redundant setups. This meant our customers began to have groups of similar servers ( web servers, database servers, etc) that have the same monitoring configuration.

What if you could re-use an existing server's key in Scout on new servers? Then, those new servers would use all of the same settings as the original server. This was a much better approach than creating servers via the Scout API as well: it's a Crontab entry that runs every minute, so if for some reason the first Scout run fails, no big deal: it'll keep trying.

We launched cloud monitoring in September 2009.

Consistency

A problem soon emerged: when a new server reported with a cloud key, we copied the plugins from the original server, but we didn't keep things consistent (ex: adding a new plugin to the original server didn't add it to the other servers).

Why?

  • Customers would often use the cloud key as a base template - it monitored key metrics, then they would add additional monitoring plugins as they saw fit.
  • Inevitably, some servers end up needing slight modifications from their base image (ex: testing out Memcached on a single app server).

In short, taking the step of making everything consistent would be a big jump. We needed more time to see how cloud images were used.

Jan 2010: A Tangent - Copy and Paste

We punted on the consistentcy issue, but we introduced a tool to make it easy to copy one server's settings to another: Copy and Paste launched in January 2010. It allowed you to copy a plugin (and all of its settings) to any number of servers. It made it easier to keep things consistent, but required a manual step.

Sep 2011: Failed Attempt - Synced Groups

Our first attempt at making things consistent? Synced groups: define a group of servers (ex: web servers), pick one server to act as the template server. All other servers in the group will use that server's configuration.

We started working on this in the fall of 2011. However, issues soon emerged:

  • Too ambitious - We envisioned representing these synced groups differently in the UI. In addition to the standard server views, there would be additional views to represent the aggregate synced group. We've found UI issues to be the longest to rollout - it takes a while to get the experience right.
  • Overlapping terms - We already had groups - so we thought about calling these "synced groups" or "clusters". We were hesitate to add more Scout-specific terminology. We like clean, easy-to-use tools and adding more terms heads in the opposite direction.
  • Servers that should belong to multiple groups - For example, a single app server may also perform some utility functions.
  • One-off changes - For example, testing Memcached on a single app server.

Apr 2013: Roles - A common language

roles

We started using configuration management tools vs. hand-rolled scripts to deploy servers at Scout in the summer of 2011. As our first-hand experience with configuration management tools increased, our ability to make editorial decisions on how syncing should be done was strengthened.

Our first commit to roles was in late September 2012. We put our first customer on roles in November 2012 and opened it up everyone in the begining of April 2013.

Apr 2013: Chef Recipe + Install Instructions

We wanted to make it super-easy to use Chef with Scout, so inspired by Boundary, we added Chef-specifc instructions for Scout when adding a new server.

The original Scout Chef recipe was written by Drew Blas. We forked it on Github and updated it w/role support.

Tl;dr

  • We realized keeping monitoring settings in sync across servers would be a big win a couple of years ago.
  • We didn't want to make the customer experience complicated when syncing was enabled.
  • It took (1) our own experience with configuration management tools (2) regular customer feedback to make something easy-to-use yet powerful.

More on roles:

Get notified of new posts.

Once a month, we'll deliver a finely-curated selection of optimization tips to your inbox.

Comments

comments powered by Disqus