Last week, one of our application servers died. We have four app servers, so in theory, the death of one app server shouldn't bring the entire platoon down. However, real-life had other plans: 95% of requests were handled fine, but around 5% were being dropped. Here's the story of how we diagnosed and fixed the issue with our realtime charts.
I’m sitting in the Denver Airport – in a couple of minutes, I’ll board the plane to RailsConf in Portland, Oregon. I’m already getting amped for Voodoo Donuts, Stumpdown Coffee, well-trimmed beards, and of-course, lots of Rails-related chats.
I’m bringing a fresh load of Scout T-Shirts. These aren’t your normal heavy-weight, poor-fitting shirts. They are tastefully designed, American Apparel – Tri-Blend (otherwise known as the most comfortable shirt you’ll own). If you’re attending RailsConf, shoot us an email so we can meetup and improve your wordrobe at the same time. Or, just look for us (Andre and Derek). We don’t always rest our arms on each other, but when we do, we look like this:
It's been three weeks since the launch of the largest feature enhancement in Scout's existence: roles. Haven't heard of roles? Nutshell: roles let you monitor many serves with fewer clicks and more joy. Roles were driven by your feedback and it's showing in the fast adoption numbers below.
Time to give an awkward nerd high-five of thanks:
- Customers on our Roles BETA program - your feedback and willingness to try new things helped us iron out the edges for the public rollout.
- Contributors to our Chef recipe - we've already had six authors commit to our Chef recipe for deploying Scout. It's great to see a hardened Chef recipe based on real-world usage.
- Feedback since the launch - we built roles because of your feedback, and we've enjoyed reading your suggestions post-launch.
Haven't tried roles yet? To get started, see the "Roles" dropdown on your account, and read the FAQ on roles.
Roles are a new feature available immediately for all new and existing accounts.
You have a carefully thought out architecture. You frequently add new servers as your business grows. In fact, scaling up is part of business as usual. Monitoring should scale easily with you -- that's why we're introducing roles.
Roles make it easy to setup plugins and triggers across many servers. Instead of individually configuring servers, configure roles. Then, apply roles to your servers through our UI, the command line, or your configuration management tool.
Some examples of how roles will make your life easier:
- Updating a trigger on 50 app servers
- Adding a Memory Usage plugin on 100 memcached servers
- Updating a plugin to a new version across all 10 MongoDB servers
Roles are available now on your account. Look for the new "Roles" item on the top navigation. If you previously had servers organized by groups, your groups have been upgraded to roles. See documentation here on creating roles and organizing your existing plugins into roles.
Your account now has a single, account-wide key -- use it for any new servers you add. Your existing keys will continue to work, so you don't have to touch any servers you're currently monitoring.
Most setups have a limited number of server configurations (app, db, utility, for example), and several servers of each configuration. When you add another app server, it probably needs the same monitoring template as your existing app servers. Adding more servers using existing templates is the scenario we wanted to make dead simple in Scout.
There's no need to stick to one template at a time: servers can have any number of roles in Scout, so feel free to mix and combine roles as needed to reflect the functionality of your servers. Is one of your HAProxy boxes also running memcached? No need to create a brand new roles, just apply two of the roles you already have.
Once defined, roles are "active": if you update a role (say by adding a plugin or a trigger), all the servers in that role are automatically updated to reflect the changes. It's a much easier way to to manage your monitoring configuration.
Best friends with Chef
We provide an official Chef Recipe designed to work with roles.
To simplify deployment, Scout now provides a single, account-wide key you can use on all your servers.
Even if you're not using Chef, you can (optionally) specify roles directly through the Scout executable:
scout -rdb,app to assign the db and app roles, for example. This makes role assignments highly script-able, whether you're using Chef, Puppet, or Moonshine.
Fine-tuned for large environments
With the recent notification group changes and now roles, we're making monitoring easier for large environments. Our previous tools -- cloud keys and plugin copy-paste -- were useful, but it was easy for things to get out of sync. Roles our our answer for keeping monitoring in sync in large environments.
Roles in Summary
With Roles, we want to make deploying and scaling scout on large environments as easy as possible:
- Roles are "active": updating role's triggers or plugins propagates to all the role's servers.
- Better than templates: Servers can belong to more than one role.
- Account-wide keys: no need to provision keys for new servers - reuse the same 40-character account key in the crontab across all your servers
- Specify roles via the crontab: optionally, you can pass a command-line argument to the scout agent to specify the roles it should belong to.
- Friendly with Chef: we also provide an official chef recipe for roles-enabled server configuration.
To get started, see the "Roles" dropdown on your account, and read the FAQ on roles here.
Whenever we’re asked how to make on-call notification schedules for Scout alerts, we recommend PagerDuty. PagerDuty has invested a ton of time in building a dedicated notification scheduling service, and it’s a great complement to Scout.
With our recent release of notification groups, Scout’s integration with PagerDuty got even more powerful:
- Multiple PagerDuty services: add as many PagerDuty services to Scout as necessary.
- Trigger-specific escalation policies: assign any PagerDuty escalation policy to any threshold in Scout. If you need to create multiple thresholds on a given metric with different escalation policies, it’s simple to do – just add another trigger.
- Automatic incident resolution from Scout: since all integrations are routed through PagerDuty’s API, Scout now auto-resolves any PagerDuty incidents when Scout’s trigger stops firing.
Multiple services in PagerDuty:
... and those same services integrated into Scout:
Adding PagerDuty services within Scout
You need to start in Scout to create a PagerDuty integration:
- Click on Notifications (in the top navigation bar),
- Click on “Add PagerDuty Integration.”
- You’ll be given the option to create a new PagerDuty service, or connect to an existing service within your account.
To assign a PagerDuty service to a trigger, ensure the PagerDuty integration is part of a notification group (the notification group can contain other items too, if needed), then assign that notification group to a trigger.
Have a useful plugin sitting around? Share it! Send a pull request to our scout-plugins repository on Github.