Coming Soon: your Rails app performance trends & outliers, via email

I follow a simple rule before configuring a monitoring alert: if I receive this alert at 3am, will I act on it?

If not, it shouldn't be an alert.

Few performance-related alerts meet this criteria. For example, if our app is running 25% slower, it's not worth a hasty 3am fix, but it is worth a first-thing-in-the-morning effort.

That's the drive behind a feature we'll make available soon: The Digest Email. Available in daily or weekly editions, the Digest Email summarizes your Rails app performance and directs you to bottlenecks with ease:

Digest Email

How It Works

At a frequency of your choice (daily or weekly), we'll crunch the numbers on your app's performance (both web endpoints and background jobs). Performance is compared to the previous week, and highlights are mentioned in the email.

To start, there's three specific areas we're focusing on.

1. Trends

It's easy to just grab endpoints with large changes in their mean response time between today and last week. However, that adds significant noise: a rarely used endpoint, like UsersController#forgot_password, may vary widely in response time. Is it worth the development performance effort if response times are bouncing between 100 ms - 500 ms? Frequently, the answer is no.

Scout works hard to identify significant trends. Some of the approaches our algorithms apply:

To make tracking down the source of trends easier:

2. Slow Outliers

What if an endpoint is fine for 90% of users, but it becomes extremely slow for a small subset of users? The small percentage of users experiencing performance problems are frequently high-paying power users that are pushing your app the hardest. For example, a controller-action that renders all employees at a startup will load quickly while that same endpoint would fall over if that company was Apple.

Additionally, these very slow outliers can trigger frustrating capacity problems, and in a worst-case scenario, momentary downtime. It's far more difficult to determine the application capacity you need to serve your app when response times vary widely (Little's Law isn't valid across a wide distribution of response times).

We highlight endpoints that are triggering these slow outliers, but that's not all. We also identify any significant bottlenecks (example: a slow ActiveRecord query).

Bonus: if you've setup our GitHub integration, you'll see who last touched any expensive code paths.

3. The email subject

Our subject line is dynamic, changing with your aggregrate app performance. Here's an example:

subject with change

If performance isn't changing, it's important to know that too:

subject no change

Also, we display a friendly emoticon when things are going well:

subject emoticon

It's a nice, friendly reward.

The goal: if things haven't changed, there's no need to open the email. If we think there's something worth investigating, we'll draw your attention.

Early Access

We're limiting the number of recipients as we tune our algorithms based on your feedback. Enable the Digest Email in your user settings to ensure you'll be in our first access group.

TL;DR

Most app performance issues don't warrant immediate, one-off alerts, but they do warrant a holistic per-day or per-week review.

The Scout Digest Email aims to address this while identifying the source of issues.