During a team camp among the lofty peaks of Breckenridge, Colorado, we talked a lot about the future of Scout and monitoring in general. Big mountains and nature have a way of doing that.
One thing that was getting our nerd juices flowing: Go.
At Monitorima in May, it was clear that Go was becoming the language of choice for performant yet fun-to-develop daemons.
After our morning hike fueled us with crip mountain air, we said: why not build a light Scout daemon in Go? As in, right this afternoon?
Kevin Lawver, President @ Rails Machine, is our guest author for this post.
Few things feel worst than rolling out a High Availability (HA) system, then regularly seeing that system collapse. For our team at Rails Machine, that failing HA system was MySQL Multi-Master Replication Manager (MMM).
We've been searching for a MMM replacement for a while, and a few months ago, we made the switch to MariaDB + Galera Cluster for High Availability MySQL.
What's wrong with MySQL MMM? What's special about Galera Cluster? Read on!
I've been hearing how Docker is the new awesome, but it didn't click for me until I dug in with a practical question: if we deployed Scout via Docker, would deployment be a more pleasurable experience?
My three takeaways are below.
We're overjoyed with the reaction to server roles, our new feature that makes monitoring many servers as easy as monitoring a few. The end result hits our favorite sweet spot: it makes something that used to be painful into something fun.
Server Roles was the biggest release since the launch of Scout and the path to the release was anything but a smooth, rolling path. It's a story of fast-changing deployment environments, tangents, a failed experiment, listening, first-hand experience, and finally, something we were happy with.
Here's the story of Scout's evolution to roles.
Oct 2007: Before AWS
Scout started as an internal tool at Highgroove Studios (now Big Nerd Ranch) in 2007, or, roughly one year before AWS exited Beta status. For you young chicks out there, this was a time when you couldn't click a button to provision a server.
Since it wasn't as easy to provision servers, there was less churn in the size of environments. When you wanted to monitor a new server in Scout, you'd create it in our UI and then use the provided locally in your Crontab entry. The manual step of copying the key to to the server didn't feel tedious (and was way easier than configuring Nagios, Munin, etc) since our customers weren't provisioning servers frequently.
We recently decided it was time for a major update to the public side of Scout. We’d start with a more polished homepage. Since we’re both developers, the obvious next step seemed like hiring a designer. However, working with an outside designer isn’t a hire-and-forget experience:
- Good designers are difficult to find. Design doesn’t scale like a product business.
- Good designers are busy. It could take 30-60 days to start work, then another 30 days for it to come together. This means we could be looking at a 90 day timeline. We wanted to launch it faster.
Instead of starting work with a designer on a blank slate, we decided to start firming up what we wanted the homepage to look like. We’d end up with one of the following outcomes:
- We’re terrible at design, but we’ve at least thought it through. Hire a designer.
- We can get 80% of the way there, but we’ll need a designer for touchups.
- If we iterate enough, we can launch something we’ll be happy with.
Startup Lessions from CCP - EVE Online Style:
Another great example was the formation of “Team Best Friends Forever” (“BFF” is an inside EVE joke). This team is a group of CCP developers whose sole mission is not to work on major features and improvements, but rather to fix all those annoying “little things” that bother their customers. Too many times, product managers and development teams are focused on the big-ticket items – and that’s fine, but TBFF is a great approach that again proves that CCP listens to their customers.
Every man has their breaking point when it comes to deadweight code. Andre and I hit ours recently and decided to spend all of last week focusing soley on cleaning up Scout (a Rails app). Our goals:
- Faster tests – our tests took 8 minutes to complete. While it’s the perfect amount of time to catchup on Daily Show clips, it really tested our patience making application-wide changes.
- Removing deadweight – unused CSS rules, database tables + columns, views, and assets. It’s good having certainty that modifying code will change something in the application.
Here’s how we went about it:
Last week, Sparrow became the latest poster boy for talent acquisitions (Google gets the team, kills the product). Paying customers complain (I supported it!). Indie devs get depressed as one of their rank sells out.
I disagree with Matt Gemmell that these are a good thing – this is not a feel-good rags-to-riches story. It’s about brilliant developers giving into reasonableness because they didn’t have the runway to be foolish.
Scout’s realtime charts have been a big hit. Once you start using them for major deploys or performance incidents, going back to ten terminal windows running “top” feels like the dark ages.
So, how did we go about it?
Ernest Hemingway via Letters of Note
I write one page of masterpiece to ninety one pages of shit. I try to put the shit in the wastebasket.
Sounds a lot like writing code too, huh?