If you're using Amazon EC2, you may be familiar with CloudWatch,
Amazon's analytic system that provides metrics on CPU usage,
Network I/O, and Disk I/O of your instances. While CloudWatch
collects metrics, it doesn't provide a web interface for viewing the metrics, graphs, trending, or alerting.
Enter our Scout EC2 Cloudwatch plugin. Like any
other Scout plugin, you can graph the resulting metrics, set
triggers, track trends, and get email alerts when the numbers go
out of bounds.
What does it monitor?
The CloudWatch plugin captures the following ("measures", as EC2 calls them): NetworkIn, NetworkOut, DiskReadOps, DiskWriteOps DiskReadBytes, DiskWriteBytes, CPUUtilization.
Note, this plugin does not fetch EC2 Load Balancer Metrics, only EC2 instance metrics.
Single Instance, Autoscaling Group, etc.
The EC2 CloudWatch plugin can capture metrics from a single EC2 instance, or it can aggregate metrics across a couple of dimensions. It can aggregate metrics across a given instance type, across all instances launched from a specific image (AMI), or by a specified autoscaling group. That means you can, for example, graph the performance of your application server autoscaling group as a whole, or graph just your memcached instance.
To use this plugin, you have to enable CloudWatch for the instance(s) you want to collect metrics from. See Amazon's CloudWatch docs for details. Basically, it's just
ec2-monitor-instances ##### from the command line, or passing a monitoring parameter to the ec2-run-instances. It's covered nicely in Amazon's docs.
New to Scout?
If you're learning about Scout through this plugin, sign up for a trial Scout account to give this plugin a try. You can graph all kinds of metrics and measurements from all your servers. It works with cloud instances, VPS's, and dedicated hardware.
James Gray's July 19th talk at RubyKaigi 2009 focused on best practices for long-running Ruby daemon processes.
What types of questions did the audience ask? What did they seem most interested in?
In general, users always want to know about our RRD usage, extracting the daemon functionality from Scout's agent, and the agent's memory usage. It was the same at RubyKaigi. The questions reminded me of how much current Ruby RRD solutions suck and that it's time we did something about that. It also reminded me that I need to get around to extracting our daemon code, which I've always intended to do.
As FiveRuns posted on their blog they have announced End-of-Life for FiveRuns Manage. We have made arrangements with FiveRuns to ease the transition for customers who still need a robust, easy-to-use monitoring solution.
For current Fiveruns customers, we are offering 50% off your first paid month here with Scout . Note that this is only for current FiveRuns Manage customers, and that the offer expires in one week (August 19th). Of course, like any other Scout signup, it’s risk-free: your first month is free (and your second month is half-off) and you can cancel, upgrade, or downgrade at anytime.
FiveRuns Manage customers: use your discount code on our signup page, and welcome to Scout!
Getting started with Scout is very straightforward, and the signup process guides you through all the steps. The main difference from FiveRuns Manage is that you choose the components you want to monitor by selecting plugins. You can add or remove plugins at any time, and we offer some suggestions for getting started below.
Your basic process is this:
- Install the gem:
sudo gem install scout_agent and start it with the server key you’re given on signup
- Select one or more plugins from the directory. The Server Load, Disk Usage, and Memory Profiler are easy plugins to get started with.
- Customize or add Triggers. Scout uses triggers to alert you of spikes or trends in the data being gathered—for example, “alert me when the five-minute load average exceeds 4.0” Plugins come with default triggers, and you can customize all you need.
Let us know if you have questions!
You might be familiar with Linux load averages already. Load averages are the three numbers shown with the
top commands - they look like this:
load average: 0.09, 0.05, 0.01
Most people have an inkling of what the load averages mean: the three numbers represent averages over progressively longer periods of time (one, five, and fifteen minute averages), and that lower numbers are better. Higher numbers represent a problem or an overloaded machine. But, what's the the threshold? What constitutes "good" and "bad" load average values? When should you be concerned over a load average value, and when should you scramble to fix it ASAP?
Thanks to Rob Lingle of Rails Machine, we have a new plugin for monitoring IO performance. See the iostat plugin here.
What is iostat and why would I use it?
iostat reports terminal and disk I/O activity. You should use it if you suspect a device is IO bound. Ilya Grigorik recently put up a good post on iostat, and the man pages are here.
What are the plugin configuration options?
There are three configuration options for the iostat plugin:
- iostat Command -- most likely, you won't need to change this. Consult the iostat documentation for other flags and options.
- Device -- defaults to /, or specify any defice you want to monitor.
- Interval -- defaults to three seconds; set to a different number to have iostat report averages over that many seconds
How do I install the plugin in Scout?
Just like any other plugin, go the Scout plugin directory and select the Device Input/Output plugin.
Ensure the iostat command is installed on your server. If it's not, you probably just need to install the sysstat package. For example, on Ubuntu this is apt-get install sysstat.
Enjoy, and let us know if you have any feedback.
Scout takes a trek to Ruby’s birthplace – Japan – as James Gray presents How Lazy Americans Monitor Servers at the sold out Ruby Kaigi.
James’ July 19th talk focuses on the architecture of the Scout agent, the Ruby gem that is installed on a server you wish to monitor using Scout.
James will dig into the technical details of the agent’s division of labor approach for preventing memory leaks and crashes.
Our recent update to Scout featured a revised UI, more functionality, and a new Scout Agent. While it’s easy to see the changes in the UI, a lot of the work conducted by the agent happens beneath the surface.
The Scout Agent, which is installed on a server you wish to monitor, was kind enough to sit down and walk me through its DNA (note that the ability to answer human questions is currently not available in the most recent release).
First, tell me a bit about what you’re made of.
I’m just a plain-old Ruby gem that you can install on any Linux-based server (
sudo gem install scout_agent).
So, you’re a daemon right? Aren’t long-running Ruby tasks known to leak memory?
Yes, I’m a daemon. And yes, Ruby, like many programming languages, can leak memory when run for a long period of time.
My strategy for preventing memory leaks is simple: I do real work, like running plugins, in a separate short-lived process. I
fork(), do whatever, and
exit() so the OS can clean up any mess.
What’s your strategy to prevent the agent from crashing? Obviously, it’s important that monitoring software keeps running.
My work is divided into 2 main processes and several short-lived processes:
- Lifeline – A single process that watches over all other agent processes. If a process fails to check-in with the lifeline regularly, I force it to stop and replace it with a healthy process.
- Master – This is the event loop of the agent and is the main process monitored by the lifeline. It just sleeps and runs plugins in a never-ending cycle.
- Missions – These processes execute the plugin code. These are small processes that exist only when plugins are running.
The reason for this division of labor? The real work is executed by the mission processes, which are short-lived. By offloading the work to such processes, the potential for degrading performance and a plugin’s execution raising an exception and killing me off is greatly reduced.
It’s easier to write 200 lines of bug-free code than 3000. The 200 LOC (my lifeline) keeps the rest alive.
UPDATED 6/30 – The fix for the old scout client (run via cron) is now available in version 2.0.7 (sudo gem install scout).
In rare (and difficult to reproduce) cases we’ve seen the Scout Agent not observe a Timeout during a checkin error with the Scout server. Scout uses Ruby’s RestClient gem to connect to the Scout Server and it uses the standard Net::HTTP library to manage the connection. Some versions of the Net::HTTP library can run into a bug in IO.select() on some platforms. This causes the request to hang forever in some rare cases.
Our fix? We added a redundant Timeout for the request, in addition to Net::HTTP’s own Timeout. You have to be careful how you nest those calls though, since they will throw the same Exception by default. We followed Eric Hodel’s advice to get our implementation right.
If you’re using Net::HTTP and notice the same issue, try adding a redundant Timeout with a custom halting Exception (our committed fix for this is on github).
This fix is included in version 3.2.6 of the Scout Agent. We’re planning on backporting the fix to the old client
late next week (available now). Follow our Twitter feed to stay updated with the latest releases.
Sinatra, a Ruby DSL for quickly creating web applications with minimal effort, forms a key part of the Scout infrastructure.
James Gray talks about how we use Sinatra at Scout via RubyLearning.org – 20+ Rubyists are using Sinatra – Do you?
For more on how Scout works:
Rails Machine, one of the first specialized
Rails hosting providers, has selected Scout for
Why Scout? “Scout’s versatility was a key benefit: “the open-source
plugins are the killer app of Monitoring,” says Rails Machine CEO
Bradly Taylor. “We have so much flexibilty—and we’ll be
contributing some great plugins back to the community.”
“With Scout, we have the full picture – Rails, MySQL, CPU, Memory, IO,
and Disk Usage – all in one dashboard. You can’t beat that,” said
Jesse Newland, Senior Engineer at Rails Machine.
Rails Machine, which recently launched a Managed Hosting offering,
finds Scout particularly useful for proactively managing customer
accounts. “Scout’s trend detection have more than once caught an
application’s jump in memory usage due to RMagick-based image
uploads/resizes,” says Newland. “We’ve been able to jump in, restart
Apache, and then proposing alternatives to customers. It’s much better
than a server running out of a memory minutes later.”
Scout is included with Rails Machine’s new Managed Hosting service
which does everything but write the code – from MySQL tuning to
backups to performance monitoring. Self-managed customers can request a coupon for a free Basic Scout subscription or $14/off a
larger plan. See the announcement on the Rails Machine blog for more details.
From the original five-minute Rails
deployment gem, the Rails Machine
open source configuration management and deployment system, Rails
Machine understands automating system administration. Combining
Scout’s proactive approach to monitoring with Rails Machine’s
encyclopedia-like knowledge-base for scaling Rails apps is a perfect
combination for managed hosting.