Moving from FiveRuns to Scout

By Derek Bullet_white Comments Comments

As FiveRuns posted on their blog they have announced End-of-Life for FiveRuns Manage. We have made arrangements with FiveRuns to ease the transition for customers who still need a robust, easy-to-use monitoring solution.

For current Fiveruns customers, we are offering 50% off your first paid month here with Scout . Note that this is only for current FiveRuns Manage customers, and that the offer expires in one week (August 19th). Of course, like any other Scout signup, it’s risk-free: your first month is free (and your second month is half-off) and you can cancel, upgrade, or downgrade at anytime.

FiveRuns Manage customers: use your discount code on our signup page, and welcome to Scout!

Getting Started

Getting started with Scout is very straightforward, and the signup process guides you through all the steps. The main difference from FiveRuns Manage is that you choose the components you want to monitor by selecting plugins. You can add or remove plugins at any time, and we offer some suggestions for getting started below.

Your basic process is this:

  1. Install the gem: sudo gem install scout_agent and start it with the server key you’re given on signup
  2. Select one or more plugins from the directory. The Server Load, Disk Usage, and Memory Profiler are easy plugins to get started with.
  3. Customize or add Triggers. Scout uses triggers to alert you of spikes or trends in the data being gathered—for example, “alert me when the five-minute load average exceeds 4.0” Plugins come with default triggers, and you can customize all you need.

Let us know if you have questions!

 

Understanding Linux CPU Load - when should you be worried?

By Andre Bullet_white Posted in Examples Bullet_white Comments 16 comments

You might be familiar with Linux load averages already. Load averages are the three numbers shown with the uptime and top commands - they look like this:

load average: 0.09, 0.05, 0.01

Most people have an inkling of what the load averages mean: the three numbers represent averages over progressively longer periods of time (one, five, and fifteen minute averages), and that lower numbers are better. Higher numbers represent a problem or an overloaded machine. But, what's the the threshold? What constitutes "good" and "bad" load average values? When should you be concerned over a load average value, and when should you scramble to fix it ASAP?

Read More →

 

New Plugin: iostat

By Andre Bullet_white Posted in Plugins Bullet_white Comments Comments

Thanks to Rob Lingle of Rails Machine, we have a new plugin for monitoring IO performance. See the iostat plugin here.

What is iostat and why would I use it?

iostat reports terminal and disk I/O activity. You should use it if you suspect a device is IO bound. Ilya Grigorik recently put up a good post on iostat, and the man pages are here.

What are the plugin configuration options?

There are three configuration options for the iostat plugin:

  • iostat Command -- most likely, you won't need to change this. Consult the iostat documentation for other flags and options.
  • Device -- defaults to /, or specify any defice you want to monitor.
  • Interval -- defaults to three seconds; set to a different number to have iostat report averages over that many seconds

How do I install the plugin in Scout?

Just like any other plugin, go the Scout plugin directory and select the Device Input/Output plugin.

Ensure the iostat command is installed on your server. If it's not, you probably just need to install the sysstat package. For example, on Ubuntu this is apt-get install sysstat.

Enjoy, and let us know if you have any feedback.

 

Presenting the Scout Agent at Ruby Kaigi - Tokyo, Japan July 19

By Derek Bullet_white Posted in Updates Bullet_white Comments Comments

Scout takes a trek to Ruby’s birthplace – Japan – as James Gray presents How Lazy Americans Monitor Servers at the sold out Ruby Kaigi.

James’ July 19th talk focuses on the architecture of the Scout agent, the Ruby gem that is installed on a server you wish to monitor using Scout.

James will dig into the technical details of the agent’s division of labor approach for preventing memory leaks and crashes.

 

Q&A with the Scout Agent - An overview

By Derek Bullet_white Posted in Updates Bullet_white Comments 4 comments

Our recent update to Scout featured a revised UI, more functionality, and a new Scout Agent. While it’s easy to see the changes in the UI, a lot of the work conducted by the agent happens beneath the surface.

The Scout Agent, which is installed on a server you wish to monitor, was kind enough to sit down and walk me through its DNA (note that the ability to answer human questions is currently not available in the most recent release).

First, tell me a bit about what you’re made of.

I’m just a plain-old Ruby gem that you can install on any Linux-based server (sudo gem install scout_agent).

So, you’re a daemon right? Aren’t long-running Ruby tasks known to leak memory?

Yes, I’m a daemon. And yes, Ruby, like many programming languages, can leak memory when run for a long period of time.

My strategy for preventing memory leaks is simple: I do real work, like running plugins, in a separate short-lived process. I fork(), do whatever, and exit() so the OS can clean up any mess.

What’s your strategy to prevent the agent from crashing? Obviously, it’s important that monitoring software keeps running.

My work is divided into 2 main processes and several short-lived processes:
  • Lifeline – A single process that watches over all other agent processes. If a process fails to check-in with the lifeline regularly, I force it to stop and replace it with a healthy process.
  • Master – This is the event loop of the agent and is the main process monitored by the lifeline. It just sleeps and runs plugins in a never-ending cycle.
  • Missions – These processes execute the plugin code. These are small processes that exist only when plugins are running.

The reason for this division of labor? The real work is executed by the mission processes, which are short-lived. By offloading the work to such processes, the potential for degrading performance and a plugin’s execution raising an exception and killing me off is greatly reduced.

It’s easier to write 200 lines of bug-free code than 3000. The 200 LOC (my lifeline) keeps the rest alive.

Read More →

 

Scout Agent Updated - and do your Net::HTTP calls ever hang?

By Derek Bullet_white Comments Comments

UPDATED 6/30 – The fix for the old scout client (run via cron) is now available in version 2.0.7 (sudo gem install scout).

In rare (and difficult to reproduce) cases we’ve seen the Scout Agent not observe a Timeout during a checkin error with the Scout server. Scout uses Ruby’s RestClient gem to connect to the Scout Server and it uses the standard Net::HTTP library to manage the connection. Some versions of the Net::HTTP library can run into a bug in IO.select() on some platforms. This causes the request to hang forever in some rare cases.

Our fix? We added a redundant Timeout for the request, in addition to Net::HTTP’s own Timeout. You have to be careful how you nest those calls though, since they will throw the same Exception by default. We followed Eric Hodel’s advice to get our implementation right.

If you’re using Net::HTTP and notice the same issue, try adding a redundant Timeout with a custom halting Exception (our committed fix for this is on github).

This fix is included in version 3.2.6 of the Scout Agent. We’re planning on backporting the fix to the old client late next week (available now). Follow our Twitter feed to stay updated with the latest releases.

 

Older posts: 1 ... 53 54 55 56 57 ... 68