Archives

Q&A with the Scout Agent - An overview

By Derek Bullet_white Posted in Updates Bullet_white Comments 4 comments

Our recent update to Scout featured a revised UI, more functionality, and a new Scout Agent. While it’s easy to see the changes in the UI, a lot of the work conducted by the agent happens beneath the surface.

The Scout Agent, which is installed on a server you wish to monitor, was kind enough to sit down and walk me through its DNA (note that the ability to answer human questions is currently not available in the most recent release).

First, tell me a bit about what you’re made of.

I’m just a plain-old Ruby gem that you can install on any Linux-based server (sudo gem install scout_agent).

So, you’re a daemon right? Aren’t long-running Ruby tasks known to leak memory?

Yes, I’m a daemon. And yes, Ruby, like many programming languages, can leak memory when run for a long period of time.

My strategy for preventing memory leaks is simple: I do real work, like running plugins, in a separate short-lived process. I fork(), do whatever, and exit() so the OS can clean up any mess.

What’s your strategy to prevent the agent from crashing? Obviously, it’s important that monitoring software keeps running.

My work is divided into 2 main processes and several short-lived processes:
  • Lifeline – A single process that watches over all other agent processes. If a process fails to check-in with the lifeline regularly, I force it to stop and replace it with a healthy process.
  • Master – This is the event loop of the agent and is the main process monitored by the lifeline. It just sleeps and runs plugins in a never-ending cycle.
  • Missions – These processes execute the plugin code. These are small processes that exist only when plugins are running.

The reason for this division of labor? The real work is executed by the mission processes, which are short-lived. By offloading the work to such processes, the potential for degrading performance and a plugin’s execution raising an exception and killing me off is greatly reduced.

It’s easier to write 200 lines of bug-free code than 3000. The 200 LOC (my lifeline) keeps the rest alive.

Read More →