Monitoring Django apps on Heroku

By Derek Bullet_white Comments Comments

I don't know of an easier way to deploy a Django app than letting Heroku do the work. That said, how do you stay on top of your app's performance, errors, and stability post-launch? Running an app on Heroku is a blissful experience, but it presents some monitoring challenges that aren't present when you control the hardware.

In this post, I'll walk through a free-to-start, low-effort approach that gives you great visibility of the health of your Django app on Heroku. All of the services below either have Django-specific support or don't require a significant time investment to work with a Django app.

Focus Service
UptimeUptime Robot
Application Performance MonitoringScout
LoggingLogDNA
Exception MonitoringRollbar
Custom MetricsHostedGraphite
Resource usageHeroku Application Metrics

Monitoring coverage areas

Here's the primary areas you'll want to monitor on a production Django app:

  • Uptime - is the app reachable from around the globe?
  • Application Performance Monitoring - when performance goes bad, dive to the the line-of-code causing the issue.
  • Logging - view your application and Heroku logs.
  • Exception Monitoring - aggregate, view, and close exceptions.
  • Custom Metrics - track key performance indicators that are specific to your app.
  • Resource usage - track memory usage and CPU load.

I'll cover each area in detail below.

Uptime

Is my Django app up? Whether you are hosting a personal blog or on the stability team for Netflix, you need this. I don't believe there is a free Heroku addon for uptime monitoring, so I'd suggest Uptime Robot. Uptime Robot has a free plan that checks if your site is up every five minutes.

Application Performance Monitoring

When it comes to tracking down performance issues, application performance monitoring (APM) gives the most value with the least effort. These services have libraries that trace the execution of your code, SQL queries, external HTTP requests and more, pointing to the line-of-code when there is a performance problem. Adding this instrumentation yourself would be painful.

The Scout Heroku Addon is an easy-to-configure APM option for Django apps. Once the addon is provisioned and the scout-apm package is installed, Scout traces each web request and breaks down time spent in Python code, SQL queries, HTTP requests, Redis queries, and more. Scout automatically identifies problems like N+1 database queries which are hard to reproduce in development.

Logging

Heroku’s log history only goes back 1500 lines, which could just be a couple of seconds of logs for a production Django app. This means you have to send your log stream somewhere for meaningful data retention and querying. LogDNA is an easy option on Heroku that requires virtually no configuration.

Exception Monitoring

Exception Monitoring tools make it easy to track exceptions down to a line-of-code, saving you valuable development time hunting down bugs. They also aggregate similar errors together to decrease noise when things are going wrong.

Rollbar is an easy-to-configure, reliable option for monitoring Django exceptions. Scout also integrates with Rollbar, giving you a single pane of glass for both performance and errors.

Custom Metrics

Your app has personal indicators that indicate its health (and the health of your business). For example, a shopping cart app may track the number of times an item has been added to its cart.

If you aren't on Heroku, StatsD is a great generic option. While it is possible to configure StatsD on Heroku, it is involved. A simpler option for custom metrics is HostedGraphite.

Resource Usage

There's no need to use another service to monitor basic metrics like memory usage and CPU load. If you are using Hobby dynos or higher, you get high-level resource usage metrics and alerting for free within your Heroku dashboard. Just enable application metrics.

 

Part I: How not to structure your database-backed web apps

By Derek Bullet_white Comments Comments

Most scientific papers are unlikely to change your day-to-day approach as a Rails web developer. How not to structure your database-backed web applications: a study of performance bugs in the wild Yang et al., ICSE’18 is the exception to that rule.

This study examined 12 popular, mature, opensource Rails apps for ActiveRecord performance anti-patterns. And boy, did they find some issues:

11 out of 12 studied applications contain pages in their latest versions that take more than two seconds to load and also pages that scale super-linearly

Read More →

 

Finding slow ActiveRecord queries with Scout

By Derek Bullet_white Comments Comments

Once your Rails app begins seeing consistent traffic, you're bound to have slow SQL queries. While PostgreSQL and MySQL can log slow queries, it's difficult to gleam actionable information from this raw stream. The slow query logs lack application context: where's the LOC generating the query? Is this slow all of the time, or just some of the time? Which controller-action or background job is the caller?

Read More →

 

Finding and fixing N+1 queries in Django apps

By Derek Bullet_white Comments Comments

The Django ORM makes it easy to fetch data, but there's a downside: it's easy to write inefficient queries as the number of records in your database grows.

One area where the ease of writing queries can bite is you is with N+1 queries. Expensive N+1 queries go undiscovered in small development databases. Finding expensive N+1 queries is an area where Scout is particularly helpful.

Read More →

 

Why put Rust in our Python Monitoring agent?

By Chris Bullet_white Comments Comments

Prior to adding Python performance monitoring, we'd written monitoring agents for Ruby and Elixir. Our Ruby and Elixir agents had duplicated much of their code between them, and we didn't want to add a third copy of the agent-plumbing code. The overlapping code included things like JSON payload format, SQL statement parsing, temporary data storage and compaction, and a number of internal business logic components.

This plumbing code is about 80% of the agent code! Only 20% is the actual instrumentation of application code.

So, starting with Python, our goal became "how do we prevent more duplication". In order to do that, we decided to split the agent into two components. A language agent and a core agent. The language agent is the Python component, and the core agent is a standalone executable that contains most of the shared logic.

Read More →

 

Your Rails & Elixir performance metrics 📈 inside Chrome Dev Tools

By Derek Bullet_white Comments Comments

Browser development tools - like Chrome Dev Tools - are vital for debugging client-side performance issues. However, server-side performance metrics have been outside the browser's reach.

That changes with the Server Timing API. Supported by Chrome 65+, Firefox 59+, and more browsers, the Server Timing API defines a spec that enables a server to communicate performance metrics about the request-response cycle to the user agent. When you use our open-source Ruby or Elixir server timing libraries, you'll see a breakdown of server-side database queries, view rendering, and more:

screen

Combined with the already strong client-side browser performance tools, this paints a full picture of web performance.

Get started with Scout's server timing libraries:

A Scout account isn't required, but it does make investigating slow response times more fun.

 

Older posts: 1 2 3 ... 68