Birds of a Fiber: A look at Falcon, a modern asynchronous web server for Ruby

November 14 Bullet_white By Dave Bullet_white Posted in Features Bullet_white Comments Comments

What is Falcon?

The GitHub Readme describes Falcon as, "... *a multi-process, multi-fiber rack-compatible HTTP server ... Each request is executed within a lightweight fiber and can block on up-stream requests without stalling the entire server process."*

The gist: Falcon aims to increase throughput of web applications by using Ruby’s Fibers to be able to continue serving requests while other requests are waiting on IO (ActiveRecord queries, network requests, file read/write, etc).

What’s a Fiber?

Most of us are familiar with Threads. A Ruby process can have multiple Threads which are coordinated and executed by the Ruby VM. A Fiber can be thought of like a more lightweight Thread, but the Ruby VM doesn’t handle the Fiber scheduling - the fibers themselves must coordinate to schedule execution. That’s the job of Falcon.

Let’s Get Ready to Fly!

Time for Falcon to spread its wings and show us what it’s got! We’ll test Falcon with a simple Rails 5 app running on Ruby 2.5, running in production mode.

We need some way to simulate ActiveRecord queries, network IO, and calls to native extension C code - all typical things average Rails applications do.

For all endpoints, we accept a sleep_time parameter in the URL, designating how long to sleep.

ActiveRecord Queries

We’ll use PostgreSQL’s pg_sleep function to simulate slow SQL queries:

class AverageController < ApplicationController

def slow_sql

ActiveRecord::Base.connection.execute("select * from pg_sleep(#{sleep_time})")

end

end

Network IO

We’ll use Net::HTTP to fetch a URL to simulate remote API calls:

class AverageController < ApplicationController

def remote_io

uri = URI.parse("http://localhost:8080/#{sleep_time}")

Net::HTTP.get_response(uri)

end

end

The HTTP server listening on the remote end is written in Go and sleeps for sleep_time before returning a 200 status and a very minimal body stating how long it slept before returning the response:

# sleepy_http.go

package main

import (

    "fmt"

    "log"

    "net/http"

    "strconv"

    "time"

)

func handler(w http.ResponseWriter, r *http.Request) {

    t, _ := strconv.Atoi(r.URL.Path[1:])

    time.Sleep(time.Duration(t) * time.Millisecond)

    fmt.Fprintf(w, "Slept for %d milliseconds", t)

}

func main() {

    fmt.Println("Listening on port 8080")

    http.HandleFunc("/", handler)

    log.Fatal(http.ListenAndServe(":8080", nil))

}

Native Extension/C Calls

I wrote a small C native extension module that simply sleeps for the specified time in C before returning to Ruby. There’s a sleep with GVL and one without the GVL:

class AverageController < ApplicationController

def cworkwith_gvl

CFoo::MyClass.do_work(sleep_time)

end

def cworkwithout_gvl

CFoo::MyClass.do_work_without_gvl(sleep_time)

end

end

Test Flight

Falcon can either be used in forking or threaded mode. In forking mode, a single thread per forked worker is created. In both modes, many fibers run within each thread. One fiber is created for each new request. We’ll use forking mode in our tests with a concurrency of 5 (5 total threads across 5 forks, but no limit to the number of fibers).

We’ll use siege to make concurrent requests against our endpoints. The options we’ll use for siege: siege -c -r

Slow SQL

First up, slow_sql with each SQL request taking 1 second:

$ siege -c 50 -r1 'http://localhost/slow_sql/1'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 10.08 secs

Transaction rate: 4.96 trans/sec

Wait a second - if Falcon is able to serve requests while we’re waiting for the SQL to return, we should be seeing about 1 second of elapsed time.

Remote IO

Ok, we’ll come back to the SQL test. What about network IO?

$ siege -c 50 -r1 'http://localhost/remote_io/1000'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 11.09 secs

Transaction rate: 4.51 trans/sec

Same results as our SQL test. Again, we should be seeing about 1 second of elapsed time.

What gives?

It turns out I forgot to mention a critical characteristic of fibers - they are cooperatively scheduled amongst themselves and not preemptible by the Ruby VM. That means in order for a fiber to run, another fiber must explicitly yield so Falcon can switch the running fiber.

In short: in order for Falcon to achieve its concurrency, you need to use libraries that are made to be ‘async aware’ of Falcon’s async reactor.

Async Aware

Fortunately, the author of Falcon has also created some async libraries for common things like Postgres and HTTP. Let’s use those to see how that improves concurrency!

Slow SQL

All we need to do is use async-postgres gem in place of our pg gem - no other code changes:

gem ‘pg’

gem 'async-postgres'

And the results?

$ siege -c 50 -r1 'http://localhost/slow_sql/1'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 1.07 secs

Transaction rate: 46.73 trans/sec

That’s more like it! All 50 requests were being served concurrently.

Remote IO

Adding async-http and our remote_io endpoint now looks like:

class AverageController < ApplicationController

def remote_io

endpoint = Async::HTTP::URLEndpoint.parse("http://localhost:8080")

client = Async::HTTP::Client.new(endpoint)

client.get("/#{sleep_time}")

end

end

The results:

$ siege -c 50 -r1 'http://localhost/remote_io/1000'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 1.08 secs

Transaction rate: 46.30 trans/sec

Awesome, right!?

So if we just replace some libraries with async aware libraries, we should get at least the same, if not better, concurrency than with Puma using the same number of threads, right?

Well, Not Quite

So far we’ve tested the endpoints that have async aware libraries that play nice with Falcon. What happens when we throw in an endpoint that does work that is not Falcon async-friendly?

For this test we’ll hit the async slowsql endpoint as we did before, but we’ll also hit the non-async cworkwithoutgvl endpoint at the same time, with a 5 second duration and only 5 requests (the same as the total number of Falcon threads):

$ siege -c 5 -r1 'http://localhost/cworkwithout_gvl/10000'

Transactions: 5 hits

Availability: 100.00 %

Elapsed time: 10.02 secs

Transaction rate: 0.50 trans/sec

Ok, no surprise there. What about the async endpoints that should only take 1 second?

$ siege -c 50 -r1 'http://localhost/remote_io/1000'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 10.35 secs

Transaction rate: 4.83 trans/sec

Uh oh. Our 5 requests that triggered non-async work ended up blocking all of our async endpoints for 10 seconds!

Falcon Fibers vs Puma Threads

Puma is the default web server for Rails 5. Puma is a threaded webserver, meaning each Puma process usually has multiple threads to handle requests.

One big difference of threads vs fibers is that threads are preemptible by the Ruby VM. This means the Ruby VM can suspend a thread from running at any time, run another thread, and switch back and forth based on which thread is waiting for IO and based on time (so each thread gets a fair amount of time to run, if more than one thread wants to run at the same time).

Fibers are not preemptible by the Ruby VM. Fibers must coordinate among themselves about which fibers should run and when. Since your application’s code will not have fiber yield points, the switching will occur in the Falcon async library boundaries. In our (and almost all other) case, this is only during network IO, file IO, and SQL queries (a form of network IO itself).

So What’s the Lesson, What’s the Takeaway?

The biggest lesson here is that when a request is accepted by Falcon, it is immediately handled in a new fiber within an existing thread. All fibers within that thread can end up blocking each other if they do any meaningful work that is not completely async aware.

Unfortunately Falcon is not magic and likely will not provide better concurrency or performance without substantial code changes in your app - and even then you are likely to encounter unhappy surprises. Unless you really understand the trade-offs and implications of Falcon’s design, then instead of flying a Falcon you may end up battling a Dragon. Chances are you’re probably better sticking with a webserver like Puma.

Bonus Questions

  • Want to read about how Ruby might improve it’s concurrency performance in the future?

  • How does Falcon limit the number of Fibers it serves at one time?

  • Would Puma with five threads and one worker also block in the ‘Well, Not Quite’ scenario?

  • If Puma was configured with enough threads to handle all concurrent connections in these same scenarios, would it perform better/worse/the same as Falcon?

    • What’s the overhead difference in CPU/Memory vs Falcon?
  • Does Falcon’s async reactor remind you of something you’ve seen before?

  • How do Thread local variables behave in Fibers? Are they also Fiber local?

  • Do the chances of having deadlocks or race conditions increase when using Fibers vs Threads?

Get notified of new posts.

Once a month, we'll deliver a finely-curated selection of optimization tips to your inbox.

Comments

comments powered by Disqus