Elixir foundations for Ruby Devs: transforming data

This is a guest post by Tomasz Kowal, a software developer currently working full time with Elixir at ClubCollect. He started with Erlang 6 years ago and is still amazed by the power functional languages provide. In his free time he likes tinkering with flying robots.

Have you ever reached for a drink of water, then realized, half-way through your first sip, it was Sprite? I like Sprite, but that first sip of a similar-looking, but very different liquid is a shock.

That's how Elixir can feel if you're coming from Ruby. The syntax looks similar, but Elixir is different from the first sip. For a smooth transition, you need to (a) learn some functional programming patterns and (b) unlearn some Object-Orientated habits.

A core design pattern of Elixir is the focus on data transformations: you'll see it in libraries like Ecto.Changeset, Ecto.Multi, Plug.Conn and built-ins like Enum. Let's dive in.

Data Transformations

The data transformation pattern uses one data structure (DS) that is your single source of truth and many small functions operating on it. It makes your programs:

  1. Easy to compose (which translates to "easy to write")
  2. Easy to extend (customize)
  3. Easy to test.

Almost every time it looks exactly the same:

  1. Choose a data structure
  2. Write many functions that take the data structure as first argument and return that data structure
  3. Make sure most of those functions don't have side effects.

The examples will start simple. Lets start with lists and Enum protocol.

Simple data transformers with Enum

You'll often see code like this:

sorted_even =
  list
  |> Enum.filter(&Integer.is_even/1)
  |> Enum.sort()

You can chain many operations from the Enum module using the pipe operator |>. This is possible, because all functions like filter or sort take a list as a first argument and return a different modified list.

If you need any other operations that you would like to compose in the same way, just write a function that takes a list as a first argument and returns a new list. Lets encapsulate those three lines in its own function:

def sorted_even(list) do
  list
  |> Enum.filter(&Integer.is_even/1)
  |> Enum.sort()
end

Without any modifications we captured the logic and put it into a function that we can later us in chain like this:

def two_smallest_even(list) do
  list
  |> sorted_even()
  |> Enum.take(2)
end

This may look obvious, but more complicated libraries use the same pattern.

The three observations are:

  1. Transformations on lists are easy to compose, because a set of transformations is also a transformation (composability).
  2. You can write functions that do crazy things as long as they return list at the end (extensibility).
  3. As long as your functions don't introduce side effects, testing is easy.

Ecto Transformations

Lets look at the Ecto library and how it solves data validation. Validators need to check if changes to given data are valid and if not, indicate why. Lets call our single data structure a "changeset". Our validators should be small functions that take a changeset as a first argument and return a changeset.

What do we need to store?

There are fields in our DS that we can treat as input like data and params, there are fields that are intermediate like changes, and there are fields that are clearly output: valid? and errors. We combine all this sauce into a single DS and now we can use it like this:

user
|> cast(params, [:name, :email, :age])
|> validate_required([:name, :email])
|> validate_format(:email, ~r/@/)

For example, we take the original data, a user in database, and tell Ecto what params we want to change and list all the fields that may require validation. The cast function returns a changeset and from that point on, all functions in the chain take a changeset as a first argument and return a changeset.

Writing your own Ecto validator

Ecto has a nice set of ready-to-use validators that you can easily compose, but what if you wanted to write your own? Something non-standard like making sure that an event in database doesn't finish before it started. We just need to write a function that takes a changeset and returns a changeset like this:

def validate_interval(changeset, start, end) do
  start_date = get_field(changeset, start)
  end_date = get_field(changeset, end)
  case Date.compare(start_date, end_date) do
    :gt -> add_error(changeset, start, "…")
    _otherwise -> changeset
  end
end

We extract the starting date and ending date from the changeset, compare them, and if everything is OK we return an unmodified changeset. If there were errors from previous validations, we don't care: we just pass them on through the pipe chain. If the start date is greater than the end date, we set the valid? indicator to false and prepend the error to the list of errors.

Validators can also be composed the same as lists. Lets say we have a set of validators that are always used together. For example, we would like to create an address validator from street and zipcode validators:

def validate_address(cs, street, zip) do
  cs
  |> validate_street(street)
  |> validate_zipcode(zip)
end

A set of validators applied one by one is also a validator, so validators are easy to compose. They are also easy to test, because in the end, you pass a DS and check fields in a DS in your tests:

Ecto.Changeset is also used when calling the database:

case Repo.insert(changeset) do
  {:error, changeset} →...
  {:ok, model} →...
end

Database constraints are converted to changeset errors. This follows the principle of separating pure and impure parts of your program.

Ecto.Multi

A third example of the same "single data structure" pattern is Ecto.Multi. Multi stores database queries that can be later fired in one transaction:

Multi.new
|> Multi.update(:account, a_changeset))
|> Multi.insert(:log, log_changeset))
|> Multi.delete_all(:sessions,
    assoc(account, :sessions))

multi |> Repo.transaction

It is different from a changeset, because the main DS is opaque. The internals may change and you can't use them directly. It is similar to a changeset, because all operations take Multi as a first argument and return Multi, which makes them easy to extend and compose in the same way as validators. This also means that you can end up with pretty big Multis with branching logic and nested operations that you would likely test to ensure everything works as expected.

Instead of making actual queries to the database, Multi gives you a to_list function which lists all the operations that ended up in Multi. This way, you can test your application logic using pure data structures instead of hitting the database. This makes tests easier and faster.

Phoenix

The fourth example is the king of them all. The essence of Phoenix Framework. The almighty Plug. Lets apply the same principles that were shown above, this time to web servers. We need a single DS that holds all information about web request. It is going to contain all things that come with a request:

All things that we need to return at the end:

And some intermediate things that might come in handy during request lifecycle like:

Params are intermediate, because they come either in GET URL or in POST body and need to be normalized to an Elixir map first. Lets call this DS a Conn. When a request arrives to web server, it is immediately translated to Conn. It is similar to how cast works in Changeset. After that there are many small functions called plugs that (you guessed it) take Conn as a first argument and return Conn.

A set of plugs chained together is called a pipeline. Entire Phoenix framework is a pipeline like this:

Conn |> Enpoint |> UserPipelines |> Router |> Controller

A pipeline is also a plug so the entire Phoenix Framework is just a plug. The beauty comes with its extensibility. You can put your own custom plugs in almost any place in the request lifecycle. It allows library creators to add new functionality to Phoenix by simply writing couple of functions with instruction where developer needs to... you know... plug them.

It is even better if you can keep your plugs pure. Lets say in your plug you want to add something from database to Conn.assigns. You can do it like this:

def my_plug(conn) do
  user_id = get_session(conn, :user_id)
  user = Repo.get(User, user_id)
  assign(conn, :current_user, user)
end

...and this would be hard to test, because it needs to call the database each time you call it. There is a simple workaround for that. Pass impure things as an argument!

def my_plug(conn, repo \ Repo) do
  user_id = get_session(conn, :user_id)
  user = repo.get(User, user_id)
  assign(conn, :current_user, user)
end

We pass the module name as the second argument with the default value of Repo. This will ensure, that in case we call this plug with a single argument, it behaves in exactly the same way as the one above. We can use the second argument in our tests like this:

defmodule Fakerepo do
  def get(1) do
    %User{name: „Tomasz”, …}
  end
end

my_plug(conn, Fakerepo)

We made the plug testable by making all its "contracts" with outside world explicit. This nicely separates pure and impure parts of the plug.

Summary

We can see the same data transformation pattern repeated many times through different libraries. It is convenient to use when language offers easy chaining with pipe operator or something similar. As Alan Perils wrote in Epigrams on Programming:

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

This is reflected in the Unix philosophy where everything is a stream of lines and you build larger programs by composing programs with the Unix pipe |. It is also reflected in many Elixir libraries which use single DS and compose programs using Elixir pipe operator |>.