Skip to main content

Choosing an Erlang formatter

· 8 min read
|
Sebastian Weddmark Olsson

There are many different Erlang formatters, and as a hack day project I investigated which options exist for us at wgtwo. There are two main alternatives, but sadly both have problems. I've also briefly looked at an Erlang linter.

Background

In wgtwo we use a bunch of different programming languages, and we all have different experiences and are used to different languages and environments. We are pretty autonomous and we are expected to jump in and out in different services to fix bugs and add features.

We like when the code is uniform, because it makes it easier to focus on the business logic. That is why we want to use tools to make sure our code is consistent no matter who is the author, or which IDE is used, or in which part of the system the code resides.

About half a year ago there was a discussion about code style within wgtwo that resulted in formatting tools being applied for Kotlin, Bazel, Go and Java. It also resulted in an internal wiki page containing guidelines about code style.

That document highlights some of the problems with mixing different code-styles. It should be easy for newcomers to maintain the coding style. It should also be easy to read diffs, and the discussions about code style and formatting will be minimized because there is a concensus.

This is nice, I want it for Erlang

As some of our services are written in Erlang, I wanted to investigate which formatters exist for Erlang, and what state they are in. I used our last hack day for this purpose.

The requirements I had was that it should be reproducable. Calling the formatter multiple times should not change the structure more than once (first time of being called). The formatter should also preferably work with rebar3 (the most used Erlang build tool). The tool should not use external tooling that wouldn't work for all developers flow.

I also wanted it to have a short execution time, at least after the initial formatting.

Benchmark

I searched for formatting tools on hex.pm, github.com, duckduckgo.com, google.com and came up with the following arbitrary list of Erlang formatters. There is probably others, but these seems to be the most used.

  • rebar3_fmt
  • steamroller
  • otp/erl_tidy
  • tsloughter/erl_tidy
  • rebar3_format
  • eryngii

rebar3_fmt

One big problem with this is that it uses Emacs erlang-mode for formatting. Sure, I am an Emacs user and the erlang-mode and its formatting is maintained and supperted by OTP, but my non-Emacs coworkers would not be happy if they need to install Emacs every time they want to format the code.

steamroller

Though it was easy to setup (just add it to dependencies in your Rebar config and run rebar3 steamroll), my first impression of the execution was that it was really slow. Even when running subsequent calls on my Dell XPS 13 P82G it took around 3.5 minutes to format.

The plugin had some support for increasing the number of workers from the default --J=1, but that did not seem to help with the execution time.

The default steamroller formatting options specify 2 spaces instead of the erlang-mode 4 spaces standard that is used in our code base.

Here is a sample of a complex record structure

-                   components =
- [{invoke,
- #'Invoke'{
- invokeID = 1,linkedID = asn1_NOVALUE,
- operationCode = updateLocation,
- parameter =
- #'UpdateLocationArg'{
- imsi = IMSI,
- 'msc-Number' = CallingGTBCD,
- 'vlr-Number' = CallingGTBCD}}}]},
+ components =
+ [
+ {
+ invoke,
+ #'Invoke'{
+ invokeID = 1,
+ linkedID = asn1_NOVALUE,
+ operationCode = updateLocation,
+ parameter =
+ #'UpdateLocationArg'{
+ imsi = IMSI,
+ 'msc-Number' = CallingGTBCD,
+ 'vlr-Number' = CallingGTBCD
+ }
+ }
+ }
+ ]
+ },

I was quite happy with the results, even though they were slow, until I saw how it treated maps

-      parameters =
- #{called_party_addr =>
- #sccp_addr{
- ... },
- calling_party_addr =>
- #sccp_addr{
- ... },
- data =>
- #'Continue'{
+ parameters =
+ #{
+ called_party_addr
+ =>
+ #sccp_addr{
+ ...
+ },
+ calling_party_addr
+ =>
+ #sccp_addr{
+ ...
+ },
+ data
+ =>
+ #'Continue'{

I can't say that I easily understand what the parameters are and which the values are with this formatting. It burns in my eyes.

erl_tidy and erl_tidy

So I found two erl_tidy projects, one is included in the Erlang/OTP libraries. The other one seems just to be a rebar3 wrapper around the first one, so I'll just talk about the former one.

Under the hood this library uses erl_prettypr:format/2, which prints the abstract syntax tree. This should work well, but gives weird indentation problems. For instance when it comes to records it will not add a newline before the first field, so the lines will become quite long, and when the lines become close to the paper width of the document then it inserts too many newlines.

Visualising with this example again

-        components =
- [{invoke,
- #'Invoke'{
- invokeID = 1,linkedID = asn1_NOVALUE,
- operationCode = updateLocation,
- parameter =
- #'UpdateLocationArg'{
- imsi = IMSI,
- 'msc-Number' = CallingGTBCD,
- 'vlr-Number' = CallingGTBCD}}}]},
+ components =
+ [{invoke,
+ #'Invoke'{invokeID
+ =
+ 1,
+ linkedID
+ =
+ asn1_NOVALUE,
+ operationCode
+ =
+ updateLocation,
+ parameter
+ =
+ #'UpdateLocationArg'{imsi
+ =
+ IMSI,
+ 'msc-Number'
+ =
+ CallingGTBCD,
+ 'vlr-Number'
+ =
+ CallingGTBCD}}}]},

There are also some issues with erl_prettypr; it throws an exception when there are argumented macro functions.

-define(MACRO(), object).
foo(?MACRO()) ->
ok.
** exception exit: no_translation
in function io:put_chars/3
called as io:put_chars(<0.4843.0>,unicode,
[...])
in call from erl_tidy:output/4 (erl_tidy.erl, line 431)
in call from erl_tidy:write_module/3 (erl_tidy.erl, line 413)
in call from erl_tidy:file_2/2 (erl_tidy.erl, line 335)
in call from erl_tidy:file_1/3 (erl_tidy.erl, line 310)

rebar3_format

I had an issue when installing this plugin. It was not as easy as adding rebar3_format to plugins in the rebar3 config. The reason I had problems with it was that the plugin depends on inaka/katana_code which for some reason did not get pulled in properly and was missing some vital files. The issue could be resolved by deleting the user rebar3 cache (rm -rf ~/.cache/rebar3/) as explained in this issue

After installation you need to specify where the source files for formatting can be found. This would probably not be needed if we did not use an Erlang umberella project (an umberella project is when there are subapplications residing in your main application). Here is where I found out that the command line option --files apps/**/{src,include}/*.?rl is apparantly not the same as specifying {format, [{files, [“apps/**/{src,include}/*.?rl”]}]} in the config. The command line options finds only one file, while the config parameter works as expected.

Formatting-wise it is similar to erl_tidy. This is because it uses inakas katana_code which in its turn uses erl_tidy.

-           components =
- [{invoke,
- #'Invoke'{
- invokeID = 1,linkedID = asn1_NOVALUE,
- operationCode = updateLocation,
- parameter =
- #'UpdateLocationArg'{
- imsi = IMSI,
- 'msc-Number' = CallingGTBCD,
- 'vlr-Number' = CallingGTBCD}}}]},
+ components =
+ [{invoke,
+ #'Invoke'{invokeID = 1,
+ linkedID =
+ asn1_NOVALUE,
+ operationCode =
+ updateLocation,
+ parameter =
+ #'UpdateLocationArg'{imsi
+ =
+ IMSI,
+ 'msc-Number'
+ =
+ CallingGTBCD,
+ 'vlr-Number'
+ =
+ CallingGTBCD}}}]},

Problem is that both erl_tidy and katana_code have multiple issues with macros. It is hard to process format code which include macros without preprocessing the macros.

elvis

This is a bonus; it is not a formatter but a linter.

One difference between formatters and linters are that formatters change the code into a uniform format, and linters warn or fail when rules are broken. Linters can also check other things as nesting level.

This article by Brujo Benavides describe it pretty well.

Are formatters better than linters?

Setting it up you need to configure a ruleset and save in a special Elvis config file in the repo. This config specifies which linting rules to apply to which files.

For me it took 8-9 minutes for it to execute linting on our code base with the example ruleset that is proposed by the tool.

Summary

Sadly I couldn't find any good alternatives that fits our purposes. There are issues with macros, or execution time.

I have to put this on the shelf again for a while, with just a dream of uniform code.

Edit: 2020-10-01

I forgot to update this blogpost, but something amazing happend.

After I wrote this blog post I contacted the maintainer of steamroller, and told him about the slowness I experienced, and some other difficulties. One day later he had found and fixed an algorithm going from an O(n^2) complexity to an O(n). When retrying it on our code base, things went from minutes to seconds!

He also removed and improved some of the ambiguous configuration parameters.

I haven't yet started to look at the map formatting issues I had, but maybe that is improved as well. Anyway I'll have to leave that for another hackday.