blog

Implementing Haskell's $ in OCaml

# April 17, 2012

Haskell provides a convenient function application operator, $, that can improve code readability and concision. This post discusses the problems that arise in trying to implement an identical operator in OCaml.

Why do we need an application operator?

In most situations, we don’t. Having said that, $ can be useful – its right-associativity sometimes obviates the need for parentheses which can result in cleaner code. As the Haskell documentation states:

This operator is redundant, since ordinary application (f x) means the same as (f $ x). However, $ has low, right-associative binding precedence, so it sometimes allows parentheses to be omitted.

It turns out that implementing $ in OCaml is an interesting undertaking in itself.

The shape of things to come

When we’re done, the following code should compile and print “7” upon execution:

By definition of $, res [1; 2; 3] is equivalent to the expression string_of_int (succ (sum [1; 2; 3])).

Naive approach

As a first pass we could try implementing $ as follows:

# let ($) f x = f x;;
val ( $ ) : ('a -> 'b) -> 'a -> 'b = <fun>

Testing on a simple case suggests that this implementation is a suitable analogue of Haskell’s operator:

# let sum xs = List.fold_left (+) 0 xs;;
val sum : int list -> int = <fun>
# succ $ sum [1; 2; 3];;
- : int = 7

However, problems become apparent when we attempt to chain together more than two functions. As an example, let’s try out the body of res:

# string_of_int $ succ $ sum [1; 2; 3];;
Error: This expression has type int -> int
       but an expression was expected of type int

The reported error indicates that string_of_int was applied to a function from int -> int rather than an int as intended: our operator is left-associative. string_of_int is being passed the succ function rather than the result of applying succ to the sum of [1; 2; 3].

Perhaps this is unsurprising given that we didn’t specify associativity anywhere. Here’s the interesting bit: the associativity of our operator is a direct consequence of its name. To clarify, we’ll need to consult the operator associativity table in the language manual. The relevant row is repeated below:

  • Construction or operator
    • =... <... >... |... &... $...
  • Associativity: left

In other words: any operator identifier beginning with $ is left associative. To get the right-associativity, a right-associative prefix must be used:

  • Construction or operator
    • @... ^...
    • **... lsl lsr asr
  • Associativity: right

So to fix our earlier implementation we simply need to rename $; here are two possible implementations that behave correctly:

Whichever we choose, the toplevel expression now evaluates as anticipated:

# string_of_int **$ succ **$ sum [1; 2; 3];;
- : string = "7"
# string_of_int @$ succ @$ sum [1; 2; 3];;
- : string = "7"

As a final note: there’s nothing special about non-prefix $ when used as part of an operator name – it has been retained for the sake of consistency. If you’ve used OCaml Batteries Included, you might find the first operator familiar – in BatPervasives it’s named **>; the leading ** is necessary for the reasons outlined above.

Enter Camlp4

In the previous section we implemented $ but were forced – due to grammatical constraints – to settle for slightly cryptic operator names. Can we do any better? After all, the goal of this post is to compile the snippet provided in the opening section. So far it can only be approximated as follows:

We need a way of telling OCaml that $ is a right-associative operator in spite of its name. If that sentence makes you wonder whether this is even possible given the grammatical constraints outlined above, you’re on the right track – it’s not. Instead, we’ll need to write a syntax extension with the help of Camlp4 that allows us to (in a sense) “violate” the operator naming rules. More accurately: to extend the grammar such that $ is defined as a right-associative operator.

Camlp4whatNow?

To quote the documentation,

Camlp4 is a Pre-Processor-Pretty-Printer for OCaml. It offers syntactic tools (parsers, extensible grammars), the ability to extend the concrete syntax of OCaml (quotations, syntax extensions), and to redefine it from scratch.

Of relevance here is the ability to extend OCaml’s (concrete) syntax.

Writing the extension

While Camlp4 can be somewhat esoteric (the latest version isn’t officially documented), our requirements are simple enough that a simple 4 line module will do the trick.

The above code extends the default OCaml grammar Gram by adding a new rule under the expr entry. This rule

  • extends the expr (expression) entry by inserting the rule at the supplied precedence level – before application;
  • is named applOp (an arbitrarily chosen name);
  • is right-associative (RIGHTA);
  • rewrites f $ x as f x for all expressions (exprs) f, x.

To make use of this extension it must first be compiled – here with the help of ocamlfind,

$ ocamlfind ocamlc -linkpkg -syntax camlp4o \
    -package camlp4.extend -package camlp4.quotations -c appop.ml

and passed to the -pp switch of ocamlc during batch compilation of dollar.ml,

$ ocamlc -pp "camlp4o ./appop.cmo" dollar.ml -o dollar
$ ./dollar
7

To see what’s going on under the hood, we can pre-process dollar.ml and output the (source) result to stdout,

$ camlp4o ./appop.cmo dollar.ml
let sum = List.fold_left ( + ) 0

let res xs = string_of_int (succ (sum xs))

let () = print_endline (res [ 1; 2; 3 ])

Finally, the extension can also be used interactively from the toplevel,

# #use "topfind";;
- : unit = ()
Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

- : unit = ()

# #camlp4o;;
/opt/godi/lib/ocaml/std-lib/dynlink.cma: loaded
/opt/godi/lib/ocaml/std-lib/camlp4: added to search path
/opt/godi/lib/ocaml/std-lib/camlp4/camlp4o.cma: loaded
    Camlp4 Parsing version 3.12.1

# #load "appop.cmo";;

# let sum = List.fold_left (+) 0;;
val sum : int list -> int = <fun>

# string_of_int $ succ $ sum [1; 2; 3];;
- : string = "7"

A health warning

This post is simply meant to introduce the relationship between associativity and operator naming–I don’t meant to suggest that the above syntax extension is the “best” solution. On the contrary, accomodating OCaml’s naming conventions is appealing for two reasons:

  1. Simpler implementation;
  2. Satisfies the grammatical expectations of its clients.

Having said that, it’s both useful and reassuring to know that should we need to “violate” these conventions, Camlp4 is there to help.

Update

A commenter on the original post (@ManInTheMiddle) pointed out that

Considering the non-Camlp4 approach, it seems to me your solution misses the point that the $ operator should also mix well with function composition (Haskell . operator). This operator is left-associative and should be of higher-precedence than function application. So if you choose the **... prefix for $ then you will not find any such operator for composition.Hence I suggest using only @... or ^... for application. Then you can choose *..., /... or %... for composition.

I agree – the above glosses over the relationship between precedence and operator naming. Interestingly, the Batteries project uses (-|) for composition and (**>) for application (see here); the precedence you’d expect coming from Haskell is inverted. We’d need to define a new application operator to address this problem as the commenter suggested:

yielding,

# print_endline -| string_of_int **> succ **> sum [1; 2; 3];;
Error: This expression has type string but an expression was expected of type 'a -> string
# print_endline -| string_of_int ^^ succ ^^ sum [1; 2; 3];;
7

Along similar lines, an improvement to the syntax extension can also be made,

thereby licensing:

# print_endline -| string_of_int $ succ $ sum [1; 2; 3];;
7
- : unit = ()

comments powered by Disqus