# December 17, 2016

Tom Dickey has just released vile 9.8s; this release includes the following changes:

```
20161217 (s)
> Brendan O'Dea:
+ add command-line parsing for "--" token, assumed by visudo in the
1.8.12 - 1.8.16 changes (report by Wayne Cuddy).
> Tom Dickey:
+ recompute majormode order when "after", "before" or "qualifiers" is
modified for a majormode.
+ add yamlmode (discussion with Steve Lembark)
+ modify DSTRING definition in lex-filter to handle continuation lines.
+ modify cfgmode to reduce false-matches with random ".cfg" files.
+ improve ps syntax filter
+ interpret %%BeginData / %%EndData keywords
+ interpret %%BeginPreview / %%EndPreview keywords
+ add ".mcrl2" as suffix for mcrlmode.
+ fixes from test-script: conf, hs, nr, rc, rcs, txt, xq, xml
+ improved regression test-script to check for places where the syntax
filter might have mixed buffered- and unbuffered-calls in the same
state, causing tokens in the markup to "move".
+ remove a statement from flt_putc in the standalone filters that
converted a bare ^A to ^A?.
+ remove escaping from digraphs.rc, since change in 9.7zg made that
both unnecessary and incorrect (reports by Marc Simpson, Brendan
O'Dea).
+ improve tcl syntax filter
+ color backslash-escapes in double-quotes.
+ add rules to handle regexp and regsub regular expressions. This
does not yet handle -regexp switch cases.
+ add call to flt_bfr_error to flag unbalanced quotes here and in
a few other filters.
+ modify newline patterns to allow for cr/lf endings in continuations
+ add special case for literals like "{\1}" and "{\\1}".
+ add special case for html entities such as "{{}" and "{&foo;}"
+ improve sh syntax filter
+ allow quoted strings within '${' parameter, a detail that can
happen with ksh brace groups (report by j. van den hoff).
+ handle ksh's "ANSI C quotes", i.e., "$'xxx'" using single quotes
after a dollar sign.
+ use the ksh ("-K") option for bashmode and zshmode syntax.
+ interpret "$name" within '${' parameter
+ don't warn for inline-here documents
+ handle special case where matching tag for a here-document is on
the same line as a closing ")" in $(xxx) command.
+ highlight ksh's "[[", "((", "$((" bracketing like "{".
+ handle ksh's "((" and "$((" arithmetic expressions.
+ handle ksh's base#value numbers
+ improve perl syntax highlighter:
+ fix state used to guess where a pattern might occur, e.g., after
an "if" keyword with no preceding operator to account for line
breaks.
+ correct a check for illegal numbers, which flagged hexadecimal
numbers containing "e".
+ distinguish special case of "format =" vs "format =>".
+ allow pod to begin without a preceding blank line, but warn.
+ allow for case where pod mode is turned on/off with only one blank
line between the directives.
+ check for simple patterns that may follow operators such as "map".
+ allow '$', '+' or '&' as a quote or substitution delimiter
+ allow angle brackets for quotes after 'q', etc.
+ fix highlighting when square-brackets are used as delimiters in a
perl substitution, e.g., s[foo[bar]xxx][yyy]
+ quiet some unnecessary compiler warnings with glibc > 2.20 by adding
_DEFAULT_SOURCE as needed.
+ improve version-comparison for "new" flex to allow for 2.6.0, and
accept that for built-in filters. Also modify filters/mk-2nd.awk
to work with "new" flex ifdef's to ignore yywrap (Debian #832973).
+ correct long-name for filename-ic mode (report Marc Simpson).
```

See here for further information.

comments powered by Disqus ]]>

# July 28, 2016

Tom Dickey has just released vile 9.8r; you can browse the changes here.

This release offers significant improvements and bug fixes; if you’re a vile user, I recommend upgrading.

comments powered by Disqus ]]>

# May 13, 2016

*or* “Wait, what? Tail call optimisation in awk?”

This post covers tail call optimisation (TCO) behaviour in three common awk implementations^{1}: gawk, mawk and nawk (*AKA* the one true awk).

None of the three implement full TCO, while `gawk`

alone provides self-TCO. The bulk of this post will therefore be devoted to gawk’s implementation and related pitfalls.

Let’s begin with a simple awk script that defines a single function, `recur`

, called from the `BEGIN`

block:

```
$ nawk 'function recur() {return recur()} BEGIN {recur()}'
Segmentation fault: 11
$ mawk 'function recur() {return recur()} BEGIN {recur()}'
Segmentation fault: 11
$ gawk 'function recur() {return recur()} BEGIN {recur()}'
# ...runs indefinitely [?]...
```

Note the difference in behaviour here: nawk and mawk blow the stack and segfault while gawk cheerily continues running. Thanks gawk.

But wait! Gawk is actually dynamically allocating additional stack frames—so long as there’s memory (and swap) to consume, gawk will gobble it up and our script will plod on. Below, the first 30 seconds of (virtual) memory consumption are charted^{2}:

Whoops!

In order to obtain (self-)TCO and spare your poor swap partition, gawk provides the `-O`

switch,

```
$ gawk -O 'function foo() {return recur()} BEGIN {recur()}'
# ...runs indefinitely; air conditioning no longer required...
```

and lo and behold,

What about full TCO? Let’s expand our one liner a little to include a trampoline call:

`$ gawk -O 'function go() {return to()} function to() {return go()} BEGIN {go()}'`

and chart memory consumption again,

Bugger. So, it looks like gawk isn’t keen on full blown TCO. Time to find out why.

We’ve just seen that gawk seems to optimise self-calls in tail position when the `-O`

flag is specified. To better understand this functionality we can dump opcodes from the trampoline case and take a look under the hood:

```
$ echo 'function go() {return to()} function to() {return go()} BEGIN {go()}' > /tmp/trampoline.awk
$ gawk --debug -O -f /tmp/trampoline.awk
gawk> dump
# BEGIN
[ 1:0x7fc00bd022e0] Op_rule : [in_rule = BEGIN] [source_file = /tmp/trampoline.awk]
[ 1:0x7fc00bd02360] Op_func_call : [func_name = go] [arg_count = 0]
[ :0x7fc00c800f60] Op_pop :
[ :0x7fc00c800e20] Op_no_op :
[ :0x7fc00c800ea0] Op_atexit :
[ :0x7fc00c800f80] Op_stop :
[ :0x7fc00c800e60] Op_no_op :
[ :0x7fc00bd01e00] Op_after_beginfile :
[ :0x7fc00c800e40] Op_no_op :
[ :0x7fc00c800e80] Op_after_endfile :
# Function: go ()
[ 1:0x7fc00bd01f20] Op_func : [param_cnt = 0] [source_file = /tmp/trampoline.awk]
[ 1:0x7fc00bd020a0] Op_func_call : [func_name = to] [arg_count = 0]
[ 1:0x7fc00bd01fb0] Op_K_return :
[ :0x7fc00c800ee0] Op_push_i : Nnull_string [MALLOC|STRING|STRCUR|NUMCUR|NUMBER]
[ :0x7fc00c800f00] Op_K_return :
# Function: to ()
[ 1:0x7fc00bd02130] Op_func : [param_cnt = 0] [source_file = /tmp/trampoline.awk]
[ 1:0x7fc00bd02270] Op_func_call : [func_name = go] [arg_count = 0]
[ 1:0x7fc00bd021f0] Op_K_return :
[ :0x7fc00c800f20] Op_push_i : Nnull_string [MALLOC|STRING|STRCUR|NUMCUR|NUMBER]
[ :0x7fc00c800f40] Op_K_return :
```

Note the lack of a distinct *jump* or *tailcall* opcode; instead, even with the optimiser turned on, `go`

and `to`

are performing `Op_func_call`

s. Hmm, okay; we’ll see a different opcode in our original `recur`

case, though, right? Wrong:

```
$ echo 'function recur() {return recur()} BEGIN {recur()}' > /tmp/recur.awk
$ gawk --debug -O -f /tmp/recur.awk
gawk> dump
# BEGIN
[ 1:0x7fc1d0408ef0] Op_rule : [in_rule = BEGIN] [source_file = /tmp/recur.awk]
[ 1:0x7fc1d0408f80] Op_func_call : [func_name = recur] [arg_count = 0]
[ :0x7fc1d0802120] Op_pop :
[ :0x7fc1d0802020] Op_no_op :
[ :0x7fc1d08020a0] Op_atexit :
[ :0x7fc1d0802140] Op_stop :
[ :0x7fc1d0802060] Op_no_op :
[ :0x7fc1d0408bc0] Op_after_beginfile :
[ :0x7fc1d0802040] Op_no_op :
[ :0x7fc1d0802080] Op_after_endfile :
# Function: recur ()
[ 1:0x7fc1d0408ce0] Op_func : [param_cnt = 0] [source_file = /tmp/recur.awk]
[ 1:0x7fc1d0408e60] Op_func_call : [func_name = recur] [arg_count = 0]
[ 1:0x7fc1d0408d70] Op_K_return :
[ :0x7fc1d08020e0] Op_push_i : Nnull_string [MALLOC|STRING|STRCUR|NUMCUR|NUMBER]
[ :0x7fc1d0802100] Op_K_return :
```

`¯\_(ツ)_/¯`

Time to dig around gawk’s grammar definition. Here’s `return`

, defined in `awkgram.y`

:

```
| LEX_RETURN
{
if (! in_function)
yyerror(_("`return' used outside function context"));
} opt_exp statement_term {
if ($3 == NULL) {
$$ = list_create($1);
(void) list_prepend($$, instruction(Op_push_i));
$$->nexti->memory = dupnode(Nnull_string);
} else {
if (do_optimize
&& $3->lasti->opcode == Op_func_call
&& strcmp($3->lasti->func_name, in_function) == 0
) {
/* Do tail recursion optimization. Tail
* call without a return value is recognized
* in mk_function().
*/
($3->lasti + 1)->tail_call = true;
}
$$ = list_append($3, $1);
}
$$ = add_pending_comment($$);
}
```

Take a closer look at the code following that comment:

```
if (do_optimize
&& $3->lasti->opcode == Op_func_call
&& strcmp($3->lasti->func_name, in_function) == 0
) { /* ... */
($3->lasti + 1)->tail_call = true; /* <--- */
}
```

In other words, during a `return`

gawk:

- Checks whether the
`do_optimize`

flag (`-O`

) is specified. - If so, it checks whether the previous instruction is an
`Op_func_call`

. - If that call is to a function with the same name as the current one,
- …the
`tail_call`

flag is set.

So it goes.

Here’re a few takeaways from the above^{3}:

- Don’t rely on TCO if you’re writing awk.
- Just don’t.
- If you
*do*need TCO, make sure you’re using gawk- Be sure to specify the
`-O`

flag otherwise you’ll need to buy a new fan, - and make sure you’re not trampolining as the optimiser won’t be of any help.

- Be sure to specify the

Personally, I’ll be sticking with nawk.

comments powered by Disqus ]]>

# February 24, 2013

A while back, I discussed an implementation of the application operator (Haskell’s `$`

) in OCaml. In the closing section of that post, a couple of problems were raised regarding treatment of associativity and composition in *OCaml Batteries*. These issues have been addressed in *Batteries* 2.0, released in January 2013; the improvements are outlined here.

Batteries 1.x defines the following operators for composition and application:

`val ( -| ) : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b`

- “Function composition.
`f -| g`

is`fun x -> f (g x)`

. Mathematically, this is operator`o`

.”

- “Function composition.
`val ( **> ) : ('a -> 'b) -> 'a -> 'b`

- “Function application.
`f **> x`

is equivalent to`f x`

. This [operator] may be useful for composing sequences of function calls without too many parenthesis.”

- “Function application.

The problem pointed out in the comments section (now the closing *Update*) of the post was that

the precedence you’d expect coming from Haskell is inverted. We’d need to define a new application operator to address this problem as the commenter suggested…

Specifically, the following code sample was shown to exhibit surprising behaviour for anyone familiar with Haskell’s `.`

and `$`

,

```
# print_endline -| string_of_int **> succ **> sum [1; 2; 3];;
Error: This expression has type string but an expression was expected of type 'a -> string
```

In Batteries 2.0, both operators have been renamed:

- Composition:
`(-|)`

is now`(%)`

- Application:
`(**>)`

is now`(@@)`

To appreciate the behaviour of the new operators, we can once again consult (a subsection of) the operator associativity table from the language manual:

Construction or operator | Associativity |
---|---|

`*...` `/...` `%...` `mod` `land` `lor` `lxor` |
left |

`+...` `-...` |
left |

`::` |
right |

`@...` `^...` |
right |

`=...` `<...` `>...` `|...` `&...` `$...` |
left |

A couple of things are worth noting:

- The associativity of the application operator has changed.
`(**>)`

is left associative,`(@@)`

right.- As discussed in the original post, right associativity is an integral feature of the application operator.

- In Batteries 1.x, application had higher precedence than composition.
`(**>)`

is covered by the first row of the table,`(-|)`

by the second.- As of Batteries 2.0, this precedence has been inverted.

Reworking the above code sample for Batteries 2.0 is trivial—simply substitute `(%)`

for `(-|)`

and `(@@)`

for `(**>)`

. With these changes in place, the code behaves as follows:

```
# print_endline % string_of_int @@ succ @@ sum [1; 2; 3];;
7
- : unit = ()
```

Crucially, the application operator has lower precedence than the composition operator and is right associative.

Returning to the definition of `$`

referenced at the outset of the original post,

This operator is redundant, since ordinary application

`(f x)`

means the same as`(f $ x)`

. However,`$`

has low, right-associative binding precedence, so it sometimes allows parentheses to be omitted.

it should be apparent that these new operators closely conform to Haskell’s treatment of application and composition (in particular, associativity and precedence), allowing for a straightforward translation of the above expression:

```
Prelude> putStrLn . show $ succ $ sum [1, 2, 3]
7
```

comments powered by Disqus ]]>

# February 17, 2013

I’ve migrated this blog from Posterous to Hakyll following the (long anticipated) announcement that Posterous will be shut down in April. Unfortunately, comments for old posts have been lost.

Hopefully folks like the new layout; mobile stylesheet coming soon.

comments powered by Disqus ]]>

# April 22, 2012

In the previous post I introduced the `$`

operator to OCaml using two different approaches:

- Renaming the operator to
`**$`

or`@$`

in order to achieve the necessary associativity. - Leveraging Camlp4 to provide a syntax extension.

In the comments section^{1}, variants of these operators were provided that mirror Haskell’s relative precedence of application and composition.

As a postscript, I thought it might be interesting to look at the implementation of `$`

in Standard ML. Here it is in the SML/NJ toplevel,

```
Standard ML of New Jersey v110.74 [built: Wed Apr 11 13:33:07 2012]
- infixr 0 $;
infixr $
- fun (f $ x) = f x;
val $ = fn : ('a -> 'b) * 'a -> 'b
```

Of note:

- Standard ML lets us specify the associativity of newly defined operators explicitly (using the
`infix*`

fixity directives) whereas OCaml follows an operator naming convention. - As such, we have no need to fall back on syntax extensions here;
`$`

is a valid name for a right associative operator.

To replicate the target example of the previous post we’ll need to define a few utilities,

```
- fun succ x = x + 1;
val succ = fn : int -> int
- val sum = foldl op+ 0;
val sum = fn : int list -> int
- fun printLn str = print $ str ^ "\n";
val printLn = fn : string -> unit
```

Note that `printLn`

is defined using `$`

; the standard approach would be `fun printLn str = print (str ^ "\n")`

.

With these definitions in place, we can employ `$`

to print the desired result:

```
- printLn $ Int.toString $ succ $ sum [1, 2, 3];
7
val it = () : unit
```

Finally, since `$`

was defined with a precedence level of `0`

, it interacts correctly with SML’s composition operator, `o`

, which has a precedence of `3`

(as per the Standard):

```
- printLn o Int.toString $ succ $ sum [1, 2, 3];
7
val it = () : unit
```

See the closing

*Update*.↩

comments powered by Disqus ]]>

# April 17, 2012

Haskell provides a convenient function application operator, `$`

, that can improve code readability and concision. This post discusses the problems that arise in trying to implement an identical operator in OCaml.

In most situations, we don’t. Having said that, `$`

can be useful – its right-associativity sometimes obviates the need for parentheses which can result in cleaner code. As the Haskell documentation states:

This operator is redundant, since ordinary application (f x) means the same as (f $ x). However, $ has low, right-associative binding precedence, so it sometimes allows parentheses to be omitted.

It turns out that implementing `$`

in OCaml is an interesting undertaking in itself.

When we’re done, the following code should compile and print “7” upon execution:

```
let sum = List.fold_left (+) 0
let res xs = string_of_int $ succ $ sum xs
let () = print_endline $ res [1; 2; 3]
```

By definition of `$`

, `res [1; 2; 3]`

is equivalent to the expression `string_of_int (succ (sum [1; 2; 3]))`

.

As a first pass we could try implementing `$`

as follows:

```
# let ($) f x = f x;;
val ( $ ) : ('a -> 'b) -> 'a -> 'b = <fun>
```

Testing on a simple case suggests that this implementation is a suitable analogue of Haskell’s operator:

```
# let sum xs = List.fold_left (+) 0 xs;;
val sum : int list -> int = <fun>
# succ $ sum [1; 2; 3];;
- : int = 7
```

However, problems become apparent when we attempt to chain together more than two functions. As an example, let’s try out the body of `res`

:

```
# string_of_int $ succ $ sum [1; 2; 3];;
Error: This expression has type int -> int
but an expression was expected of type int
```

The reported error indicates that `string_of_int`

was applied to a function from `int -> int`

rather than an `int`

as intended: our operator is *left-associative*. `string_of_int`

is being passed the `succ`

function rather than the result of applying `succ`

to the `sum`

of `[1; 2; 3]`

.

Perhaps this is unsurprising given that we didn’t specify associativity anywhere. Here’s the interesting bit: the associativity of our operator is a direct consequence of its *name*. To clarify, we’ll need to consult the operator associativity table in the language manual. The relevant row is repeated below:

- Construction or operator
`=... <... >... |... &... $...`

- Associativity: left

In other words: any operator identifier beginning with `$`

is left associative. To get the right-associativity, a right-associative prefix must be used:

- Construction or operator
`@... ^...`

`**... lsl lsr asr`

- Associativity:
*right*

So to fix our earlier implementation we simply need to rename `$`

; here are two possible implementations that behave correctly:

Whichever we choose, the toplevel expression now evaluates as anticipated:

```
# string_of_int **$ succ **$ sum [1; 2; 3];;
- : string = "7"
# string_of_int @$ succ @$ sum [1; 2; 3];;
- : string = "7"
```

As a final note: there’s nothing special about non-prefix `$`

when used as part of an operator name – it has been retained for the sake of consistency. If you’ve used *OCaml Batteries Included*, you might find the first operator familiar – in BatPervasives it’s named `**>`

; the leading `**`

is necessary for the reasons outlined above.

In the previous section we implemented `$`

but were forced – due to grammatical constraints – to settle for slightly cryptic operator names. Can we do any better? After all, the goal of this post is to compile the snippet provided in the opening section. So far it can only be approximated as follows:

```
let sum = List.fold_left (+) 0
let res xs = string_of_int @$ succ @$ sum xs
let () = print_endline @$ res [1; 2; 3]
```

We need a way of telling OCaml that `$`

is a right-associative operator *in spite of its name*. If that sentence makes you wonder whether this is even possible given the grammatical constraints outlined above, you’re on the right track – it’s not. Instead, we’ll need to write a *syntax extension* with the help of *Camlp4* that allows us to (in a sense) “violate” the operator naming rules. More accurately: to extend the grammar such that `$`

is defined as a right-associative operator.

To quote the documentation,

Camlp4 is a Pre-Processor-Pretty-Printer for OCaml. It offers syntactic tools (parsers, extensible grammars), the ability to extend the concrete syntax of OCaml (quotations, syntax extensions), and to redefine it from scratch.

Of relevance here is the ability to extend OCaml’s (concrete) syntax.

While Camlp4 can be somewhat esoteric (the latest version isn’t officially documented), our requirements are simple enough that a simple 4 line module will do the trick.

```
open Camlp4.PreCast.Syntax
EXTEND Gram
expr: BEFORE "apply"
[ "applOp" RIGHTA [ f = expr; "$"; x = expr -> <:expr<$f$ $x$>> ]];
END
```

The above code extends the default OCaml grammar `Gram`

by adding a new rule under the `expr`

entry. This rule

- extends the
`expr`

(expression) entry by inserting the rule at the supplied precedence level – before application; - is named
*applOp*(an arbitrarily chosen name); - is right-associative (
`RIGHTA`

); - rewrites
`f $ x`

as`f x`

for all expressions (`expr`

s)`f`

,`x`

.

To make use of this extension it must first be compiled – here with the help of `ocamlfind`

,

```
$ ocamlfind ocamlc -linkpkg -syntax camlp4o \
-package camlp4.extend -package camlp4.quotations -c appop.ml
```

and passed to the `-pp`

switch of `ocamlc`

during batch compilation of `dollar.ml`

,

```
$ ocamlc -pp "camlp4o ./appop.cmo" dollar.ml -o dollar
$ ./dollar
7
```

To see what’s going on under the hood, we can pre-process `dollar.ml`

and output the (source) result to `stdout`

,

```
$ camlp4o ./appop.cmo dollar.ml
let sum = List.fold_left ( + ) 0
let res xs = string_of_int (succ (sum xs))
let () = print_endline (res [ 1; 2; 3 ])
```

Finally, the extension can also be used interactively from the toplevel,

```
# #use "topfind";;
- : unit = ()
Findlib has been successfully loaded. Additional directives:
#require "package";; to load a package
#list;; to list the available packages
#camlp4o;; to load camlp4 (standard syntax)
#camlp4r;; to load camlp4 (revised syntax)
#predicates "p,q,...";; to set these predicates
Topfind.reset();; to force that packages will be reloaded
#thread;; to enable threads
- : unit = ()
# #camlp4o;;
/opt/godi/lib/ocaml/std-lib/dynlink.cma: loaded
/opt/godi/lib/ocaml/std-lib/camlp4: added to search path
/opt/godi/lib/ocaml/std-lib/camlp4/camlp4o.cma: loaded
Camlp4 Parsing version 3.12.1
# #load "appop.cmo";;
# let sum = List.fold_left (+) 0;;
val sum : int list -> int = <fun>
# string_of_int $ succ $ sum [1; 2; 3];;
- : string = "7"
```

This post is simply meant to introduce the relationship between associativity and operator naming–I don’t meant to suggest that the above syntax extension is the “best” solution. On the contrary, accomodating OCaml’s naming conventions is appealing for two reasons:

- Simpler implementation;
- Satisfies the grammatical expectations of its clients.

Having said that, it’s both useful and reassuring to know that should we need to “violate” these conventions, Camlp4 is there to help.

A commenter on the original post (`@ManInTheMiddle`

) pointed out that

Considering the non-Camlp4 approach, it seems to me your solution misses the point that the

`$`

operator should also mix well with function composition (Haskell`.`

operator). This operator is left-associative and should be of higher-precedence than function application. So if you choose the`**...`

prefix for`$`

then you will not find any such operator for composition.Hence I suggest using only`@...`

or`^...`

for application. Then you can choose`*...`

,`/...`

or`%...`

for composition.

I agree – the above glosses over the relationship between precedence and operator naming. Interestingly, the Batteries project uses `(-|)`

for composition and `(**>)`

for application (see here); the precedence you’d expect coming from Haskell is inverted. We’d need to define a new application operator to address this problem as the commenter suggested:

yielding,

```
# print_endline -| string_of_int **> succ **> sum [1; 2; 3];;
Error: This expression has type string but an expression was expected of type 'a -> string
# print_endline -| string_of_int ^^ succ ^^ sum [1; 2; 3];;
7
```

Along similar lines, an improvement to the syntax extension can also be made,

```
open Camlp4.PreCast.Syntax
EXTEND Gram
expr: BEFORE "^"
[ "applOp" RIGHTA [ f = expr; "$"; x = expr -> <:expr><$f$>> ]];
END
```

thereby licensing:

```
# print_endline -| string_of_int $ succ $ sum [1; 2; 3];;
7
- : unit = ()
```

comments powered by Disqus ]]>

# March 26, 2012

In the previous post we looked at the *Functor* structure; now we turn to *Applicative Functor*. One important enrichment is that applicatives allow for the application of functions that are in some context `t`

to values in the same context.

Once more, a couple of definitions to get things rolling,

- Given a function in context,
`val f: ('a -> 'b) t`

,*Applicative Functor*provides an operation,`<*>`

, such that`f`

can be applied to values in like context:`val (<*>) : ('a -> 'b) t -> 'a t -> 'b t`

. - Given a value
`'a`

, the*Applicative Functor*function`pure`

will lift`'a`

into the appropriate (“default” or pure) context:`val pure: 'a -> 'a t`

.

Something that I glossed over last time around are the laws associated with the structures under discussion. A module can implement a signature validly while violating these laws. For *Functor*, the laws are as follows:

- Mapping the identity function over a value in context
`t`

should have no effect – i.e.,`fmap id = id`

.- For example,
`OptionFunctor.fmap (fun x -> x) (Some 5) = Some 5`

and`EitherStringFunctor.fmap (fun x -> x) (Left "Oops") = Left "Oops"`

.

- For example,
- Mapping the composition of functions
`g`

and`h`

over a value in context`t`

should be equivalent to mapping function`g`

over this value, then mapping function`h`

over the output (i.e., the composition of mapping`g`

and mapping`h`

).

The salient law for *Applicative Functor* is addressed below.

Let’s begin by providing the signature for an *Applicative Functor*, defined in terms of `Functor`

:

```
module type Functor = sig
type 'a t
val fmap: ('a -> 'b) -> 'a t -> 'b t
end
module type ApplicativeFunctor = sig
include Functor
val pure: 'a -> 'a t
val (<*>): ('a -> 'b) t -> 'a t -> 'b t
val (<$>): ('a -> 'b) -> 'a t -> 'b t
end
```

In addition to the signatures mentioned in the introduction, we also define a function (operator) named `<$>`

with a signature identical to `fmap`

. Strictly speaking, this should be declared in the `Functor`

signature… since we’ll only be using it with applicatives, though, the above factoring will suffice.

As before, we’ll start out by implementing something simple – an *Applicative Functor* module for `option`

s. To save some space, only incremental additions will be supplied inline. Here’re our two implementations alongside a simple demo,

```
module OptionFunctor : Functor
with type 'a t = 'a option = struct
type 'a t = 'a option
let fmap f = function
Some a -> Some (f a)
| _ -> None
end
module OptionApplicativeF : ApplicativeFunctor
with type 'a t = 'a option = struct
include OptionFunctor
let pure x = Some x
let (<*>) f a = match (f, a) with
(Some f, Some a) -> Some (f a)
| _ -> None
let (<$>) = fmap
end
module Demo1(A: ApplicativeFunctor) = struct
include A
let double x = x * 2
let eg1 x = double <$> x
let eg2 x y = pure (+) <*> x <*> y
let eg3 x y = (+) <$> x <*> y
end
module OptionDemo1 = Demo1(OptionApplicativeF)
```

The inheritance relationship indicated in the module signatures is mirrored in in the implementations – `OptionApplicativeF`

is defined in terms of `OptionFunctor`

: `fmap`

is borrowed; `pure`

, `<*>`

and `<$>`

defined explicitly.

Without further ado, the examples in action:

```
# OptionDemo1.eg1 (Some 5);;
- : int OptionDemo1.t = Some 10
# OptionDemo1.eg1 None;;
- : int OptionDemo1.t = None
# OptionDemo1.eg2 (Some 5) (Some 10);;
- : int OptionDemo1.t = Some 15
# OptionDemo1.eg2 (Some 5) None;;
- : int OptionDemo1.t = None
# OptionDemo1.eg2 None (Some 10);;
- : int OptionDemo1.t = None
# OptionDemo1.eg2 None None;;
- : int OptionDemo1.t = None
# OptionDemo1.eg3 (Some 5) (Some 10);;
- : int OptionDemo1.t = Some 15
# OptionDemo1.eg3 (Some 5) None;;
- : int OptionDemo1.t = None
# OptionDemo1.eg3 None (Some 10);;
- : int OptionDemo1.t = None
# OptionDemo1.eg3 None None;;
- : int OptionDemo1.t = None
```

Notice how `Demo.eg2`

and `Demo.eg3`

are functionally identical in spite of the variation in their definitions. Where `eg2`

raises `+`

into context `t`

with `pure`

before applying `<*>`

, `eg3`

simply `fmap`

s via the convenience operator `<$>`

. This is essentially syntactic sugar, allowing for a pipelined style. Here we’ve hit on the salient law for *Applicative Functor*:

`fmap g x = pure g <*> x`

or using `<$>`

:

`g <$> x = pure g <*> x`

Before we leave `OptionApplicativeF`

, consider the following case in which the function itself is absent (i.e., `None`

) in a `<*>`

chain:

```
# None <*> (Some 1) <*> (Some 2);;
- : '_a t = None
```

This follows from definition of `<*>`

: unless both the function and its argument are present (`Some _`

), the result of `<*>`

is `None`

.

To make things more interesting, we’ll re-introduce `either`

and provide an `ApplicativeFunctor`

implementation:

```
type ('a, 'b) either = Left of 'a | Right of 'b
module type Typed = sig type t end
module EitherFunctor(T: Typed) : Functor
with type 'a t = (T.t, 'a) either = struct
type 'a t = (T.t, 'a) either
let fmap f = function
Right r -> Right (f r)
| Left l -> Left l
end
module EitherApplicativeF(T: Typed) : ApplicativeFunctor
with type 'a t = (T.t, 'a) either = struct
include EitherFunctor(T)
let pure x = Right x
let (<*>) f a = match (f, a) with
(Right f, Right a) -> Right (f a)
| (Left err, _) -> Left err
| (_, Left err) -> Left err
let (<$>) = fmap
end
(* Demo1 as before *)
module EitherDemo1 = Demo1(EitherApplicativeF(String))
```

As with `OptionFunctorF`

, the only success condition accounted for by `<*>`

is where both the function and its value are valid – in the `option`

case, `Some _`

and in the `either`

case, `Right _`

. If the first clause of `match`

can be unified, `f`

is applied to `a`

and wrapped with the `Right`

constructor as an indication of success. In the other two cases, the error (typed `T.t`

) is propagated.

The examples in `EitherDemo1`

should yield few surprises,

```
# EitherDemo1.eg1 (Right 5);;
- : int EitherDemo1.t = Right 10
# EitherDemo1.eg3 (Right 5) (Right 10);;
- : int EitherDemo1.t = Right 15
# EitherDemo1.eg3 (Right 5) (Left "Oops");;
- : int EitherDemo1.t = Left "Oops"
# EitherDemo1.eg3 (Left "Um") (Left "Oops");;
- : int EitherDemo1.t = Left "Um"
# EitherDemo1.eg3 (Left "Um") (Right 15);;
- : int EitherDemo1.t = Left "Um"
```

The demonstration functions defined in `Demo1`

share a common semantic purpose: generalising a given function `f`

so that it can apply in a context `t`

. This is known as *lifting*.

By way of example: Consider the `double`

function: this has the signature `val double : int -> int`

; in `Demo1.eg1`

we wish to *lift* `double`

into context `t`

and double whatever value is embedded in this context – in the case of `OptionDemo1`

the context is `option`

, in `EitherDemo1`

, `either`

.

Since we’ve recognised and named this process, all that remains is to implement it. To start with, here’s a module signature that declares three lift operations:

```
module type Lift = sig
type 'a t
val lift: ('a -> 'b) -> 'a t -> 'b t
val lift2: ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t
val lift3: ('a -> 'b -> 'c -> 'd) -> 'a t -> 'b t -> 'c t -> 'd t
end
```

where `lift`

takes a function from `'a -> 'b`

and a single input in context `t`

, mapping to `'b t`

. `lift2`

takes a “two-argument” function (glossing over currying) and performs the same generalisation over a context; similarly for `lift3`

.

We can use an OCaml functor to provide a generic implementation of lifting by leveraging `<$>`

and `<*>`

:

```
module ApplicativeLift(A: ApplicativeFunctor) : Lift
with type 'a t := 'a A.t = struct
include A
let lift f a = f <$> a
let lift2 f a b = f <$> a <*> b
let lift3 f a b c = f <$> a <*> b <*> c
end
```

Finally, let’s compose the signatures of `Lift`

and `ApplicativeFunctor`

to produce a new signature, `Applicative`

,

```
(* Composition of ApplicativeFunctor and Lift *)
module type Applicative = sig
include Lift
include ApplicativeFunctor with type 'a t := 'a t
end
```

Note that this use of destructive substitution (`with type 'a t := 'a t`

) requires OCaml 3.12 – see the documentation for more information.

With the above in place, we can now define a couple of enriched applicative modules (with less unseemly names),

```
module OptionApplicative : Applicative
with type 'a t = 'a option = struct
include OptionApplicativeF
include ApplicativeLift(OptionApplicativeF)
end
module EitherApplicative(T: Typed) : Applicative
with type 'a t = (T.t, 'a) either = struct
module EA = EitherApplicativeF(T)
include EA
include ApplicativeLift(EA)
end
```

And expand on our earlier set of demos,

```
module Demo2(A: Applicative) = struct
include A
let double x = x * 2
let eg1 x = lift double x
let eg2 x y = lift2 (+) x y
let eg3 x = double <$> x
let eg4 x y = (+) <$> x <*> y
let eg5 f x y = lift2 f x y
end
module OptionDemo2 = Demo2(OptionApplicative)
module EitherStringDemo2 = Demo2(EitherApplicative(String))
module EitherIntDemo2 = Demo2(EitherApplicative(struct type t = int end))
```

This composed structure proves to be very useful in practice – as a sketch, here’s an example of lifting a function into `either`

context:

```
# let foo x y = (x * y) - (x + y);;
val foo : int -> int -> int = <fun>
# EitherStringDemo2.eg5 foo (Left "Err") (Right 10);;
- : int EitherStringDemo2.t = Left "Err"
# EitherStringDemo2.eg5 foo (Right 15) (Right 10);;
- : int EitherStringDemo2.t = Right 125
```

As before, I recommend reading the Haskell Typeclassopedia entry on *Applicative Functors*. This in turn links to some excellent resources on the subject and addresses the relationship between Applicatives and Monads.

A full code example can be found here.

comments powered by Disqus ]]>

# March 26, 2012

This post explores the implementation of *Functor* structures (not to be confused with functor) in OCaml.

Let’s begin with some definitions,

- A
*Functor*is (somewhat informally): a structure that provides a*mapping operation*that applies to a*value*in a given*context*, preserving that context. - Put differently: given a function
`val f: 'a -> 'b`

, a*Functor*allows us to apply`f`

to values in some context`t`

such that the mapping operation is of type`'a t -> 'b t`

. - This mapping operation is typically named
`fmap`

and has the type signature`val fmap: ('a -> 'b) -> 'a t -> 'b t`

. The signature of

`fmap`

may be familiar: when`type 'a t = 'a list`

, the`fmap`

function is simply`List.map`

,`# List.map;; - : ('a -> 'b) -> 'a list -> 'b list = <fun>`

The `option`

type seems as good a place to start as any. Let’s state the problem clearly: given some function `f`

that maps from values of type `'a -> 'b`

, how can we apply this `f`

to optional values of type `'a option`

and mantain the optional context? Here’s a preliminary definition of `OptionFunctor`

and some demo code where `f`

is `abs`

:

```
module OptionFunctor = struct
let fmap f = function
Some a -> Some (f a)
| _ -> None
end
module Demo1 = struct
open OptionFunctor
let eg1 = fmap abs (Some 5 )
let eg2 = fmap abs (Some (-5))
let eg3 = fmap abs None
end
```

In keeping with the definition of *Functor*, `OptionFunctor`

provides an `fmap`

function that takes two arguments (we’ll gloss over currying): some function `f`

which maps from `'a -> 'b`

and an argument of type `'a option`

. When the second argument is present (in the form `Some a`

), `f`

is applied to `a`

and wrapped with the `Some`

constructor; otherwise (where the second argument is `None`

), the result is the same.

Less verbosely, `fmap`

has type: `('a -> 'b) -> 'a option -> 'b option`

. Here it is in practice,

```
# Demo1.eg1;;
- : int option = Some 5
# Demo1.eg2;;
- : int option = Some 5
# Demo1.eg3;;
- : int option = None
```

To make this example a little more interesting, let’s introduce another type, `validation`

, and implement a *Functor* instance for it:

```
type 'a validation = Success of 'a | Failure of string
module OptionFunctor = struct
let fmap f = function
Some a -> Some (f a)
| _ -> None
end
module ValidationFunctor = struct
let fmap f = function
Success a -> Success (f a)
| Failure e -> Failure e
end
module Demo1 = struct
open OptionFunctor
let eg1 = fmap abs (Some 5 )
let eg2 = fmap abs (Some (-5))
let eg3 = fmap abs None
end
module Demo2 = struct
open ValidationFunctor
let eg4 = fmap abs (Success 5 )
let eg5 = fmap abs (Success (-5))
let eg6 = fmap abs (Failure "Something went wrong")
end
```

Our validation type has two constructors: `Success`

and `Failure`

– a slight enrichment of the builtin `option`

type through the provision of reason for failure (= absence of value). `ValidationFunctor`

strongly resembles `OptionFunctor`

, the only difference being that a failure string is propagated in the second prong of the `function`

match. The new `Demo2`

module demonstrates usage; calls to `fmap`

look identical, though `validation`

constructors are employed in `Demo2`

:

```
# Demo2.eg4;;
- : int validation = Success 5
# Demo2.eg5;;
- : int validation = Success 5
# Demo2.eg6;;
- : int validation = Failure "Something went wrong"
```

So far, so good – but can we introduce a generic function and use it with both `option`

and `validation`

types? Unsurprisingly, the answer is yes: To do so, let’s introduce a `Functor`

signature and augment our two *Functor* modules with type specifications.

```
type 'a validation = Success of 'a | Failure of string
module type Functor = sig
type 'a t
val fmap : ('a -> 'b) -> 'a t -> 'b t
end
module OptionFunctor = struct
type 'a t = 'a option
let fmap f = function
Some a -> Some (f a)
| _ -> None
end
module ValidationFunctor = struct
type 'a t = 'a validation
let fmap f = function
Success a -> Success (f a)
| Failure e -> Failure e
end
module Demo3(F: Functor) = struct
let eg7 x = F.fmap abs x
let eg8 f x = F.fmap f x
end
module OptionDemo3 = Demo3(OptionFunctor)
module ValidationDemo3 = Demo3(ValidationFunctor)
```

`Demo3`

is a `functor`

in the OCaml sense – a module that is parameterised by another module. Since the parameter `F`

is required to implement the `Functor`

signature (here we’re using structural typing to unify with `OptionDemo`

and `ValidationDemo`

), `fmap`

can be used in the module body. The `abs`

examples simply take a value in context,

```
# OptionDemo3.eg7 (Some 10);;
- : int OptionFunctor.t = Some 10
# OptionDemo3.eg7 None;;
- : int OptionFunctor.t = None
# ValidationDemo3.eg7 (Success 12);;
- : int ValidationFunctor.t = Success 12
# ValidationDemo3.eg7 (Failure "Something went wrong");;
- : int ValidationFunctor.t = Failure "Something went wrong"
```

while `eg8`

allows us to provide a function,

```
# let square x = x * x;;
val square : int -> int = <fun>
# OptionDemo3.eg8 square (Some 4);;
- : int OptionFunctor.t = Some 16
# OptionDemo3.eg8 square None;;
- : int OptionFunctor.t = None
# ValidationDemo3.eg8 square (Success 5);;
- : int ValidationFunctor.t = Success 25
# ValidationDemo3.eg8 square (Failure "Oops");;
- : int ValidationFunctor.t = Failure "Oops"
```

Finally, here’s how we can define a `ListFunctor`

at the toplevel to make use of `Demo3`

:

```
# module ListFunctor = struct type 'a t = 'a list let fmap = List.map end;;
module ListFunctor :
sig type 'a t = 'a list val fmap : ('a -> 'b) -> 'a list -> 'b list end
# module ListDemo3 = Demo3(ListFunctor);;
module ListDemo3 : sig val eg7 : int ListFunctor.t -> int ListFunctor.t end
# ListDemo3.eg7 [1; 2; 3];;
- : int ListFunctor.t = [1; 2; 3]
# ListDemo3.eg7 [1; 2; -3];;
- : int ListFunctor.t = [1; 2; 3]
# ListDemo3.eg7 [];;
- : int ListFunctor.t = []
# ListDemo3.eg8 square [1; 2; 3];;
- : int ListFunctor.t = [1; 4; 9]
# ListDemo3.eg8 square [];;
- : int ListFunctor.t = []
```

Above, we used structural typing to match `OptionFunctor`

and `ValidationFunctor`

against the `Functor`

signature. How about defining these modules with the `Functor`

signature at the outset?

First of all, consider what happens when we attempt to create a new module, `OF`

, by signing `OptionFunctor`

with `Functor`

:

```
# module OF = (OptionFunctor : Functor);;
module OF : Functor
# OF.fmap abs (Some 5);;
Error: This expression has type 'a option
but an expression was expected of type int OF.t
```

In specifying a signature, the type representation in `OptionFunctor`

has been *abstracted over*. The same problem is evidenced in the following code,

```
# module type A = sig type t val id: t -> t end;;
module type A = sig type t val id : t -> t end
# module B = struct type t = int let id x = x end;;
module B : sig type t = int val id : 'a -> 'a end
# B.id;;
- : 'a -> 'a = <fun>
# B.id 10;;
- : int = 10
# module C = (B: A);;
module C : A
# C.id;;
- : C.t -> C.t = <fun>
# C.id 10;;
Error: This expression has type int but an expression was expected of type
C.t
```

By defining module `C`

as having signature `A`

, type `C.t`

has been rendered abstract. What we meant to express is that `C`

is a kind of `A`

… and that `type t = int`

. To achieve this, a sharing constraint is necessary:

```
# module D = (B: A with type t = int);;
module D : sig type t = int val id : t -> t end
# D.id 10;;
- : D.t = 10
```

Returning to `OptionFunctor`

and its cousin `OF`

, let’s look more closely at each module’s signature (by using the assign-to-peek trick in the toplevel):

```
# module T = OptionFunctor;;
module T :
sig
type 'a t = 'a option
val fmap : ('a -> 'b) -> 'a option -> 'b option
end
# module T = OF;;
module T :
sig
type 'a t = 'a OF.t
val fmap : ('a -> 'b) -> 'a t -> 'b t
end
```

As anticipated above, `OF.t`

is abstract; to “fix” `fmap`

we’ll need to add a sharing constraint as follows,

```
# module OF2 = (OptionFunctor : Functor with type 'a t = 'a option);;
module OF2 :
sig
type 'a t = 'a option
val fmap : ('a -> 'b) -> 'a t -> 'b t
end
```

Notice that `t`

has a concrete type in `OF2`

while it remains abstract in `OF`

(as seen through `T`

). The provision of a sharing constraint solves the type incompatibility problem,

```
# OF.fmap abs (Some 5);;
Error: This expression has type 'a option
but an expression was expected of type int OF.t
# OF2.fmap abs (Some 5);;
- : int OF2.t = Some 5
```

So: is it worth explicitly specifying the signature or not in light of the potential confusion caused by sharing constraints? On the one hand, explicit specification buys encapsulation and compile-time safety – the compiler will alert us to missing definitions. On the other, constraint provision is required to prevent nasty surprises from arising – perhaps we could make do with structural typing.

Since ensuring that implementations satisfy the conditions of formal categories (in this case, *Functor*) is desirable, explicit signatures will be employed for the remainder of the post. This is by no means required for what follows though.

As a final step, let’s introduce an `either`

type – more general than `validation`

in that its failure type is parametric – and `Functor`

signatures for all of our implementations:

```
type 'a validation = Success of 'a | Failure of string
type ('a, 'b) either = Left of 'a | Right of 'b
module type Functor = sig
type 'a t
val fmap : ('a -> 'b) -> 'a t -> 'b t
end
module OptionFunctor : Functor
with type 'a t = 'a option = struct
type 'a t = 'a option
let fmap f = function
Some a -> Some (f a)
| _ -> None
end
module ValidationFunctor : Functor
with type 'a t = 'a validation = struct
type 'a t = 'a validation
let fmap f = function
Success a -> Success (f a)
| Failure e -> Failure e
end
module ListFunctor : Functor
with type 'a t = 'a list = struct
type 'a t = 'a list
let fmap = List.map
end
(* Signature of any module providing its type in `t' *)
module type Typed = sig type t end
module EitherFunctor(T: Typed) : Functor
with type 'a t = (T.t, 'a) either = struct
type 'a t = (T.t, 'a) either
let fmap f = function
Right r -> Right (f r)
| Left l -> Left l
end
module Demo3(F: Functor) = struct
let eg7 x = F.fmap abs x
let eg8 f x = F.fmap f x
end
module OptionDemo3 = Demo3(OptionFunctor)
module ValidationDemo3 = Demo3(ValidationFunctor)
module ListDemo3 = Demo3(ListFunctor)
module EitherDemo3 = Demo3(EitherFunctor(String))
```

Here `EitherFunctor`

is an (OCaml) `functor`

(i.e., a module parameterised with another module). The module argument must satisfy the signature `Typed`

by providing a type member `t`

; this is used for the `Left`

(failure) case.

```
# EitherDemo3.eg7 (Right 12);;
- : int EitherFunctor(String).t = Right 12
# EitherDemo3.eg7 (Left "Something went wrong");;
- : int EitherFunctor(String).t = Left "Something went wrong"
# EitherDemo3.eg8 square (Right 12);;
- : int EitherFunctor(String).t = Right 144
# EitherDemo3.eg8 square (Left "Oh dear");;
- : int EitherFunctor(String).t = Left "Oh dear"
```

Why do we need to use a `functor`

here? The answer lies with the arity of `Functor.t`

. Recall that `either`

is defined as,

while `Functor`

expects a type matching

Without the `functor`

, we’d receive a signature mismatch error.

Next time we’ll look at *Applicative Functor*, built on top of the `Functor`

modules described in this post.

For more information on *Functor*, the Haskell Typeclassopedia is an excellent reference.

comments powered by Disqus ]]>

# September 4, 2011

This is a simple extension to improve the output of the ‘hg paths’ command. I originally submitted a patch to the *mercurial-devel* list to achieve the same thing—given that my change alters the format of a builtin command, I don’t expect it to be accepted.

And so, the *prettypaths* extension: hg-prettypaths.

Configuration is minimal:

First, add the following to your `~/.hgrc`

:

Then (optionally) set `--pretty`

as default:

comments powered by Disqus ]]>