Synechepedia

Notes on 1ML

1ML is an experimental dialect of ML. It’s aim and principle distinguishing feature is the unification of the module and core languages into a single level.

I find ML modules important and interesting for reasons I will try to articulate in The Measure of a Module. These notes record aspects and properties of 1ML which I have learned and discovered while studying the theory and implementation of the language.

Comments

  • (; ... ;) designate wrapped (possibly inline) comments
  • ;; designates the start of a comment that extends to the end of a line

Modules

According to one perspective, in 1ML “everything is just (‘a mode of use of’) modules” (Rossberg). Therefore, in thinking through the nature of 1ML it is reasonable that we begin with its modules.

Module definition syntax

To define a named module, we follow this schema:

ModuleName (; : MODULE_TYPE ;) = {
 ;; declarations and definitions
};

The MODULE_TYPE and declarations and defintions are commented out to indicate that they are optional.

The empty module

The minimal module is the empty module. We can assign M to be the empty module thus:

M = {};

Module types

In dialects of ML, every value has a type. Modules are values in 1ML, so they too have a type.

Consider module M with one type declaration and two value definitions:

M = {
  type t = (int, text);
  x = 5;
  y = (x, "five");
};

Since we opted out of specifying a type for M, 1ML will infer the following type for the module:

M : {
  type t = (int, text);
  x : int;
  y : (int, text);
};

Unsurprisingly, the type of a module is just the types of all the values which that module provides together with the types declared in the module.

Note that the module’s type is based on what it provides, not what it contains. This is an important distinction, because we can seal information into modules by specifying module types that omit some of the module’s contents:

M : { type t; y: t; } = {
  type t = (int, text);
  x = 5;
  y = (x, "five");
};

Since the above module specifies a type for the module that excludes x, x is no longer available from outside of the module:

1ML> result = M.y
result : (int, text);
1ML> result = M.x
stdin:1.10-1.13: field `x' unbound in expression

We can also seal in the types, which results in abstract types. To seal in types, we specify the module signature using :>, thus

M :> { type t; y: t; } = {
  type t = (int, text);
  x = 5;
  y = (5, "five");
};

The internal structure of the type M.t is now hidden from the outside world. We’ll see the implications of this in the following section.

The key points to take away form this brief discussion of 1ML modules:

  • A module is a value in 1ML.
  • Every module has a type.
  • The type of a module specifies the module’s public interface.

A concrete example: implementing a stack

Fair warning: the following code is a working implemention that can be run in Rossberg’s prototype interpreter. As Rossberg notes in the README for the project

  • The language is very basic, and misses lots of features you would expect from a real ML, especially datatypes.
  • It is slow and probably buggy. In particular, type checking is implemented very naively, as a direct transliteration of the rules, with no worries about efficiency. Type inference is best considered work in progress. :)

To the list of missing features we can add the critical UI feature of pattern matching.

As a result of these limitations, the following code is less clear than it otherwise could be (and eventually will be!) in a more robust implementation.

Specifying the ADT

One of the most importnat features of modern programming is abstract data types (hereafter, ADTs), which encapsulate some data within a module, providing access to it only through operations that are associated with the module.

(Goguen 1999)

For an ADT essence is form, since ADTs are defined by specification of their interface, and this only gives the externally visible form while saying nothing on the matter of implementation. So let’s recite the specification of a stack, here expressed in terms of the module type:

type STACK = {
  type t a;
  empty 'a : t a;
  push 'a : t a -> a -> t a;

  ;; Only needed to work around a bug in the type checker that's preventing use
  ;; of the clearer and more concise type declaration:
  ;; pop 'a: t a -> opt (t a, a)
  type pair a = (t a, a);

  pop 'a : t a -> opt (pair a);

  isEmpty 'a : t a -> bool;
  size 'a : t a -> int;
};

Uncoded into words:

  • A stack is a type t of some values of another type a.
  • For all types a (the “for all” here signaled by the '), we have an empty stack (of type t a). E.g., if int is a type, then we have an empty stack of type t int.
  • For all types a, if we are given a stack (t a) and a value of type a, then we can push the value onto the stack.
  • Let’s us call the type (t a, a) (which is a pair with a stack as its first element and a value that could be an element of the stack as its second element) pair a.
  • For all types a, if we have a stack t a, we can pop off the first element to get an optional pair some (s, x) : opt (pair a) consisting of the stack remaining after the first element is removed together with the removed element itself. Unless, if the given stack is empty to start with, in which case we end up with none : opt (pair a).
  • For all types a, given a stack t a, we can check if it isEmpty.
  • For all types a, given a stack t a, we can discover it’s size.

This elaboration cheats in some constraints that are not captured in the interface defined by the module’s type. For instance, the type doesn’t tell us that the second item of the pair pop s comes from the stack s, nor does it say that the new stack returned as the first item in that pair must have the returned element removed. These are further specifications that we could encode in a language with dependent types, but part of 1ML’s aim is to realize first class modules without recourse to dependent types.

As a complement to the long term pursuit of fully dependent typing disciplines, programmers in the ML tradition have developed methods of using properly abstracted types to create “trusted kernels”. This practice provides strong guarantees of reliable behavior without having to tangle with the complex (and still nascent) challenges of full dependent typing. (For more on this practice, see Oleg’s Leightweight Static Guarantees).

In our specification of the stack ADT above we say nothing at all about the internal structure of type t a. Consequently, nothing is revealed or claimed about the way the stack is actually implemented or represented inside the module. Therefore, it is impossible for a user of a module satisfying that signature to violate the invariants which we put into our implementation. Before considering examples of this safety in practice, let’s look at the implementation:

Implementing the ADT

Stack :> STACK = {
  type t a = {size: int; content: list a};

  empty 'a = {size = 0; content = List.nil};

  push 'a (s : t a) x =
    let size' = s.size + 1 in
    let content' = s.content in
    {size = size' + 1; content = List.cons x content'};

  peek 'a (s : t a) =
    let content = s.content in
    List.head content;

  type pair a = (t a, a);

  pop 'a (s : t a) =
    let content = s.content in
    caseopt (List.tail content)
      (fun () => Opt.none)
      (fun content' => caseopt (List.head content)
        (fun () => Opt.none)
        (fun x =>
          let size' = s.size - 1 in
          let s' = {size = size'; content = content'} in
          Opt.some (s', x)));

  isEmpty 'a (s : t a) =
    let content = s.content in
    List.isNil content;

  size 'a (s : t a) = s.size
};

The stack is implemented as a record holding the current size of the stack and the content of the stack, represented as a list. There’s nothing interesting going on here, and I suspect this implementation is largely self-explanatory for anyone familiar with typed functional programming.

The likely exception is the definition for pop. The caseopt in use there is the way case analysis is encoded in 1ML in absence of pattern matching. It took me some time to come to grips with how to read these functions, so I offer the following annotated elaboration:

    ;; equivalent to a case statement in Haskell or OCaml, `caseopt` is a
    ;; function which performs case analysis of values of type `opt`.
    caseopt (List.tail content)

      ;; Matches the `none` alternative.
      (fun () => Opt.none)

      ;; Matches the `some x` alternative, binding the value of the
      ;; `x` to the `content` variable in the head of the lambda.
      (fun content' => caseopt (List.head content)

        ;; This is just a nested case analysis for the other optional value
        (fun () => Opt.none)
        (fun x =>
          let size' = s.size - 1 in
          let s' = {size = size'; content = content'} in
          Opt.some (s', x)));

Once pattern matching and record field punning is implemented, pop could look like something like this:

  pop 'a (s : t a) =
    match List.tail s.content with
    | none         => none
    | some content => match List.head content with
      | none   => none
      | some x => some ({size = s.size - 1; content}, x)

TODO On abstract types, sealing, and security

TODO Wait, why do the modules look just like the records?

Types

TODO The size of a types

1ML’s novelty is in unifying the core and module languages from traditional MLs into a single language. The gimmick used to achieve this is a clever elaboration of the module language into System Fω, whereinin which the core language is already at home. However, type inference in System F is undecidable, and the impressive type inference enjoyed in most Haskell and {S,OCa}ml code is achieved by the famous, constrained version of System F known as the Hindley-Milner type system.