A high-level systems programming language

Project maintained by egorelik93 Hosted on GitHub Pages — Theme by mattgraham

Substructural Type Systems

As it turns out, some programming language theorists have recognized the significance of copying and disposal for decades. Programming language theorists have traditionally classified three “structural rules” of expression manipulation in traditional programming languages. First, some notation:

\[x : T\]

means that a variable $x$ of type $T$ is declared.

\[G \vdash e : T\]

means that an expression $e$ of type $T$ is defined in the context of $G$, usually a list of variables and their types available for use in the expression, but in our case may be better thought of as the list of “uses” themselves.

\[\begin{matrix} G \vdash e_1 : T_1 & G \vdash e_2 : T_2 \\ \hline G \vdash e_3 : T_3 \end{matrix}\]

means that if $G \vdash e_1 : T_1$ and $G \vdash e_2 : T_2$ are both true, then $G \vdash e_3 : T_3$ is also true. This is the notation usually used for describing a rule.

Using this notation, we can write the three structural rules.

The contraction rule corresponds to the ability to copy any variable.

\[\begin{matrix} G, x : T_1 \vdash e : T_2 \\ \hline G, x : T_1, x : T_1 \vdash e : T_2 \end{matrix}\]

This rule says that if we can use a variable once, we can use it twice, and through repeated application, any number of times.

The weakening rule corresponds to the ability to dispose any variable.

\[\begin{matrix} G \vdash e : T_2 \\ \hline G, x : T_1 \vdash e : T_2 \end{matrix}\]

This rule says that we can use a variable 0 times in an expression and it will still get disposed.

The last rule is called exchange, and may be somewhat difficult to understand but roughly corresponds to the ability to write a swap function.

\[\begin{matrix} G_1, y : T_2, x : T_1 \vdash e : T_3 \\ \hline G_1, x : T_1, y : T_2 \vdash e : T_3 \end{matrix}\]

Basically, this rule says that variables can be used in any order. Without it, you would be stuck with a pure stack discipline, only being able to use variables in the order you were given them.

Together, the contraction and exchanging rules basically say that the context is just a set of variable declarations, whereas weakening says that we only care about whether or not something is contained in the context; extras don’t matter. These rules are also related to a mathematical concept called a cartesian monoidal category, but I won’t define that here; let’s just say that it helps motivate the idea that these are rules not just of programming languages but mathematical language and logic as well. Look it up if interested.

As I mentioned, these are the rules for traditional programming languages, but here we are interested in the resource interpretation, where not everything is copyable. A type system lacking one or more of these rules is called a substructural type system. Our focus for now will be on copying and disposal; we will return to exchange at a later time. A type system that has exchange but has neither copying nor disposal is called a Linear Type System, named as such because they preserve the number of available resources. This is the most well-studied combination of the rules (other than having all three of course). On the other hand, a type system that lacks contraction/copying but has the other rules is called an Affine Type System, a word that refers to having a fixed number of extra resources but preserving resources aside from that. Affine type systems are perhaps less commonly studied, but are more common in actual programming languages. Rust, for example, does not require you to use every resource, so it has an Affine type system. That is not to say that Affine type systems are necessarily superior to Linear type systems. Most likely this is because of the reliance on exception unwinding in computers today; regardless of whether or not exceptions can be caught, unwinding assumes that variables on the stack can be disposed, or otherwise that would have to be leaked. It is not obvious if linear type systems can coexist with exceptions, whereas with affine type systems that is fairly straightforward. A language with a true linear type system might look something like safe C code, where every use of free or a similar subroutine must be explicit, except that the compiler would complain if any resources were not properly disposed or otherwise used in such a fashion. On the other hand, this would allow for some disposal patterns that are difficult in Rust today, such as an asynchronous disposal or a fallible disposal with a status code as a result.

There are two other constructs that most such programming languages supply, although not strictly related to substructural type systems. One of these should be familiar to most programmers, the type constructor sometimes called a struct or record.

struct MyStruct {
    field1 : i32,
    field2 : i64
}

In a substructural type system, having a resource of a struct type means you have all of the resources contained by the struct, at the same time. That also means you have responsibility for disposing all of these resources.

The rules for structs, using this struct as an example, look like:

\[\begin{matrix} G_1 \vdash e_1 : \texttt{i32} & G_2 \vdash e_2 : \texttt{i64} \\ \hline G_1, G_2 \vdash \texttt{MyStruct}\{ \texttt{field1} : e_1, \texttt{field2} : e_2 \} : \texttt{MyStruct} \end{matrix}\] \[\begin{matrix} G, x : \texttt{i32}, y : \texttt{i64} \vdash e : T \\ \hline G, z : \texttt{MyStruct} \vdash \texttt{let } (x, y) = z \texttt{ in } e : T \end{matrix}\]

The other fundamental type constructor is less well-known as a built-in feature, though techniques to imitate it are common. It is called a discriminated union, or sometimes a variant type, and consists of a fixed number of possible types that a resource could be. Unlike the union type of C fame, a discriminated union always contains a tag identifying which of the types a particular resource actually is. Most languages supporting it will hide the exact implementation details, in favor of a pattern matching feature. Rust calls these types enums, and in many ways these are also a generalization of C-style enums by adding data to each variant of the enum.

enum MyEnum {
    MyVariant1(i32),
    MyVariant2(i64)
}

The rules for enums, using this one as an example, look like:

\[\begin{matrix} G \vdash e_1 : \texttt{i32} \\ \hline G \vdash \texttt{MyEnum::MyVariant1}(e_1) : \texttt{MyEnum} \end{matrix}\] \[\begin{matrix} G \vdash e_2 : \texttt{i64} \\ \hline G \vdash \texttt{MyEnum::MyVariant2}(e_2) : \texttt{MyEnum} \end{matrix}\] \[\begin{matrix} G, x : \texttt{i32} \vdash e_1 : T & G, y : \texttt{i64} \vdash e_2 : T \\ \hline G, z : \texttt{MyEnum} \vdash \texttt{match } z & \{ \texttt{MyEnum::MyVariant1}(x) \texttt{ => } e_1, \texttt{MyEnum::MyVariant2}(y) \texttt{ => } e_2 \} : T \end{matrix}\]

We will at times refer to Rust’s built-in enum type Result.

enum Result<T, E> {
    Ok(T),
    Err(E)
}

Prev Home Next