Make static type checking work for you

(Beta reduction)

When first using programming languages like C, C++, Java, I never stopped to think about what the compilation phase was actually doing, beyond considering it a necessary translation from source code to machine code. The need to annotate variables and methods with types seemed an intrinsic part of that process, and I never gave much thought about what it really meant, or the purpose it served; it was just part of writing and compiling code.

Using dynamic languages changes things, you realize that code may need some form of translation or interpretation for execution, but does not absolutely require type annotations as with statically typed languages. In this light, one reconsiders what a type system is from scratch, how it plays a part in compilation, and what compilation errors and type errors really are.

A type error (I am not talking about statistics) is an important concept to grasp well. In fact not just for programming, it is also a useful concept to illuminate errors in natural language and thinking. Briefly, a type error is treating data as belonging to a kind to which it does not in fact belong. The obvious example is invoking operations on an object that does not support them. Or alternatively, according to this page

type error: an attempt to perform an operation on arguments of the wrong types.

If one tries to perform such illegal operations (and there is no available conversion or coercion), the program can behave unexpectedly or crash. Statically typed languages can detect this error at compile time, which is a good thing. This is one of the main points advocates of statically typed languages make when discussing static and dynamic languages.

But the question is, how big an advantage is this? How important are type errors, what fraction of common programming errors do they make up? And how much of program correctness (in the non strict sense of the term) can a compiler check and ensure?

Compilers and type systems can be seen not just requirements to write programs that do not crash, but also tools with which to express and check problem domain information. The more information about the problem domain can be encoded in the type information in a program, the more the compiler can check for you. In this mindset one adds type information order to exploit the type system, rather than just conform to it.

Here are a few examples where problem domain information is encoded in types in order to prevent errors that otherwise could occur at runtime, and therefore would need specific checks. A concrete example taken from the first post I linked

  • If I call move on a tic-tac-toe board, but the game has finished, I should get a compile-time type-error. In other words, calling move on inappropriate game states (i.e. move doesn’t make sense) is disallowed by the types.
  • If I call takeMoveBack on a tic-tac-toe board, but no moves have yet been made, I get a compile-time type-error.
  • If I call whoWonOrDraw on a tic-tac-toe board, but the game hasn’t yet finished, I get a compile-time type-error.

By encoding these rules of the problem domain into the type system, it is not possible to write a program that violates the rule, logic errors in the program do not compile.

But it is unrealistic to go all the way and say, all the relevant information should be expressible in types, and its really seductive twin: all program errors are type errors. Unfortunately, the real world is not that tidy and convenient, unless you’re doing specialized things like theorem proving, or programming with an exotic programming language like Coq.

As advocates of dynamic languages know very well, there is no subsitute for unit testing and integration tests. The compiler should not make you feel safe enough to disregard testing. Compile-time checking and testing are complementary, not mutually exclusive. Compile-time checking does not eliminate the need for testing, nor does testing eliminate the benefits of compile-time checking.

Still, there is most probably room left to express more relevant information in the type system and making the compiler do more work for you. And this becomes even more plausible if you’re using a language with such a powerful type system as Scala. More on these matters: Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages.

Type classes as a decoupling mechanism

Say you have a class T and a method m. represents an operation that is applicable to many types, including T. In good OO practice, the only information needs about the types it operates on is precisely that part of the types’ nature that makes m applicable to them. So you create an abstraction representing that nature, and define m according to it. In turn, classes like T conform to that abstraction.

If the abstraction is an interface, we implement that interface. If the abstraction is a class, we inherit that class. Both of these mechanisms will work, they are a standard solution to the problem. But even though the use of an abstraction reduces the coupling between T and m, it still exists. If you want an existing class to be usable with m you have to modify its type and implementation. That class would now be aware of m.

The abstraction that m requires modifies T

Type classes are an alternate abstraction used to define without altering the types that m operates on. Instead of specifying that T must have some type, the type class mechanism merely states that there must exist some code somewhere that m can rely on in order to operate on T. This code, which does not necessarily exist at T, is the abstraction that m depends on, but that T need not be aware of.

The type class decouples T from m

The classic example would be sorting objects, where a sorting method requires some way to compare the objects it must sort. Using the standard solution, you would create an abstraction like Comparable, and make m require the objects it sorts to have that type. In Java we have

public static <T extends Comparable<? super T>> void sort(List<T> list)

In the type class approach, however, the abstraction is separate from the type of the objects to be sorted. Instead, it requires that some code, a Comparator, be accessible to the sorting method. In Java,

public static <T> void sort(List<T> list, Comparator<? super T> c)

or in a Scala example, where the type class is Ordering

def sorted[B >: A](implicit ord: Ordering[B]): List[A]

I’ve shown examples of the type class mechanism for Java and Scala[1]. What makes type classes even more effective in Scala is that the Ordering[B] parameter need not be passed explicitly. This remedies the two problems that the java type class solution has. First, users of the sort operation should not have to worry about passing the Comparator, it is an implementation detail. Second, in Scala it is possible to vary the Ordering implementation to be used just by bringing in a different Ordering object into scope, without having to modify any of the call sites.

[1] It could be said that the type class approach by definition requires transparent passing of type class instances, so the Java code would not really qualify as an example


Two negative views of Scala

It never hurts to read criticism and downsides to some technology as an antidote to rosy tainted hype. Here are two not so positive accounts of Scala. This first one is a recurring theme in criticisms of Scala, namely that it’s too complex, cryptic, overly complicated:

Besides the comments addressing specifics of that post[1], here’s a general response to criticisms of Scala as overly complex by Martin Odersky. The notion of language complexity is a deep subject that I’ve been thinking about lately, so I’ll save discussion for another post.

Moving on, perhaps Scala’s biggest public embarrasment to date is the Yammer fiasco, where a move from Java to Scala was initally promising but eventually proved not to be worth it. The code was moved back to Java. Here’s the original critique:

and some useful coverage on InfoQ and further comments by the original author.

The story with Yammer has some overlap with the first criticism above; again the issue of complexity is brought up, and in particular how this is a problem since there is a shortage of engineers with Scala experience. But besides more academic considerations of language complexity, what the Yammer story seems to point to is the relative immaturity of Scala as an industrial language. I recall some comments saying that the problems brought up about Scala today are reminiscent of criticism of Java in the 1.3 days, and that these issues are a natural part of a language’s evolution and improvement that will be resolved in time. Typesafe seems to be going down the right path.

But if we accept this interpretation, Scala’s successful evolution is a matter of resources. Does typesafe have the financial muscle to address these issues diligently? Maybe!


[1] Commenters point out that the collection excercise the writer attempts as evidence of excessive complexity is unnatural, useless, and in fact impossible in any language, but Scala actually comes close

Compile-time metaprogramming for Scala

Macros will come to Scala in 2.10. Here’s a quote from the proposal

Compile-time metaprogramming has been recognized as a valuable tool for enabling such programming techniques as:

  • Language virtualization (overloading/overriding semantics of the original programming language to enable deep embedding of DSLs),
  • Program reification (providing programs with means to inspect their own code),
  • Self-optimization (self-application of domain-specific optimizations based on program reification),
  • Algorithmic program construction (generation of code that is tedious to write with the abstractions supported by a programming language).

An example of algorithmic program construction in action can be found in Slick (database access framework) in direct embedding mode.

Like implicit conversions, macros should be used with care as they have the potential to make code obscure. Code changes effected by macros are not immediately visible to the programmer, just as implicit conversions can make method invocations hard to track. Both mechanisms can affect code “at a distance”; without proper discipline this can lead to confusion.