A Minimal Haskell Primer

If you're participating in the Roguelike Tutorial group project and don't know Haskell but want to see a tiny bit of what Haskell is all about, read on. This page by itself won't teach you all of Haskell, it just gives you enough of an idea of how to read Haskell source so that you can read the lessons and examples at each step. The lessons themselves will try to explain what's going on as they go as well. If you decide at the end that you want to learn Haskell more deeply, the Introduction page talks about some good resources.

If you'd like some youtube videos:

Types and Values

Haskell has two main realms, or levels, or spaces, or whatever you care to call them: Types and Values.

Types are entirely a compile-time concept, and don't exist at runtime at all. You can do all sorts of fanciness with types and your program won't be bogged down with any runtime type checks. Like types that represent different units of measurement that won't match up without explicit conversion, which can save you from various embarrassing mishaps with your math, or tracking the difference between strings that have had escape sequence processing performed on them or not (such as for a web app). Types are things you might expect like Int (bounded and signed), Word (bounded and unsigned), Integer (unbounded and signed), Char (one letter or other symbol), and String (many chars in sequence), but we also have generic types like [a] (pronounced "list of a"), Maybe a, Either a b, and Int -> Int. In those cases, the lowercase parts are type variables that can be filled in with another type. They don't have to be a single letter, they simply are by convention, and the letter itself has no effect. Type declarations generally go on their own line above a top level declaration within a file, and they use ::, which is read as "has type" or "has the type" or "is type", or something similar. If there's an arrow in a type, ->, then that means that it's a function type.

The data keyword declares a new data type. There are a few possible formats here that we won't go into at the moment.
The type keyword declares an alias for a type that the compiler won't be supremely strict about. For example, the String type in Haskell is an alias for [Char], so if you declare a value as a String you can use list operations on it to filter the contents or whatever you like. The compiler won't help you enforce the difference. I'm not sure that we'll be using type that often.
The newtype keyword declares a type that has a runtime representation that's exactly the same as another type, but that the compiler will enforce the difference for you. This will be used by us quite a bit, because it helps you keep lots of otherwise identically typed values straight. For example, there is a Unique data type in the standard library that represents a unique identifier. Internally it's just an Integer value, but the newtype distinction limits what we can do with it, because it doesn't make sense to add two different id values for example.

Both data and newtype statements give you Constructors, which are a special group of function that starts with a capital letter (or with a : if they have a symbolic name), and that you can use with pattern matching (which is talked about a little more below). There are frequently types with a single value constructor that's the same as the name of the type, but since types and values don't share a namespace there's no clash.

Because type and newtype define types in terms of existing types, there can be values which don't have an -> in their displayed type but that get called functions when people aren't being super technical. An example of this would be getLine, which is an IO String value. When you run getLine as part of IO, it's result doesn't come from any argument that you pass to it, but instead from manipulating the current IO context that's "in the background" so to speak (by consuming a line of input from stdin). Sometimes people call this sort of function-ish value an "action", though that's not an official sort of term that every Haskell user will automatically understand.

Values are primarily a runtime concept, though you can compile value literals into your program as well, things like 1, 5.4, 'a', and "hello". Values are the information that actually flows through the program to get things done. You transform values with the use of functions. Functions are also values, and so you can apply functions to other functions. A function that uses another function as an argument is called a Higher-order Function. Function application in Haskell is written prefix if it's a function with a word name, and infix if it's a function with a symbolic name.

foo x y -- foo applied to x and y
a + b   -- + applied to a and b

If you want you can also write word functions infix using backticks, and symbol functions as prefix by naming them with parens around them.

x `foo` y
(+) a b

You can define an anonymous function, also called a Lambda Function, with a backslash, then the arguments, then an -> and the result expression. This is kinda like a literal for a function, but most people don't call it a function literal, they just call it a lambda.

map (\x -> 2 * x) [1,2,3]

Definitions

Definitions are relatively straight forward. Here's two examples

myFloat :: Float
myFloat = 9.6

foo :: Int -> String
foo 0 = "Nada"
foo x = show x

First there's the type signature (which isn't strictly required), then a single line for a non-function, or one or more lines representing the function's cases. When we use foo (eg, with foo 3) it tries to pattern match the first case, and if that fails it tries the next one, until either a case matches or we run out of cases. If we run out of cases you get an exception thrown in your face, so don't do that. With how we've written it here we'll always get a match at the end, because the variable 'x' matches any value and then gives 'x' that value in the right hand side of the expression. So, if the input is 0 then we get the string "Nada", and for any other input we just use the show function on that number.

Let's try something a little more complex

bar :: Int -> String -> String
bar times str = let
    go times str temp = if times > 0
        then go (times-1) str (temp++str)
        else temp
    in go times str ""

Now we've got a function with two arguments. It takes an Int, then a String, and gives back a String. You might note that the -> is the same between all the arguments. This is because "a function with two arguments" is a bit of a lie: bar is really a function of one argument from Int to String -> String, and if you apply bar to two values (like bar 3 "chomp") you're actually applying it once, getting a String -> String back, then applying that function to the next value. Technically all functions in Haskell are functions of 1 argument like this. However, for ease of writing code you can pretty much write your definitions as if Haskell had multi-argument functions, and GHC will even give you compilation error messages about the wrong number of arguments or wrong argument types as if your functions had more than one argument. All of the pros of doing things like this are a little beyond this document, but the main one you might see is that we can apply some of the arguments and then store that function we get back and reuse it. This is called Partial Application, and we'll probably do that some of the time.

Note also that we didn't need a bunch of braces and semicolons in our function even though it's spread across more than one line. Technically our function is still a single expression, and GHC uses the whitespace to determine what the correct parts of it are. There's the let ... in ... (sets up temporary bindings) and the if ... then ... else ... (the classic branch construct, note that we can't ever skip the else case in Haskell). In both situations, as long as the indentation after the first newline stays consistent, GHC figures it out fine. So our program ends up looking kinda like Python code anyway.

Having an outer function that provides a nice interface to an inner function that does the looping work with extra loop variables is quite common, and out of habit the inner function is also usually just called go, or maybe bar and bar' or something like that. GHC is quite capable of converting that sort of stuff into fast iterative loops at the machine code level. However, manual recursion is prone to the occasional error, and in many cases there are already higher-order functions that will take your starting data and your operation and do the appropriate recursion for you (mapping, folding, filtering, and so on).

There's also where, which lets you put your temporary definitions after your main expression, and it compiles to the exact same thing as a let expression. It's a purely stylistic choice, and I don't personally use it much. We might at some point just to give it a feel.

There's also case ... in ... expressions, which are exactly the same as a function case with pattern matches and all that. Those we will be doing quite a bit, because it can be much easier to read a quick case expression within a bigger series of expressions instead of having a whole additional function case at the outer layer. As with let/where, case expressions and function clauses compile down to the exact same code, so it's just a style thing.

Typeclasses

Obviously we don't want (+) to work for just Ints or just Words or whatever, we want it to work for all of our numbers. Typeclasses are how we group "similar enough" types according to some operation(s) they share, usually with some laws about how the operation(s) must work. Lots of Haskell typeclasses have names that are very mathy, which can make them seem unapproachable, some of them are more normal seeming though. Things like Show, Num, Eq, Ord, Functor, Applicative, Monad. Typeclasses get a capital letter like types do, and when they show up in type signatures it's as a constraint before an equals arrow, =>, and then the rest of the type is given,

(+) :: Num a => a -> a -> a

Here we're saying that the (+) operator can be used with any two values (of the same type) that implement Num, and then you get a value of that same type back.

realToFrac :: (Fractional b, Real a) => a -> b

This lets us convert from some Real number type a to some Fractional number type b. Haskell doesn't do any sort of automatic conversion, so things like realToFrac and fromIntegral show up whenever you need to switch around your number types.

Typeclasses are used for a lot more than just numbers. Show lets you turn a value into a debug friendly string. Eq is for equality, and Ord is for ordering. Normal things that you might expect to be able to do with many data types. For some of the basic typeclasses like this the compiler can even automatically generate the implementation when you make data types of your own.

Data types that are part of a typeclass are said to have an instance of that class. The functions that are part of a typeclass are its methods. Yes, in light of OOP's popularity and the potential confusion it was perhaps a bad set of names to pick. Oh well.

Typeclasses can get a little abstract. Functor is a typeclass for (this is a hand-wave) "some structure that holds values of type 'a', and which, given a conversion function from 'a' to 'b', can be converted into an identically shaped structure holding values of type 'b' instead", which means things like List (zero or more a's) and Maybe (exactly zero or one a's) for example, but not a Set, because if you change the values in the Set you might need to change the Set's internal structure to match (both TreeSet and HashSet are like this). It also means things that you wouldn't expect right away, like functions themselves. With functions, the 'a' that you change into 'b' is the return value of the function. Functor has a method called fmap, and you can write, for example, fmap (+1) [1,2,3] and get [2,3,4] back like you might expect in other languages. Or you can write (fmap (*2) (+1)) 5 and get 12 back. What? Well, the function for +1 had the function for *2 "applied" to the return value that it would eventually have, which gives us a new function back, which was then applied to 5. This gives us (5+1)*2.

That might seem unimportant at the moment, but just like with any other effective form of code reuse, it lets us learn just a few general operators and then we can confidently reuse them whenever the need comes up.

(Note that there's also a map function specific to lists, largely for historical reasons.)

Documentation

In Haskell, -- starts a single-line comment, and you use {- and -} around multi-line comments.

If you want doc comments (which can be rendered into a webpage with a program called haddock), you can use {-| at the start of your multi-line comment, or you can use -- | at the start of a block of singe-line comments. A doc comment goes with the type signature that comes immediately after it.

You can instead comment the "previous" thing with -- ^, which is how you normally put a doc comment on an individual type within a function's type signature, or on a field in a record (which is a special kind of data statement).

Documentation is extremely important and you should document pretty much all your code, regardless of what programming language you happen to use. The "some other programmer" who reads the documentation later trying to figure out what the code does and why is probably yourself a year from now, so save yourself the trouble. "Docs can get out of date", "it takes too much time", bha! You don't need to doc something the instant you write it, because yeah maybe you'll throw it away in an hour, but once you're confident enough that it's there to stay for a while, you should take the time to document it.

do-notation

There's one more bit to cover that will come up over and over again. It's called "do-notation", or "do-blocks". It's similar to the previous stuff, but with a small difference. To start, it looks something like this:

main :: IO ()
main = do
    line <- getLine
    putStrLn (line ++ line)

A do block is actually just special syntax sugar for two of the operations of the Monad typeclass, >>= (pronounced "bind") and >> (sometimes called "then"). It's most commonly used with IO things, but you can actually use it with any Monad at all. Other types that we might be using with do notation include:

List: computations that are best expressed as nested loops
Maybe: computations that might "return null", so to speak.
Either: computations that might return an error value instead of just null
ST: computations that can safely mutate "thread-local" storage

The big difference between let and do is that with let the order of your definitions doesn't matter, and with do the ordering is strictly enforced. In our example, we have to get a line before we can print it out, so that strictly enforced ordering is exactly what we want.

Within a do block the left arrow, <-, binds the results of an expression on the right into the variable named on the left. Very similar to variable assignment in other languages. So, we perform a getLine and that becomes available to us as String named line. Then we perform a putStrLn and have it print out the line value appended to itself.

As with let and if, GHC can figure out what we mean purely with indentation, so we don't need braces and semicolons. It's possible to use them but it ends up looking kinda weird if you're used to looking at C or Java or similar, because the last expression can't have a semicolon after it, so it always looks like you forgot one.

You can also use let within a bigger do expression if you want, which allows you to easily mix non-monadic code and monadic code. In that situation you don't use the in keyword after it. We'll be doing that quite a bit. If we use the bar function from the above example our main could look like this:

main :: IO ()
main = do
    line <- getLine
    let barred = bar 3 line
    putStrLn (barred ++ barred)

Minimal Haskell Primer