aaron maxwell

Crash course in Applied Functional Programming

If you're a software developer, something interesting has been happening in your field lately. Mainstream programming languages are starting to gain features from what used to be a decidedly academic domain: so-called functional programming.

Now, I'm sure you are a programmer just like me, who works in C or C++ or Java or Python or Ruby or Javascript or, God help you, PHP. And also just like me, you probably spend much of your free time reading math books.

What? Oh, right, that's not normal - I keep forgetting. The point is, more and more, languages that you actually use are picking up some rather theoretical ideas from computer science, and incorporating them very directly into the syntax of the language.

The debates about adding closures to Java, the growing popularity of languages like Python, Ruby and Javascript, and even some developments in C and C++ are all part of this. The problem for most of us, since we've been busy for years writing software that controls traffic signals, or calculates your paycheck deductions with all the decimal points in the right places, or otherwise doing important and useful stuff and doing it right — the problem is that the grown-up languages we use tend to heavily use imperative language constructs. Which we can become used to, and use it as our only hammer for every nail in our toolbox of stretched analogies. I mean, programming idioms.

And so we code our minds relentlessly into a deeper and deeper rut. Is there no hope? Perhaps not; you might as well hang up your keyboard and get a job stocking groceries or something.

Or, you can keep reading below, increasing your charisma, intelligence, and attractiveness to your choice of gender(s) as you learn some of the basics. It's a bit math heavy, but it'll be worth it! Or maybe not. Only one way to find out.

Map, Filter, Reduce

Three central concepts of functional programming are "map", "filter" and "reduce". Map is an operation that takes a function mapping a domain D to a range R, and a list of elements of D. It then produces a list of elements of R.

map(f, [d0, d1, d2...]) -> [f(d0), f(d1), f(d2)...]

Filter takes a function mapping some domain D onto the range [True, False], and a list of elements from D, and produces a subset of this input list:

filter(is_even, [1, 2, 7, -4, 3, 12.0001]) -> [2, -4]

Map produces a list with the same number of elements as the input list. (Or same cardinality, for a nonfinite list.) Filter will produce a subset (sublist) of the input set (list). In contrast, the reduce operation always produces a single element.

Reduce will take an operator - i.e., a function of two elements that produces a single element of the same type. It will also take a list of two or more of these elements. (You can extend it accept empty and single-element lists by specifying a default value; but let's ignore that for now.)

Reduce works as follows: take the first two elements; apply them to the operator, to get a single result; apply this result and the next element to the operator, to get a new result; and repeat, until there are no elements left, and you get a single final result. The time-honored "hello world" version of reduce is applying the addition operator to a list of numbers:

reduce(add, [1, 2, 4, 8]) -> 15

One place I've used reduce in production code is when creating SQL queries with a bunch of criteria ANDed together. It surprised the heck out of me when I realized that PHP includes some support for functional programming. Say you define a function join_by_dash() like this:

// in PHP
function join_by_something($a, $b, $what) {
    if('' == $a) {
        return $b;
    }
    if('' == $b) {
        return $a;
    }
    return $a . $what . $b;
}

function join_by_dash($a, $b) {
    return join_by_something($a, $b, "-");
}

So, join_by_dash('sonic', 'youth') would return 'sonic-youth'. Then you can join a bunch together with the array_reduce() function:

function join_by_comma($a, $b) {
    return join_by_something($a, $b, ", ");
}
$items = array('goat', 'termite', 'horse');
$listing = array_reduce($items, "join_by_comma");
// $listing is now "goat, termite, horse"

I've used this in constructing SQL queries like so (for, say, your groundbreaking web-next-dot-oh website):

function join_by_and($a, $b) {
    return join_by_something($a, $b, " and ");
}
$conditions = array(
    "age<$age",
    "gender=$gender",
    "iq<$max_iq",
    );
$where_clause = array_reduce($conditions, "join_by_and");

More commonly, if you code in PHP for any length of time, you have probably used the implode() function, which is a specialized kind of reduction in disguise:

$where_clause = implode(" and ", $conditions);

Reducing with join_by_*() and using implode() are exactly equivalent. You might imagine ways that statistics over a population could be calculated using reduce.

(The above is a heuristic example. So you nice people with the pitchforks there, please forgive that I did not protect against SQL injections, etc. I promise that in real life I did. Thank you.)

Map and filter are, of course, eminently parallizeable. Reduce CAN be, depending on the operator.

(Oh, and by the way, there are slightly different ways to implement reduce. The check for '' == $a is necessary because of PHP's notion of reduce, but not in all other languages. See the appendix for details.)

Iterators and Generators

You probably know what an iterator is; it's a idiom or device many computer languages have for swooping through a list element by element. Generators are very similar. Simply, a generator is a device that generates an iterator. What's the difference? Well, say you have some ordered set of elements you want to process. Maybe finding the maximum of the set. In C, would cycle through it like this:

extern int scores[HOWMANY]; // initialized elsewhere
int i;
int high_score = 0;
for(i=0; i < HOWMANY; ++i) {
    if(scores[i] > high_score) {
        high_score = scores[i];
    }
}

This demonstrates iteration, but not an iterator, which is meant to provide an abstract interface for cycling through the elements of a collection. Java has built-in iterators, but if it did not, one way to implement it would be like this:

class ScoreSourceIterator {
    private int[] scores = {};
    private int position = 0;
    public ScoreSourceIterator(String file_name) {
        // open the file for reading
 // then, fetch the scores, storing in this.scores
 this.load_all();
    }

    public boolean isEmpty() {
        if (position == scores.length()) {
            return true;
        }
        return false;
    }

    public int next() {
        if(isEmpty()) {
            // Signal we're out of scores by returning 0.  (Yes, we're
            // assuming that all scores are positive.)
            return 0;
 }
 return scores[position++];
    }
}

// elsewhere...
public int get_high_score(String scores_file) {
    int high_score = 0;
    int score;
    
    ScoreSource ss = new ScoreSource(scores_file);
    while((score = ss.next()) > 0) {
        if(score > high_score) {
            high_score = score;
        }
    }
    return high_score;
}

(By the way, please excuse any horrifying flaws in my purported Java code. I'm way out of practice with the language, and have not even verified that my examples compile. If it just hurts too much to ignore, pretend that it's a fictional language named Flava, a proprietary-cum-open platform whose marketing centers around taste puns and emeritus hip-hop artists.)

A problem with iteration is that you have to load the whole array. What happens in the C example if HOWMANY is 10**9, or you're parsing some similarly massive game score database? That's the problem generators solve. Generators behave just like an iterator, except that rather that slurping in enough data to cause a core dump, it automatically loads reasonable pieces of data at a time, refreshing as needed:

class ScoreSourceIterator {
    // Generator, actually!
    private int[] buffered_scores = {};
    private int position = 0;
    
    public ScoreSourceIterator(String file_name) {
        // open the file for reading
        // ...
        // then, fetch the next N elements
        this.load();
    }

    protected int load() {
        int num_read = 0;
        position = 0;
        // read in the next set of scores
        // store it in this.buffered_scores
        // ...
        return num_read;
    }
    
    public int next() {
        int n;
        // are we empty? if so, fill up
        if(this.isEmpty()) {
            n = this.load();
            if (0 == n) {
                // we're really all out
                return 0;
            }
         }
         return buffered_scores[position++];
    }
}

Anyway, this is more memory efficient than schlurping in the list all at once. You're probably quite familiar with this pattern. So much, in fact, that now I realize it was silly to spend so many bytes creating this example. Sigh.

Oh well... at least now I can talk about how Python does all this. Python has strong language support for iteration. You can do things like:

for num in list_of_integers:
    x = 1.0 / num
    print "No divide by zero error... YET!"

The token list_of_integers is basically an iterator. It's a list too; lists in python know how to act like an iterator. In the simple C example, you had to specify how you go through the list. In the python example, though, you don't worry about the "how" - just the "what". The "how" of iterating is encapsulated in, and handled by, the iterator itself.

(That's almost the definition of a high level language, by the way: if it lets you program by specifying what instead of how.)

Suppose you have a pressing problem: you need, NEED to know whether there are more zeros or ones on your hard drive. There's just no way around it. You HAVE to know.

And yet, like most modern computers, your main hard drive is about two orders of magnitude larger in capacity than the system's memory.

Stupid RAM cartels! Why can't they constantly be in cutthroat competition, like those hard drive manufacturers!

Fortunately, you have a way out of this conundrum. Wielding Python (version 2.4 or higher), you shrewdly write a generator function that tears off memory-sized chunks of chewy hard drive data:

def allbits():
    # the bit buffer
    bits = [] 
    while True:
       if len(bits) == 0:
           # read in some reasonable number of bits
       yield bits.pop()

The yield keyword is a little interesting. What it does is return the value there... and the next time the function allbits() is called, instead of starting at the beginning, it starts at the next statement after yield. Of course, since the yield is in that inescapable while loop, it will return bits.pop(), then bits.pop(), then bits.pop(), ... pausing only to refresh the bit buffer as needed.

So you can do this:

excess_ones = 0
for bit in allbits():
    if 1 == bit:
       excess_ones += 1
    else:
       excess_ones -= 1

("But wait!" you cry. "The state of a hard drive is in constant flux. This method won't give you a count that is accurate at a particular instant in time." To which I reply, "Shut it before I kick you!!")

(Also, maybe you are wondering how the generator signals that it's out of bits. I won't go into it, but just trust me that there is a way that Python has for handling this: the generator will simply terminate, just like when you read the last element of a vector. Do a web search for "python StopIteration" if you're curious about the mechanism.)

List Comprehensions

Coincidently, this ties into map, reduce and filter, through something called list comprehensions. Python has pretty strong support for this. (It's not the first language with the feature, of course - that honor, as far as I know, belongs to SETL.) A list comprehension is a device (idiom) for constructing lists. Sounds simple, but it turns out to be pretty deep, because of how powerful it can be for certain programming patterns.

A commonly used built-in Python function is range(). It's called like range([start, ]stop[, step]):

# [0, 1, 2]
x = range(3)
# [2, 3, 4, 5]
y = range(2,6)
# [0, 2, 4, 6, 8]
z = range(0,10,2)

So if you want a list of numbers 1 to 10:

nums = range(1,11)

But what if you want a sequence list 1.5, 2.5, ... 10.5? You can do this:

nums = [x + 0.5 for x in range(1,11)]

That is a list comprehension. It is a mechanism for constructing a list. Let's get fancier:

positions = [4.7*sin(2*PI*x-0.17) + 2.2 for x in angles]

Or if you have a Person class, with a name attribute,

names = [person.name for person in people]

Now, notice that "for person in people" looks somewhat similar "for bit in allbits()" above. They are in fact quite related. In each case, iteration over a list is taking place. people is a fully formed list, meaning the whole contents have been calculated and stored in memory. It's not the product of a generator, which lazily calculates the values as they are needed. But you can certainly use a generator in list comprehensions.

Say you have, with utmost care, constructed a file containing many integers, as text, one per line. And you want a list of these numbers raised to the fourth power. God knows why; maybe you just like wasting time doing pointless things. In any event, construct your list in one line like so:

  quads = [int(line)**4 for line in open("ints.txt", "r")]

Get that? The built-in python function open(), for file reading, knows how to act like a generator. It will read the lines one at a time from disk (or N at a time - it's optimized appropriately), and return them one by one in sequence. int() obviously converts that to an integer, which is then quadded. (squared, cubed... quadded?)

You can apply map to generated lists:

  first_bits = map(extract_first_bit, open("really_long_lines.txt"))

And filter:

  expensive_cars = filter(is_not_cheap, [Car(car_data) for car_data in open("cars.txt")])

In fact, most languages that do list comprehensions let you do filtering inline. You do it in Python using the trailing "if" clause here:

junk_cars = [Car(car_data)
             for car_data in open("cars.txt")
             if "Ford" in car_data]  # ooh burn!

(Incidentally, Car is a class, and Car(car_data) creates a Car object. Python omits the "new" keyword of C++, Java, etc.)

And you can use reduce with a generator too:

  # may overflow, but won't run out of memory
  bitsum = reduce(add, allbits())

It gets even more interesting when you start using lambda functions, which adroitly leads us into the section titled...

Lambda Functions

Nah. Actually, I won't write about lambda functions. This is enough for now.

Whew! Let's see, that took... a bit over two hours of my life to write. For some value of "a bit". Not too bad, considering I would have otherwise wasted that time mindlessly flipping between the Discovery and Sci-Fi Channels. So, I hope you have enjoyed reading it, if indeed you have read this far without gouging out either your eyes or your monitor. Let me know if you have questions (and are foolish enough to encourage me) or, more likely, corrections ("Java doesn't have freestanding functions, you git!")

Gotta go. There's an episode starting of "Man Vs. Wild" I've only seen three times.

Appendix: Flavors of Reduce

About ten billion words ago, did you wonder why the PHP function join_by_something could not be shorter, like this:

function join_by_something_whynot($a, $b, $what) {
    return $a . $what . $b;
}

Well, the reduce operation has a fuzzy definition at its edges, and different languages provide reduce functions that behave in slightly different ways. Most let you specify a seed value, which will be used as the first argument in the first step. So in python, for example:

>>> import operator
>>> a = [1, 2, 3]
>>> b = [10]
>>> reduce(operator.add, a)
6
>>> reduce(operator.add, b)
10
>>> # now specify an initial value
... initial_value = 10
>>> reduce(operator.add, a, initial_value)
16
>>> reduce(operator.add, b, initial_value)
20
>>>

Some languages also use the seed if you use an array having fewer than two elements, using a sensible default depending on the array type (0 for numbers, etc.) Let's use this PHP function in an example:

// string join by dash
function sjoind($a, $b) {
    return $a . '-' . $b;
}

Now if you invoke array_reduce(array('Z'), sjoind), the result is '-Z'. Makes sense, right? Because the array only has one element, the reducing function takes two, so PHP uses its default seed value (the empty string).

Now try array_reduce(array('Z', 'Z'), sjoind). What would you expect the result to be? Think a second... Ok, time's up. Did you guess 'Z-Z'? WRONG! It's actually '-Z-Z'. Can you think of why?

It's because of how the PHP creators decided to implement array_reduce. In some languages, the reduce operation only uses the initial value when working with an array of fewer than two elements. PHP, however, always uses its seed value, in effect prepending it to the array.That's why join_by_something has to check its arguments.

(It's actually even more arcane than that: the seed value is actually the integer zero, which is just cast to an empty string in this context. Boy, now you are TOTALLY set to win the Geek Edition of Trivial Pursuit.)