Less is More – Arden's Smalltalk Blog

making hard things easy, the impossible, possible

Archive for the category “Applications”

MapReduce, Hadoop, and Cincom Smalltalk

MapReduce is a popular and effective technique that’s used to apply concurrency to problems that often involve large amounts of data, in order to improve performance.

Hadoop is a popular implementation of the MapReduce model or technique.

MapReduce is named after the functional programming functions map and reduce. The map function applies a function to each element in a list, and reduce aggregates or combines the results. MapReduce can distribute the Map work to many machines, and then Reduce summarizes the work into a final answer.

MapReduce and Smalltalk

So how would this work in Smalltalk? To start, let’s determine what the Smalltalk equivalents to map and reduce are.

The collect: method can be used as a Smalltalk equivalent of map, since it can collect the result of a block applied to every element in a collection.
The fold: method (or inject:into: ) can be used as an equivalent of reduce, since it can reduce the results to a single object (simple  examples: finding the maximum, minimum, or sum value).

Pragmatically though, you might also think of map as mapping out the work (to be performed concurrently) to multiple cores or machines, and reduce as combining or summarizing the results from the map work. If you are following the pattern it doesn’t matter if  you use collect: or fold: specifically.

The purpose of Cincom’s MatriX framework is to simplify concurrency. The MatriX framework allows you to easily make many linear solutions concurrent.

The example below shows how to create a solution to a problem, and then use MatriX to create a mapReduce-style solution using the same code with minimal alterations.

A Simple Example

Let’s say that we had a long list of documents (files) and we wanted to get a count of how many times each word occurs in the set of documents. In Smalltalk, we would want to collect the word counts for each file and then combine or fold the results into an aggregated summary.   So how might we do this in Smalltalk?

Let’s start with some basics.

  1. A method to return a list of filenames to use for counting word occurrences
  2. A method that parses the file into tokens (words)
  3. A method that, given a file string, returns a count of the words found in the file
  4. A method that summarizes (reduces) the word counts into one set
  5. A method that provides a local solution using the above methods

We can test and debug by first running it locally, and then move forward distributing the work.

Below are the methods for the above basics, respectively:

Note: Be sure to change the dir in the myFiles method to a location on your machine.

        "self myFiles"
        "Returns filename strings"
        | dir fileStrings |
        dir := 'C:\Arden\Documents\Cincom\'.
        fileStrings := dir asFilename filesMatching: '*.txt'.
        ^fileStrings as Array


parseFile: fileString
        | fileStream contents words |
        fileStream := fileString asFilename readStream.
        contents := [fileStream upToEnd] ensure:[fileStream close].
        words := contents tokensBasedOn: Character space.
        words copy do:[:word | (word includes: Character cr) ifTrue: [
               words remove: word.
               words addAll: (word tokensBasedOn: Character cr)]].

wordCountFor: fileString
        | words |
        words := self parseFile: fileString.
        words := words collect:[:word | word select:[:char | 
		char isAlphabetic] ].
        words := words reject: #isEmpty.
        ^words asBag.


reduce: wordCounts
        "Combine the wordCounts and create a Dictionary summary"
        | aggregatedWords finalCounts |
        aggregatedWords := wordCounts fold:[:counts :newCounts | 
		newCounts valuesAndCountsDo:[:word :n | 
		counts add: word withOccurrences: n]. counts ].
        finalCounts := Dictionary new.
        aggregatedWords valuesAndCountsDo:[:word :count | 
		finalCounts at: word put: count].


        "self runExampleLocal"
        | files wordCounts summary results |
        files :=self myFiles.
        wordCounts := files collect:[:fileStr | self wordCountFor: fileStr ].
        summary := self reduce: wordCounts.
        results := summary associations sort: #value descending.
        (results first: 100) do:[:ea |Transcript cr; show: 
		ea key; tab; show: ea value printString ].

So now that we have this running, we want to distribute the workload to allow the files to be processed and words to be counted, concurrently. The word counts will come back to a central place (our main image) where they will be summarized.

Making this concurrent is a lot of work, right?

Not in Smalltalk with Cincom’s MatriX concurrency framework.

  • Load MatriX
  • Add one line of code to create the virtual machines that do the work concurrently
  • Tweak the line of code that gets the word counts to distribute the work

That’s it! Here is the complete example of our solution running distributed:

        "self runExample"
        | files vms wordCounts summary results |
        files :=self myFiles.
        vms := MatriX.VirtualMachines new:3.
        wordCounts := [vms do:[:fileString | 
		MapReduceExample wordCountFor: fileString] with: files] 
		ensure:[vms release].
        summary := self reduce: wordCounts.
        results := summary associations sort: #value descending.
        (results first: 100) do:[:ea |Transcript cr; show: ea key; 
		tab; show: ea value printString ].

Note: I ran into an issue with marshaling Bags in MatriX, and I have a patch available. (Thank you Michael for finding and fixing!)



Afternoon app – 2048

You may or may have not heard about a popular new game called 2048.

We have some very exciting new work in our development builds, a new source code editor, which I wanted to exercise.

Around the same time, I saw the article linked above about the game 2048.  It is a simple four by four grid with numbers that you slide around, trying to add them up to …… 2048!


I downloaded a free version of 2048 onto my iPhone, and there is also browser based version here. I checked it out and it looked pretty interesting, so I decided to implement a simple version of it as an “afternoon app”.  An Afternoon app is a simple time-boxed implementation.  Since Smalltalk lets you do a lot in a short amount of time, it is a great choice.

I built it and published it on the public repository.  You can find it as TwentyFortyEight.

It is an MVC designed app with these three classes.




I reverse engineered it by observation.  I tried a couple simple ways to process the moves, then refined them.    My #process:  method takes an array of the four items in a row or column, and looks like this:

process: array

“process an array of the grid”

self compress: array.

self combine: array.

self compress: array.


#compress:         slides the elements down to the far end of the array.

#combine:           adds adjacent cells of the same number, one pass

The second compress is required in situations where there are two combines in the one pass.

Since you can move four directions, up, down, left, right, it gives #process the row or column for right, down respectively.  For left and up it reverses the array for #process, and then flips it back for integration into the grid.

I made an assumption which turned out to process incorrectly in some circumstances.  Rather than fix it, I left it as a challenge to find (not too hard, not to trivial) and change for those who want to tinker with the application.   There is another needed fix (hint: the new value that appears) which is a very easy fix.

How can the implementation be improved?  Other than the needed fix mentioned above, the application does not detect when the game is over, or allow you to restart it.  These are easy fixes.

Another relatively easy improvement would be to add the running score.  When numbers are combined, their value can be added to the score tally.

A big improvement that would take some more time would be to add animation.  This would probably require model restructuring or changes to communicate in terms of the moves in order for the view to animate them. Visually, adding animation would be a big improvement.  I welcome any developers to give it a try and share their results.  I may give it a shot myself.

What else could you do?  You could try using different algorithms for auto-playing the game, in order to discover or test heuristics in order to maximize your score.

If you have any more ideas or any questions, I would be happy to hear from you.  Reach me at the email below.

Best Regards and happy Smalltalking!

Arden Thomas


Cincom Smalltalk Product Manager

Post Navigation