Archive for the 'Programming' Category

Tripoli: A Python Triplespace Implementation

Tuesday, April 26th, 2005

I’ve just uploaded some rough and ready triplespace implementation code. This implements an indexed store for triples (3-tuples that represent binary relations, e.g. subject, predicate and object, as in RDF) and a tuple-space-like interface to the store with put, get, read, copy, copy-collect, copy-graph and copy-collect-graph operations.

The graph operations are the interesting ones, as they hook triples together into directed labelled graphs by treating the subject and object values in the triples as vertex labels, and the predicate values as edge labels. It’s possible to put a set of values like

Dominic, likes, Python
Python, is-cooler-than, Java
Haskell, is-cooler-than, Python

and then copy/collect the graph of triples connected to the node Dominic (Dominic likes Python, Python is cooler than Java) or to the node Haskell (Haskell is cooler than Python, Python is cooler than Java) into a separate triple space.

Much work left to do on this, but I had a sudden blaze of motivation today and thought I’d get as much done as possible while it lasted.

I Have The Attention Span Of A – Oh look, a puppy!

Wednesday, April 20th, 2005

I really should be doing statistical things with PW data right now, but I suppose it doesn’t hurt to go through a few more Haskell warm-up exercises.

In the darcs repos now is another literate Haskell file, Region.lhs. This has the task of finding all the contiguous subregions of a 2D polyomino puzzle board where some of the squares are blocked out (e.g. because they already have a polyomino placed on them).

The idea is that by measuring the number of squares in each subregion, and comparing it to the sizes of the polyominoes we have to place, we can tell immediately if it’s impossible to fit those polyominoes into that region because there will always be one or two squares left over (or one or two too few). This is cheaper than trying out all of the possible placements of polyominoes in the region, and allows us to prune the search tree fairly drastically in some cases.

My initial approach was to use the Data.Graph module, build a graph of all of the connected squares, then get the strongly-connected subgraphs of that graph using the function provided. But it turned out to be fiddly to build the graph in the first place – by the time we’d found all the edges, we might as well have mapped out all the subregions as well.

The approach tried here makes what I hope is fairly sensible use of the State monad, although the result looks a little like object-oriented programming turned inside out.

I think Literate Haskell is a really good way to go, incidentally. I’m very bad at commenting, generally speaking, although I try to make up for it by not writing monstrously convoluted code. But describing an algorithm in English, mentioning in passing the steps taken to implement it, feels very natural for me.

Polyominoes continued

Saturday, April 16th, 2005

There’s now a literate Haskell version of the polyomino generator in the darcs repository (oh, yeah…there’s now a darcs repository). It’s written as a module that exports a single function, rank, that generates all the polyominoes of rank n. For values of n greater than 9, this can take quite a while…

darcs get http://www.codepoetics.com/repos/code to grab it.

This is a preliminary to a full-blown polyomino puzzle solver.

Better Living Through Monads

Tuesday, April 5th, 2005

The Su Doku solver as I originally hacked it out is not easy to read or understand. This is partly due to the hacked-out, uncommented, un-test-encased, generally cowboy-rigged and lacksadaisical manner of its execution, but also partly due to the idioms in which its algorithm is expressed. I knew there had to be a better way – the Monad way. And so it transpired.

The difference between this and the original code is mainly that the parts of the program that have to do with inspecting the current state of the game, determining possible future moves, and carrying them out are now bundled together in a State monad, which effectively provides a nice little execution environment for Su Doku-solving code. The Game argument, pervasive in the original code, has vanished into the fabric of the monad. The list-based approach to searching through the possible moves has been replaced by a mplus-ing together of alternative actions – if the first fails, we get to try the second instead, and so on.

Probably the biggest win is that the semantics of the solver are now a lot more explicit and uncluttered, as the generic aspects of the algorithm have been absorbed into the syntax. Getting to the stage where this was possible – e.g. setting up the Game State monad, and understanding how it threads the game’s state through bound actions without explicit arguments – was a major headache. But once I’d done that bit, I rewrote the rest pretty much in one go, and it compiled and ran with only a few minor tweaks. That’s the way it should be – the library-writer ’s sweat is the client programmer’s gravy, if you’ll pardon the disgusting simile.

I have been greatly assisted by a couple of excellent tutorials on monads in Haskell. Monads for the Working Haskell Programmer provides both a functioning State monad and an example of its use in writing a game solver – handy, that – while All about Monads gives a very useful overview of the common uses of monadic idioms in Haskell programming.

I need to get my head round monad transformers next, so that I can write an interactive version of the solver that blends game-state and IO actions.

Su Doku Solver

Monday, April 4th, 2005

I’ve written a Su Doku solver in Haskell. At the moment you need an interactive Haskell console like Hugs or GHCi to do anything with it, and I’m tempted to keep it that way so as not to spoil the fun too much for manual solvers. On the other hand, I’m also tempted to hook it up to a web interface, maybe with the ability to provide hints and suggestions rather than outright solutions.

I think Haskell is a pretty natural choice for solvers like this one, which are based on simple traversal of a search tree (with a few optimising and validating steps along the way). Most of the functions are pretty small. This time around, I started getting heavily into list comprehensions. The points-free fetish seems to be wearing off at last…

See the Times Su Doku page for information about Su Doku.

HaskellDB continued

Sunday, March 27th, 2005

wxHaskell seems to provide ODBC drivers only. I peered down the path of setting up an ODBC DSN to my MySQL database on Linux, and sensed only pain and bafflement ahead. So I went back to HSQL, and did a bit of coaxing…

As far as HaskellDB is concerned, this all seems to have been a bit of a waste of time: the version of MySQL I have installed doesn’t support nested subqueries (now there’s a surprise…), and HaskellDB’s generated SQL uses them pervasively.

However, I do now have the option of falling back to HSQL + plain SQL, which might be acceptable.

I like HaskellDB a lot in principle, and will probably try again with it + Postgres some time soon, but I don’t think it’s going to be much help to me here.

Choices, choices

Monday, March 21st, 2005

I want to do some statistical data-munging with data from The Public Whip, and I’ve been trying to decide what programming language I should do it with.

Haskell is the first option. I’ve been meaning for a while to use it to do some real world, scripting-type tasks. Functional programming style is good for data munging, as Google knows. Perhaps judicious use of combinators would enable me to specify a variety of reports in a small embedded sub-language. That would be cool. The toolset would be HUnit, HaskellDB, and maybe some library of statistical functions if there’s a suitable one out there.

The second option is Alice ML, which I want to learn in more detail. Most of the advanced features of the language are irrelevant to the task at hand, however. It’s a pretty serial sort of task, so declarative concurrency won’t be much help. On the other hand, ML is more forgiving than Haskell when it comes to I/O.

The third option is Ruby. I quite fancy doing the whole Agile/TDD thing by the book (although with HUnit I could do that in Haskell as well). But will this force everything into an OOP straitjacket? Any Ruby code I write is likely to make heavy use of blocks (closures) and higher-order functions, just because that’s the way I tend to think about things nowadays – it would be interesting to see how this gels with the standard approach to building test fixtures. Of course, it might work out just fine.

Of the three, I’m prepared to bet that Ruby is far and away the easiest to connect to MySQL.

Monadic Parser Combinators in Python

Friday, March 18th, 2005

OK, ok, I know this is insane…

I’ve translated some code from Hutton & Meijer’s paper on monadic parser combinators from Haskell into Python.

You can write things like:

> token = isalpha |seq| many(isdigit)
> tokens = token |sepBy| whitespace
> runParser("p123 q456 hello world", tokens)
['p123', 'q456']

Please don’t try to use this to do anything serious.

So here it is: Source code for monadic parser combinators in Python.

Probably the only useful bit of this is the multi-reader buffered stream class at the start. It allows you to have multiple forward-only readers on a single stream. The stream is buffered so that readers lagging behind the front-runner can access values that have already been pulled from the stream. Any values that are no longer needed by any reader are removed from the buffer.

XML Acceleration

Thursday, March 10th, 2005

I don’t know whythis amuses me quite so much…

The Suffocating Gerbils Problem

Tuesday, March 8th, 2005

This is a new one on me.

It raises an interesting question, though: how do you incorporate performance requirements (tolerance levels, etc) into formal specifications? Especially as these will presumably depend on the computing power of the hardware system the specification is implemented on.

I find that the Jasper online Sinclair ZX Spectrum Java emulator suffers from the reverse of this – what we might call the hyperactive gerbils problem. It just runs too damn fast for any of the games on it to be playable (apart from Chaos).

I can imagine a solution where you synchronize with an internal clock, and guarantee a) that an operation will complete before the next clock tick, or throw some exception, and b) that the subsequent operation will not begin until after then next clock tick. Then the work rate of the gerbils would be strictly controlled.

How Smart Are Ordinary (Java) Developers?

Monday, March 7th, 2005

Mike Champion, responding to a blog post by Christopher Ferris:

(try explaining “hypermedia as the engine of application state” to an ordinary Java developer!)

Why? Would that be, like, difficult, or something?

Leaving aside the language wars / “Java is a language for the masses” snob stuff, I do worry about the way people apostrophize the Ordinary Developer, like The Man On The Street, whenever they want to say that something is too hard for dumb people to understand.

Writing software that doesn’t completely suck is still quite a difficult activity, possibly more difficult than it should be.

If you’re smart enough to write software that doesn’t completely suck, you should be smart enough to understand things like REST without too much difficulty.

If you’re not smart enough to write software that doesn’t completely suck, then why are you writing software?

The real work in that expression, “ordinary Java developers”, is being done by the word “ordinary”. It’s a way to label anyone who isn’t hopelessly dumb as out of the ordinary, and therefore peripheral to the argument: what you think you understand, and what your understanding causes you to think is important, is irrelevant because you’re unrepresentatively smart.

I don’t think this is a good way to argue. It’s better to start with the premise that, within the profession of software developers, “ordinary” means “pretty smart” – professional, educated, capable of writing software that doesn’t completely suck. Insofar as reality doesn’t match that premise, there is a problem and the problem is with reality (and it is our practical responsibility to try to fix it).

High peer-group expectations are one of the drivers of high achievement. In the software industry, we seem to like to think, and be prepared to accept, that almost everybody around us is thick as two short planks. This is a destructive form of competitive individualism (“I’m OK, you’re not OK”), entirely at odds with the creative form of competitive individualism that encourages people to recognise and affirm one another’s talents, and strive to better themselves.

It is also a form of collaboration with the “deskilling” ambitions of those in the industry who would rather bully a herd of replaceable dunces than lead a team of respected professionals (note: professionals are also replaceable, when necessary – but this is because professionals have shared values and competancies that enable them to learn and adapt quickly).

What I would like to see is greater confidence and trust in the abilities of developers, and languages and tools that are based on that confidence and trust.

Concurrent Python

Saturday, February 5th, 2005

I’ve gathered together some of the recent Python code I’ve been playing with into a single file, concurrency.py. At present this contains support for MultiEvents, MultiQueues, AsyncResults, DataflowDicts and DataflowObjects.

MultiEvent is a class that lets you listen for several events at once, and tells you when any one of them has been set. MultiQueue lets you wait for an item to be available in any of several queues. When an item arrives, it tells you what the item is and what queue it was in. An AsyncResult represents a computation running on a separate thread that might not have finished yet (very like IAsyncResult in the .Net framework). The threaded function transformer/decorator causes a method to run in its own thread and return an AsyncResult.

DataflowDict provides a dictionary of single-assignment variables. If you try to read a variable that hasn’t been assigned yet, it will block until a value becomes available. DataflowObject wraps a DataflowDict, translating attribute gets and sets into dataflow dictionary lookups. Together with threaded, the dataflow classes make some interesting kinds of nondeterministic / declarative concurrency possible. Even with the scaffolding I’ve created, it’s still nowhere near as easy in Python as it is in Oz.

There is more to be done here (including, no doubt, some careful inspection of the code for subtle or egregious threading errors – race conditions, that kind of thing), but I think I’ve made a fairly good start.

Thread-safe MultiQueue in Python

Thursday, February 3rd, 2005

Another Python Cookbook recipe submission. This one is strikingly similar to the tuple space code I wrote a while back.

I actually wrote it because I needed a MultiQueue class to make a more efficient ActiveObject that would multiplex on several message queues using the threads in a thread pool.

The aim here is to support a large number of concurrent objects without having to have a large number of threads, based on the assumption that not all of the objects are going to be processing messages from each other all of the time. Every object still has its own dedicated message loop, but this loop is now activated by multiplexing thread-pool threads rather than owning its own thread. If the thread pool contains 5 threads, then you can have 5 actions in progress concurrently with the other ActiveObjects on stand-by for when a thread becomes available.

I have a feeling that Stackless Python makes these sorts of trade-offs moot…

CTM Wiki

Tuesday, February 1st, 2005

I’ve created a wiki dedicated to CTM, with the approval of the book’s authors Peter van Roy and Seif Haridi (that is, they approve of my creating it. As far as the content is concerned, the usual wiki disclaimers obviously apply). The idea is that there will be pages dedicated to the various concepts, techniques and models introduced by the book, with scope for discussion, examples in languages other than Oz (e.g. Alice ML), and links to related resources. There is some work to be done on the wiki before it is suitable for public consumption, but anyone who feels like lending a hand at this very early stage is welcome to login and start creating/editing content.

When the visual changes are complete and the introductory text and basic content skeleton are in place, I will announce the wiki on LtU and various mailing lists. I hope it will grow to be a useful resource for students and others interested in the book.

Active Objects in Python

Tuesday, February 1st, 2005

My first submitted Python Cookbook Recipe, based on looking at a bit of Oz code in CTM and thinking, how would I do that in Python?

Little Lambda

Wednesday, January 26th, 2005

I can’t really claim any credit for this Lambda-calculus-interpreting Haskell snippet, as it’s more or less directly ripped off from Wadler’s The Essence of Functional Programming. The only change I’ve made is to update the syntax so that it uses the current Haskell do notation for monads instead of bindM and unitM. Doing that was enough to enable me to persuade myself that I understood the code.

What would be cool would be to equip this with a parser, a REPLoop and the ability to setq values within the top-level environment. The macho thing to do here would be to try to get from there to a complete Scheme interpreter or something like that. Then at last I’d be a real boy, and my nose would stop getting longer every time I pretended to know something about programming language theory.

Note on coalgebra

Sunday, January 23rd, 2005

Unless I’m just kidding myself that I understood any of that paper on codata and comonads, the RBT module in my last Haskell exercise exports what is known as a final coalgebra.

Assuming that this is correct, is it a useful thing to know? Maybe not, or not until you know all the other things you need to know in order for it to become useful (let’s see, shall we?). On the other hand, I can see immediately the usefulness of the kind of encapsulation captured by the design pattern that I’ve implemented here (cribbing from the paper, which uses the same pattern to define a Stream coalgebra).

What’s more, it’s remarkably clear how the private and public faces of the RBT coalgebra (if that’s what it is) are related and distinguished: the semantics of hidden and visible are cleanly expressed through the type system and the module system. Which, in my book, is a definite win.

Haskell: Randomly balanced binary trees

Sunday, January 23rd, 2005

Based on a suggestion by Alistair Turnbull, this latest Haskell exercise involves creating a randomly balanced tree codata type (at least, I think it’s a codata type). (Edit: nope, it’s a coalgebra. Or at least, I think it is)

Updates to the tree are merged with a stream of random numbers, which are used to reorder nodes within the tree.

The rule is that the random number attached to the parent node must always be greater than the random number attached to either of its child nodes.

This has the effect of shuffling the tree so that it remains balanced even in degenerate cases (e.g. adding an already sorted sequence of values to the tree, which without balancing would result in a linked list with lookups in linear-time rather than logarithmic time).

I am impressed with the simplicity and effectiveness of this technique. Improvements to my implementation of it here are warmly solicited.

Tanenbaum – SCO

Monday, January 17th, 2005

Bit of luck today – I managed to pick up Andrew S. Tanenbaum’s Structured Computer Organization from Oxfam for £1.49 (link is to a later edition – I got the 3rd ed., in Prentice Hall reprint). I recommend this book very highly to anyone who’s interested in how things work under the hood that’s under the hood under the hood – and how (and why) the hoods stack up. Tanenbaum is drily witty throughout, which enables him to be mildly opinionated (and hence interesting) whilst giving a broadly fair and balanced account of issues such as CISC versus RISC (although in the edition I have, we get a sense of where the debate was about a decade and a half ago…).

What? You thought I meant that SCO? Good heavens, why would anyone be interested in them?

I remain several watts short of full enlightenment on the topic of comonads.

Codata and comonads

Sunday, January 16th, 2005

Just started reading this interesting paper on codata and comonads in Haskell programming. My intuitive and probably wrong understanding of codata is that it is like having a function between a domain and a codomain, and using an algebra on the domain whilst only being able to see the values in the codomain. It’s a sort of formalization of encapsulation or data hiding. But I’ve tried to understand coalgebra before, and come unstuck more or less immediately, so it’s quite likely that I’ve failed this time as well. At least this paper gives some concrete examples in Haskell code, which often helps.