Archive for the 'Programming' Category

Textual Informatics

Tuesday, April 14th, 2009

I’d like to talk a little about some of the techniques involved in extracting information from text, or apprehending text as information. I won’t be demonstrating anything terribly sophisticated – in fact, my examples to begin with will all be expressible using small bits of Javascript – but I’d like to expose some of the rude mechanics of textual informatics, and try to build up a vocabulary which later on we might use to talk about some more serious and significant issues. I’m interested in seeing whether and how Franco Moretti’s proposals for a “sociologisation” of literary studies can be distinguished from an “informationalisation” or “informaticisation” of the field, for example. I’d also like to get a better grasp on bioinformatics, which has been creeping up on everyone for a while now; my uninformed guess is that strings of unicode characters and strings of DNA codons are, from an informational perspective, not very significantly unalike. But this might turn out to be true only for quite large values of “not very significantly”.

A “rambler” is a text generator that treats a piece of input text as a series of linked pairs of words. In the preceding sentence, “A” and “rambler” make the first such pair, “rambler” and “is” make the second, and so on. The word “a” is paired in this way with the words “rambler”, “text”, “piece” and “series”, while the word “text” is paired with “generator” and “as”. The rambler starts on a random word somewhere in the text, and moves from that word to any of the words it is paired with. Here’s a (very boring) 100-word ramble around that sentence:

series of linked pairs of words. pairs of linked pairs of linked pairs of linked pairs of linked pairs of linked pairs of words. text as a series of input text generator that treats a series of words. input text as a text generator that treats a text as a text generator that treats a piece of input text as a text generator that treats a piece of words. a series of input text as a series of input text generator that treats a series of input text generator that treats a text generator that treats a series of input

(if the rambler hits the last word of its input text, and that word has no other word that it is paired with, then it just jumps randomly to any word it likes).

Rambling around the whole of this text so far, we get:

…expose some more serious and so on. The rambler hits the first such pair, “rambler” make the word of DNA codons are, from an “informationalisation” or apprehending text generator that treats a “sociologisation” of linked pairs of linked pairs of words. In the word it likes). just jumps randomly to get a little about some more serious and strings of DNA codons are, from an “informationalisation” or “informaticisation” of input (if the preceding sentence, “A” and “series”, while the techniques involved in this might turn out to get a “sociologisation” of unicode characters and so on. The rambler hits the field…

It’s nonsense, but bits of it hint in the direction of sense – the rambler is pulling apart and recombining its input text, following a graph of associations extracted from the text itself:

fool1

What the rambler code actually does is to scan the text once, constructing this graph as a data structure, and then perform a random walk around that structure – it decomposes the text into a dictionary of terms (labelled vertices in the graph), and a set of linked references (edges in the graph) between entries in that dictionary. The linear form of the source text is completely exploded – in fact, we could not faithfully reconstruct the original text from the rambler’s representation of it (although its random walk might accidentally output it once in a blue moon).

The rambler is a diverting toy – it “understands” nothing about the text it consumes, or the text it produces – but its output is more syntactically regular than a completely random jumbling of its input terms because it is informed by – or uses as information – the way that the rules of syntax mean that particular kinds of words tend to be paired together in that input. It is also, more subtly, informed by the frequency with which particular nouns and adjectives, or particular “flavours” of word, appear in combination – the nonsensical ramble over the text of Cold World in the preceding post retains quite a bit of the flavour of the original. Try it on something else – a good long LCC post, for instance – and you’ll see what I mean.

For the next one of these, I’ll be discussing an algorithm for finding the longest common substring of two strings…

All Reflections Drained

Thursday, April 9th, 2009

Xasthur’s latest release is in no readily justifiable sense a black metal album, yet it clearly belongs in a continuum with Malefic’s previous work. For much of the time it sounds like a collection of instrumental out-takes from the preceding full-length release, Defective Epitaph, performed in a ruined cathedral by an orchestra of sighing zombies. The subtracted element is Malefic’s ghastly, searing howl, which makes only occasional appearances: for the most part, the vocals range between indistinct hissing, tormented groans, eerie sussurations and a quavery baritone that calls to mind the Tindersticks’ Stuart Staples mumbling into the folds of his tuxedo. There’s a sample (“I perceived everything and everyone in existence to be dead – including myself…”) which sounds a lot like Gwyneth Paltrow doing Sylvia Plath (as heard on Wrest’s Lurker of Chalice) but turns out, if I’m not mistaken, to be from this oddity:

Otherwise the invariants of Xasthur’s musical scheme are all in place: dominant minor ninths, repetition, spooky organs, repetition, guitars treated to sound like rusty steel drums, repetition, everything out of tune with everything else. Malefic plays the drums throughout, with a sort of commendable doggedness – he’s clearly not one to give up an instrument just because he’s crap at it (as witness also his continued agonizing over the cello). The titles you could make up for yourself: “obfuscated in oblivion”, “masquerade of incisions”, “maze of oppression”. (Here’s some for the next album: “catacomb of splinters”, “shroud of desolation”, “cipher of emptiness”, “errancy of the void”). It could all hardly be any drearier, although if you play the title track at about twice the speed the vocals take on a sort of campy urgency which is quite enjoyable in its own way.

The System Of The World

Friday, January 23rd, 2009

A partial order is a relation that is:

  • Reflexive: Every element is “before” itself.
  • Transitive: if P is before Q, and Q is before R, then P is before R.
  • Antisymmetrical: If P is before Q, and Q is before P, then P is equal to Q.

A partial order is not a total order, because for any P and Q, it is not necessarily the case that either P is before Q or Q is before P.

We can represent a partial order with a directed, acyclic graph:

diag1

The circled letters in this graph are “elements”, and an arrow between two elements P and Q means that P is before Q. The transitive rule means that if you can get from P to S by going via R, then P is before S – we don’t show the arrow from P to S explicitly, and neither do we show the arrow from each element to itself.

The powerset of a set is the set of all the parts (subsets) of that set; for example, the powerset of {a, b, c} is {{}, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}}. The relation “is a part of” over the elements of this powerset is a partial order:

diag2

Now, suppose we assign a (prime) number to each of a, b and c: a=2, b=3, c=5. To each of the sets in the powerset of {a, b, c}, we can then assign a value by multiplying together the values assigned to its elements. The empty set {}=1, {a}=2, {a, b}=a * b = 6, {a, b, c}=a * b * c=30:

diag3

The relation “is a part of” can now be mapped to the relation “is a factor of”: 2 and 3 are factors of 6, 3 and 5 are factors of 15, 3 and 10 are factors of 30 and so on. Note that this wouldn’t have worked if we’d used non-prime numbers to number a, b and c: if we’d numbered them a=2, b=3 and c=4, that would have implied that {a} was a part of {c}, because 2 is a factor of 4.

Let’s call our “universe” set {a, b, c} U, and its powerset P, Now, we can find the complement in U of any set in P by dividing the value of U (30) by the value assigned to that set. For example, the complement in U=30 of {b, c}=15 is {a}=30 / 15 = 2. What’s more, the complement in {b, c}=15 of {c}=5 is {b}=15 / 5 = 3.

What about the intersection of two sets? Let U be {a, b, c, d}, and the “universal value” be 2 * 3 * 5 * 7=210. The intersection of {a, b, c}=30 and {b, c, d}=105 will be the set valued with the product of the common prime factors of 30 and 105, which are 3 and 5. 3 * 5 = 15, the value of {b, c}.

To find the union of {a, b, d}=42 and {a, c}=10, we can use the intersection and complement operations described above. The intersection of 42 and 10 is 2. The complement of 2 in 42 is 21, and the complement of 2 in 10 is 5. If we multiply them all together, we get 2 * 21 * 5 = 210 = {a, b, c, d}.

Now suppose that all we have is a set S containing various elements {a, b, c…}, and a function from S to a set O of values ranging from a minimum value 1 (which numbers the empty set) to some maximum “universal value” (which numbers a set of which all of the elements of S are parts). O is missing some numbers – 4 isn’t in it, and neither is 9 – since all it contains is prime numbers and the sums of sets of prime numbers, and there is no set {2, 2} or {3, 3}.

Given S and the function from S to O, we can say for any two elements of S whether one is a part of the other, whether they overlap at all, whether another element is a part of their overlapping part or not, and so on. We can say – without knowing what is “inside” a, b and c – how they stand in relation to each other, at least as far as relations of inclusion go. We can identify some elements as “primitive”, inasmuch as they are valued with prime numbers, and others as “composite” (although S may not actually contain the “primitive” elements out of which they are composed).

This concludes a toy example of how we can express relationships between objects by assigning them values from a partial order. What Badiou does with Heyting algebras in Logics of Worlds is rather more complex, needless to say. But what may surprise you is that this approach to ordering entities by numbering primitive elements with primes, and their composites with products of primes, was first developed by…wait for it…Leibniz!

Functional SOA

Monday, March 27th, 2006

In the context of Service Oriented Architecture, a service is the conjuncture of an endpoint, a contract and a protocol. The endpoint is where you send messages to, the protocol determines how they get there, and the contract defines the semantics of the communication (understood as message-exchange; although with more complex services, especially those involving asynchronous messaging, questions of choreography begin to arise).

In a statically-typed language with reflection and annotation capabilities (e.g. C#), it is possible to derive all three aspects of a service from an annotated interface: the system can reflect over the interface, read off the annotations, generate appropriate message handlers and service definitions and hook these into a service broker (e.g. a web server that maps URIs to handlers). The interface is already a code object, a part of the language’s type system: the service can be instantiated on the server side by classes that implement that interface, and on the client side by proxy objects that dispatch method calls against that interface to the service endpoint via the specified protocol.

That’s how Microsoft’s .Net framework (and specifically Indigo) works. It makes use of type system information to construct both the plumbing (endpoint and protocol) and the public profile (contract) of a service. In effect, the IDL (interface description language) of systems such as COM and CORBA has migrated into the programming language via the type system: typed and annotated programs are thus self-describing, or at least interpretable by a service framework. Furthermore, this type information is available at run-time, and the type system permits values (objects) to be cast to the universal type (object), such that they can be treated solely as carriers of type information about themselves, and subsequently as targets for the “dynamic dispatch” of method calls. This makes it possible to write generic framework code that will work with any object that is passed to it, code that is solely concerned with the tasks of reflection and method invocation and consequently sits on the periphery of the normal compile-time-type-checked universe.

I’ve commented before on the suitability of pure functional programming languages for Service Oriented Architecture, the closeness of the fit between the functional idiom and SOA’s stateless façades. It should be evident, however, that the .Net approach will not work well in a language like Haskell, because no escape to the periphery of Haskell’s type system is permitted. We can perfectly easily construct a “universal” type in Haskell, and have values of that type carry type information around with themselves; but such a type would be “universal” only for the universe of types that we defined into it in advance (in other words, it would be polymorphic over a given range of types). Other kinds of sophisticated type-hackery are possible, and will give us the illusions of various kinds of freedom; but all involve the creation of sub-systems within the confines of Haskell’s basic type system, and force us to work within those subsystems whenever we need the flexibility they afford. (That’s just what we should expect from Haskell, of course! It’s the same philosophy that motivates the confinement of I/O side-effects to monads…)

I don’t want to dismiss the utility of some of the more hackish approaches to type-flexibility in Haskell, because it’s clear that some of them work very well. But I think that a more idiomatically Haskell-ish approach to SOA would not contort the type system in an attempt to replicate the mechanisms used by .Net, but instead make use of Haskell’s own particular strengths as a language. What I would suggest is that in Haskell the IDL should not be imported into the type system, from whence it would then have to be retrieved via some simulacrum of reflection, but should instead be implemented as combinator-based embedded language. Service definitions would then be values within the type system, constructed using this embedded language, that would be used in much the same way as the annotated interfaces obtained by reflection from .Net objects. This would provide the additional benefit of being able not only to inspect and interpret such values at run-time, but also to compose and transform them.

Internet Violence

Tuesday, February 28th, 2006

So, I wrote this script:

import httplib
from time import sleep

def vote():
        conn = httplib.HTTPConnection("www.frontpagemag.com")
        conn.request("GET", "/survey/vote.asp")
        print "Cookie monster want cookie!"
        r1 = conn.getresponse()
        cookie = r1.getheader('set-cookie')
        cookie = cookie[:cookie.find(';')]
        conn.close()
        conn = httplib.HTTPConnection("www.frontpagemag.com")
        conn.request("POST", "/survey/Response.asp", "profVote=46&VOTE=Vote+Now%21",
                     {'Content-Type': 'application/x-www-form-urlencoded',
                     'User-Agent':'Cookie Monster',
                     'Cookie': cookie})
        r1 = conn.getresponse()
        conn.close()
        print "Cookie monster cast vote now!"
	sleep(5)

while 1:
    vote()

What does it do? Every 5 seconds, it adds another vote for Michael Bérubé at attention-seeking gimp-boy David Horowitz’s poll of America’s Worst Professor over at FrontPage magazine. Note that I have nothing against Michael Bérubé personally; quite the reverse. He asked readers of his weblog to vote early and often for him, in the hope of beating such luminaries as Top! Public! Intellectual! Noam Chomsky and professional Native American impersonator Ward Churchill. I got a bit tired with hitting the back button and deleting cookies from my browser, so I decided to automate the process.

Horowitz’s grim parody of a McCarthyite hate-site slowed to a crawl on Monday; he’s blaming “leftwing vandals” for what amounts to a slashdotting of the votes page by a combination of manual refreshing, browser-based Javascript-driven automated vote-stuffing, and possibly the odd instance of an earlier version of the bot above (which, I confess, didn’t have the decency to wait a few seconds between requests). I would imagine that the biggest culprit was browsers refreshing the results page, as doing so involves downloading not only the HTML of the current leaderboard but also a whole slew of other rather more bandwidth-intensive crap: cartoons of Dick Cheney and so on.

In a way, this is reassuring: the failure of the FrontPage site to deal with such a temporary rise in popularity indicates that its regular readership must be fairly tiny. Still, it must be said: Horowitz’s inability, or refusal, to distinguish between a few japing academics hitting the refresh button on their browsers and a malicious and co-ordinated DDos attack is entirely unsurprising, as is the tone of righteous outrage he feigns in his denunciation of the “leftwing vandals” who took such hearty and appropriate delight in buggering about with his creepy and idiotic poll.

Numbers in Haskell

Monday, January 16th, 2006
import List
import Random

shuffleWith :: IO [Int]
shuffleWith = randomIO >>= seed -> return $ randoms (mkStdGen seed)

shuffle :: (Ord a) => [a] -> IO [a]
shuffle xs = shuffleWith >>= nums -> return $ (map snd . sort . zip nums) xs

getInt :: IO Int
getInt = do line <- getLine
            return (read line :: Int)

play :: [Int] -> Int -> IO ()
play numbers steps = do
	putStrLn $ (unwords . map show) numbers
	putStr "Reverse how many? "
	flipCount <- getInt
	let numbers' = (reverse . take flipCount) numbers ++ (drop flipCount numbers)
	if numbers' == sort numbers
		then putStrLn $ "Done! That took you " ++ (show steps) ++ " steps."
		else play numbers' (steps + 1)

main :: IO ()
main = do
	numbers < - shuffle [1..9]
	play numbers 1

Tripoli in Java

Sunday, January 8th, 2006

My Javacization of Tripoli proceeds apace. A snapshot of the source, including JUnit tests, can be grabbed from http://codepoetics.com/code/tripoli.tar.gz

Tripoli is an attempt to implement a “triple space” – basically a triple store with tuple space semantics. Browsing the tests is probably the quickest way to find out what it’s meant to do, but the general idea is that it provides some primitives for co-ordinating concurrent processes, based on a sort of shared “whiteboard” plus a notification mechanism that allows processes to wait for triples that match a particular pattern to be added to the triple space.

By restricting the tuples that can be added to the shared space to triples, we can introduce some extra semantics in the form of the collectGraph and copyCollectGraph operations, which treat each triple (V,E,V) as a description of an edge between two vertices of a directed, labelled graph and retrieve all of the vertices that are reachable from a given starting point. This enables the representation and querying of RDF-like data structures.

Java/Tripoli

Friday, December 30th, 2005

I’m translating Tripoli into Java, for the hell of it. Well, OK, not just for the hell of it. It’s time I did some Java programming – since I know C# very well, I arguably already know Java quite well, but it’s always good to have some working code to back up an argument like that. No use swanning around saying “I code in Haskell, so by definition I completely pwn every mainstream programming language”, even if that’s essentially true. pwnage must be ostensible. (And, p.s., do not try this argument on C++ mavens, unless you want to spend a very uncomfortable half hour fielding trick questions about copy constructor semantics).

Writing Java with Eclipse on Linux is a pretty good experience, particularly with integrated JUnit support – I’m cranking out test cases at a reasonable speed. TDD feels more natural, more sensible, every time I try it. Must get to grips with HUnit some time soon…

Postgres / pgadmin3

Wednesday, November 30th, 2005

Had a first serious go today at building a database schema in Postgresql using pgadmin3. It’s a fine tool, although I suspect I might go faster writing CREATE TABLE statements by hand once I get used to Postgres’ flavour of SQL.

What I’m looking at right now is a variant on the table-of-triples approach to sparse and latently-structured metadata. In the table-of-triples approach, you represent the attributes of each item by asserting a series of subject-predicate-object triples with the item as the subject, e.g.:

subject     predicate    object
------------------------------------------------
dominic    likes             sausages
dominic    hates           mashed potato

In the approach I’m considering, you have one table containing predicate-object pairs, each of which has an auto-incrementing integer alias, and another table containing subject-alias(predicate-object) pairs, e.g.

property_id   predicate   object
----------------------------------------------------
1                     likes           sausages
2                     hates         mashed potato
subject    property_id
------------------------------------
dominic   1
dominic   2

The point of this is to eliminate some redundancy (assuming that certain predicate-object pairs occur frequently, e.g. lots of people like sausages, or have the surname “Smith”). It also allows us to split a search into two parts – first find the property_id for a given property, then find the subjects that have that property – which means that the results of the first part of the search can be cached, speeding up repeated queries with the same criteria.

Electronic records management

Monday, November 28th, 2005

I’ve been thinking about what a free (libre) electronic record management system (ERMS) would look like, whose needs it would address and how it would be put together.

Record management is a complex area, but a certain amount of the complexity of electronic systems is in my view unnecessary – it’s a by-product of attempting to transpose the methodology of paper-based record management into an electronic software product.

There is no need, for example, to have every record located within a monolithic hierarchical file plan. Electronic records still need to be organised and classified, in quite a variety of ways, but as they no longer have a physical location as such, it’s no longer necessary for that classification to encode information about where you have to go in order to find them.

Already in ERMSes that use a hierarchical fileplan it’s quite common to have a record “located” in several places at once (or cross-referenced between multiple locations). Something like Google mail’s system of applying tags to records, rather than “filing them away” in “folders”, seems ultimately more appropriate to this reality. A sufficiently flexible query engine would allow the user to traverse any of several overlapping classificatory schemes in search of related material, providing multiple views (some of which might indeed be hierarchical) onto a multidimensional grid of associations.

A user searching for related information should be able to pull on any of several threads criss-crossing through a particular record, and see what else is strung on each thread or where any particular pair of threads cross again elsewhere.

A user looking for a group of records with something in common should be able to identify that something unambiguously without having to refer to a monolithic hierarchical fileplan that creates ambiguities whenever some new piece of information has to be slotted into it that doesn’t quite fit into any of its little boxes – does it belong over here, or over here…?

A user capturing some new piece of information and creating a new record should be able to give it a partial classification that captures some, but not all, of its salient characteristics, such that this classification can be extended over time as other users discover new contexts in which that record can be seen as pertinent.

Oh, and all of this has to be realised without creating a hideously complex and completely unmanageable mess…

Hobby update

Tuesday, October 4th, 2005

Not much to report, except that I’ve added some rudimentary error handling and changed the Makefile so that it actually builds the test app. Now when we fail to log in, we see the actual error message returned from libobby.

The two big things I need to do next are i) build libobby 0.3.0-dev, and ii) figure out what to do with libsigc++, so I can handle the arrival of the server’s “welcome” packet and time the login appropriately.

Hobby: a Haskell Wrapper for Obby

Monday, September 26th, 2005

Obby is a C++ library (libobby) that provides synced document buffers – essentially, buffers that can be shared between multiple clients over a network, in such a way that changes to the buffer made by one client are immediately propagated to all of the other clients. It is used by the extremely cool collaborative editor Gobby.

Hobby is an attempt to create a pure C API for libobby, and to make the functions in that API available to Haskell programs via Haskell’s FFI. It uses c2hs to generate Haskell bindings based on the header files of C code. If you have darcs, you can obtain the current source of Hobby via

darcs get http://www.codepoetics.com/hobby

The code currently in the repo simply sets up and tests a toolchain that consists, at the moment, of GNU make, g++, sed, c2hs and ghc. Makefiles control the build; g++ creates the C API object files; sed converts C++ header files to C header files by stripping out the “extern C” declarations; c2hs creates the Haskell FFI bindings for the C header files, and ghc compiles them.

The next stage in the project is to wrap as much of the ClientBuffer object code as possible, and see if we can get some working Haskell code to connect to a running Gobby session.

Gnome goodness

Friday, September 2nd, 2005

I built GTK+ 2.8.3 from sources yesterday, and am now looking at a Gnome desktop of considerable visual elegance and charm. Trouble is, up until yesterday I was using KDE exclusively, so I have no idea whether the GTK+ upgrade is at all responsible for the graphical quality I now see in front of me. Either way, the font rendering in particular is just lovely. It’s almost as nice to look at as OS X running on my dad’s Mac.

The reason for building the latest-and-greatest GTK+ wasn’t in any case to make Gnome look pretty, but so I could check out Paolo Martini’s addition of Cairo support to Gtk2Hs.. Cairo is the 2D vector graphics library that will soon be drawing everything you see on your Gnome desktop, potentially via an OpenGL back-end. It’s a technology I was excited about, but skeptical of ever seeing in action, two or three years ago – now it’s nearly cooked, and the Linux desktop is about to take another big step forwards.

Things I Can No Longer Be Bothered With

Monday, August 22nd, 2005

I can no longer be bothered with UML. It no longer matters to me whether or not I have Visio installed on my Windows desktop, or whether it has a nice set of class diagram templates I can use. I won’t use them even if it is, and has. Nobody else around me uses UML, except in the most informal way for back-of-an-envelope diagrams, and that’s fine by me. If the day arrives when I’m surrounded by people all using UML very professionally to communicate with each other, I’m sure I can catch up.

I can no longer be bothered with Twisted. Everything those guys say about threads is wrong, the moment you start working in an environment where threads are lightweight and the concurrency model is no longer shared state + mutexes. Admittedly Python isn’t that environment (although Stackless could be). But what that really means is that Python isn’t the tool for the job (of building high-performance network servers), rather than that one should add the burden of working with a frankly bogglesome programming model to all the other worries one has when trying to make software of that kind. Also, it bugs me that the Twisted guys don’t want to see that there are alternatives to shared state concurrency, and base a large part of their objection to threaded programming on the known deficiencies of the shared state model. I know of a good book they all should go away and read.

I can only intermittently be bothered with Python itself, although it’s still the language I’m most comfortable with. I’m considering migrating Tripoli to Haskell, and maybe adding a parser and execution engine for an RDF query language to it. On the other hand, I’m not sure I can really be bothered with RDF any more. Although I still think triplespaces are a neat idea.

I can hardly be bothered with anything published by O’Reilly any more. I’d still want the next edition of the Javascript book, and maybe a couple of other reference texts – say, on Postgres’s PL/SQL – but I’m frustrated with the absence of decent primers for non-mainstream languages in their catalog. Where’s Programming Haskell? The Oz Cookbook?

I can’t be bothered with LAMP the way I used to be. As a web application stack, it’s effective enough, but it’s a bit of an evolutionary dead end. PHP5 isn’t magically not-crap just because it has objects. MySQL had the wrong sorts of beginnings in life to ever mature into a decent database. Linux and Apache I’m more contented with, because Linux is evolving and Apache essentially doesn’t need to (“HTTPD, HTTPD, / I know I am, I’m sure I am, I’m HTTPD…”). There is an increasingly broad range of alternatives out there – Ruby on Rails is the current lead attention-getter – and that’s a good thing. Maybe in a few years everyone will be doing LYME (Linux, YAWS, Mnesia, Erlang)…

Python List Monad

Thursday, August 18th, 2005

Another Python Cookbook Recipe, translating a little something from the world of Haskell into Python.

Ugly! Ugly! Ugly!

Thursday, July 28th, 2005
HRESULT IShellExtInit::Initialize (
    LPCITEMIDLIST pidlFolder,
    LPDATAOBJECT pDataObj,
    HKEY hProgID );

Piddlefolder indeed.

The notational conventions used in MFC (and, here, ATL) programming do convey useful information about the values being referred to, but they convey it in such an utterly graceless way that I think I’d rather not know.

class ShellExtInit a where
   Initialize :: a -> IORef [Item] -> IORef DataObject -> ProgId -> IO ResultCode

That’s better.

Tripoli on SourceForge

Thursday, May 19th, 2005

Tripoli now has a SourceForge project page.

The multiple-readers-one-writer locking class from the synchronization module was adapted and improved by Matthew Scott and turned into a handy Python Cookbook recipe.

Part of Matthew’s motivation for writing his own version of this code was that he needed to use it in a non-GPL project, and all of Tripoli is currently licensed under the GPL. I don’t in fact believe that library code – in particular the implementation of generic methods and techniques – requires the protections of the full GPL: once the technique is known, there is little that can be done to it in the way of proprietory modification. So the next release of Tripoli will probably be split into two packages, an application package under the GPL and a library package (including the concurrent Python classes) under the LGPL.

The cookbook recipe has two other significant advantages over my original code: it’s re-entrant, and it’s unit tested. I’ve been working on making my own implementation re-entrant, and creating some appropriate tests for it: these also will be in the next release. My approach to re-entrancy is slightly different to Matthew’s, in that the original non-reentrant class is kept as it is and a re-entrant wrapper provided for it. This makes for a cleaner design, in my view. I’m also looking at some ways of supporting some common patterns in testing multi-threading code.

Once the MrowLock updates are done, I’ll start on a proper HTTP-based Tripoli server – the current XML-RPC version was just a quick hack to get something working.

Code organisation in OpenLaszlo

Sunday, May 15th, 2005

Finally this week it was my turn to ride the OpenLaszlo pony at work. My first priority was to figure out some ways in which large canvas definitions with deeply-nested views liberally soused with javascript code could be factored out into smaller and more manageable modules.

The first trick I tried was to extract any large chunks of javascript code (method bodies, etc.) into named functions in separate files, and import them with a <script> tag with a src attribute. Pretty soon I also had a couple of files with libraries of utility functions – things like setPropertyOnAll(elems, propertyName, value) which would take an array of components and make them all visible, or partially opaque, or whatever else you wanted. Javascript is an under-appreciated language, I often think – people tend to use it just for straight scripting, without realizing the power that its closures and prototype-based object system actually give you.

The next trick was to “flatten out” the source, extracting nested views and menus into class definitions in library files and then importing the libraries and dropping class instances into the places where the views had been. Again, some common patterns fairly quickly emerged – there were lots of buttons that caused a hint to appear elsewhere in the canvas on mouseover and made it disappear again on mouseout, and it was a simple matter to create a new class derived from basebutton with a hint attribute that had this behaviour baked-in.

So some of my initial worries about how OpenLaszlo code would scale were assuaged: it’s clear that you can build a full-scale OpenLaszlo application out of a collection of composable bits and pieces, provided you take care to use the appropriate mechanisms for modularity and code re-use.

Tripoli + XMLRPC = Somewhat Useful?

Saturday, April 30th, 2005

I’ve wrapped the main Tripoli object, the impressively-named Manifold, in an XMLRPC server and created a client for it. The client needs to be connected to a TripleCallbackListener that will receive callbacks from the server when a triple matching a request pattern becomes available. This means that get and read operations are completely asynchronous.

Here’s a quick start guide to using the stuff. First, do this:

from xmlrpc import TripleCallbackListener, ManifoldServer, ManifoldClient

listener = TripleCallbackListener(8023)
server = ManifoldServer(8024)
client = ManifoldClient(server.url, listener)
server.start()
listener.start()

We’ll start by requesting some things that don’t exist yet, so we can see what happens when they do.

def printTriple(triple):
    print triple.subject, triple.predicate, triple.object

client.get("likes", "*", "*", "*", printTriple)
client.get("dislikes", "*", "*", "*", printTriple)

The first parameter is the name of the triple space to query, the next three are a pattern to match (all wildcards here, so any triple will do), and the last is a function to call back when a matching triple becomes available. Ours will just print whatever matching triple is returned.

Now we will create some triples in a triple space called “default”:

client.put("default", "Dominic", "likes", "Python")
client.put("default", "Dominic", "hates", "VB.Net")
client.put("default", "Python", "is", "cool")
client.put("default", "VB.Net", "is", "teh suck")

We’ll copy my likes into one triple space, “likes”, and my dislikes into another, “dislikes”. The pattern here is wildcarded for the subject and object of the triple, but requires an exact match on the predicate.

client.copy("default", "*", "likes", "*", "likes")
client.copy("default", "*", "hates", "*", "dislikes")

Our callbacks got called (hopefully)!

Note that copy returns an integer, which is the number of tuples copied. Now we’ll copy-collect the whole graph into another triple space:

count = client.copy_collect_graph("default", "Dominic", "dominic")

Let’s retrieve those triples:

for i in xrange(count):
    client.get("dominic", "*", "*", "*", printTriple)

That should give you the general idea.

Tripoli + HTTP = Really Useful?

Wednesday, April 27th, 2005

Tripoli needs an HTTP server to be really useful. You could run one on your desktop, and use it to tag files with arbitrary metadata, including their relationship to other files.

Trusted users on other machines in the network could access the server to request copies of those files. You could flag files for the attention of other users, or insert them into workflow processes by giving them a state attribute.

copy, copy-collect, copy-graph and copy-collect-graph could be used to move triples between triplespaces running on different servers.

The other things it needs are persistence and reliability. At the moment all the triples are held in memory, and will vanish when the triplestore server is shut down. Saving a bunch of triples to a file on disk, and restoring them again later, is not hard. Real-time logging to disk of additions and removals would make it possible to restore from the last good save in the event of unplanned shutdown.

The other other thing it needs is a short and simple tutorial somewhere explained what the heck it’s for, and how to use it…