Hector: A Literate Haskell Implementation of ELIZA

Monday, 09 November 2009Tags: haskell, AI, code

Hector is a Literate Haskell implementation of Weizenbaum's ELIZA program.

ELIZA is more or less the original “chatterbot”. Implemented by Weizenbaum in the mid-1960s to demonstrate how seemingly “intelligent” behaviour could be mimicked by very simple pattern-matching, it shocked even him with how much ordinary people were taken in by the illusion. It is probably best known in conjunction with its DOCTOR script, which allows it to act like a Rogerian psychotherapist (a version of this can be invoked in the Emacs text editor with “M-x doctor”). One user was reportedly so affected by the program that he requested that Weizenbaum give him some privacy to discuss his issues with the doctor!

The program works by simple pattern matching. The user's input is checked against a list of pre-defined patterns, and if there's a match, a corresponding output is returned. Eliza also has some facilities for extracting content from the input and using it in the reply (e.g. “I need a vacation” might result in “How would you feel if you got a vacation?”). The final bit of finesse is a rudimentary memory, so that it can bring up past topics again if it doesn't know what to say (Hector does not do this).

Hector is a Haskell reimplementation of a very similar system. It is written in a “literate” style, and should be simple to follow along with. Hector is not a doctor like Eliza — instead, he's more of a grizzled old programming veteran, having got his start feeding punchcards into a mainframe in his undergraduate years. He can't help you with relationships or inferiority complexes, but he'll be happy to discuss your programming problems with you.

A few sample conversations (My responses prefixed with >):

Hello there. What's the matter?
> I don't know how to write programs in Haskell.
Does anyone have a reference on how to write programs in Haskell?
> Yes, there are lots on the internet.
Oh Internet... a thousand monkeys and associated typewriters, eh?
> More or less, yeah.
At least we agree on something!
Hello there. What's the matter?
> I have the flu.
How long have you had the flu?
> Four days.
I'm pretty sure Dijkstra has a paper that would help you.
> With the flu? I doubt it.
Hello there. What's the matter?
> My regular expressions are wrong.
Regular Expressions... Now you have two problems.
> That's not very helpful.
So, have you read Knuth's books?
> No.
Well, I don't know what to say, then.

I am by no means knowledgeable about Haskell: this program is just the best efforts of a beginner, and I'd be very happy to hear feedback on how to make it more idiomatic or expressive. In particular, the PRNG seed passing strikes me as a kludge, and I suspect there's a nicer way to handle the IO aspects, for example with interact and lines.

The code below covers the interesting parts of the program, but the actual input/response pairs have been left out for conciseness (and because they're largely boring). You can also view the full source.

A short list of references about Haskell and AI history/programming can be found at the end of the post. More can be found by looking at the “related items” on the linked-to pages.

The Code

We're going to use regular expressions for our pattern matching, and select responses randomly from the set of available responses to that input, so we'll need some imports.


import Data.List
import System.Random
import Text.Regex

The main loop of the program is quite simple: we just keep prompting the user for input and outputting a response until we get an EOF. Before we enter this loop, we'll just print a short greeting. Each iteration of the loop we'll take advantage of the system random number generator to create a new generator for use in picking our output.


main :: IO ()
main = do
  putStrLn "Hello there. What's the matter?"
  mainloop

mainloop :: IO ()
mainloop = do
  input <- getLine
  seed <- randomIO
  putStrLn $ respond input (mkStdGen seed)
  mainloop

The format we'll use for our canned input/response pairs is simple. Each possible input is a regular expression, originally entered as a string. Before use, we'll do some gentle massaging to get it into the Regex type our libraries want. Every input/response pair actually consists of many different inputs and responses: any input in the pair can cause any of the responses to be output. In other words, if any of the inputs in a pair match, one of the outputs in the pair will be returned.


type RawIRPair = ([String], [String])
type IRPair = ([Regex], [String])

Our actual input/response pairs are defined at the end of the program, since they're very bulky and largely uninteresting. They're grouped into categories, so we'll need to put all of them together into one big list.


combinedResponses :: [RawIRPair]
combinedResponses = foldl' (++) [] [general, tech, nonSequitur]

While we're at it, let's convert our input strings into Regexes — this is the final form that we'll actually use. Note that we're making the regular expressions case-insensitive.


responses :: [IRPair]
responses = regexify combinedResponses
  where regexify :: [RawIRPair] -> [IRPair]
        regexify []     = []
        regexify (x:xs) =
          [(map (\i -> mkRegexWithOpts i True False) (fst x), snd x)] ++
          regexify xs

Here's where the meat of the program happens: the process of selecting a response. Essentially, we just find the list of responses that match our input, and pick one at random.


respond :: RandomGen g => String -> g -> String
respond input randGen = pickResponse (findMatchingResponses input) randGen

The process of picking the response at random is also straightforward. Once it's chosen at random, we run it through a function that substitutes any captured subgroups in place of patterns of the form !0, !1, etc.


pickResponse :: RandomGen g => ([String], [String]) -> g -> String
pickResponse (responses, substrings) randGen =
  fillIn (responses !! chosen) substrings
  where chosen = (fst $ randomR (0, length responses - 1) randGen :: Int)

fillIn :: String -> [String] -> String
fillIn response substrings = fillInCount response substrings 0

fillInCount :: String -> [String] -> Int -> String
fillInCount response (x:xs) n =
  fillInCount (subRegex (mkRegex ("!" ++ show n)) response x) xs (n+1)
fillInCount response [] _ = response

Finding the list of matching responses, however, is a little trickier, largely due to the deep nesting that we need to flatten out before we get a chance to actually try to match our regular expressions. Along with the list of matching potential responses, we return the list of matching substrings (if any).


findMatchingResponses :: String -> ([String], [String])
findMatchingResponses input = checkListOfTuples input responses
  where checkListOfTuples input (x:xs) =
           case (checkTuple input x) of
             ([], []) -> checkListOfTuples input xs
             (responses, substrings) -> (responses, substrings)

checkTuple :: String -> IRPair -> ([String], [String])
checkTuple input (i:is, r) =
  case (matchRegex i input) of
    Just a -> (r, a)
    Nothing -> checkTuple input (is, r)
checkTuple _ _ = ([], [])

Finally, we need to give Hector something to say. We do this by creating the list of input/response pairs references in our list-flattening function above. I've left this out of the post because it's largely uninteresting, but it looks something like this:


general :: [RawIRPair]
general = [
  (["hello"], ["Hi. What's going on?", "Yes, hi. Tell me your problem."]),
  (["I need ([a-zA-Z ]*)", "I want ([a-zA-Z ]*)"],
   ["Well, do you think it's at all possible to get !0?",
    "What would change if you got !0?"])]

References

Paradigms of Artificial Intelligence Programming (Amazon.com, Amazon.ca)
Real World Haskell (Amazon.com, Amazon.ca, free online)
Programming in Haskell (Amazon.com, Amazon.ca)

Extensions

There are a few niceties that Hector doesn't implement, that can be considered the infamous “exercises left to the reader”. These include: memory of already-used phrases, memory of previous topics, a much more extensive and useful set of input/response pairs, and actual sentience and intelligence.

ELIZA implements the first three.

Felix Crux