Attempto Controlled English is a formally defined unambiguous language which is a subset of the English language. It’s pretty sweet.
I’ve known about it for some time, but I never fiddled with it because the standard implementation setup is rather elaborate. I wanted a nice, simple package in Haskell which would define a parser and a printer only, much like haskell-src-exts does. That way I can use ACE to parse some simple English for all sorts of purposes1, with a simple familiar API that I can peruse on Hackage. Partly it’s also a good learning experience.
So I went through the paper The Syntax of Attempto Controlled English
to see whether it was comprehensive enough to write a parsec parser out
of. It was! I first wrote a tokenizer
in with Attoparsec and wrote
some tests. From those tokens I produced a set of combinators
for Parsec, then I wrote a parser.
While writing the parser I produced a set of test-cases
for each grammar production. Finally, I wrote a pretty
printer, and wrote
some tests to check that
print . parse . print . parse = id
.
Newbies to Haskell parsing might find it an interesting use-case
because it tokenizes with Attoparsec
(from Text) and then parses its own token type (Token)
with Parsec. A
common difficulty is to avoid parsing from String
in
Parsec, which most tutorials use as their demonstration.
The Hackage package is here. I find the documentation interesting to browse. I tried to include helpful examples for the production rules. You shouldn’t have to know syntax theory to use this library.
Here is an ACE sample. We can parse the sentence “a <noun> <intrans-verb>” like this:
> parsed specification "a <noun> <intrans-verb>."
λRight (Specification (SentenceCoord (SentenceCoord_1 (SentenceCoord_2
SentenceCoord_3 (TopicalizedSentenceComposite (CompositeSentence
(Sentence (NPCoordUnmarked (UnmarkedNPCoord (NP (SpecifyDeterminer A)
(N' Nothing (N "<noun>") Nothing Nothing Nothing)) Nothing))
(VPCoordVP (VP (V' Nothing (ComplVIV (IntransitiveV "<intrans-verb>"))
(Nothing) Nothing) Nothing) Nothing) Nothing) []))))))
Anything to do with vocabulary is written as
<foo>
. The parser actually takes a
record of parsers so that you can provide your own parsers for each
type of word. These words are not of interest to the grammar, and your
particular domain might support different types of words.
If we pretty print the parsed phrase, we get:
> fmap pretty (parsed specification "a <noun> <intrans-verb>.")
λRight "a <noun> <intrans-verb>."
I.e. we get back what we put in. I also wrote a HTML printer. A more complicated sentence demonstrates the output:
for each <noun> <var> if a <noun> that <trans-verb> some <noun> and <proper-name>’s <noun> <trans-verb> 2 <noun> then some <noun> <intrans-verb> and some <noun> <distrans-verb> a <intrans-adj> <noun> <proper-name>’s <noun> <adverb>.
Can be printed with
fmap (renderHtml . toMarkup) . parsed specification
and the output is:
for each <noun> <var> if a <noun> that <trans-verb> some <noun> and <proper-name>'s <noun> <trans-verb> 2 <noun> then some <noun> <intrans-verb> and some <noun> <distrans-verb> a <intrans-adj> <noun> <proper-name>'s <noun> <adverb>.
The colors and parenthesizing embellishments are just to demonstrate what can be done. I’m not sure this output would actually be readable in reality.
This is a good start. I’m going to leave it for now and come back to it later. The next steps are: (1) write more tests, (2) add feature restrictions and related type information in the AST, (3) add a couple sample vocabularies, (4) implement the interrogative (useful for query programs) and imperative moods (useful for writing instructions, e.g. text-based games).
Specifically, I want to use this to experiment with translating it to logic-language databases and queries, and from that produce interactive tutorials, and perhaps experiment with a MUD-like game that utilizes it.↩︎