Hulk is an working-and-work-in-progress IRC server. Github repo.
Last Wednesday night I whipped up a simple IRC server in Haskell in about four hours. We have been long time sick of the poor quality of the Skype Linux implementation, which was, on the dev team, our main point of communication. We agreed something like IRC would be good, so I thought it would be easy in Haskell to make such a thing, and it was; the next day we were chatting on it!
I noticed that Peteris Krumins made a blog on Thursday about how to write a TCP server in Haskell. I thought, “That’s good timing!” I also use the Network module and Control.Concurrent, and do pretty much everything he is demonstrating in the blog. So it seems like Hulk is a nice “real-world” demonstration of a non-simple TCP server in Haskell.
Our requirements for a server are quite narrow:
We also pipe some feeds to it like tickets, Git commits, site issues, etc.
In the spirit with which Peteris writes I thought that I might describe the design of the project a little bit.
$ ls
auth history hulk.conf README src
dist hulk.cabal LICENSE Setup.hs txt
It’s good Haskell practice to start any project with
cabal init
which asks you a series of questions and
generates a .cabal
file for you. Common practice is to put
source in the src
dir, and have your Project in a
sub-directory matching the project name:
$ ls src
Control Data GeneratePass.hs Hulk Main.hs
Code that isn’t specific to the particular project but could be used
anywhere should go in appropriate modules such as
Control.*
, Data.*
, etc. It occurs commonly
that you will need this code in other projects and because the
dependency between these modules and your main project’s modules is only
in one direction you can simply copy the files over to your new
project.
Control.Monad.IO
Data.String
Hulk
Hulk.Auth
Hulk.Client
Hulk.Config
Hulk.Event
Hulk.Options
Hulk.Providers
Hulk.Server
Hulk.Types
Main
The first two just contain utilities that I tend to use often. The
Main
module is the main entry point, then control goes to
Hulk.Server
which starts listening on the right port,
accepting connections and handling/sending messages to/from clients.
In order to handle messages and reply to them from clients, the
Hulk.Client
module is used. The code in Hulk.Client is
entirely pure, and it is the bulk of the project. This is an intentional
effort. The original program I whipped up used a bunch of MVars and was
basically an imperative program, and about as confusing.
Another “good practice” is for Haskell programs to be like a well-oiled super villain base. On the edge is where all the explosions happen, and inside is where the bad guys sit and drink Orzo and control everything.
Impure code is like the wreckless henchmen who always wreck everything, and double-cross you at every opportunity. Pure code is the evil genius who devises the master plan, tells the henchmen what to do, and keeps them in separate living quarters.
It’s also common to put all your types into one module named
Types
, as you tend to use types from every module and this
avoids circular dependency problems in the long run.
The main entry point to the project is in Main
, as it
should be:
{-# OPTIONS -Wall #-}
module Main where
import Network
import System.Console.CmdArgs
import System.Posix
import Hulk.Config (getConfig)
import Hulk.Options (options,optionsConf)
import Hulk.Server (start)
import Hulk.Types ()
main :: IO ()
main = withSocketsDo $ do
_ <- installHandler sigPIPE Ignore Nothing
cmdArgs options >>= getConfig . optionsConf >>= start
I initialise the sockets subsystem for Windows and then install a
handler for SIGPIPE
, because that signal is sent in Unix
when a program attempts to write to a socket that has been closed. Both
Windows and Unix have their novel design choices. Go figure.
I’m using the CmdArgs library, tutorial here by the author, Neil Mitchell, which I am pleased is becoming part of my standard project repertoire.
I define my options merely as a way to specify the configuration file, for now.
{-# LANGUAGE DeriveDataTypeable, RecordWildCards, ScopedTypeVariables #-}
{-# OPTIONS -Wall -fno-warn-missing-signatures #-}
module Hulk.Options (Options
,options
,optionsConf) where
import System.Console.CmdArgs
data Options = Options
{ conf :: FilePath
} deriving (Show,Data,Typeable)
options = Options
{ conf = def &= opt "hulk.conf" &= help "The config file."
}
&= summary "Hulk IRC Daemon (C) Chris Done 2011"
&= help "Runs an IRC server based on the provided configuration file."
And I read the config file with the great ConfigFile library:
{-# OPTIONS -Wall -fno-warn-missing-signatures -fno-warn-name-shadowing #-}
module Hulk.Config
(Config(..)
,getConfig)
where
import Data.Word
import Data.ConfigFile
import Hulk.Types
getConfig :: FilePath -> IO Config
getConfig conf = do
contents <- readFile conf
let config = do
c <- readstring emptyCP contents
hostname <- get c "LISTEN" "hostname"
listen <- get c "LISTEN" "port"
motd <- get c "STRINGS" "motd_file"
preface <- get c "STRINGS" "preface_file"
passwd <- get c "AUTH" "passwd_file"
key <- get c "AUTH" "passwd_key"
return Config { configListen = fromIntegral (listen::Word16)
, configMotd = Just motd
, configHostname = hostname
, configPasswd = passwd
, configPasswdKey = key
, configPreface = Just preface
}
case config of
Left cperr -> error $ show cperr
Right config -> return config
The reading process is merely a simple monad that either returns the
Config
object or an error. I choose to just throw an error
when there’s an issue. I use this library for pretty much every project
I use, it really is an essential library.
An alternative way to express the code above is to
runReaderT
, define a function like
get' = ask >>= flip get
and then you can express the
above with Applicative operators (<$>)
and
(<*>)
.
This module will read a file like this:
[LISTEN]
port = 6667
hostname = cn-done
[STRINGS]
motd_file = txt/MOTD
preface_file = txt/PREFACE
[AUTH]
passwd_file = auth/passwd
I start the server by using the listenOn
function,
covered in Peteris’s post, and accept connections, setting the buffering
to NoBuffering
. This turns out to be rather important; as
Peteris mentions, this avoids surprises with buffering, which is
something I experienced when testing out LineBuffering
in
this project. In certain situations unknown to me, access to handles
locks up.
-- | Start an IRC server with the given configuration.
start :: Config -> IO ()
start config = withSocketsDo $ do
hSetBuffering stdout LineBuffering
listenSock <- listenOn $ PortNumber (configListen config)
envar <- newMVar Env { envClients = M.empty
, envNicks = M.empty
, envChannels = M.empty }
forever $ do
(handle,host,_port) <- accept listenSock
hSetBuffering handle NoBuffering
let conn = Conn { connRef = newRef handle
, connHostname = host
, connServerName = configHostname config
}
_ <- forkIO $ handleClient config handle envar conn
return ()
I fork a new thread per handle. No big deal. I have one value,
envar
, of type MVar Env
, which stores the
state of the whole server. It can only be accessed by one thread at a
time, that’s why I put it on an MVar
. The definition of
Env
is:
data Env = Env {
envClients :: Map Ref Client
,envNicks :: Map Nick Ref
,envChannels :: Map ChannelName Channel
}
where
newtype Ref = Ref { unRef :: Handle }
deriving (Show,Eq)
Ref
is merely a wrapper for Handles, to avoid me
accidentally using a handle. It is only used as a unique reference to a
client.
In the new thread I have a handler which receives newlines from the client:
-- | Handle a client connection.
handleClient :: Config -> Handle -> MVar Env -> Conn -> IO ()
handleClient config handle env conn = do
let runHandle = runClientHandler config env handle conn
runHandle $ makeLine CONNECT []
fix $ \loop -> do
line <- catch (Right <$> UTF8.hGetLine handle) (return . Left)
case filter (not.newline) <$> line of
Right [] -> loop
Right line -> do runHandle (line++"\r"); loop
Left _err -> runHandle $ makeLine DISCONNECT ["Connection lost."]
where newline c = c=='\n' || c=='\r'
I get a line which is Right, or fail and return what’s Left. The case
of getLine failing is when the socket is closed. I ignore messages only
containing newline characters, and the middle case is actually getting a
valid line which I pass to runHandle
that runs the pure
client handler, then loop
s again.
To run the client handler, I have the following function:
-- | Handle a received line from the client.
runClientHandler :: Config -> MVar Env -> Handle -> Conn -> String -> IO ()
runClientHandler config env handle conn line = do
modifyMVar_ env $ \env -> do
(replies,env) <- runReaderT (runHulkIO $ handleLine env conn line) config
mapM_ (handleReplies handle) replies
return env
It passes the program state (env
) and the current
connection info (conn
) to the function
handleLine
, which is the single export from
Hulk.Client
, which is a transformer over an arbitrary
monad. Technically, in this case I’m running it inside a
readerT
on IO
, so it’s not actually pure. The
handleLine
action returns a bunch of replies/instructions
for the Server
module to perform and a new state
(env
).
When I said that the Hulk.Client
module was pure, I
meant that it is abstracted over whether it is pure or impure, and
therefore can be treated as pure for testing and developing, and when
running the server, runs in IO, but only 0.1% of the code uses IO. Also,
when I said “arbitrary monad”, I meant any monad implementing the
MonadProvider
class.
class Monad m => MonadProvider m where
providePreface :: m (Maybe String)
provideMotd :: m (Maybe String)
provideKey :: m String
providePasswords :: m String
Meaning that these are the only “impure” things I need when running
the program. I need to read the preface, motd, key, and password files
on demand. In the IO
case, I simply read the file. In the
pure case, I can stick it in a Reader
or
Identity
monad and the whole computation is thus pure.
What’s the benefit? This means I can run arbitrary parts of the computation trivially, and make pure test suites out of it. QuickCheck my IRCd, anyone? The main benefits are not to have to worry about conflicting simultaneous threads, and being able to run any function from the module with whatever state one desires.
The Client
module replies with one of the following:
data Reply = MessageReply Ref Message | LogReply String | Close
MessageReply
: Send this Message
to the
given handle (Ref
).LogReply
: Log this String
.Close
: Close the current connection.I find this separation of IO and logic to be useful.
The rest of the project lies in Hulk.Client
and is
academic/straight-forward. I will explain the IRC
monad,
though:
newtype IRC m a = IRC {
runIRC :: ReaderT Conn (WriterT [Reply] (StateT Env m)) a
}
deriving (Monad
,Functor
,MonadWriter [Reply]
,MonadState Env
,MonadReader Conn)
MonadWriter [Reply]
.Conn
), but I
don’t/shouldn’t modify it or write to it.Env
, which is the whole
server state.Client
constrain to
MonadProvider
.This is called a monad transformer stack. Haskell Wikibook on transformers
Example of a MessageReply
:
-- | Send a message reply.
reply :: Monad m => Ref -> Message -> IRC m ()
reply ref msg = do
outgoing $ encode msg
tell . return $ MessageReply ref msg
Examples of LogReply
:
-- | Log an outgoing line.
outgoing :: Monad m => String -> IRC m ()
outgoing = log . ("-> " ++)
-- | Log a line.
log :: Monad m => String -> IRC m ()
log = tell . return . LogReply
Example of the StateT
:
-- | Modify the nicks mapping.
modifyNicks :: Monad m => (Map Nick Ref -> Map Nick Ref) -> IRC m ()
modifyNicks f = modify $ \env -> env { envNicks = f (envNicks env) }
Example of using the Conn
object from the
ReaderT
:
-- | Make a new IRC message from the server.
newServerMsg :: Monad m => String -> [String] -> IRC m Message
newServerMsg cmd ps = do
hostname <- asks connServerName
return $ Message {
msg_prefix = Just $ Server hostname
,msg_command = cmd
,msg_params = ps
}
That’s all, folks! I hope this is useful to some people thinking of writing their first Haskell daemon project.
Haskell is the only language I know in which I can write 400~ lines of code without running it and then run it and have it work as expected.