commit 6deb50a (2022-03-04 19:57:35 -0500) Torsten Scholak: improve doctests
Tagged as: haskell recursion generics parsing
Posted on Feb 2, 2022
49 min read
The adventure continues in this "Unrecurse" sequel. Previously, we bravely faced turmoil and confusion in a cruel world in which Haskell suddenly stopped supporting recursive function calls. We barely escaped the wrath of the compiler. This time, we try to survive an even more extreme situation: Haskell without recursive data types! It is the ultimate test of our programming skills. Will we make it through the final challenge, or is all hope lost? Join us in this journey about tapes and tribulations.
This is a Literate Haskell essay: Every line of program code in this article has been checked by the Haskell compiler. Every example and property in the Haddock comments has been tested by the doctest tool. I thank the Haskell community for making this possible.
To make this a proper Haskell file, it needs a header. There are several language extensions we need to enable:
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE DefaultSignatures #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DerivingVia #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE InstanceSigs #-}
Nice, this is more looking like your typical fancy Haskell file now. We will also need to import a meager handful of libraries, functions, and types:
module Flattening where
import Control.Applicative (Alternative (empty, (<|>)))
import Control.Lens
Cons (_Cons),
(
cons,
prism,
uncons,
withPrism,
zoom,
_1,
_2,
)import Control.Monad (MonadPlus, mfilter)
import Control.Monad.State
MonadState (get, put),
( StateT (runStateT),
evalStateT,
)import Control.Monad.Trans (MonadTrans (lift))
import Control.Monad.Writer (Writer, execWriter, tell)
import Data.Coerce (coerce)
import Data.Functor.Foldable (Base, Corecursive (embed), Recursive (cata, project))
import Data.Functor.Foldable.TH (makeBaseFunctor)
import Data.Kind (Type)
import Data.Maybe (fromJust)
import Data.Monoid (Sum (..))
import Data.Vector (Vector)
import GHC.Generics
Generic (Rep, from, to),
( K1 (K1, unK1),
M1 (M1, unM1),
U1 (U1),
V1,
type (:*:) ((:*:)),
type (:+:) (L1, R1),
)import Unrecurse (Continue (..), Kont (..), Stack, Tree (..), exampleTree, pop, push, while)
import Prelude hiding (even, odd)
You will notice that we are importing definitions from the
Unrecurse
module, which belongs to the previous article in this series.
For the Tree
type from the Unrecurse
module, we need a QuickCheck random generator to run property tests with
doctest
:
-- $setup
-- >>> import Test.QuickCheck
-- >>> :{
-- arbTree :: Arbitrary a => Int -> Gen (Tree a)
-- arbTree 0 = pure Nil
-- arbTree n =
-- frequency
-- [ (1, pure Nil),
-- ( 3,
-- Node
-- <$> arbTree (div n 2)
-- <*> arbitrary
-- <*> arbTree (div n 2)
-- )
-- ]
-- :}
--
-- >>> instance Arbitrary a => Arbitrary (Tree a) where arbitrary = sized arbTree
This should create random binary trees with a frequency distribution that is exponentially decreasing in the number of constructors.
Ok, enough beating around the bush. Now we can start with the actual content of the essay.
Last time on this channel, we
have seen how one can remove recursive calls from a function. We learned
about continuations, defunctionalization, and monadic State
effects. We used these techniques to reimplement two simple recursive
functions, printTree
and accumTree
, using only
iteration. These functions are both specific examples of a
fold
. They consume a value of type Tree
, a
data type for binary trees with two constructors, Nil
and
Node
. printTree
reduces the tree node by node
in depth-first, left-to-right order to an effect: leaf values are
printed to stdout
as they are encountered. On the other
hand, accumTree
reduces the tree to value, that is, the sum
of all leaf values.
Even though we worked very hard to remove all recursion from these
functions, we still have a problem. The definition of the
Tree
type was and remains self-referential: its
Node
constructor takes two Tree
values as
arguments. That makes Tree
a recursive data type,
and that is FORBIDDEN in recursion-free Haskell. Sorry, I don't make the
rules. So far, we did not dare to remove recursion from the
Tree
data type. This time, we are more ambitious!
The high-level idea is that we are going to store our
Tree
in a linear data structure we call a
Tape
. This will be done in a fashion that allows us to zoom
in on subtrees by slicing the Tape
.
As usual, we need a few ingredients:
Token
s, which are a set of values that are going to
represent different pieces of a Tree
.Tape
, which is a linear data structure that can be
written to and read from and that can be used to represent a whole
Tree
or parts of it.Tree
to a
Tape
of Token
s.Tape
of Token
s
to a Tree
.Tree
type that can be used to
construct or deconstruct a tree iteratively.Generic
code.We will cover these ingredients in detail in the following sections.
It will take some time to go through all of them. The slow pace will
help you to can get a feel for all this stuff. We shall now start by
defining the Token
and Tape
types. Chocks
away!
We define our tape as a newtype
wrapper around an
underlying type constructor, t :: Type -> Type
:
newtype Tape t a = Tape {unTape :: t a}
deriving stock (Eq, Show)
deriving newtype
Semigroup,
( Monoid,
Functor,
Applicative,
Monad,
Alternative,
Foldable
)
The type t
could be []
, Seq
,
Vector
, Deque
, etc. It doesn't matter, we
won't make a choice at this point. The only requirement is that there is
a way to attach or detach elements on the left side of t
.
The Cons
data class provides a way to formalize this
requirement, and the following code propagates this requirement to the
Tape
type by means of coercion:
instance
Cons (t a) (t b) a b =>
Cons (Tape t a) (Tape t b) a b
where
=
_Cons $
withPrism _Cons review' :: (b, t b) -> t b)
\(preview' :: t a -> Either (t b) (a, t a)) ->
( prism (coerce review') (coerce preview')
This class instance gives us a prism
that can be used to build or deconstruct a Tape
via the
cons
and uncons
functions from Control.Lens.Cons.
They basically work like (:)
and uncons
from
Data.List
, but they are polymorphic in the type
t
and thus can be used with any t
that
satisfies the Cons
requirement.
Let's now talk about what we are going to put on the tape. Our tapes
will be made up entirely of Token
s, to be defined
momentarily. Because of that homogeneity, it is a good idea to save us
some keystrokes and forge a handy type synonym:
-- | A tape of tokens.
type TTape t = Tape t Token
Each Token
will be used to represent a piece of
information about a particular Tree
. For trees with integer
leaf nodes, i.e. Tree Int
, we will only ever need four
Token
s:
data Token
= -- | Represent a recursive call to an abstract data type.
Rec Int
| -- | Represent a left choice between two constructors.
L
| -- | Represent a right choice between two constructors.
R
| -- | Represent an integer leaf node.
I Int
deriving stock (Eq, Show)
I will explain each of these tokens in more detail in a bit. Their function will become clear as we go along.
Now, how do we turn a tree into a token tape?
In general, we want a function -- let's call it
linearize
-- that turns a value of some type a
into a tape of tokens, TTape t
, without losing any
information. a
could be any type, but we explicitly want
this to work for a ~ Tree Int
in the end.
Let's give linearize
a type signature:
type To t a = a -> TTape t
And, because we like to keep things formal, a formal definition:
class
ToTokens
t :: Type -> Type)
(a :: Type)
(where
-- | Convert a value of type `a` into a tape of tokens.
linearize :: To t a
This is Haskell. And, in case you haven't noticed, the way of Haskell
is to make things as general as possible, sometimes until it hurts. For
that reason, this class is parameterized not only by the type of the
values we are going to encode, a
, but also by the tape's
type parameter, t
.
To annoy you further, I will give linearize
an arcane
default
implementation:
linearize ::
defaultRecursive a,
( ToTokensStep t (Base a)
=>
) To t a
= cata linearizeStep linearize
This definition uses the accurately named yet mysterious
Recursive
class. Recursive
gives us
cata
. Both of these are defined in Data.Functor.Foldable.
The cata
function is a generalization of fold
and takes two arguments:
linearizeStep
that is doing the actual
work. It has the type Base a (TTape t) -> TTape t
.a
.With these, cata
is recursively chewing up the value
a
and turning it into a TTape t
. I admit, this
machinery is a wee opaque. I will try my best to explain what is going
on. Stay with me.
Let's first zoom in on the cryptic type of
linearizeStep
. This is a function that takes a value of
type Base a (TTape t)
and gives us back a value of type
TTape t
. I guess it's clear what comes out of this function
(a tape of tokens), but what in tarnation are we passing here? What's
Base
, and why is it parameterized by both a
and our trusty token tape type?
Base :: Type -> (Type -> Type)
, as it turns out,
is also coming from Data.Functor.Foldable.
It is an open type family and can be thought of as a type-level registry
of so-called "base functors". A registered base functor,
Base a r
, is a non-recursive data type that is derived for
a specific recursive data type, a
. The type parameter
r
is used to represent recursion in a
. How?
Think of it in the following way: Base a r
is structurally
equal to a
except that r
takes the place of
all recursive occurrences of a
in a
.
For instance, the base functor of our Kont
type from the
previous installment of this series
is:
-- | A base functor for `Kont`.
data KontF next r
= -- | Terminate a computation
FinishedF
| -- | Continue a computation with `next`
MoreF next r
deriving stock (Eq, Show, Functor)
The r
type parameter appears exactly where
Kont next
appears in the original More
constructor of Kont next
. Go back to the definition of
Kont next
and check for yourself if you don't believe me.
Off you pop.
Quick side node on naming. It is customary to name the base functor
and its constructors after the recursive data type they are associated
with (in this case, Kont
) except for appending the letter
F
for "functor". Like the name suggests, a base functor is
always a functor in the type parameter r
, and Haskell can
derive that instance for us. Neat.
Now, with KontF
in hand, we can write the following type
family instance:
type instance
Base (Kont next) =
KontF next
This tells Haskell that the base functor of Kont
is
KontF
.
How is all this going to help us?
Like we said before, the argument of linearizeStep
is of
type Base a r
with r ~ TTape t
. If
a
were Kont next
, then Base a r
would be KontF next (TTape t)
. And, likewise, if
a
were Tree Int
, then Base a r
would be TreeF Int (TTape t)
. That means that
linearizeStep
always works on a version of a
where recursive constructors are replaced with token tapes,
r ~ TTape t
.
We now understand that linearizeStep
takes a special
non-recursive version of a
and that it is supposed to
produce a token tape. But how should this transformation look like?
Let's dive into a concrete example and try to understand how things
should play out for a ~ Kont Int
. This is a bit easier than
reaching immediately for trees.
First, consider the base case. For a finished continuation,
FinishedF
, our encoding should look like this:
-- | A linearized finished continuation.
-- >>> linearizedFinished
-- Tape {unTape = [L]}
linearizedFinished :: TTape []
=
linearizedFinished let finished :: KontF Int (TTape []) =
FinishedF
in linearizeStep finished
This base case is particularly easy to deal with since the
FinishedF
constructor has no arguments. The only
information we need to encode is the constructor itself. I use the token
L
(for "left") to represent FinishedF
, because
it appears on the left side in the sum type KontF
. Thus,
the linearizedFinished
tape should have one element: the
token L
.
Now, let's take a look at the recursive case: For a continuation with
one more step, MoreF
, the situation is more complicated,
but only slightly so. I propose the following encoding:
-- | A linearized continuation with one more step.
-- >>> linearizedMore linearizedFinished
-- Tape {unTape = [R,I 0,Rec 1,L]}
-- >>> linearizedMore (linearizedMore linearizedFinished)
-- Tape {unTape = [R,I 0,Rec 4,R,I 0,Rec 1,L]}
-- >>> linearizedMore (linearizedMore (linearizedMore linearizedFinished))
-- Tape {unTape = [R,I 0,Rec 7,R,I 0,Rec 4,R,I 0,Rec 1,L]}
linearizedMore :: TTape [] -> TTape []
=
linearizedMore previousTape let more :: KontF Int (TTape []) =
MoreF 0 previousTape
in linearizeStep more
I hope the examples make it clear enough that in this encoding:
R
(for "right") is the token for
MoreF
.I 0
(for "integer") is the token for the first argument
of MoreF
. That argument is always 0 :: Int
in
this contrived example.Rec _
is the token for the recursive case. Its argument
counts the number of tokens needed to encode it. Effectively, this just
measures the length of the previous tape we pass to the
linearizedMore
function.Note how, in the above examples, calls to linearizedMore
are nested to create a tape that encodes progressively more recursive
calls to the MoreF
constructor. What I have done here
manually will in the end be done for us automatically by
linearize
thanks to the Recursive
type class
and cata
:
-- |
-- >>> linearize (Finished :: Kont Int) :: TTape []
-- Tape {unTape = [L]}
-- >>> linearize (More 0 $ Finished :: Kont Int) :: TTape []
-- Tape {unTape = [R,I 0,Rec 1,L]}
-- >>> linearize (More 0 $ More 0 $ Finished :: Kont Int) :: TTape []
-- Tape {unTape = [R,I 0,Rec 4,R,I 0,Rec 1,L]}
-- >>> linearize (More 0 $ More 0 $ More 0 $ Finished :: Kont Int) :: TTape []
-- Tape {unTape = [R,I 0,Rec 7,R,I 0,Rec 4,R,I 0,Rec 1,L]}
If we had a working implementation of linearizeStep
already, then the only thing we would need to do to get this behaviour
is to define an instance of the Recursive
type class for
Kont next
, like so:
instance Recursive (Kont next) where
project ::
Kont next ->
KontF next (Kont next)
More n k) = MoreF n k
project (Finished = FinishedF project
This implementation of project
tells Haskell how a
single layer of a Kont next
value is unrolled into a
KontF next (Kont next)
value. The rest is taken care of by
the cata
function. I can recommend you to read the newly
revised documentation
of the recursion schemes package to get an even better understanding of
the principles behind this approach.
Good, we have a more or less clear picture of how
linearizeStep
is supposed to work. What's missing is an
implementation. Next up: an implementation.
We can formally introduce linearizeStep
like this:
class
ToTokensStep
t :: Type -> Type)
(base :: Type -> Type)
(where
-- | A stepwise linearization of a value of type `base (TTape t)`.
linearizeStep :: To t (base (TTape t))
Like ToTokens
, the ToTokensStep
type class
is parameterized by the type of the token tape, t
. But
instead of the a
type, we've got another parameter,
base
, for its base functor.
I promised oodles of boilerplate code, and I am happy to announce that the waiting is over. We will use datatype-generic programming to implement this class!
Have a look at the following default
implementation:
linearizeStep ::
defaultAlternative t,
( Foldable t,
Generic (base (TTape t)),
GToTokensStep t (Rep (base (TTape t)))
=>
) To t (base (TTape t))
=
linearizeStep
gLinearizeStep. GHC.Generics.from
Of course, that's just a wrapper around gLinearizeStep
,
defined below:
class
GToTokensStep
t :: Type -> Type)
(rep :: Type -> Type)
(where
-- | A generic implementation of `linearizeStep`.
gLinearizeStep :: forall a. To t (rep a)
This follows the
usual
pattern
for datatype-generic programming in Haskell. In particular, this says
that, if our base functor has a Generic
instance with
generic representation Rep (base r)
, then we can obtain a
ToTokensStep
instance (and thus linearizeStep
)
for free. Free is very cheap.
GHC.Generics.from
will convert a base r
value into a Rep (base r)
value. The latter represents
base r
using only generic primitive types. These types are
defined in the GHC.Generics
module and are:
V1
for impossible values (Void
). This is
used for types that have no constructors. We can't represent
Void
in our token tape.U1
for constructors without arguments like
()
or Finished
.K1
for constants like True
or
1
. This is used for constructor arguments. These could be
recursive values.M1
for meta data. This is a wrapper and used to encode
constructor or data type names.(:*:)
for product types. This is used to separate
constructor arguments.(:+:)
for sum types. This is used to encode a choice
between two constructors.If you have never seen these types before, you may want to read some
of the documentation
in the GHC.Generics
module. There are some examples that
will help you understand the types better than I can in this
tutorial.
We only need to specify once what should happen for the six generic
types. For V1
, we can't do anything:
instance GToTokensStep t V1 where
=
gLinearizeStep v `seq` error "GToTokensStep.V1" v
For U1
, we can just ignore it and return an empty token
tape:
instance
Alternative t =>
GToTokensStep t U1
where
= Tape empty gLinearizeStep _
For K1
, we can just delegate to
linearize
:
instance
ToTokens t c =>
GToTokensStep t (K1 i c)
where
= linearize . unK1 gLinearizeStep
When specialized to K1 i Int
, this instance is used to
convert an Int
constant appearing in
KontF Int r
into a tape of a single I
token:
instance
Alternative t =>
ToTokens t Int
where
= pure (I i) linearize i
Moreover, when specialized to K1 i (TTape t)
, the
K1
instance defines what should happen for the
TTape t
constants in KontF next (TTape t)
.
This is the trick that allows us to deal with recursive constructor
arguments:
instance
Alternative t, Foldable t) =>
(ToTokens t (TTape t)
where
=
linearize tape pure (Rec $ length tape) <|> tape
Here we use length
to measure the length of the tape. We
store that length in a Rec
token that we prepend to the
tape using (<|>)
. This length information will be
helpful later when we want to decode the tape back into a value.
For M1
, we can just unwrap the constructor:
instance
GToTokensStep t f =>
GToTokensStep t (M1 i c f)
where
= gLinearizeStep . unM1 gLinearizeStep
For the product (f :*: g)
, we can delegate to the
GToTokensStep
instances of f
and
g
:
instance
Alternative t,
( Foldable t,
GToTokensStep t f,
GToTokensStep t g
=>
) GToTokensStep t (f :*: g)
where
:*: y) =
gLinearizeStep (x <|> gLinearizeStep y gLinearizeStep x
The tapes of the two x :: f a
and y :: g a
values are concatenated using (<|>)
.
Finally, we can define an instance for the sum
(f :+: g)
:
instance
Applicative t,
( Alternative t,
Foldable t,
GToTokensStep t f,
GToTokensStep t g
=>
) GToTokensStep t (f :+: g)
where
L1 x) =
gLinearizeStep (pure L <|> gLinearizeStep x
R1 x) =
gLinearizeStep (pure R <|> gLinearizeStep x
We use pure L
and pure R
to encode the left
and right constructor.
This concludes the definition of GToTokensStep
and the
boilerplaty datatype-generic programming exercise for
ToTokensStep
. Wasn't that fun? There is more to come.
ToTokens
InstancesPerhaps this was lost in the noise, but we can now automatically
generate ToTokens
instances!
For the Kont
data type, this is done in three steps:
Step 1: Ask Haskell to generate a Generic
instance for
Kont
's base functor, KontF
.
deriving stock instance Generic (KontF next r)
Step 2: Obtain a ToTokensStep
instance from the default
implementation.
instance
Alternative t, Foldable t, ToTokens t next) =>
(ToTokensStep t (KontF next)
Step 3: Earn a ToTokens
instance.
instance
ToTokensStep t (KontF next) =>
ToTokens t (Kont next)
With these we can convert a Kont next
value into a
TTape t
value (if we also happen to have a
ToTokens
instance for next
). And we know that
this is true because this is a literate Haskell article, and all
previously seen examples were in fact already working. Surprise!
Originally, we were interested in values of type
Tree Int
. Perhaps you remember. Are we any closer to
linearizing those, too? We are. We can automagically generate now
everything we need.
We defined the base functor KontF
for the
Kont
data type manually. This was a bit tedious, but it
helped us understand base functor types. Now, rather than going through
the trouble of writing our own base functor for Tree
(or
any other data type a
), we can use
makeBaseFunctor
to do this for us.
makeBaseFunctor
is a Template
Haskell function that generates the base functor for the
Tree
type and calls it TreeF
.
'Tree makeBaseFunctor '
This little trick also generates Base
and
Recursive
instances for Tree
, among a few
other things that we don't need to worry about right now.
However, we don't get a Show
or Generic
instance for TreeF
, so let's quickly add those:
deriving stock instance (Show a, Show r) => Show (TreeF a r)
deriving stock instance Generic (TreeF a r)
The Generic
instance opens up the possibility of
auto-generating the ToTokens
instance for
Tree
:
instance
Alternative t, Foldable t, ToTokens t a) =>
(ToTokensStep t (TreeF a)
instance
ToTokensStep t (TreeF a) =>
ToTokens t (Tree a)
And that's it! Let's see what we can do with this:
-- >>> linearize (Nil :: Tree Int) :: TTape []
-- Tape {unTape = [L]}
-- >>> linearize (Node Nil 0 Nil :: Tree Int) :: TTape []
-- Tape {unTape = [R,Rec 1,L,I 0,Rec 1,L]}
-- >>> linearize (Node (Node Nil 0 Nil) 1 (Node Nil 2 Nil) :: Tree Int) :: TTape []
-- Tape {unTape = [R,Rec 6,R,Rec 1,L,I 0,Rec 1,L,I 1,Rec 6,R,Rec 1,L,I 2,Rec 1,L]}
-- >>> linearize exampleTree :: TTape []
-- Tape {unTape = [R,Rec 16,R,Rec 6,R,Rec 1,L,I 1,Rec 1,L,I 2,Rec 6,R,Rec 1,L,I 3,Rec 1,L,I 4,Rec 16,R,Rec 6,R,Rec 1,L,I 5,Rec 1,L,I 6,Rec 6,R,Rec 1,L,I 7,Rec 1,L]}
There you have it, we can flatten binary trees and store them in tapes of tokens. Cool stuff!
How can we go back from a TTape t
value to a
Tree
value?
The answer is parsing. Many parsing libraries exist for
Haskell, but we will use none of them, because we need a lot less than
what they offer. Instead, we will use a minimal approach to parsing
based on the good old state monad transformer, StateT
. We
know it well from the previous
article.
It is a little-known fact that StateT
already provides
all that we need to implement a monadic parser. It
even supports backtracking. This may be surprising, since
StateT s b a
is just a newtype
wrapper around
s -> b (a, s)
, where s
is the state's type,
and b
is the type of some inner monad. Why should this
matter for parsing? Well, that's because, at its most fundamental level,
a parser for things a
is a function from strings
s
to lists b ~ []
of pairs (a, s)
of things and strings. That's a little Seussian rhyme I borrowed
from Fritz
Ruehr. It means that if we have a string s
and a parser
StateT s b a
with b ~ []
, then running the
parser on s
will return:
a
from
any prefix of the input string s
(including the empty
string) ors
,
where each pair in the list belongs to one alternative parse of
s
. The first part of a pair is the parsing result,
a
, and the second part is the unconsumed remainder of the
input string.There may be a very long list of alternatives, but for
b ~ []
those are lazily evaluated. This is why we can think
of StateT s [] a
as a parser with backtracking. If we don't
want backtracking, we can use StateT s Maybe a
instead.
Then we will only ever get zero or one parse. If we get
Nothing
, the parse failed. If we get Just
, the
parse succeeded. For b ~ Maybe
, we can never explore more
than one alternative. We are greedily parsing, and committing to the
first alternative that succeeds is a final decision. b
(for
"backtracking") should always be a monad with a MonadPlus
instance for supporting choice (mplus
) and failure
(mzero
). []
, Maybe
, and
LogicT
from Control.Monad.Logic
fulfil this requirement, but there are many monads that do not.
In Haskell, a string is a list of characters. Here, we have a tape of tokens. If we want to parse a tape of tokens, then we should be able to do that with this state monad transformer:
type From b t a = StateT (TTape t) b a
This is the counterpart to To t a
that we have been
using to flatten trees into tapes of tokens. To go the other way, we
need to define a value of type From b t a
. It will need to
be made such that it is compatible with how we defined
To t a
above and undoes the flattening we engineered there.
We will build this value from the ground up starting with the simplest
parser we can write down:
-- | A parser that consumes a single token from the tape and returns it.
token ::
forall b t.
MonadFail b,
( Cons (TTape t) (TTape t) Token Token
=>
) From b t Token
= do
token <- get
t case uncons t of
Nothing -> fail "unexpected end of input"
Just (x, xs) -> put xs >> pure x
This parser just tries to take the first token from the tape and
yields it, no matter what the token is. If there are no tokens left, it
fails. The MonadFail
constraint is needed for the
fail
function, and the Cons
constraint is
needed for the uncons
function.
The second most simple parser we can write is one that consumes a single token and returns it if and only if it matches a given predicate:
-- | A parser that matches a given token and returns it.
isToken ::
forall b t.
MonadFail b,
( MonadPlus b,
Cons (TTape t) (TTape t) Token Token
=>
) Token ->
From b t Token
= mfilter (== t) token isToken t
The mfilter
function is a monadic version of
filter
and provided by the MonadPlus
requirement.
These two parsers, token
and isToken
, will
turn out to be everything we need. We will use combinator
functions to compose them again and again until we get to the final
parser that solves our problem. The combinators will mostly be provided
by the Alternative
and MonadPlus
instances for
From b t
. This will become much clearer in the next
section. It's all about the combinators from here. There is documentation
on the subject for those who are interested, but it should not be
necessary to read this to understand the rest of this article.
We'd like to be able to go back and forth between token tapes and
Tree
values:
-- | `parse` is the inverse of `linearize`.
-- prop> \tree -> evalStateT parse (linearize @[] tree) == Just (tree :: Tree Int)
This is a there-and-back-again property. It says that, if we
have a Tree Int
value, then we can first linearize it into
a token tape, and then parse it back into the same Tree Int
value we started with. No treasure is lost or gained by this process.
Not even a small chest.
The function parse
returns the parser we need. A formal
definition of parse
is:
class
FromTokens
b :: Type -> Type)
(t :: Type -> Type)
(a :: Type)
(where
-- | Parse a value of type `a` from a list of tokens.
parse :: From b t a
We parameterize FromTokens
on the backtracking monad,
b
, the tape type, t
, and the type of the value
we want to parse, a
. Like linearize
,
parse
has an annoyingly opaque default
implementation:
parse ::
defaultCorecursive a,
( Monad b,
Traversable (Base a),
FromTokensStep b t (Base a)
=>
) From b t a
= go
parse where
=
go fmap embed $
parseStep>>= traverse (resetParse go)
Let's take this apart, and see what it does. The helper
go
replaces cata
in the default
implementation of linearize
from before. go
is
a recursive descent parser that repeatedly calls the stepwise parser
parseStep
. This parser comes from the
FromTokensStep
constraint and has the type:
parseStep :: From b t (Base a (TTape t))
. We haven't
defined parseStep
and FromTokensStep
yet, but
we will shortly. parseStep
is a parser that returns a base
functor for the type a
, where unused tokens are wrapped in
token tapes TTape t
that appear in the recursive positions
of a
in a
. Those tapes are then parsed by
parseStep
again and again, until we get a base functor
value that contains no token tapes (for TreeF
, that would
be NilF
). If we naively glued the base functors coming out
of this recursion together, we would get a value of type
Base a (Base a (Base a (Base ... )))
. However, we cannot
work with this type directly, because it would depend on the runtime
value of the token tape: the more nested the encoded value, the more
nested the type. Instead, we need to incrementally roll the functors up
into an a
value. We can do this by using the
Corecursive
constraint, which is the counterpart to
Recursive
from before. Corecursive
gives us
embed :: Base a a -> a
, the inverse of
project
, which is exactly what we need.
-- | Run a parser on a tape, and
-- lift the result(s) into the parent parsing scope.
-- Unused tokens are discarded.
resetParse :: Monad b =>
From b t a -> TTape t -> From b t a
= lift . evalStateT m resetParse m
class
FromTokensStep
b :: Type -> Type)
(t :: Type -> Type)
(base :: Type -> Type)
(where
-- | A stepwise parser of a value of type `base (TTape t)`.
parseStep :: From b t (base (TTape t))
parseStep ::
defaultFunctor b,
( Generic (base (TTape t)),
GFromTokensStep b t (Rep (base (TTape t)))
=>
) From b t (base (TTape t))
= to <$> gParseStep
parseStep
class
GFromTokensStep
b :: Type -> Type)
(t :: Type -> Type)
(rep :: Type -> Type)
(where
-- | A generic implementation of `parseStep`.
gParseStep :: forall a. From b t (rep a)
instance
MonadFail b =>
GFromTokensStep b t V1
where
= fail "GFromTokensStep.V1"
gParseStep
instance
Monad b =>
GFromTokensStep b t U1
where
= pure U1
gParseStep
instance
MonadFail b,
( MonadPlus b,
Cons (TTape t) (TTape t) Token Token,
GFromTokensStep b t f,
GFromTokensStep b t g
=>
) GFromTokensStep b t (f :+: g)
where
=
gParseStep L >> L1 <$> gParseStep)
(isToken <|> (isToken R >> R1 <$> gParseStep)
instance
MonadFail b,
( MonadPlus b,
Cons (TTape t) (TTape t) Token Token,
GFromTokensStep b t f,
GFromTokensStep b t g
=>
) GFromTokensStep b t (f :*: g)
where
=
gParseStep :*:)
(<$> gParseStep
<*> gParseStep
instance
Monad b, FromTokens b t c) =>
(GFromTokensStep b t (K1 i c)
where
= K1 <$> parse
gParseStep
instance
Functor b, GFromTokensStep b t f) =>
(GFromTokensStep b t (M1 i c f)
where
= M1 <$> gParseStep gParseStep
instance
MonadFail b,
( MonadPlus b,
Cons (t Token) (t Token) Token Token,
Alternative t,
FromTokens b t a
=>
) FromTokensStep b t (TreeF a)
instance
Monad b, FromTokensStep b t (TreeF a)) =>
(FromTokens b t (Tree a)
instance
MonadFail b,
( Cons (TTape t) (TTape t) Token Token
=>
) FromTokens b t Int
where
=
parse >>= \case
token I i -> pure i
-> fail "expected Int"
_
instance
MonadFail b,
( Alternative t,
Cons (TTape t) (TTape t) Token Token
=>
) FromTokens b t (TTape t)
where
=
parse >>= \case
token Rec n -> go n
where
go :: Int -> From b t (TTape t)
0 = pure empty
go = cons <$> token <*> go (n' - 1)
go n' -> fail "expected Rec" _
data NextF a r = FirstF r | SecondF a | ThirdF r
accumTree'''''''' ::
forall t a.
Alternative t,
( Foldable t,
Monoid a,
ToTokens t a,
FromTokens Maybe t a,
Cons (t Token) (t Token) Token Token
=>
) StateT (TTape t, Stack (NextF a (TTape t))) (Writer a) ()
=
accumTree'''''''' $ do
while <- fromJust . evalStateT parseStep <$> zoom _1 get
treeF case treeF of
NilF -> do
<- zoom _2 pop
c case c of
Just (FirstF leftF) -> do
$ put leftF
zoom _1 pure Continue
Just (SecondF contentF) -> do
lift (tell contentF)$ put (linearizeStep $ NilF @a)
zoom _1 pure Continue
Just (ThirdF rightF) -> do
$ put rightF
zoom _1 pure Continue
Nothing -> pure Break
NodeF {..} -> do
$ push (ThirdF rightF)
zoom _2 $ push (SecondF contentF)
zoom _2 $ push (FirstF leftF)
zoom _2 $ put (linearizeStep $ NilF @a)
zoom _1 pure Continue
'Sum
makeBaseFunctor '
deriving stock instance (Show a, Show r) => Show (SumF a r)
deriving stock instance Generic (SumF a r)
instance
Alternative t,
( Foldable t,
ToTokens t a
=>
) ToTokensStep t (SumF a)
instance
ToTokensStep t (SumF a) =>
ToTokens t (Sum a)
instance
Monad b,
( Alternative t,
Foldable t,
FromTokens b t a
=>
) FromTokensStep b t (SumF a)
instance
Monad b, FromTokensStep b t (SumF a)) =>
(FromTokens b t (Sum a)
-- | Calculate the sum of the content values of the linearized example tree.
-- >>> sumTree''''''''
-- Sum {getSum = 28}
sumTree'''''''' :: Sum Int
=
sumTree'''''''' $
execWriter
runStateT@Vector)
(accumTree'''''''' $ Sum <$> exampleTree, []) (linearize
Mar 10, 2022
Say hello to Dex and see how Dex greets you.
Jan 20, 2022
Have you ever wanted to write a recursive function and wondered what would happen if someone took away recursion from Haskell? Say goodbye to recursive function calls, say goodbye to recursive data types. How sad Haskell would be without them! I'm sure that thought must have occured to you -- if not, what are you even doing here?! Well, this article has you covered should that day ever come. After reading it, you will know how to write a recursive function that doesn't recurse.