WIP Disambiguation production #63

expipiplus1 · 2023-05-12T14:20:32Z

Definitely WIP at the moment,

The thrust of it is that the continuations are explicitly [a] -> [b] now, which for most productions is just a map, however for disambiguation, this structure is exposed.

Currently only nonterminals work, as, for example, the Pure case runs the continuation once for every individual element. What needs to change would be to maintain a list of a, and defer applying args. I had a quick look at jiggling things around towards that end without luck.

…ance

expipiplus1 · 2023-05-12T14:33:05Z

If this would be desirable even for just nonterminals, then I'd be happy to neaten things up from the state it's in at the moment.

Unhandled in parse at the moment

expipiplus1 · 2023-05-13T05:34:32Z

Ok, I neatened it up a little and limited disambiguation to nonterminals.

Motivating example parsing an expression without precedence

-- λ> :main "A + B * C * G"
-- Ambiguous
-- ├╴ ((A+B)*(C*G))
-- ├╴ *
-- │  ├╴ Ambiguous
-- │  │  ├╴ ((A+B)*C)
-- │  │  └╴ (A+(B*C))
-- │  └╴ G
-- └╴ +
--    ├╴ A
--    └╴ Ambiguous
--       ├╴ (B*(C*G))
--       └╴ ((B*C)*G)

I'm sure there's still some neatening + testing to do!

expipiplus1 · 2023-05-13T05:41:56Z

As a further simplification to the implementation, I wonder if we could just make NonTerminal :: !(r e t a) -> !(Prod r e t ([a] -> [b])) -> Prod r e t b...

expipiplus1 · 2023-05-14T14:38:21Z

Ah, instead of making disambiguate operate in Grammar we can make the nonterminal 'live' during parse with something like this:

    Disamb p d -> do
      r <- mkRule p
      scont' <- newConts =<< newSTRef [Cont C.id d args scont]
      parse (State (NonTerminal r (pure id)) C.id pos scont' : ss) env

I don't think this should be a performance disadvantage compared to running in Grammar, however it feels bad putting pure id as the right prod for NonTerminal (I guess this would be solved by the previous comment). This also means that we end up calling the disamb function with an empty list some of the time which is also a little odd.

ollef · 2023-05-14T21:16:28Z

This is a neat idea. I had a little think about the API and implementation:

If we had

expose :: Prod r e t a -> Prod r e t [a]

and

conceal :: Prod r e t [a] -> Prod r e t a

(but perhaps with better chosen names), then it seems like we should be able to implement

disambiguate :: ([a] -> [b]) -> Prod r e t a -> Prod r e t b
disambiguate f = conceal . fmap f . expose

without it needing to be in grammar. This also seems to solve #62, because we can implement

constraint :: (a -> Bool) -> Prod r e t a -> Prod r e t a
constraint = disambiguate . filter

This is possible if we tweak the Prod type e.g. as follows (along the lines of one of your comments):

data Prod r e t a where
  -- Applicative.
  Terminal    :: !(t -> Maybe a) -> !(Prod r e t (a -> [b])) -> Prod r e t b
  NonTerminal :: !(r e t a) -> !(Prod r e t (a -> [b])) -> Prod r e t b
  Pure        :: [a] -> Prod r e t a
  -- Monoid/Alternative. We have to special-case 'many' (though it can be done
  -- with rules) to be able to satisfy the Alternative interface.
  Alts        :: ![Prod r e t a] -> !(Prod r e t (a -> [b])) -> Prod r e t b
  Many        :: !(Prod r e t a) -> !(Prod r e t ([a] -> [b])) -> Prod r e t b
  -- Error reporting.
  Named       :: !(Prod r e t a) -> e -> Prod r e t a

because then we get

expose :: Prod r e t a -> Prod r e t [a]
expose (Terminal b p)    = Terminal b (fmap (fmap pure) p)
expose (NonTerminal r p) = NonTerminal r (fmap (fmap pure) p)
expose (Pure x)          = Pure [x]
expose (Alts as p)       = Alts as (fmap (fmap pure) p)
expose (Many p q)        = Many p (fmap (fmap pure) q)
expose (Named p n)       = Named (expose p) n

conceal :: Prod r e t [a] -> Prod r e t a
conceal (Terminal b p)    = Terminal b $ fmap (fmap concat) p
conceal (NonTerminal r p) = NonTerminal r $ fmap (fmap concat) p
conceal (Pure as)         = Pure $ concat as
conceal (Alts as p)       = Alts as $ fmap (fmap concat) p
conceal (Many p q)        = Many p $ fmap (fmap concat) q
conceal (Named p n)       = Named (conceal p) n

but I have not tried updating the other modules (parser etc).

Maybe it breaks down somewhere.

expipiplus1 · 2023-05-15T02:30:10Z

I like expose and conceal a lot! Much nicer than smashing it all into [a] -> [b].

It makes me worry a little bit from a performance point of view to be passing lists around everywhere in Prod and I feel like there's basically no chance of any fusion taking place in parse.

Just as an aside, @ollef, did you ever experiment with using Data.Array.ST for anywhere lists are used in parse? In fact, I wonder if this would be just slightly better done with Seq (or even Tree) given all the concatting... Must benchmark after

expipiplus1 · 2023-05-15T05:46:19Z

mmm, I think that keeping this to nonterminals might be the pragmatic thing to do, with your scheme above it doesn't seem possible to push Pure fs down into the leaves of Alts for example.

Attaching to nonterminals only seems important wrt parse as written, as they're the place where new continuations are made which act as "join" points in the continuation chain.

expipiplus1 · 2023-05-15T06:03:00Z

I haven't thought it through, but I wonder if it would be possible to push disambiguations all the way down to Alts, and have it be Alts :: ![Prod r e t a] -> !(Prod r e t ([a] -> b)) -> Prod r e t b

expipiplus1 · 2023-07-07T09:44:14Z

hmm, I've got a testcase where:

This works, and d is called with a list of length 2:

  xx <-
    disambiguate d $
      (Foo <$ anyToken)
        <|> (Bar <$ anyToken)

This doesn't work, d is called twice with a list of length 1 and the whole parse returns 2 results:

  xx <-
    disambiguate d $
      (Foo <$ many anyToken)
        <|> (Bar <$ anyToken)

Sadly I've long since flushed the context needed to easily understand what's going on :(

...

Oh, I still see several todos in the code, I guess I'm just hitting one of these cases past-me foresaw.

...

On a hunch I removed the first case in simplifyCont and this seems to sort things for at least this example, don't quite understand why though yet.

Add some comments and make ResultsCont a newtype with a Category inst…

3999264

…ance

expipiplus1 added 3 commits May 13, 2023 11:50

Add Disamb production

74d0c9a

Unhandled in parse at the moment

Make ResultsCont operate over lists

886eec4

Add disambiguating nonterminal

ec21f6f

expipiplus1 force-pushed the disamb2 branch from ceba2b8 to 63e5523 Compare May 13, 2023 05:29

Add disambiguation example

14f3763

expipiplus1 force-pushed the disamb2 branch from 63e5523 to 14f3763 Compare May 13, 2023 05:33

export disambiguate from Text.Earley

82771fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Disambiguation production #63

WIP Disambiguation production #63

expipiplus1 commented May 12, 2023

expipiplus1 commented May 12, 2023

expipiplus1 commented May 13, 2023

expipiplus1 commented May 13, 2023

expipiplus1 commented May 14, 2023

ollef commented May 14, 2023

expipiplus1 commented May 15, 2023 •

edited

Loading

expipiplus1 commented May 15, 2023

expipiplus1 commented May 15, 2023

expipiplus1 commented Jul 7, 2023 •

edited

Loading

WIP Disambiguation production #63

Are you sure you want to change the base?

WIP Disambiguation production #63

Conversation

expipiplus1 commented May 12, 2023

expipiplus1 commented May 12, 2023

expipiplus1 commented May 13, 2023

expipiplus1 commented May 13, 2023

expipiplus1 commented May 14, 2023

ollef commented May 14, 2023

expipiplus1 commented May 15, 2023 • edited Loading

expipiplus1 commented May 15, 2023

expipiplus1 commented May 15, 2023

expipiplus1 commented Jul 7, 2023 • edited Loading

expipiplus1 commented May 15, 2023 •

edited

Loading

expipiplus1 commented Jul 7, 2023 •

edited

Loading