Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Disambiguation production #63

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

expipiplus1
Copy link

Definitely WIP at the moment,

The thrust of it is that the continuations are explicitly [a] -> [b] now, which for most productions is just a map, however for disambiguation, this structure is exposed.

Currently only nonterminals work, as, for example, the Pure case runs the continuation once for every individual element. What needs to change would be to maintain a list of a, and defer applying args. I had a quick look at jiggling things around towards that end without luck.

@expipiplus1
Copy link
Author

If this would be desirable even for just nonterminals, then I'd be happy to neaten things up from the state it's in at the moment.

@expipiplus1
Copy link
Author

Ok, I neatened it up a little and limited disambiguation to nonterminals.

Motivating example parsing an expression without precedence

-- λ> :main "A + B * C * G"
-- Ambiguous
-- ├╴ ((A+B)*(C*G))
-- ├╴ *
-- │  ├╴ Ambiguous
-- │  │  ├╴ ((A+B)*C)
-- │  │  └╴ (A+(B*C))
-- │  └╴ G
-- └╴ +
--    ├╴ A
--    └╴ Ambiguous
--       ├╴ (B*(C*G))
--       └╴ ((B*C)*G)

I'm sure there's still some neatening + testing to do!

@expipiplus1
Copy link
Author

As a further simplification to the implementation, I wonder if we could just make NonTerminal :: !(r e t a) -> !(Prod r e t ([a] -> [b])) -> Prod r e t b...

@expipiplus1
Copy link
Author

Ah, instead of making disambiguate operate in Grammar we can make the nonterminal 'live' during parse with something like this:

    Disamb p d -> do
      r <- mkRule p
      scont' <- newConts =<< newSTRef [Cont C.id d args scont]
      parse (State (NonTerminal r (pure id)) C.id pos scont' : ss) env

I don't think this should be a performance disadvantage compared to running in Grammar, however it feels bad putting pure id as the right prod for NonTerminal (I guess this would be solved by the previous comment). This also means that we end up calling the disamb function with an empty list some of the time which is also a little odd.

@ollef
Copy link
Owner

ollef commented May 14, 2023

This is a neat idea. I had a little think about the API and implementation:

If we had

expose :: Prod r e t a -> Prod r e t [a]

and

conceal :: Prod r e t [a] -> Prod r e t a

(but perhaps with better chosen names), then it seems like we should be able to implement

disambiguate :: ([a] -> [b]) -> Prod r e t a -> Prod r e t b
disambiguate f = conceal . fmap f . expose

without it needing to be in grammar. This also seems to solve #62, because we can implement

constraint :: (a -> Bool) -> Prod r e t a -> Prod r e t a
constraint = disambiguate . filter

This is possible if we tweak the Prod type e.g. as follows (along the lines of one of your comments):

data Prod r e t a where
  -- Applicative.
  Terminal    :: !(t -> Maybe a) -> !(Prod r e t (a -> [b])) -> Prod r e t b
  NonTerminal :: !(r e t a) -> !(Prod r e t (a -> [b])) -> Prod r e t b
  Pure        :: [a] -> Prod r e t a
  -- Monoid/Alternative. We have to special-case 'many' (though it can be done
  -- with rules) to be able to satisfy the Alternative interface.
  Alts        :: ![Prod r e t a] -> !(Prod r e t (a -> [b])) -> Prod r e t b
  Many        :: !(Prod r e t a) -> !(Prod r e t ([a] -> [b])) -> Prod r e t b
  -- Error reporting.
  Named       :: !(Prod r e t a) -> e -> Prod r e t a

because then we get

expose :: Prod r e t a -> Prod r e t [a]
expose (Terminal b p)    = Terminal b (fmap (fmap pure) p)
expose (NonTerminal r p) = NonTerminal r (fmap (fmap pure) p)
expose (Pure x)          = Pure [x]
expose (Alts as p)       = Alts as (fmap (fmap pure) p)
expose (Many p q)        = Many p (fmap (fmap pure) q)
expose (Named p n)       = Named (expose p) n

conceal :: Prod r e t [a] -> Prod r e t a
conceal (Terminal b p)    = Terminal b $ fmap (fmap concat) p
conceal (NonTerminal r p) = NonTerminal r $ fmap (fmap concat) p
conceal (Pure as)         = Pure $ concat as
conceal (Alts as p)       = Alts as $ fmap (fmap concat) p
conceal (Many p q)        = Many p $ fmap (fmap concat) q
conceal (Named p n)       = Named (conceal p) n

but I have not tried updating the other modules (parser etc).

Maybe it breaks down somewhere.

@expipiplus1
Copy link
Author

expipiplus1 commented May 15, 2023

I like expose and conceal a lot! Much nicer than smashing it all into [a] -> [b].

It makes me worry a little bit from a performance point of view to be passing lists around everywhere in Prod and I feel like there's basically no chance of any fusion taking place in parse.

Just as an aside, @ollef, did you ever experiment with using Data.Array.ST for anywhere lists are used in parse? In fact, I wonder if this would be just slightly better done with Seq (or even Tree) given all the concatting... Must benchmark after

@expipiplus1
Copy link
Author

mmm, I think that keeping this to nonterminals might be the pragmatic thing to do, with your scheme above it doesn't seem possible to push Pure fs down into the leaves of Alts for example.

Attaching to nonterminals only seems important wrt parse as written, as they're the place where new continuations are made which act as "join" points in the continuation chain.

@expipiplus1
Copy link
Author

I haven't thought it through, but I wonder if it would be possible to push disambiguations all the way down to Alts, and have it be Alts :: ![Prod r e t a] -> !(Prod r e t ([a] -> b)) -> Prod r e t b

@expipiplus1
Copy link
Author

expipiplus1 commented Jul 7, 2023

hmm, I've got a testcase where:

This works, and d is called with a list of length 2:

  xx <-
    disambiguate d $
      (Foo <$ anyToken)
        <|> (Bar <$ anyToken)

This doesn't work, d is called twice with a list of length 1 and the whole parse returns 2 results:

  xx <-
    disambiguate d $
      (Foo <$ many anyToken)
        <|> (Bar <$ anyToken)

Sadly I've long since flushed the context needed to easily understand what's going on :(

...

Oh, I still see several todos in the code, I guess I'm just hitting one of these cases past-me foresaw.

...

On a hunch I removed the first case in simplifyCont and this seems to sort things for at least this example, don't quite understand why though yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants