You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the third homework one of the exercises is about implementing morphological analysis. It is unclear to me how to proceed with it. When we parse sentences, then the whitespaces determine the tag boundaries. But if all we get is one word, then what are the boundaries? Letters? Should we write a CFG grammar that accepts the input one letter at a time? Or are we supposed to break the word into chunks (leg, obb, at, et, etc...) based on some logic, and then apply the parsing on it? The Tree provided as an example implies the latter, but that chunking is basically almost the parsing itself... Besides, without the underlying automaton, if we look at the surface form, then the 'tag' boundaries can be ambiguous. So the 'some logic' can be very complex, for example a full FST... So, I am simply not sure what the exercise demands from me.
The text was updated successfully, but these errors were encountered:
dobijan
changed the title
A question regarding CFG morphological analyis
A question regarding CFG morphological analysis
Nov 27, 2017
Whitespaces don't determine the tag boundaries. If you look closely, you will see that we called .split() on the inputs to the parser exactly because of that. However, the parse() method expects a sequence of tokens, so for the Arithmetics exercise, these are equivalent:
aparser.parse('1 - 2 / ( 3 - 4 )'.split()) # listaparser.parse(['1', '-', '2', '/', '(', '3', '-', '4', ')']) # same as the previous one, actuallyaparser.parse('1-2/(3-4)') # string
So you can see that some parsing is going on. Two hints:
think about what your terminal symbols should be
your grammar doesn't have to be CNF for this exercise
I decided to consider letters as base tokens. I wrote a letter-based grammar. Due to this the parse tree was a little cluttered, but then I postprocessed that tree to collapse sibling letter terminals into word parts. It works, I don't know whether this is the best solution, but the alternative seemed far more complex (somehow universally find the word parts in the word without an FSA, regexes for this purpose would be quite numerous and long). This way at least, only the grammar defines the word parts, and this information is not split between two distinct parts of the program.
I'm interested whether the others take a different approach.
In the third homework one of the exercises is about implementing morphological analysis. It is unclear to me how to proceed with it. When we parse sentences, then the whitespaces determine the tag boundaries. But if all we get is one word, then what are the boundaries? Letters? Should we write a CFG grammar that accepts the input one letter at a time? Or are we supposed to break the word into chunks (leg, obb, at, et, etc...) based on some logic, and then apply the parsing on it? The Tree provided as an example implies the latter, but that chunking is basically almost the parsing itself... Besides, without the underlying automaton, if we look at the surface form, then the 'tag' boundaries can be ambiguous. So the 'some logic' can be very complex, for example a full FST... So, I am simply not sure what the exercise demands from me.
The text was updated successfully, but these errors were encountered: