Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R2L handling of semantic roles in passive sentences #89

Open
sebastianruder opened this issue Jul 8, 2014 · 21 comments
Open

R2L handling of semantic roles in passive sentences #89

sebastianruder opened this issue Jul 8, 2014 · 21 comments

Comments

@sebastianruder
Copy link
Contributor

Hi all,
For a sentence such as "The jar is filled with yellow marbles", the passive-rule2 produces the following EvaluationLink:

(EvaluationLink (stv 0.990000 0.990000)
  (PredicateNode "filled@71815ebd-87ec-4c3f-a8ee-9410da13822f") ; [2934]
  (ListLink (stv 0.990000 0.990000)
    (VariableNode "$x") ; [2942]
    (ConceptNode "jar@bae428c8-6e7b-4259-9585-b94f6bebed3f") ; [2939]
  ) ; [2943]
) ; [2944]

The VariableNode would bind an agent, if it was present, e.g. "by Peter". There would need to be another representation to involve the marbles, such as making the verb phrasal:

(EvaluationLink (stv 0.990000 0.990000)
  (PredicateNode "filled_with@71815ebd-87ec-4c3f-a8ee-9410da13822f") ; [2934]
  (ListLink (stv 0.990000 0.990000)
    (VariableNode "$x") ; [2942]
    (ConceptNode "jar@bae428c8-6e7b-4259-9585-b94f6bebed3f") ; [2939]
    (ConceptNode "marbles@1f78f01f-624d-427b-9051-1c29e94b6f0b")
  ) ; [2943]
) ; [2944]

What are your thoughts on this?

@sebastianruder sebastianruder changed the title R2L handling of agent in passive sentences R2L handling of semantic roles in passive sentences Jul 8, 2014
@linas
Copy link
Member

linas commented Jul 8, 2014

It almost surely best to avoid three-point functions like this. I think you want:

filled_by (jar, $x)

where $x could be "Peter"

also, the passive-rule-2 seems backwards to me, I'm thinking it should be

filled(jar, $x) where $x could be marbles

should have @ruiting and @Rodas comment

@rodsol
Copy link
Contributor

rodsol commented Jul 9, 2014

Hi
Passive -rule-1 is for cases like
"The jar is filled with yellow marbles by Peter."
r2l output :

EvaluationLink
PredicateNode "filled@123"
ListLink
ConceptNode "Peter@345"
ConceptNode "jar@789"

and
Passive-rule-2 is applied when we don't know by whom the action is done
"The jar is filled with yellow marbles."

EvaluationLink
PredicateNode "filled@123"
ListLInk
VariableNode "$x"
ConceptNode "jar@789"

@rodsol
Copy link
Contributor

rodsol commented Jul 9, 2014

and for
with(fill, marbles)
it should be :


EvaluationLink
PredicateNode "with@11"
ListLink
PredicateNode "filled@123"
ConceptNode "marbles@987"


@ruiting comment

@williampma
Copy link
Member

The restriction tense($A, past_passive) is actually stopping passive-rule1 from being applied for "The jar is filled with yellow marbles by Peter." or "The jar is being filled with yellow marbles by Peter."

@rodsol
Copy link
Contributor

rodsol commented Jul 9, 2014

yes , i just fixed it
rodsol@cc20990

On Wed, Jul 9, 2014 at 11:26 AM, William Ma [email protected]
wrote:

The restriction tense($A, past_passive) is actually stopping passive-rule1
from being applied for "The jar is filled with yellow marbles by Peter." or
"The jar is being filled with yellow marbles by Peter."


Reply to this email directly or view it on GitHub
#89 (comment).

@rodsol
Copy link
Contributor

rodsol commented Jul 9, 2014

@williampma
it seems PASSIVE1-1 and PASSIVE1 are redundant
is there anyway the two rules can be merge ?

@williampma
Copy link
Member

Yeah, I agree it's a bit redundant.

How about

_obj($A,$B) & by($A,$C) & tense($A,$type) => (passive-rule1 $type ...)

and check $type in the scheme function to see it equals "******_passive", and do nothing otherwise.

@williampma
Copy link
Member

I can actually think of a way to allow regular expression like

_obj($A,$B) & by($A,$C) & tense($A, .*\Qpassive\E) => ...

though I am not sure how often this functionality is needed.

@sebastianruder
Copy link
Contributor Author

If we don't want to use three-point-functions, then we would need a rule to deal with with(fill, marbles) and -- as @rodsol proposed -- produce

EvaluationLink
    PredicateNode "with@11"
    ListLink
        PredicateNode "filled@123"
        ConceptNode "marbles@987"

To avoid redundancy, we should have a rule that could deal with all such relations, e.g. through(x, y), about(x, y), etc. and produce a PredicateNode for them. What do you say?

@linas
Copy link
Member

linas commented Jul 9, 2014

So I assume that "filled by Peter" becomes

EvaluationLink
    PredicateNode "by@211"
    ListLink
        PredicateNode "filled@123"
        ConceptNode "Peter@1987"

@sebastianruder
Copy link
Contributor Author

@linas, with that logic, yes. Although I think we would want to limit the scope to a certain set of terms since in the case of "by", Peter should rather occupy the first argument position of the EvaluationLink. What do you think?

@linas
Copy link
Member

linas commented Jul 9, 2014

  1. I'm sorry that I ever said "filled_by" in the early remark, I wasn't thinking.

  2. by(filled, Peter) and with(filled, marbles) is the correct order. In both cases, "filled" is the "prepostiional subject", and "Peter/marbles" is the "prepositional object". As a general rule, the parent comes first, the dependent comes after.

Some systems would write this as _psubj(by, filled) _pobj(by, Peter) _psubj(with, filled) _pobj(with, marbles) but relex doesn't do this by default (it will if you turn on stanford compatibility mode).

Which of these styles we should use in r2l .. I don't know. Which might be easier to reason with .. I don't know. Which of these is easier for PLN?

Some years ago, I actually used the form "filled_by(jar, peter) which seemed like a good idea at the time, but fell apart when I needed to say "was slowly filled by"

@sebastianruder
Copy link
Contributor Author

I think it would be useful for PLN to supplement what we already have. In the current implementation, there is no strict differentiation between semantic roles, objects, etc. For me, if we want to avoid three-point-functions, intuitively, it feels most natural to add EvaluationLinks with new PredicateNodes such as "through", "with", etc. to provide additional information. I don't see how this information could otherwise be fit in without altering the current semantics of nodes and links. I would only do this approach, though, for a small set of needed expressions at first as IMO particularly phrasal verbs are still something that should be discussed more in-depth. I don't think that for every phrasal verb, the particle should be split off as a separate PredicateNode, but rather that they should form an entity together, e.g. "look_up", "look_for", etc. Or maybe I'm confusing two things here: Phrasal verbs which can only work with the particle; and adjuncts, e.g. "with marbles", "by Peter". These should be handled differently.
On a related note, would you say that there should be a differentiation between adjuncts and complements since there can only be one complement but a potentially infinite number of adjuncts? @cosmoharrigan, what is your take on this whole issue? How should we capture adjuncts such as "by Peter", "with marbles", etc. to be able to deal with them with PLN?

@linas
Copy link
Member

linas commented Jul 10, 2014

Hi Rodas,

I'm saying two distinct things:

  1. Since you are one of the people working on r2l I wanted you to review Sebastian's work and make sure it does not conflict with your plans.

  2. I was saying that, if one designs some of these structures poorly, then attaching an adjunct becomes impossible. So, as you look for a data structure for "look_for", you should think about how to represent "superficially look for". This was a mistake I once made. Which is why reviews are needed.

@sebastianruder
Copy link
Contributor Author

@linas, one remark: Here you mentioned that it is best to avoid three-point-functions. How about ditransitive verbs, e.g. sell(x,y,z), though. Surely they would require three-point-functions, right?

@linas
Copy link
Member

linas commented Jul 13, 2014

Hi @sebastianruder .. yes, and no and it depends on context. At the 'surface dependency parse' level, we can avoid 3-point functions, since sell(x,y,z) is equivalent to _subj(sell, x) _obj(sell, y) _iobj(sell, z). The latter representation is nice because it easily allows for verb-modifiers: e.g. softly-sell adds one more relation _advmod(sell, softly). With the 3-point function, the same modification gets messy and confusing: should it be sell(x,y,z,w) with w==softly? Should it be _advmod(sell(x,y,z), w)? should it be a different function entirely, called softly_sell(x,y,z)? Whichever scheme you pick, it seems that one can find some sentence that either breaks the scheme or forces even more complexity. So: "he casually but quickly sold the iPhone to John", how do I conjoin the two modifiers? "He sold the iPhone to John in a casual but quick manner" what does the preposition "in" connect to? would it be in(sell(x,y,z),w)? something else? All these have standard answers in the 2-point case. That's why I was recommending against it.

However... it's widely known that ditransitive verbs have an arity of 3, (named subj, obj and iobj) and, at deeper, semantic layers of analysis, you want to have all three parts tied together (or at least, its very convenient). It can simplify the use of "lexical functions" in meaning-text-theory, for example.

In r2l, we are translating from the 'surface dependency parse' level to some deeper quasi-semantic level, on which we hope to apply PLN. So the question then becomes: "is it easier to use one single glob sell(x,y,z), or is it easier to use three relations _subj(sell, x) _obj(sell, y) _iobj(sell, z)?" I don't know the answer to that. In some mathematical, formal sense, these two different structures are equivalent; so ease-of-use is your only decision criterion.

I do know that, in the past, when doing something similar, I got burned by this. I was trying to convert sentences into those fabled "web 2.0" or "semantic web" triples that the triple-store and sparQL people love so much. Which is easy and works great when your sentences are easy and simple, and completely falls apart once you get to real-life examples. So I guess I'm saying "OK, but be careful".

@sebastianruder
Copy link
Contributor Author

@rodsol, @ruiting, how would you represent adjuncts such as "by Peter", "with marbles" now? With their own PredicateNode as Rodas initially proposed?

(EvaluationLink
    (PredicateNode "with@11")
    (ListLink
        (PredicateNode "filled@123")
        (ConceptNode "marbles@987")))

@sebastianruder
Copy link
Contributor Author

@williampma, has this issue been resolved? How are adjuncts captured at the moment?

@williampma
Copy link
Member

I don't think this is resolved yet. Currently there isn't any R2L rule for adjuncts. There is a on-rule scheme helper function, but the corresponding R2L rule is missing, so it is never invoked. I am not sure what the current decision is.

The dangling on-rule does follow a similar structure Rodas purposed, with its own PredicateNode "on" but without instance (so it is not on@1111).

@sebastianruder
Copy link
Contributor Author

Alright. What do you think about implementing a rule which creates a PredicateNode for certain relations, i.e. with(x, y), through(x, y) and about(x, y) for a start. It would for instance produce for with(fill, marbles):

EvaluationLink
    PredicateNode "with@11"
    ListLink
        PredicateNode "filled@123"
        ConceptNode "marbles@987"

as I stated in this comment and as Rodas originally proposed as well.
@williampma, @amebel, do scoped variables work at the moment and if so, could you briefly explain to me how to use them so we can have one rule for all three relations? I've seen the varscope.txt file, but I haven't come across an example where they are actually used.

@williampma
Copy link
Member

about-rule exists (I forgot about it), but again it is without instance number (so it is PredicateNode "about" and not PredicateNode "about@1111"). If this implementation is correct, then I think the solution for with(x,y) should also be without instance number (just PredicateNode "with").

"Scoped variable" is currently hard-coded for the MAYBE rule, so I guess in a sense it does not really work. On the other hand, my recent implementation of regular expression in #152 could potentially be used for "scoped variable", so I guess the old implementation for MAYBE will be extincted (in fact, I might do that eventually). However, I don't think scoped variables applies to your case. It is for something like with(x, scope) and allows scope to be matched to an array of different words. ie, the scope is for the word in the relation. It does not work for the name of the relation like scope(x, y), which is what you want I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants