Removes typing extension #57

Tpt · 2015-02-21T12:02:51Z

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)

yhamoudi · 2015-02-21T12:05:07Z

How do you type resources now?

Ezibenroc · 2015-02-21T12:07:18Z

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)

Is it always ok? (I do not have any counter-example in mind)

Tpt · 2015-02-21T12:30:00Z

How do you type resources now?

An example: Bach ∩ (?, instance of, human)

Or (it's not valid in RDF but I think we can allow it in your data model) 1934 ∩ (?, instance of, date)

yhamoudi · 2015-02-21T15:52:29Z

It's a bit ugly to join an instance of triple to each resource/missing. It's kind of the same problem than with inverse predicates: we can encode it with 2 triples ((?,a,b)∪(b,reverse(a),?)) but it's better to make a clearer distinction by adding a new field (inverse predicate / type).

progval · 2015-02-21T15:56:58Z

👍 @yhamoudi
And it creates extra workload for module developers.

Ezibenroc · 2015-02-21T16:33:46Z

And it creates extra workload for module developers.

Yes. Types were an optionnal information provided to improve the precision. With this PR it would become mandatory...

Tpt · 2015-02-21T16:34:33Z

My basic point of view is: we should try to keep the datamodel as simple as possible data model in order to be easy to maintain and understand. I am afraid of having a feature explosion in the data model that would makes the work of module creation very difficult (and it's why I personally dislike the reverse-predicates that has the only advantage (for the Wikidata module) to reduce the tree size).

It's a bit ugly to join an instance of triple to each resource/missing.

You are sure that you will be able to add type annotations to each resource/triples? Imho we should only add them when you are sure they are relevant i.e. when they are explicitly stated in the question like in "Who is Bach" that would be rewritten "Bach ∩ (?, instance of, person)" or "In which country is Paris" that would be rewritten something like "(Paris, [located in, in, location], ?) ∩ (?, instance of, country)".

And that because I don't see how you can do a good enough typing everywhere without real knowledge of the semantic of each word. For example will you be able to understand that "mother" may be both a relationship and a movie? An other example: type the output of "Type "Where is Paris?" is very tricky.

But I would be very happy to be wrong on it, so feel free to convince me I'm wrong.

Side remark because I believe it will arise again quickly: please no parsing of "When is born X" as "(X, birth, ?) ∩ (?, instance of, date)", because it has no real meaning: the range of a "birth" predicate would usually be an event, and cast it to date with an intersection with "(?, instance of, date)" or with a type annotation has really no semantic sense. More, it makes simple module development far mode difficult (need to do clever guesses from a "birth" predicate and a "date" type to see that it's a "birth date" we are looking for).

And it creates extra workload for module developers.

Could you expend on it? I believe that adds some instance of triples is cleaner because we could imagine that the module rewrite the triples he knows about and then the libmodule applies "instance of" triples using resource value-type and JSON-LD @type. If you see a simpler way to use type annotations, please expend on it. I would be very happy to have something simpler than that.

progval · 2015-02-21T16:41:46Z

On 21/02/2015 17:34, Thomas Tanon wrote:

And it creates extra workload for module developers.

Could you expend on it? I believe that adds some instance of triples is cleaner because we could imagine that the module rewrite the triples he knows about and then the libmodule applies "instance of" triples using resource value-type and JSON-LD @type. If you see a simpler way to use type annotations, please expend on it. I would be very happy to have something simpler than that.

Because module developpers would have to implement a simplification step
that takes into account this intersection, or the module would return
something that can't be used (an intersection of a resource and an
instance-of triple)

Tpt · 2015-02-21T16:50:10Z

Because module developpers would have to implement a simplification step
that takes into account this intersection, or the module would return
something that can't be used (an intersection of a resource and an
instance-of triple)

It's exactly why I've proposed the filter based on value-type and @type.

yhamoudi · 2015-02-21T18:50:19Z

What is the difference between type and value-type? What is the the JSON serialization of typing?

Tpt · 2015-02-21T18:56:25Z

What is the difference between type and value-type?

The serialization of resources specifies a type ("resource") and a value-type ("time", "string", "resource-jsonld"...). See the spec for more details

What is the the JSON serialization of typing?

The serialization of the type extension has not been specified yet.

yhamoudi · 2015-02-21T22:09:35Z

The serialization of resources specifies a type ("resource") and a value-type ("time", "string", "resource-jsonld"...). See the spec for more details

I have not been clear. I was talking of type from the datamodel (that is removed in this pull request) and value-type from the serialization. But after re-reading the doc, i have no more question on this.

And that because I don't see how you can do a good enough typing everywhere without real knowledge of the semantic of each word.

I know that Watson uses thousands of types and that it's an important feature, so they probably succeed to perform a very accurate typing.

And it creates extra workload for module developers.

It's exactly why I've proposed the filter based on value-type and @type.

I'm not sure that i understand this remarks (especially about " filter based on value-type and @type"). You say (?) that having 2 triples instead of 1 is better because 2 differents modules can try to solve them. For instance, let's consider What president was born in Italy. A module M1 knows who's born in Italy, but not who are the presidents. A module M2 knows who are the presidents but not their birth places.

Depending on the datamodel we have:

1 triple: (?:PRESIDENT, born in, Italy) -> only module M1 can answer, but it will return all the people born in Italy since it doesn't know who is president or not
2 triple: (?, born in, Italy) ∩ (?, instance of, president) -> M1 gets the people born in Italy, and then M2 filters on the presidents -> more accurate answer

I agree that removing types will solve this kind of things, but i'm not sure that it's the clean way to do. Indeed, with the same reasoning there is a lot of other parts that could be split:

([a,b], c, ?) -> split in (a, c, ?) ∪ (b, c, ?)
(la, lb, lc) -> split into triples without list of predicates
more generally, doesn't allow lists and reverse predicates at least (since it can be obtain with unions and intersections)

I think we have 3 possibilities:

splitting everything. The datamodel is not changed but everytime a module changes the normal form (question parsing, wikidata, ...), the normal form is immediately processed by a "translation module" that removes the lists, reverse predicates, ... in order to split all it can do ((?:PRESIDENT, born in, Italy) -> (?, born in, Italy) ∩ (?, instance of, president)). We could even have a "simplified datamodel" that doesn't talk about reverse predicates or normal forms with lists, even if the modules can use them (but they are removed immediately by the "translation module").
we do not split but we find a way to make modules collaborate between them in order to solve a same Missing for instance. In the previous example, M1 could be able to query M2 (directly or indirectly, it could be the core that gives the same triple to different modules and process their answer in a clever way).
we do not split + we consider that modules are totally independant (it is what we do actually?). At the end, each part of the normal form has been solved by a module that didn't worked with other modules. On the other hand, we could use a score to choose between the answers of different modules on a same Missing for instance.

I dislike the use of instance of for types because it looks like an "hack" to have types, instead of a clean way to do it. You say that we need to keep the datamodel as simple as possible, but using "instance of" as a way to type you will need to explain the special role holds by the predicate "instance of".

Moreover, i think that we should take into account the computation time needed to solve a question. When there is only 4-5 modules to query it's easy, it could be more difficult if there were 100 modules. The shortest is the normal form, the quickest will be the algorithm (there is a balance to find between the accuracy of the answer and the speed needed to obtain it).

Removes typing extension

5466111

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)

Tpt mentioned this pull request Feb 21, 2015

Differentiate “Who” from “What” ProjetPP/PPP-QuestionParsing-Grammatical#131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removes typing extension #57

Removes typing extension #57

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015

Ezibenroc commented Feb 21, 2015

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015

progval commented Feb 21, 2015

Ezibenroc commented Feb 21, 2015

Tpt commented Feb 21, 2015

progval commented Feb 21, 2015

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015

Removes typing extension #57

Are you sure you want to change the base?

Removes typing extension #57

Conversation

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015

Ezibenroc commented Feb 21, 2015

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015

progval commented Feb 21, 2015

Ezibenroc commented Feb 21, 2015

Tpt commented Feb 21, 2015

progval commented Feb 21, 2015

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015

Tpt commented Feb 21, 2015

yhamoudi commented Feb 21, 2015