Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes typing extension #57

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Removes typing extension #57

wants to merge 1 commit into from

Conversation

Tpt
Copy link
Member

@Tpt Tpt commented Feb 21, 2015

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)
@yhamoudi
Copy link
Member

How do you type resources now?

@Ezibenroc
Copy link
Member

The features of this extension may be implemented by the intersection with (?, instance of, MY_TYPE)

Is it always ok? (I do not have any counter-example in mind)

@Tpt
Copy link
Member Author

Tpt commented Feb 21, 2015

How do you type resources now?

An example: Bach ∩ (?, instance of, human)

Or (it's not valid in RDF but I think we can allow it in your data model) 1934 ∩ (?, instance of, date)

@yhamoudi
Copy link
Member

It's a bit ugly to join an instance of triple to each resource/missing. It's kind of the same problem than with inverse predicates: we can encode it with 2 triples ((?,a,b)∪(b,reverse(a),?)) but it's better to make a clearer distinction by adding a new field (inverse predicate / type).

@progval
Copy link
Member

progval commented Feb 21, 2015

👍 @yhamoudi
And it creates extra workload for module developers.

@Ezibenroc
Copy link
Member

And it creates extra workload for module developers.

Yes. Types were an optionnal information provided to improve the precision. With this PR it would become mandatory...

@Tpt
Copy link
Member Author

Tpt commented Feb 21, 2015

My basic point of view is: we should try to keep the datamodel as simple as possible data model in order to be easy to maintain and understand. I am afraid of having a feature explosion in the data model that would makes the work of module creation very difficult (and it's why I personally dislike the reverse-predicates that has the only advantage (for the Wikidata module) to reduce the tree size).

It's a bit ugly to join an instance of triple to each resource/missing.

You are sure that you will be able to add type annotations to each resource/triples? Imho we should only add them when you are sure they are relevant i.e. when they are explicitly stated in the question like in "Who is Bach" that would be rewritten "Bach ∩ (?, instance of, person)" or "In which country is Paris" that would be rewritten something like "(Paris, [located in, in, location], ?) ∩ (?, instance of, country)".

And that because I don't see how you can do a good enough typing everywhere without real knowledge of the semantic of each word. For example will you be able to understand that "mother" may be both a relationship and a movie? An other example: type the output of "Type "Where is Paris?" is very tricky.

But I would be very happy to be wrong on it, so feel free to convince me I'm wrong.

Side remark because I believe it will arise again quickly: please no parsing of "When is born X" as "(X, birth, ?) ∩ (?, instance of, date)", because it has no real meaning: the range of a "birth" predicate would usually be an event, and cast it to date with an intersection with "(?, instance of, date)" or with a type annotation has really no semantic sense. More, it makes simple module development far mode difficult (need to do clever guesses from a "birth" predicate and a "date" type to see that it's a "birth date" we are looking for).

And it creates extra workload for module developers.

Could you expend on it? I believe that adds some instance of triples is cleaner because we could imagine that the module rewrite the triples he knows about and then the libmodule applies "instance of" triples using resource value-type and JSON-LD @type. If you see a simpler way to use type annotations, please expend on it. I would be very happy to have something simpler than that.

@progval
Copy link
Member

progval commented Feb 21, 2015

On 21/02/2015 17:34, Thomas Tanon wrote:

And it creates extra workload for module developers.

Could you expend on it? I believe that adds some instance of triples is cleaner because we could imagine that the module rewrite the triples he knows about and then the libmodule applies "instance of" triples using resource value-type and JSON-LD @type. If you see a simpler way to use type annotations, please expend on it. I would be very happy to have something simpler than that.

Because module developpers would have to implement a simplification step
that takes into account this intersection, or the module would return
something that can't be used (an intersection of a resource and an
instance-of triple)

@Tpt
Copy link
Member Author

Tpt commented Feb 21, 2015

Because module developpers would have to implement a simplification step
that takes into account this intersection, or the module would return
something that can't be used (an intersection of a resource and an
instance-of triple)

It's exactly why I've proposed the filter based on value-type and @type.

@yhamoudi
Copy link
Member

What is the difference between type and value-type? What is the the JSON serialization of typing?

@Tpt
Copy link
Member Author

Tpt commented Feb 21, 2015

What is the difference between type and value-type?

The serialization of resources specifies a type ("resource") and a value-type ("time", "string", "resource-jsonld"...). See the spec for more details

What is the the JSON serialization of typing?

The serialization of the type extension has not been specified yet.

@yhamoudi
Copy link
Member

The serialization of resources specifies a type ("resource") and a value-type ("time", "string", "resource-jsonld"...). See the spec for more details

I have not been clear. I was talking of type from the datamodel (that is removed in this pull request) and value-type from the serialization. But after re-reading the doc, i have no more question on this.

And that because I don't see how you can do a good enough typing everywhere without real knowledge of the semantic of each word.

I know that Watson uses thousands of types and that it's an important feature, so they probably succeed to perform a very accurate typing.

And it creates extra workload for module developers.

It's exactly why I've proposed the filter based on value-type and @type.

I'm not sure that i understand this remarks (especially about " filter based on value-type and @type"). You say (?) that having 2 triples instead of 1 is better because 2 differents modules can try to solve them. For instance, let's consider What president was born in Italy. A module M1 knows who's born in Italy, but not who are the presidents. A module M2 knows who are the presidents but not their birth places.

Depending on the datamodel we have:

  • 1 triple: (?:PRESIDENT, born in, Italy) -> only module M1 can answer, but it will return all the people born in Italy since it doesn't know who is president or not
  • 2 triple: (?, born in, Italy) ∩ (?, instance of, president) -> M1 gets the people born in Italy, and then M2 filters on the presidents -> more accurate answer

I agree that removing types will solve this kind of things, but i'm not sure that it's the clean way to do. Indeed, with the same reasoning there is a lot of other parts that could be split:

  • ([a,b], c, ?) -> split in (a, c, ?) ∪ (b, c, ?)
  • (la, lb, lc) -> split into triples without list of predicates
  • more generally, doesn't allow lists and reverse predicates at least (since it can be obtain with unions and intersections)

I think we have 3 possibilities:

  • splitting everything. The datamodel is not changed but everytime a module changes the normal form (question parsing, wikidata, ...), the normal form is immediately processed by a "translation module" that removes the lists, reverse predicates, ... in order to split all it can do ((?:PRESIDENT, born in, Italy) -> (?, born in, Italy) ∩ (?, instance of, president)). We could even have a "simplified datamodel" that doesn't talk about reverse predicates or normal forms with lists, even if the modules can use them (but they are removed immediately by the "translation module").
  • we do not split but we find a way to make modules collaborate between them in order to solve a same Missing for instance. In the previous example, M1 could be able to query M2 (directly or indirectly, it could be the core that gives the same triple to different modules and process their answer in a clever way).
  • we do not split + we consider that modules are totally independant (it is what we do actually?). At the end, each part of the normal form has been solved by a module that didn't worked with other modules. On the other hand, we could use a score to choose between the answers of different modules on a same Missing for instance.

I dislike the use of instance of for types because it looks like an "hack" to have types, instead of a clean way to do it. You say that we need to keep the datamodel as simple as possible, but using "instance of" as a way to type you will need to explain the special role holds by the predicate "instance of".

Moreover, i think that we should take into account the computation time needed to solve a question. When there is only 4-5 modules to query it's easy, it could be more difficult if there were 100 modules. The shortest is the normal form, the quickest will be the algorithm (there is a balance to find between the accuracy of the answer and the speed needed to obtain it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants