Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicated results with different descriptions #3

Open
diadem opened this issue Jul 10, 2012 · 4 comments
Open

duplicated results with different descriptions #3

diadem opened this issue Jul 10, 2012 · 4 comments

Comments

@diadem
Copy link

diadem commented Jul 10, 2012

the following query "houses in Summertown" retrieves several times the two properties:

Water Eaton Road, Summertown OX2
£399,950.00
Street: Water Eaton Road, Summertown OX2

bedrooms: 2

bathrooms: 1

Divinity Road, Cowley OX4
£399,950.00
Street: Water Eaton Road, Summertown OX2

bedrooms: 2

bathrooms: 1

It can be a problem directly in the extracted data, or in the visualization

@LorenzBuehmann
Copy link
Contributor

This is a problem in the extracted data where the same URI is used for different entries. This results in several solutions when using SPARQL which appear to be duplicates in the UI, but have for instance different descriptions or images.

@timfu
Copy link

timfu commented Jul 11, 2012

Ok, great. That means if we fix the issue with same URIs that should go away?

I am not 100% convinced, e.g., for right now queries such as "houses in headington" say "using fallback" and then return

Horton Hill, Horton Cum Studley, OX33
The proposed development comprises the construction of a 3-storey extension to the rear of the hotel to accommodate an additional 20 bedrooms and ancillary accommodation, 4 detached houses and garages and a shop to the front of the hotel. Planning Statement: Although the houses/hotel extension can now be built in phases, a condition attached to the Planning Permission for the houses requires that the hotel extension shall be built concurrently with the houses and that the houses may not be occupied until the hotel extension is complete and rea...
£1,600,000.00

x 7

then

Land For SalePortland Road, Milcombe, Banbury, OX15
Situated in Portland Road, Milcombe is this residential Building Land with permission for 5 houses situated in quiet village location adjoining open...

x 6

And so on and so forth. That seems more than the possible URI overlap.

@LorenzBuehmann
Copy link
Contributor

Yes, there are also duplicates in the Lucene index which is used as fallback. Have to check why this happens.

@LorenzBuehmann
Copy link
Contributor

Ok, the duplicates in the fallback Lucene index occur because of the duplicates in the extracted data. I avoid this now by only indexing 1 document per distinct URI, but this indeed lowers the recall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants