Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discover similarity of entries and create summaries #20

Open
2 of 15 tasks
flosse opened this issue Jun 24, 2024 · 6 comments
Open
2 of 15 tasks

Discover similarity of entries and create summaries #20

flosse opened this issue Jun 24, 2024 · 6 comments

Comments

@flosse
Copy link

flosse commented Jun 24, 2024

We need a mechanism that recognizes common entries from different data sources with different data quality and combines them.

Starting situation

We assume that we have already translated data from various platforms into the FairSync protocol format.
The input is therefore an unsorted vector of Records.

1. Identifying similarities

First of all, we need to find groups of entries that are somehow similar:

type GroupOfSimilarEntries = Vec<&Record>;
fn find_similar_entries(records: &[Record]) -> Vec<GroupOfSimilarEntries>

Physical Distance

The most obvious recognition attribute is the coordinates (lat/lng) of an entry. However, not all entries have coordinates, which is why other properties must be included. So we first have to filter all entries for those with a valid location

  • <10m = the same, >500 m = totally diferent

Hamming distance of Title

With the Hamming Distance #22 we calculate the difference of the title. The same title would be zero.
The longer a title is and the more different letters it has, the higher is the number.

  • We can give it by a percentage of the title. 0% = the same, 100% = totally different

Hamming Distance of the Mailadress

  • only if the mail-domain is the same, compare the part before the @qknight

Hamming Distance of Webseite

  • Only if the basis URL is the same, compare the part after /

Distance of Tags

We need a multy dimensional field of all our hashtags and how similar their are. The more freuquent they appear together, the close they appear together

  • Check the thematical distance between the hashtags of possible duplicates. If they are the same or lay close together, regard them as duplicates. If they are far away, they are not.

2. Rate quality

Then we have to decide which entry has the highest data quality:

fn rate_quality(records: &[Record]) -> &Record

The rating may vary depending on the individual criteria.
For example, a platform could always classify its own data (recognizable by the origin property) as the highest quality.

  • Put the entry with the highes quality (most attributes, most changes, latest update) first in the list to compare

3. Rate similarity

/// The similarity to the reference
///
/// 1.0 = exactly the same
/// 0.0 = something completely different
type Similarity = f64;

fn rate_similarity(reference: &Record, records: &[Record]) -> Vec<(&Record, Similarity)>
  • Regard all entries with [big differences] as individual new entries and list them as "New entries"
  • Regard all entries with [very high similarities] as duplicates and list them as "New duplicstes to merge"
  • Regard all entries [in between] as crucial to be checked

Promt KI to define its uniqueness

Use the following promts

  1. Check the websites of both entries, if it is the same organisations
  2. Check the websites of both entries, if it is the same location
  3. Check the websites of both entries, if it is the same team or contact point

if yes, merge it as one entry
if not, create a new entry

4. Create summary or show differences

In the last step we have to create a summary or show the differences, if there are too littel similarities, for a manual recheck/ result-check

fn create_summary(root: &Record, others: &[(Record, Similarity)] ) -> Entry

At this point, the user may wish to define a threshold value for the level of similarity at which other entries should be included in the summary, but this should be done before calling this function
(e.g. let others = recods.filter(|(_,similarity)|similarity > my_threshold)).

Example

WeltCafé @ Stuttgart Mitte

@wellemut
Copy link
Member

wellemut commented Aug 8, 2024

Example create
einheiltiches Importformat estellt.

grafik

@wellemut wellemut added the import label Oct 1, 2024
This was referenced Oct 10, 2024
@qknight
Copy link
Collaborator

qknight commented Oct 11, 2024

The example used is a small shop which has 3 duplicates in the database:

I get really good results with llama3 7b and using this prompt:

please check if the following json documents are duplicates based on the fields: title, description, street, contact_name, email, telephone, homepage and ignore the other fields. if a given field is empty or 'null', also ignore it! a very similar title or description is a strong indicators for a duplicate. if the other fields differ it is probably a sign for no duplicate. the zip code should only be a number while the city should only be comprised of letters and no numbers.

output: if you consider duplicates, merge all fields from both documents and only output the json as <code> but for the merge use data from the document with the higher version number.

if not considered a duplicate only output: "Not a duplicate".

[{"id":"16d60a7d34b64d30ad1dfc44aeb3ab18","created":1725105850,"version":1,"title":"Glas und Beutel Unverpackt ","description":"Unverpackt, Plastikfrei und Bio Einkaufen.","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstraße 10","zip":"72622","city":"Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle","email":"[email protected]","telephone":"07022493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Di, Mi, Fr: 10 - 13, - 14.30 - 18 Uhr, Do:  10  - 13, 14.30 - 19 Uhr, Sa. 10 - 13 Uhr.","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]



[{"id":"a9177e600c4a4693969c636c1168860b","created":1693295484,"version":6,"title":"Glas und Beutel ","description":" Fachgeschäft für unverpackte, regionale und biologische Waren des täglichen Bedarfs","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstrasse 10","zip":"72622","city":"72622 Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle-Kraiss","email":"[email protected]","telephone":"07022 2493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Tu-Fr 10:00-18:00; Th 09:00-19:00; Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":"https://sp-ao.shortpixel.ai/client/to_webp,q_lossless,ret_img/https://www.glasundbeutel.de/wp-content/uploads/2020/12/glas-beutel-unverpackt-einkaufen.jpg","image_link_url":"https://www.glasundbeutel.de/"}]


@qknight
Copy link
Collaborator

qknight commented Oct 15, 2024

Using this python code:

# set env on windows powershell like:
# env TOKEN=sk-asdfasdfasdfasdfasfd

# localai is the framework
import requests
import json
import os

# Define the LocalAI endpoint (adjust host and port as needed)
url = "https://api.ai.rhw24.it/v1/chat/completions"

# Set up headers
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.environ.get('TOKEN')}"
}

role = """
from the two json documents: extract the values 'title' and 'description' and interprete the meaning to decide weather they are accidental duplicate entries on the map or are in fact distinct locations. ignore all other fields! assume the possibility like a user might have made a misstake when entering the values. assume the point of interest is in germany. do some reasoning and finally make your decision!

your expected answer:

finally ONLY output "duplicate" OR "different". DON'T OUTPUT ANYTHING ELSE LIKE REASONING, CODE, OR HOW YOU WOULD APPROACH THE PROBLEM.
"""


def query_llama(query: str) -> str:
    """
    Makes a POST request to LocalAI's API with the given query and system instructions.

    Returns:
        Result
    """
    data = {
        "model": "meta-llama-3.1-8b-instruct-abliterated-GGUF",  # Replace with the model you are using
        "messages": [
            {"role": "system", "content": role},  # Defines the system instructions
            {"role": "user", "content": query}  # The user's query
        ],
        "max_tokens": 1000
    }

    response = requests.post(url, headers=headers, data=json.dumps(data))

    if response.status_code == 200:
        result = response.json()
        for choice in result['choices']:
            print(choice['message']['content'])
            return choice['message']['content']
    else:
        print(f"Error: {response.status_code}, {response.text}")
        return "fail"

query = """
[{"title":"Heimathafen Projekt - Unverpacktladen Café","description":""}]
[{"title":"Heimathafen Projekt - Unverpacktladen mit Café","description":""}]       
"""

query_llama(query)

We have this working now!

Most of these tests work really great with that prompt:

# python -m unittest test_llama3.py
import unittest
from llama3_request import query_llama  # Import the function from llama3-request.py

import json
import os
import re 

# File path for the JSON data
json_file_path = 'data.json'

# Initialize the data dictionary
data = {}

# Function to load the data from a JSON file
def load_data():
    global data
    if os.path.exists(json_file_path):
        with open(json_file_path, 'r') as file:
            data = json.load(file)
    else:
        data = {}

# Function to extract numerical part of the key
def extract_number(key):
    match = re.search(r'\d+', key)  # Finds the first sequence of digits in the key
    return int(match.group()) if match else 0  # Returns the number as an integer

# Function to save the data into a JSON file
def save_data():
    sorted_data = dict(sorted(data.items(), key=lambda x: extract_number(x[0])))  # Sorts the data numerically
    with open(json_file_path, 'w') as file:
        json.dump(sorted_data, file, indent=4)

def make_query(identifier, query, expect):
    res = query_llama(query)

    # Example of modifying the data structure
    #identifier = 'task_1'
    if identifier not in data:
        data[identifier] = {'success': 0, 'failure': 0}

    # Update success and failure counts
    if res == expect:
        data[identifier]['success'] += 1
    else:    
        data[identifier]['failure'] += 1

    return res

class TestQueryLlama(unittest.TestCase):

    @classmethod
    def setUpClass(cls):
        print("Running setUpClass: Initialize resources")
        load_data()

    @classmethod
    def tearDownClass(cls):
        print("Running tearDownClass: Clean up resources")
        save_data()
        print(json.dumps(data, indent=4))
        
    def test_query_1(self):
        query = """
        [{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"a9177e600c4a4693969c636c1168860b","created":1693295484,"version":6,"title":"Glas und Beutel ","description":" Fachgeschäft für unverpackte, regionale und biologische Waren des täglichen Bedarfs","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstrasse 10","zip":"72622","city":"72622 Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle-Kraiss","email":"[email protected]","telephone":"07022 2493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Tu-Fr 10:00-18:00; Th 09:00-19:00; Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":"https://sp-ao.shortpixel.ai/client/to_webp,q_lossless,ret_img/https://www.glasundbeutel.de/wp-content/uploads/2020/12/glas-beutel-unverpackt-einkaufen.jpg","image_link_url":"https://www.glasundbeutel.de/"}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)

    def test_query_2(self):
        query = """
        [{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"496be01d33434b8784a717999eada2c2","created":1697543687,"version":0,"title":"Repair-Cafe des Tauschring Schmuttertal","description":"Wir reparieren und netzwerken für die REgion","lat":48.54760001345892,"lng":10.852499981807778,"street":"Schulweg 6","zip":"86405","city":"Meitingen","country":"Deutschland","state":"Bayern","contact_name":"Sandra Nentwich","email":"[email protected]","telephone":"08271 802652","homepage":null,"opening_hours":"2. Freitag / Monat, 15 - 17 Uhr (März, Juni, September, Dezember)","founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["austauschen","reparatur","tauschen","upcycling","werkstatt"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)
    def test_query_3(self):
        query = """
        [{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"1510e0494cd2428e80ef7778d2ae3f59","created":1650980520,"version":0,"title":"Restaurant Treibgut","description":"Bieten ein hochwertiges, 100% lebensmittelechtes und langlebiges Mehrwegsystem an. Zudem viele lokale, regionale und selbstgemachte Lebensmittel wie Honig vom eigenen Hoteldach und Saft von eigenen Streuobstwiesen in der Umgebung.","lat":48.41200001463853,"lng":10.01260000747284,"street":"Friedrichsau 50","zip":"89073","city":"Ulm","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"https://hotel.lago-ulm.de/treibgut-restaurant-bar/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","gastronomie","lebensmittel","lokal","mehrweg","mehrwegsystem","nachhaltig","regional","relevo","restaurant","vegan","weltladen"],"ratings":[],"license":"CC0-1.0","image_url":"https://hotel.lago-ulm.de/wp-content/uploads/2019/03/galerie_59.jpg","image_link_url":null}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)   
    def test_query_4(self):
        query = """
        [{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"e049491d42294eb2b4aa89f48d9b4881","created":1723119462,"version":1,"title":"Werkstatthaus Stuttgart/Cafè Arg","description":"Die wunderbare Terrasse und das gemütliche Ambiente an diesem außergewöhnlich schönen Ort laden zum Entspannen ein. \n\nReparieren statt wegwerfen: in unserer offenen Werkstatt kannst Du defekte Alltags-, Elektro- und Gebrauchsgegenstände reparieren. Es gibt Werkzeuge und Unterstützung.\n","lat":48.779934844365314,"lng":9.193568625111864,"street":"Gerokstraße 7","zip":"Stuttgart","city":"Stuttgart","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":null,"homepage":"http://werkstatthaus.com/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["barrierefrei","cafe","circulareconomy","gastronomie","jetztklimachen","kreislaufwirtschaft","leitungswasser","refill","refill-station","reparatur","reparaturwerkstatt","reparieren","sharing","stuttgart","stuttgartrepariert","trinkwasser","upcycling","werkstatt","werkzeug"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)  
    def test_query_5(self):
        query = """
        [{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"b16eb76e04f9490a8c812bfb82f3f0b1","created":1569922172,"version":2,"title":"Café ViS A ViS","description":"Café in der Niklastorstraße","lat":48.94105765453589,"lng":9.258245234032275,"street":"Niklastorstraße 17","zip":"71672","city":"Marbach am Neckar","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":null,"opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["caf","cafe","leitungswasser","refill","refill-station","trinkwasser"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect) 
    def test_query_6(self):
        query = """
        [{"id":"4c584b51f49f4981b9e8146cfa5ccb5b","created":1566295940,"version":0,"title":"Café Gustav","description":"Öffnungszeiten:\nDi - Fr: 07:30 - 22:00 \nSa: 09:00 - 19:00\nSo: 09:00 - 17:00","lat":48.77117944358437,"lng":9.157502888309537,"street":"Schwabstraße 47","zip":"70197","city":"Stuttgart","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://www.cafegustav.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["leitungswasser","refill","refill-station","trinkwasser"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"71560cc0c7214b69a472bcf8007d0812","created":1712861101,"version":0,"title":"Café Gustav","description":"Hei, bei uns könnt ihr nach neuer kleidung stöbern und dazu einen leckeren Capucchino in unserem Café genießen. Oder wie wäre ein spritzgetränk in unserem Hinterhof ? Wir freuen uns auf euren Besuch.\n\nEuer Gustav Ream","lat":48.77109998314227,"lng":9.157600034567341,"street":"Schwabstrasse","zip":"70197","city":"Stuttgart","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sophie Lechner","email":"[email protected]","telephone":"0711 48986002","homepage":"https://www.cafegustav.de/","opening_hours":"9 bis 19 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","gastronomie"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect) 
    def test_query_7(self):
        query = """
        [{"id":"baad219b211d45c4b578bed03dfa1641","created":1709135693,"version":3,"title":"Ora d'Oro GmbH","description":"Unverpackt-Laden","lat":48.399400002509076,"lng":9.995300010775821,"street":"Breite Gasse 6","zip":"89073","city":"Ulm","country":"Germany","state":"Baden-Württemberg","contact_name":null,"email":"[email protected]","telephone":"","homepage":"https://www.oradoro.bio/","opening_hours":"Mo, Di & Fr: 9-18 Uhr, Mi: 8-18 Uhr, Do: 10-19 Uhr, Sa: 8-14 Uhr, Mo-Fr: 13-14 Uhr Mittagspause","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bildung","bio","food","gemüse","lebensmittel","leitungswasser","obst","refill","refill-station","schule","trinkwasser","unverpackt","zerowaste"],"ratings":["f11e47b86fdf4678a9d319d87cc31d7f"],"license":"CC0-1.0","image_url":"https://www.google.com/maps/uv?pb=!1s0x479967df2b8a27ff%3A0xbd9205d4d6788316!3m1!7e115!4shttps%3A%2F%2Flh5.googleusercontent.com%2Fp%2FAF1QipOZWfTDYqGj1IGYfhaAMp-8HotNRrZtjY3TdzhJ%3Dw213-h160-k-no!5sunverpackt%20laden%20ulm%20-%20Google%20Suche!15sCgIgAQ&imagekey=!1e10!2sAF1QipOZWfTDYqGj1IGYfhaAMp-8HotNRrZtjY3TdzhJ&hl=de&sa=X&ved=2ahUKEwir2dSS6ajxAhXigf0HHQLtBDMQoiowEnoECEUQAw","image_link_url":null,"custom":[{"url":"https://www.oradoro.bio/","title":null,"description":"Instagram"}]}]
        [{"id":"6ee0555eb6a245b5bcd8eb3daf75c736","created":1696251178,"version":3,"title":"Ora d'Oro GmbH","description":"Ulms Unverpackt-Laden um nachhaltig und plastikfrei einkaufen zu können.","lat":48.39626978076821,"lng":9.993001776744146,"street":"Unter der Metzig 22","zip":"Ulm","city":"89073","country":null,"state":null,"contact_name":"Anthony Saad","email":"[email protected]","telephone":"0731-79083066","homepage":"https://www.oradoro.bio/","opening_hours":"Mo-Fr 10:00-18:00 Uhr/Sa 9:00-16:00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","regional","unverpackt","unverpackt-einkaufen","unverpacktladen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":"https://www.google.de/maps/uv?hl=de&pb=!1s0x479967df2b8a27ff%3A0xbd9205d4d6788316!3m1!7e115!4shttps%3A%2F%2Flh5.googleusercontent.com%2Fp%2FAF1QipN15kbBNLsl7xhlk4nvp8DMl6w0YeysbTSeqq9I%3Dw284-h160-k-no!5sklare%20kante%20ulm%20-%20Google-Suche!15sCgIgAQ&imagekey=!1e10!2sAF1QipN15kbBNLsl7xhlk4nvp8DMl6w0YeysbTSeqq9I&sa=X&ved=2ahUKEwjK8das5bjqAhXgURUIHVf-C1cQoiowCnoECBIQBg#"}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect) 
    def test_query_8(self): # here it detects a duplicate but it is actually a different location i guess
        query = """
        [{"id":"b8ef1f462e844a5688745443db26a155","created":1685278420,"version":0,"title":"Bio Mäck Naturkosthandel Biolieferdienst","description":"Auf ca. 70 Hektar wird Futter für die Kühe, Getreide und Kartoffeln, sowie allerlei Obst und Gemüse produziert. Bio-Lieferdienst und Abo-Kisten.","lat":48.57335600935544,"lng":10.274127037392057,"street":"Schlossshof 8","zip":"89567","city":"Sontheim/Bergenweiler","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":"07325-6132","homepage":"https://www.biomaeck.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["biolandwirtschaft","landkreis-heidenheim","lieferdienst","skills-for-future"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"7387e6f6af4c464da754b25e378b0579","created":1524466772,"version":0,"title":"Bio-Mäck","description":"Bio-Händler, legt viel Wert auf Demeter und möglichst regional. Schwerpunkt Obst und Gemüse, aber komplettes Bio-Sortiment. Lieferung an Kindergärten, Schulen, Privatkunden und Firmen","lat":48.57298431385913,"lng":10.274852574930923,"street":"Weiherstrasse 1","zip":"89567","city":"Bergenweiler","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":"07325-6132","homepage":"https://www.biomaeck.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["abo","bioland","demeter","firmen","heidenheim","hofladen","kiste","laden","lieferdienst","ulm","wochenmarkt"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect) 
    def test_query_9(self): # here it detects a duplicate but it is actually a different location i guess
        query = """
        [{"id":"205eadd7820c47f5be995e5b4f8bd92f","created":1720433937,"version":2,"title":"Erdapfel Cafe-Bistro","description":"Cafe und Bistro angrenzend zum dazugehörigen Bio-Supermarkt","lat":48.39626269806002,"lng":9.956258195432023,"street":"Ochsengasse 41","zip":"89077 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973160318320","homepage":"http://www.erdapfel-bio-bistro.de/","opening_hours":"Mo-Fr 9:00-15:00  Sa 9:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","bistro","cafe","restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"7adeec3942614cfca97eb828e6aafe12","created":1606141642,"version":0,"title":"Erdapfel Ulm","description":"Naturkost, Bio","lat":48.396250209024295,"lng":9.95546920688612,"street":"Schlösslesgasse 10","zip":"89077","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://www.erdapfel-naturkost.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-lebensmittel"],"ratings":["3bc1cb2731ac45778cca59552cd8e9bf"],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)         
    def test_query_10(self):
        query = """
        [{"id":"f129e1c03be24a3c9630c9152a87d394","created":1658842027,"version":6,"title":"Weltladen Dornheim","description":"Als Fachgeschäft des Fairen Handels bieten wir ein buntes Sortiment an fair gehandelten Lebensmitteln aus kontrolliert biologischem Anbau sowie Kunsthandwerksprodukte aus dem globalen Süden und schöne Upcyclingprodukte an.","lat":49.87689882045467,"lng":8.481560567618143,"street":"Gernsheimer Landstr. 1","zip":"64521","city":"Groß-Gerau","country":"Deutschland","state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"http://www.pdw-dornheim.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","einkaufen","fair","fairer-handel","fairerhandel","fairtrade","food","kaufen","klimaschutz","kreisgg","lebensmittel","nachhaltigkeit","schenken","weltladen"],"ratings":["3bc226681b8841b181b57bf4da153f3c","c6e6c28b89ab4e76b47dba4651884b3e"],"license":"CC0-1.0","image_url":"https://www.weltladen.de/site/assets/files/14222/","image_link_url":null}]
        [{"id":"e83504ebdcb34d32a153f5cf1fd7c65a","created":1621840801,"version":2,"title":"Weltladen Dornheim","description":"Fachgeschäft des Fairen Handels\nVerein Partnerschaft Dritte Welt - Dornheim 1980 e. V.","lat":49.876933689171885,"lng":8.481460403875197,"street":"Gernsheimer Landstraße 1","zip":"64521","city":"Groß-Gerau","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":"06152/57254","homepage":"https://www.pdw-dornheim.de/","opening_hours":"donnerstags 16 - 18 Uhr, samstags 9 - 12 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bildungsgerechtigkeit","bio","einewelt","entwicklungszusammenarbeit","fair","fairer-handel","fairtrade","geschenkideen","globales","kunsthandwerk","lebensmittel","umwelt","weltladen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)  
    def test_query_11(self):
        query = """
        [{"id":"0cee5e74c51844bf8ecb13f8a27db317","created":1579710894,"version":0,"title":"Glas&Beutel","description":"Unverpacktladen","lat":48.62639598018787,"lng":9.336465741198726,"street":"Marktstraße 10","zip":"72622","city":"Nürtingen","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://glasundbeutel.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["lebensmittel","unverpackt"],"ratings":["8c8e433e7ee24e969095829d90383954","837bbc877d6d4939a56c20c7e79fb06c"],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"16d60a7d34b64d30ad1dfc44aeb3ab18","created":1725105850,"version":1,"title":"Glas und Beutel Unverpackt ","description":"Unverpackt, Plastikfrei und Bio Einkaufen.","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstraße 10","zip":"72622","city":"Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle","email":"[email protected]","telephone":"07022493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Di, Mi, Fr: 10 - 13, - 14.30 - 18 Uhr, Do:  10  - 13, 14.30 - 19 Uhr, Sa. 10 - 13 Uhr.","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)  
    def test_query_12(self):
        query = """
        [{"id":"16d60a7d34b64d30ad1dfc44aeb3ab18","created":1725105850,"version":1,"title":"Glas und Beutel Unverpackt ","description":"Unverpackt, Plastikfrei und Bio Einkaufen.","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstraße 10","zip":"72622","city":"Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle","email":"[email protected]","telephone":"07022493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Di, Mi, Fr: 10 - 13, - 14.30 - 18 Uhr, Do:  10  - 13, 14.30 - 19 Uhr, Sa. 10 - 13 Uhr.","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"a9177e600c4a4693969c636c1168860b","created":1693295484,"version":6,"title":"Glas und Beutel ","description":" Fachgeschäft für unverpackte, regionale und biologische Waren des täglichen Bedarfs","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstrasse 10","zip":"72622","city":"72622 Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle-Kraiss","email":"[email protected]","telephone":"07022 2493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Tu-Fr 10:00-18:00; Th 09:00-19:00; Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":"https://sp-ao.shortpixel.ai/client/to_webp,q_lossless,ret_img/https://www.glasundbeutel.de/wp-content/uploads/2020/12/glas-beutel-unverpackt-einkaufen.jpg","image_link_url":"https://www.glasundbeutel.de/"}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)   
    def test_query_13(self):
        query = """
        [{"id":"4c584b51f49f4981b9e8146cfa5ccb5b","created":1566295940,"version":0,"title":"Café Gustav","description":"Öffnungszeiten:\nDi - Fr: 07:30 - 22:00 \nSa: 09:00 - 19:00\nSo: 09:00 - 17:00","lat":48.77117944358437,"lng":9.157502888309537,"street":"Schwabstraße 47","zip":"70197","city":"Stuttgart","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://www.cafegustav.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["leitungswasser","refill","refill-station","trinkwasser"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"a9177e600c4a4693969c636c1168860b","created":1693295484,"version":6,"title":"Glas und Beutel ","description":" Fachgeschäft für unverpackte, regionale und biologische Waren des täglichen Bedarfs","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstrasse 10","zip":"72622","city":"72622 Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle-Kraiss","email":"[email protected]","telephone":"07022 2493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Tu-Fr 10:00-18:00; Th 09:00-19:00; Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":"https://sp-ao.shortpixel.ai/client/to_webp,q_lossless,ret_img/https://www.glasundbeutel.de/wp-content/uploads/2020/12/glas-beutel-unverpackt-einkaufen.jpg","image_link_url":"https://www.glasundbeutel.de/"}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect) 
    def test_query_14(self):
        query = """
        [{"id":"a4c5ad3a73714b1eb08d7c08ec16287b","created":1692179473,"version":0,"title":"Whisky & Spirituosen Manufaktur Hercynian Distilling Co. / Hammerschmiede ehem. Glen Els","description":"Das Familienunternehmen Hercynian Distilling Co. / Hammerschmiede stellt Whisky, Gin & Spiritosen hand-made unter Verwendung von regionalen Rohstoffen & Bergquellwasser her. Herzlich Willkommen: im Shop oder während einer Tour durch die Produktion. ","lat":51.629499998702435,"lng":10.633299970362941,"street":"Elsbach 11 A","zip":"37445","city":"Walkenried / Zorge","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"05586-8282","homepage":"https://www.hercynian-distilling.de/","opening_hours":"Di-Fr 10.00 h-17.00h und Sa 10.00h bis 15.00 h","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","food","gemüse","lebensmittel","obst"],"ratings":["4ab2715bb1934fc7a1728711604929a0"],"license":"CC0-1.0","image_url":"https://www.hercynian-distilling.de/","image_link_url":null,"custom":[{"url":"https://www.facebook.com/hashtag/hercyniandistillingcompany/","title":"facebook: hercyniandistilling","description":null},{"url":"https://www.instagram.com/hercyniandistilling/?hl=de","title":"Instagramm Hercynian Distilling","description":null},{"url":"https://www.instagram.com/mistress.of.distilling/reels/","title":"Instagramm Mistress of Distilling","description":null}]}]
        [{"id":"2b80abca3b224b369798e8dc1b2b85ba","created":1692178575,"version":0,"title":"Whisky & Spirituosen Manufaktur Hercynian Distilling Co. / Hammerschmiede ehem. Glen Els","description":"Das Familienunternehmen produziert Spirituosen, Gin & Whisky hand-made aus Quellwasser. Regionaler Bezug und Handwerkskunst sind uns wichtig.\nHerzlich willkommen: entweder bei uns im Shop oder während einer Führung durch die Produktion. ","lat":51.629499998702435,"lng":10.633299970362941,"street":"Elsbach 11 A","zip":"37445","city":"Walkenried / Zorge","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"05586-8282","homepage":"https://www.hercynian-distilling.de/","opening_hours":"siehe Homepage","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","food","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.facebook.com/hashtag/hercyniandistillingcompany/","title":"facebook: hercyniandistilling","description":null},{"url":"https://www.instagram.com/hercyniandistilling/?hl=de","title":"#hercyniandistilling","description":null},{"url":"https://www.instagram.com/mistress.of.distilling/","title":"#mistress.of.distilling","description":null}]}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)  
    def test_query_15(self): # this should be a duplicate but the range between the locations is like 100m+  
        query = """
        [{"id":"7f23ef061761405ab77bffb7caa1cb90","created":1714903675,"version":6,"title":"Heimathafen Projekt - Unverpacktladen Café","description":"Nachhaltiges, faires und regionales Angebot für alle ","lat":48.94666648886477,"lng":9.241581423786274,"street":"Dengelberg 1","zip":"Benningen","city":"71726","country":null,"state":null,"contact_name":"Maya und Ralf Esch","email":"[email protected]","telephone":"+49 (0) 7144 1309450","homepage":"https://www.heimathafen-projekt.de/","opening_hours":"Di-Do: 9-18 Uhr Fr: 9-19 Uhr, Do + Sa: 8-14 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","fair","ganzheitliche-gesundheit","gesund","hygieneprodukte","leitungswasser","nachhaltig","naturkosmetik","plastikfrei","prävention","refill","refill-station","regional","regionale-lebensmittel","reinigungsmittel","restaurant","trinkwasser","unverpackt","vegan","vegetarisch"],"ratings":["56e3d03b29734b0fb523edf7969ddbd6"],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.facebook.com/sloworld.org","title":"Heimathafen Projekt","description":"#gesundshoppen #gradido"}]}]
        [{"id":"a79b810b23b6413e8da28c81042e9d69","created":1633597545,"version":5,"title":"Heimathafen Projekt - Unverpacktladen mit Café","description":"Nachhaltiges, faires und naturnahes Angebot für Alle","lat":48.95221195600564,"lng":9.226970676904065,"street":"Dengelberg 1","zip":"71726","city":"Benningen","country":null,"state":null,"contact_name":"Maya und Ralf Esch","email":"[email protected]","telephone":"07144-1309450","homepage":"https://heimathafen-projekt.de/","opening_hours":"Di-Fr: 9.00 - 18 Uhr, Do + Sa: 9.00 - 13.00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-regional-obst-gemüse-","biorestaurant","cafe","fair","ganzheitliche-gesundheit","gesundheit","hygieneprodukte","lebensmittel","mietpaten","mittagstisch","naturkosmetik","plastikfrei","prävention","refill","refill-station","reinigungsmittel","restaurant","trinkwasser","umweltschutz","wasserfilter"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"http://www.heimathafen-projekt.de/","title":"Zur Website des heimathafen projekts","description":"#gesundshoppen"}]}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)                                                           
    def test_query_16(self):
        query = """
        [{"id":"8ec7447311e04234ba4aac69c20dbe6f","created":1689333966,"version":0,"title":"Solawi-Hannover | Depot Stadteil Burg / Paul Dorhmann Schule","description":"\"Die Solawi Hannover ist eine sich selbst tragende Gemeinschaft. Wir setzen uns für eine nachhaltige, ökologische und wirtschaftliche Nahrungsmittelerzeugung ein, die auf Solidarität und Regionalität beruht. Dabei steht das Tun in und mit der Natur stets im Mittelpunkt und wir suchen immer nach klugen, natürlichen und pragmatischen Lösungen. Nachhaltigkeit und Gemeinschaftlichkeit sind unsere gemeinsamen Ziele auf allen Ebenen.\nIn der Solawi Hannover bringen wir zusammen, was vor langer Zeit getrennt wurde: der Anbau vereint sich mit dem Konsumenten und schafft eine resiliente Einheit.\"","lat":52.37360001652203,"lng":9.730999995829071,"street":"Burgstraße 5","zip":"30159","city":"Hannover","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"0171-8334486","homepage":"https://solawi-hannover.de/","opening_hours":null,"founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bio-regional-obst-gemüse-","food","gemüse","lebensmittel","obst","regional","regionale-lebensmittel","regionale-produkte","solawi","solidarische-landwirtschaft","umweltbewussternähren","unverpackt"],"ratings":[],"license":"CC0-1.0","image_url":"https://solawi-hannover.de/wp-content/uploads/2023/01/Logo_Solawi_Hannover-1024x171.jpg","image_link_url":"https://solawi-hannover.de/"}]
        [{"id":"972182a318ac47d69e0a7c438db33197","created":1689336806,"version":2,"title":"SoLawi Gut Adolphshof - Depot Nordstadt","description":"Wir Mitlandwirt*innen, ermöglichen gemeinsam mit den Landwirten vom Gut Adolphshof eine besondere Form der regionalen ökologischen Landwirtschaft. Gemeinsam und solidarisch kümmern wir uns um die Belange des Hofes und sorgen für ein von allen getragenes, transparentes und nachhaltiges Wirtschaften.","lat":52.38699999749055,"lng":9.71839998369962,"street":"Klaus-Müller-Kilian-Weg","zip":"30167","city":"Hannover","country":"Deutschland","state":"Niedersachsen","contact_name":"Dominik Günderoth","email":"[email protected]","telephone":"05175 6308","homepage":"https://solawi-gut-adolphshof.de/","opening_hours":"Donnerstag Vormittag","founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["backwaren","bio","bio-regional-obst-gemüse-","brot","eier","fleisch","food","gemüse","lebensmittel","obst","regional","regionale-lebensmittel","regionale-produkte","solawi","solidarische-landwirtschaft","umweltbewussternähren","unverpackt"],"ratings":[],"license":"CC0-1.0","image_url":"https://solawi-gut-adolphshof.de/wp-content/uploads/2022/05/cropped-hof-logo.png","image_link_url":"https://solawi-gut-adolphshof.de/","custom":[{"url":"https://www.instagram.com/solawi_gutadolphshof/?hl=de","title":null,"description":null}]}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)
    def test_query_17(self): # description: null multiple times might be confusing for llama
        query = """
        [{"id":"8ec7447311e04234ba4aac69c20dbe6f","created":1689333966,"version":0,"title":"Solawi-Hannover | Depot Stadteil Burg / Paul Dorhmann Schule","description":"\"Die Solawi Hannover ist eine sich selbst tragende Gemeinschaft. Wir setzen uns für eine nachhaltige, ökologische und wirtschaftliche Nahrungsmittelerzeugung ein, die auf Solidarität und Regionalität beruht. Dabei steht das Tun in und mit der Natur stets im Mittelpunkt und wir suchen immer nach klugen, natürlichen und pragmatischen Lösungen. Nachhaltigkeit und Gemeinschaftlichkeit sind unsere gemeinsamen Ziele auf allen Ebenen.\nIn der Solawi Hannover bringen wir zusammen, was vor langer Zeit getrennt wurde: der Anbau vereint sich mit dem Konsumenten und schafft eine resiliente Einheit.\"","lat":52.37360001652203,"lng":9.730999995829071,"street":"Burgstraße 5","zip":"30159","city":"Hannover","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"0171-8334486","homepage":"https://solawi-hannover.de/","opening_hours":null,"founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bio-regional-obst-gemüse-","food","gemüse","lebensmittel","obst","regional","regionale-lebensmittel","regionale-produkte","solawi","solidarische-landwirtschaft","umweltbewussternähren","unverpackt"],"ratings":[],"license":"CC0-1.0","image_url":"https://solawi-hannover.de/wp-content/uploads/2023/01/Logo_Solawi_Hannover-1024x171.jpg","image_link_url":"https://solawi-hannover.de/"}]
        [{"id":"2b80abca3b224b369798e8dc1b2b85ba","created":1692178575,"version":0,"title":"Whisky & Spirituosen Manufaktur Hercynian Distilling Co. / Hammerschmiede ehem. Glen Els","description":"Das Familienunternehmen produziert Spirituosen, Gin & Whisky hand-made aus Quellwasser. Regionaler Bezug und Handwerkskunst sind uns wichtig.\nHerzlich willkommen: entweder bei uns im Shop oder während einer Führung durch die Produktion. ","lat":51.629499998702435,"lng":10.633299970362941,"street":"Elsbach 11 A","zip":"37445","city":"Walkenried / Zorge","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"05586-8282","homepage":"https://www.hercynian-distilling.de/","opening_hours":"siehe Homepage","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","food","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.facebook.com/hashtag/hercyniandistillingcompany/","title":"facebook: hercyniandistilling","description":null},{"url":"https://www.instagram.com/hercyniandistilling/?hl=de","title":"#hercyniandistilling","description":null},{"url":"https://www.instagram.com/mistress.of.distilling/","title":"#mistress.of.distilling","description":null}]}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)       

    def test_query_18(self):
        query = """
        [{"id":"ec50974283144111bd53ab3f15a1ee76","created":1674474693,"version":0,"title":"Café Sonne","description":"Cafe der Werkstätten Esslingen Kirchheim","lat":48.74030000937185,"lng":9.3099999657413,"street":"Blarer Platz 8","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://w-e-k.de/index.php?menuid=115","opening_hours":"Montag - Samstag:      9:30 - 17:30 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","gastronomie","inklusiv"],"ratings":[],"license":"CC0-1.0","image_url":"https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fcdn.gastroguide.de%2Fbetrieb%2F153758%2Fgalerie%2Falbum%2Fuserphotos%2F56fe6b9e98e90_420x200.jpg&f=1&nofb=1&ipt=19c54e157c245df90d1a02ffe64d4dc9210e07fc0c2f1bac7a057954c3e6411e&ipo=images","image_link_url":null}]
        [{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]        
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect) 
    def test_query_19(self): # model is confused, this one is bad as it should be obvious!
        query = """
        [{"id":"abeb5c86b3d84cdc8dc80abeb8a27537","created":1674474036,"version":1,"title":"Café Kauz","description":"Café mit Süßem und Selbstgemachtem","lat":48.741500004539965,"lng":9.304300020124902,"street":"Bahnhofstraße 32","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://cafe-kauz.de/","opening_hours":"Mittwoch bis Freitag 09:00-17:00 Uhr, Sonn- und Feiertags 11:00-17:00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","fair","gastronomie","regional"],"ratings":[],"license":"CC0-1.0","image_url":"http://cafe-kauz.de/wp-content/gallery/gallerie/thumbs/thumbs_IMG_7016_DxO.jpg","image_link_url":null}]
        [{"id":"d47a7726c3064cc0adc67fb0639e08c1","created":1670334442,"version":0,"title":"Hier und Jetzt","description":"Taschen und Jacken selbst geschneidert aus Recycling-Materialien, Biowein und selbstgebrannter Schnaps, Cafe","lat":48.742899991917845,"lng":9.308200035853405,"street":"Rathausplatz 7","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"https://www.facebook.com/hierundjetztesslingen","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["kleidung","taschen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}] 
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect) 
    def test_query_20(self): # model is confused
        query = """
        [{"id":"ec50974283144111bd53ab3f15a1ee76","created":1674474693,"version":0,"title":"Café Sonne","description":"Cafe der Werkstätten Esslingen Kirchheim","lat":48.74030000937185,"lng":9.3099999657413,"street":"Blarer Platz 8","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://w-e-k.de/index.php?menuid=115","opening_hours":"Montag - Samstag:      9:30 - 17:30 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","gastronomie","inklusiv"],"ratings":[],"license":"CC0-1.0","image_url":"https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fcdn.gastroguide.de%2Fbetrieb%2F153758%2Fgalerie%2Falbum%2Fuserphotos%2F56fe6b9e98e90_420x200.jpg&f=1&nofb=1&ipt=19c54e157c245df90d1a02ffe64d4dc9210e07fc0c2f1bac7a057954c3e6411e&ipo=images","image_link_url":null}]
        [{"id":"abeb5c86b3d84cdc8dc80abeb8a27537","created":1674474036,"version":1,"title":"Café Kauz","description":"Café mit Süßem und Selbstgemachtem","lat":48.741500004539965,"lng":9.304300020124902,"street":"Bahnhofstraße 32","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://cafe-kauz.de/","opening_hours":"Mittwoch bis Freitag 09:00-17:00 Uhr, Sonn- und Feiertags 11:00-17:00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","fair","gastronomie","regional"],"ratings":[],"license":"CC0-1.0","image_url":"http://cafe-kauz.de/wp-content/gallery/gallerie/thumbs/thumbs_IMG_7016_DxO.jpg","image_link_url":null}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)                 
    def test_query_21(self):
        query = """
        [{"id":"8fcab61a2955488c9d30ea2069b0c954","created":1696251153,"version":2,"title":"Lin`s Unverpackt Coburg","description":"Bio, regional, zero waste","lat":50.28067403951691,"lng":10.923156743367741,"street":"Schloßberg","zip":"96450","city":"Coburg","country":"Deutschland","state":"Bayern","contact_name":null,"email":null,"telephone":null,"homepage":null,"opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","demeter","plastikfrei","regional","unverpackt","unverpacktladen","vegan","vegetarisch"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"1f784a9e249c436099d53533cb4a4268","created":1630508935,"version":0,"title":"Lin's Unverpackt Coburg","description":"Unverpackte Lebensmittel","lat":50.26069908415,"lng":10.965282679984943,"street":"Steinweg 10","zip":"96450","city":"Coburg","country":null,"state":null,"contact_name":null,"email":null,"telephone":"09561 7090188","homepage":"https://www.unverpackt-coburg.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-unverpackt","lebensmittel","unverpackt","unverpacktladen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)                 
    def test_query_22(self): # here it detects a duplicate but it is actually a different location, very hard to know
        query = """
        [{"id":"b4089de9ce2841ad999e5b1c734bcedb","created":1619610292,"version":10,"title":"Samstagsmarkt","description":"Wochenmarkt mit regionalen Erzeuger:innen. ","lat":51.325446167646646,"lng":12.330055838604483,"street":"Markranstädter Straße 8","zip":"04229","city":"Leipzig","country":"Deutschland","state":"Sachsen","contact_name":"Claudia Friedrich","email":"[email protected]","telephone":"0049 176 61264172","homepage":"https://www.samstagsmarkt.de/","opening_hours":"Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-regional-obst-gemüse-","direktvermarktung","essen-zum-mitnehmen","lebensmittel","saisonal","unverpackt","wandelkarte-leipzig-2021","wochenmarkt"],"ratings":[],"license":"CC0-1.0","image_url":"https://s18.directupload.net/images/210428/wckoeftl.jpg","image_link_url":null,"custom":[{"url":"https://www.facebook.com/Samstagsmarktleipzig","title":"Facebook","description":""},{"url":"https://www.instagram.com/samstagsmarkt/","title":"Instagram","description":""}]}]
        [{"id":"a0848d2f1557427a903d2fc7e83f188d","created":1619610230,"version":1,"title":"Freitagsmarkt","description":"Als Vorgeschmack auf unseren\nFreitags Apero laden wir herzlich ein\nzu unserem Markt zum Wochenausklang, mit Essensangebot zum\nMitnehmen. Jeden Freitag 14–18 Uhr\nin der Plagwitzer Markthalle","lat":51.32547634249808,"lng":12.329943185825806,"street":"Markranstädter Straße 8","zip":"04229","city":"Leipzig","country":"Deutschland","state":"Sachsen","contact_name":"Claudia Friedrich","email":"[email protected]","telephone":"0049 176 61264172","homepage":null,"opening_hours":"Fr 14:00-18:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-regional-obst-gemüse","direktvermarktung","essen-zum-mitnehmen","saisonal","unverpackt","wandelkarte-leipzig-2021","wochenmarkt"],"ratings":[],"license":"CC0-1.0","image_url":"https://s12.directupload.net/images/210428/auornqd4.jpg","image_link_url":null,"custom":[{"url":"https://www.facebook.com/Freitags-Apero-106588338246431","title":"Facebook","description":""},{"url":"https://www.instagram.com/freitagsapero/","title":"Instagram","description":""}]}]
        """
        expect = "different"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)      
    def test_query_23(self):
        query = """
        [{"id":"624c59be3e6547f3ac5e5cd115aa71c8","created":1676210518,"version":5,"title":"Plant Age eG","description":"Von unserem Acker in Frankfurt (Oder) beliefern wir Haushalte in Berlin, Potsdam & Frankfurt (Oder) wöchentlich mit saisonalem Gemüse aus biozyklisch-veganem Anbau. Sichere Dir jetzt Deine Gemüsekiste und werde Mitglied der GemüseGenossenschaft!","lat":52.29630000996231,"lng":14.467100004883063,"street":"Müllroser Chaussee 76C","zip":"15236","city":"Frankfurt (Oder)","country":"Deutschland","state":"Brandenburg","contact_name":null,"email":"[email protected]","telephone":"+49 335 50088473 oder +49 1575 1368476","homepage":"https://www.plantage.farm/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","biozyklisch-veganer-anbau","food","gemüse","gemüsekiste","lebensmittel","obst","refill","solawi","unverpackt","vegan","zerowaste"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
        [{"id":"84a682829bfc41d6a800213bbe38bfeb","created":1691345525,"version":2,"title":"PlantAge eG","description":"Deine Gemeinschaft für regionales Gemüse & Obst\n","lat":52.29630000996231,"lng":14.467100004883063,"street":"Müllroser Chaussee 76c","zip":"15236","city":"Frankfurt (Oder)","country":"Deutschland","state":"Brandenburg","contact_name":"Veronika Mair","email":"[email protected]","telephone":"0157 51368476","homepage":"https://www.plantage.farm/","opening_hours":"Mo-Do 9-17 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","ernährung","food","gemüse","genossenschaft","landwirtschaft","lebensmittel","obst","unverpackt","zerowaste"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.instagram.com/plantage.farm","title":"Instagram","description":null}]}]
        """
        expect = "duplicate"
        result = make_query(self._testMethodName, query, expect)
        self.assertEqual(result, expect)                         

if __name__ == '__main__':
    unittest.main()

I tried to return a float from the prompt to as with the discussed Similarity parameter f64 but turned out to not work at all. Needs further testing but for now this is good enough to get started!

@qknight
Copy link
Collaborator

qknight commented Oct 15, 2024

I've also played with Hamming distance and similar concepts but this does not work well. Problem is that "Cafe Lang" vs. "Cafe Lang Kurz" is hard to distiguish because there are two similar words but Cafe is actually a category and not a name.

@qknight qknight closed this as completed Oct 15, 2024
@wellemut
Copy link
Member

@qknight I see this issue as a concept description,

@wellemut wellemut added the epic label Oct 17, 2024
@wellemut wellemut reopened this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 In review
Development

No branches or pull requests

3 participants