-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discover similarity of entries and create summaries #20
Comments
The example used is a small shop which has 3 duplicates in the database:
I get really good results with llama3 7b and using this prompt:
|
Using this python code: # set env on windows powershell like:
# env TOKEN=sk-asdfasdfasdfasdfasfd
# localai is the framework
import requests
import json
import os
# Define the LocalAI endpoint (adjust host and port as needed)
url = "https://api.ai.rhw24.it/v1/chat/completions"
# Set up headers
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {os.environ.get('TOKEN')}"
}
role = """
from the two json documents: extract the values 'title' and 'description' and interprete the meaning to decide weather they are accidental duplicate entries on the map or are in fact distinct locations. ignore all other fields! assume the possibility like a user might have made a misstake when entering the values. assume the point of interest is in germany. do some reasoning and finally make your decision!
your expected answer:
finally ONLY output "duplicate" OR "different". DON'T OUTPUT ANYTHING ELSE LIKE REASONING, CODE, OR HOW YOU WOULD APPROACH THE PROBLEM.
"""
def query_llama(query: str) -> str:
"""
Makes a POST request to LocalAI's API with the given query and system instructions.
Returns:
Result
"""
data = {
"model": "meta-llama-3.1-8b-instruct-abliterated-GGUF", # Replace with the model you are using
"messages": [
{"role": "system", "content": role}, # Defines the system instructions
{"role": "user", "content": query} # The user's query
],
"max_tokens": 1000
}
response = requests.post(url, headers=headers, data=json.dumps(data))
if response.status_code == 200:
result = response.json()
for choice in result['choices']:
print(choice['message']['content'])
return choice['message']['content']
else:
print(f"Error: {response.status_code}, {response.text}")
return "fail"
query = """
[{"title":"Heimathafen Projekt - Unverpacktladen Café","description":""}]
[{"title":"Heimathafen Projekt - Unverpacktladen mit Café","description":""}]
"""
query_llama(query) We have this working now! Most of these tests work really great with that prompt: # python -m unittest test_llama3.py
import unittest
from llama3_request import query_llama # Import the function from llama3-request.py
import json
import os
import re
# File path for the JSON data
json_file_path = 'data.json'
# Initialize the data dictionary
data = {}
# Function to load the data from a JSON file
def load_data():
global data
if os.path.exists(json_file_path):
with open(json_file_path, 'r') as file:
data = json.load(file)
else:
data = {}
# Function to extract numerical part of the key
def extract_number(key):
match = re.search(r'\d+', key) # Finds the first sequence of digits in the key
return int(match.group()) if match else 0 # Returns the number as an integer
# Function to save the data into a JSON file
def save_data():
sorted_data = dict(sorted(data.items(), key=lambda x: extract_number(x[0]))) # Sorts the data numerically
with open(json_file_path, 'w') as file:
json.dump(sorted_data, file, indent=4)
def make_query(identifier, query, expect):
res = query_llama(query)
# Example of modifying the data structure
#identifier = 'task_1'
if identifier not in data:
data[identifier] = {'success': 0, 'failure': 0}
# Update success and failure counts
if res == expect:
data[identifier]['success'] += 1
else:
data[identifier]['failure'] += 1
return res
class TestQueryLlama(unittest.TestCase):
@classmethod
def setUpClass(cls):
print("Running setUpClass: Initialize resources")
load_data()
@classmethod
def tearDownClass(cls):
print("Running tearDownClass: Clean up resources")
save_data()
print(json.dumps(data, indent=4))
def test_query_1(self):
query = """
[{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"a9177e600c4a4693969c636c1168860b","created":1693295484,"version":6,"title":"Glas und Beutel ","description":" Fachgeschäft für unverpackte, regionale und biologische Waren des täglichen Bedarfs","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstrasse 10","zip":"72622","city":"72622 Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle-Kraiss","email":"[email protected]","telephone":"07022 2493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Tu-Fr 10:00-18:00; Th 09:00-19:00; Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":"https://sp-ao.shortpixel.ai/client/to_webp,q_lossless,ret_img/https://www.glasundbeutel.de/wp-content/uploads/2020/12/glas-beutel-unverpackt-einkaufen.jpg","image_link_url":"https://www.glasundbeutel.de/"}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_2(self):
query = """
[{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"496be01d33434b8784a717999eada2c2","created":1697543687,"version":0,"title":"Repair-Cafe des Tauschring Schmuttertal","description":"Wir reparieren und netzwerken für die REgion","lat":48.54760001345892,"lng":10.852499981807778,"street":"Schulweg 6","zip":"86405","city":"Meitingen","country":"Deutschland","state":"Bayern","contact_name":"Sandra Nentwich","email":"[email protected]","telephone":"08271 802652","homepage":null,"opening_hours":"2. Freitag / Monat, 15 - 17 Uhr (März, Juni, September, Dezember)","founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["austauschen","reparatur","tauschen","upcycling","werkstatt"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_3(self):
query = """
[{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"1510e0494cd2428e80ef7778d2ae3f59","created":1650980520,"version":0,"title":"Restaurant Treibgut","description":"Bieten ein hochwertiges, 100% lebensmittelechtes und langlebiges Mehrwegsystem an. Zudem viele lokale, regionale und selbstgemachte Lebensmittel wie Honig vom eigenen Hoteldach und Saft von eigenen Streuobstwiesen in der Umgebung.","lat":48.41200001463853,"lng":10.01260000747284,"street":"Friedrichsau 50","zip":"89073","city":"Ulm","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"https://hotel.lago-ulm.de/treibgut-restaurant-bar/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","gastronomie","lebensmittel","lokal","mehrweg","mehrwegsystem","nachhaltig","regional","relevo","restaurant","vegan","weltladen"],"ratings":[],"license":"CC0-1.0","image_url":"https://hotel.lago-ulm.de/wp-content/uploads/2019/03/galerie_59.jpg","image_link_url":null}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_4(self):
query = """
[{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"e049491d42294eb2b4aa89f48d9b4881","created":1723119462,"version":1,"title":"Werkstatthaus Stuttgart/Cafè Arg","description":"Die wunderbare Terrasse und das gemütliche Ambiente an diesem außergewöhnlich schönen Ort laden zum Entspannen ein. \n\nReparieren statt wegwerfen: in unserer offenen Werkstatt kannst Du defekte Alltags-, Elektro- und Gebrauchsgegenstände reparieren. Es gibt Werkzeuge und Unterstützung.\n","lat":48.779934844365314,"lng":9.193568625111864,"street":"Gerokstraße 7","zip":"Stuttgart","city":"Stuttgart","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":null,"homepage":"http://werkstatthaus.com/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["barrierefrei","cafe","circulareconomy","gastronomie","jetztklimachen","kreislaufwirtschaft","leitungswasser","refill","refill-station","reparatur","reparaturwerkstatt","reparieren","sharing","stuttgart","stuttgartrepariert","trinkwasser","upcycling","werkstatt","werkzeug"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_5(self):
query = """
[{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"b16eb76e04f9490a8c812bfb82f3f0b1","created":1569922172,"version":2,"title":"Café ViS A ViS","description":"Café in der Niklastorstraße","lat":48.94105765453589,"lng":9.258245234032275,"street":"Niklastorstraße 17","zip":"71672","city":"Marbach am Neckar","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":null,"opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["caf","cafe","leitungswasser","refill","refill-station","trinkwasser"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_6(self):
query = """
[{"id":"4c584b51f49f4981b9e8146cfa5ccb5b","created":1566295940,"version":0,"title":"Café Gustav","description":"Öffnungszeiten:\nDi - Fr: 07:30 - 22:00 \nSa: 09:00 - 19:00\nSo: 09:00 - 17:00","lat":48.77117944358437,"lng":9.157502888309537,"street":"Schwabstraße 47","zip":"70197","city":"Stuttgart","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://www.cafegustav.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["leitungswasser","refill","refill-station","trinkwasser"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"71560cc0c7214b69a472bcf8007d0812","created":1712861101,"version":0,"title":"Café Gustav","description":"Hei, bei uns könnt ihr nach neuer kleidung stöbern und dazu einen leckeren Capucchino in unserem Café genießen. Oder wie wäre ein spritzgetränk in unserem Hinterhof ? Wir freuen uns auf euren Besuch.\n\nEuer Gustav Ream","lat":48.77109998314227,"lng":9.157600034567341,"street":"Schwabstrasse","zip":"70197","city":"Stuttgart","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sophie Lechner","email":"[email protected]","telephone":"0711 48986002","homepage":"https://www.cafegustav.de/","opening_hours":"9 bis 19 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","gastronomie"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_7(self):
query = """
[{"id":"baad219b211d45c4b578bed03dfa1641","created":1709135693,"version":3,"title":"Ora d'Oro GmbH","description":"Unverpackt-Laden","lat":48.399400002509076,"lng":9.995300010775821,"street":"Breite Gasse 6","zip":"89073","city":"Ulm","country":"Germany","state":"Baden-Württemberg","contact_name":null,"email":"[email protected]","telephone":"","homepage":"https://www.oradoro.bio/","opening_hours":"Mo, Di & Fr: 9-18 Uhr, Mi: 8-18 Uhr, Do: 10-19 Uhr, Sa: 8-14 Uhr, Mo-Fr: 13-14 Uhr Mittagspause","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bildung","bio","food","gemüse","lebensmittel","leitungswasser","obst","refill","refill-station","schule","trinkwasser","unverpackt","zerowaste"],"ratings":["f11e47b86fdf4678a9d319d87cc31d7f"],"license":"CC0-1.0","image_url":"https://www.google.com/maps/uv?pb=!1s0x479967df2b8a27ff%3A0xbd9205d4d6788316!3m1!7e115!4shttps%3A%2F%2Flh5.googleusercontent.com%2Fp%2FAF1QipOZWfTDYqGj1IGYfhaAMp-8HotNRrZtjY3TdzhJ%3Dw213-h160-k-no!5sunverpackt%20laden%20ulm%20-%20Google%20Suche!15sCgIgAQ&imagekey=!1e10!2sAF1QipOZWfTDYqGj1IGYfhaAMp-8HotNRrZtjY3TdzhJ&hl=de&sa=X&ved=2ahUKEwir2dSS6ajxAhXigf0HHQLtBDMQoiowEnoECEUQAw","image_link_url":null,"custom":[{"url":"https://www.oradoro.bio/","title":null,"description":"Instagram"}]}]
[{"id":"6ee0555eb6a245b5bcd8eb3daf75c736","created":1696251178,"version":3,"title":"Ora d'Oro GmbH","description":"Ulms Unverpackt-Laden um nachhaltig und plastikfrei einkaufen zu können.","lat":48.39626978076821,"lng":9.993001776744146,"street":"Unter der Metzig 22","zip":"Ulm","city":"89073","country":null,"state":null,"contact_name":"Anthony Saad","email":"[email protected]","telephone":"0731-79083066","homepage":"https://www.oradoro.bio/","opening_hours":"Mo-Fr 10:00-18:00 Uhr/Sa 9:00-16:00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","regional","unverpackt","unverpackt-einkaufen","unverpacktladen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":"https://www.google.de/maps/uv?hl=de&pb=!1s0x479967df2b8a27ff%3A0xbd9205d4d6788316!3m1!7e115!4shttps%3A%2F%2Flh5.googleusercontent.com%2Fp%2FAF1QipN15kbBNLsl7xhlk4nvp8DMl6w0YeysbTSeqq9I%3Dw284-h160-k-no!5sklare%20kante%20ulm%20-%20Google-Suche!15sCgIgAQ&imagekey=!1e10!2sAF1QipN15kbBNLsl7xhlk4nvp8DMl6w0YeysbTSeqq9I&sa=X&ved=2ahUKEwjK8das5bjqAhXgURUIHVf-C1cQoiowCnoECBIQBg#"}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_8(self): # here it detects a duplicate but it is actually a different location i guess
query = """
[{"id":"b8ef1f462e844a5688745443db26a155","created":1685278420,"version":0,"title":"Bio Mäck Naturkosthandel Biolieferdienst","description":"Auf ca. 70 Hektar wird Futter für die Kühe, Getreide und Kartoffeln, sowie allerlei Obst und Gemüse produziert. Bio-Lieferdienst und Abo-Kisten.","lat":48.57335600935544,"lng":10.274127037392057,"street":"Schlossshof 8","zip":"89567","city":"Sontheim/Bergenweiler","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":"07325-6132","homepage":"https://www.biomaeck.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["biolandwirtschaft","landkreis-heidenheim","lieferdienst","skills-for-future"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"7387e6f6af4c464da754b25e378b0579","created":1524466772,"version":0,"title":"Bio-Mäck","description":"Bio-Händler, legt viel Wert auf Demeter und möglichst regional. Schwerpunkt Obst und Gemüse, aber komplettes Bio-Sortiment. Lieferung an Kindergärten, Schulen, Privatkunden und Firmen","lat":48.57298431385913,"lng":10.274852574930923,"street":"Weiherstrasse 1","zip":"89567","city":"Bergenweiler","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":"07325-6132","homepage":"https://www.biomaeck.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["abo","bioland","demeter","firmen","heidenheim","hofladen","kiste","laden","lieferdienst","ulm","wochenmarkt"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_9(self): # here it detects a duplicate but it is actually a different location i guess
query = """
[{"id":"205eadd7820c47f5be995e5b4f8bd92f","created":1720433937,"version":2,"title":"Erdapfel Cafe-Bistro","description":"Cafe und Bistro angrenzend zum dazugehörigen Bio-Supermarkt","lat":48.39626269806002,"lng":9.956258195432023,"street":"Ochsengasse 41","zip":"89077 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973160318320","homepage":"http://www.erdapfel-bio-bistro.de/","opening_hours":"Mo-Fr 9:00-15:00 Sa 9:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","bistro","cafe","restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"7adeec3942614cfca97eb828e6aafe12","created":1606141642,"version":0,"title":"Erdapfel Ulm","description":"Naturkost, Bio","lat":48.396250209024295,"lng":9.95546920688612,"street":"Schlösslesgasse 10","zip":"89077","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://www.erdapfel-naturkost.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-lebensmittel"],"ratings":["3bc1cb2731ac45778cca59552cd8e9bf"],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_10(self):
query = """
[{"id":"f129e1c03be24a3c9630c9152a87d394","created":1658842027,"version":6,"title":"Weltladen Dornheim","description":"Als Fachgeschäft des Fairen Handels bieten wir ein buntes Sortiment an fair gehandelten Lebensmitteln aus kontrolliert biologischem Anbau sowie Kunsthandwerksprodukte aus dem globalen Süden und schöne Upcyclingprodukte an.","lat":49.87689882045467,"lng":8.481560567618143,"street":"Gernsheimer Landstr. 1","zip":"64521","city":"Groß-Gerau","country":"Deutschland","state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"http://www.pdw-dornheim.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","einkaufen","fair","fairer-handel","fairerhandel","fairtrade","food","kaufen","klimaschutz","kreisgg","lebensmittel","nachhaltigkeit","schenken","weltladen"],"ratings":["3bc226681b8841b181b57bf4da153f3c","c6e6c28b89ab4e76b47dba4651884b3e"],"license":"CC0-1.0","image_url":"https://www.weltladen.de/site/assets/files/14222/","image_link_url":null}]
[{"id":"e83504ebdcb34d32a153f5cf1fd7c65a","created":1621840801,"version":2,"title":"Weltladen Dornheim","description":"Fachgeschäft des Fairen Handels\nVerein Partnerschaft Dritte Welt - Dornheim 1980 e. V.","lat":49.876933689171885,"lng":8.481460403875197,"street":"Gernsheimer Landstraße 1","zip":"64521","city":"Groß-Gerau","country":null,"state":null,"contact_name":null,"email":"[email protected]","telephone":"06152/57254","homepage":"https://www.pdw-dornheim.de/","opening_hours":"donnerstags 16 - 18 Uhr, samstags 9 - 12 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bildungsgerechtigkeit","bio","einewelt","entwicklungszusammenarbeit","fair","fairer-handel","fairtrade","geschenkideen","globales","kunsthandwerk","lebensmittel","umwelt","weltladen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_11(self):
query = """
[{"id":"0cee5e74c51844bf8ecb13f8a27db317","created":1579710894,"version":0,"title":"Glas&Beutel","description":"Unverpacktladen","lat":48.62639598018787,"lng":9.336465741198726,"street":"Marktstraße 10","zip":"72622","city":"Nürtingen","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://glasundbeutel.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["lebensmittel","unverpackt"],"ratings":["8c8e433e7ee24e969095829d90383954","837bbc877d6d4939a56c20c7e79fb06c"],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"16d60a7d34b64d30ad1dfc44aeb3ab18","created":1725105850,"version":1,"title":"Glas und Beutel Unverpackt ","description":"Unverpackt, Plastikfrei und Bio Einkaufen.","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstraße 10","zip":"72622","city":"Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle","email":"[email protected]","telephone":"07022493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Di, Mi, Fr: 10 - 13, - 14.30 - 18 Uhr, Do: 10 - 13, 14.30 - 19 Uhr, Sa. 10 - 13 Uhr.","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_12(self):
query = """
[{"id":"16d60a7d34b64d30ad1dfc44aeb3ab18","created":1725105850,"version":1,"title":"Glas und Beutel Unverpackt ","description":"Unverpackt, Plastikfrei und Bio Einkaufen.","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstraße 10","zip":"72622","city":"Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle","email":"[email protected]","telephone":"07022493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Di, Mi, Fr: 10 - 13, - 14.30 - 18 Uhr, Do: 10 - 13, 14.30 - 19 Uhr, Sa. 10 - 13 Uhr.","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"a9177e600c4a4693969c636c1168860b","created":1693295484,"version":6,"title":"Glas und Beutel ","description":" Fachgeschäft für unverpackte, regionale und biologische Waren des täglichen Bedarfs","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstrasse 10","zip":"72622","city":"72622 Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle-Kraiss","email":"[email protected]","telephone":"07022 2493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Tu-Fr 10:00-18:00; Th 09:00-19:00; Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":"https://sp-ao.shortpixel.ai/client/to_webp,q_lossless,ret_img/https://www.glasundbeutel.de/wp-content/uploads/2020/12/glas-beutel-unverpackt-einkaufen.jpg","image_link_url":"https://www.glasundbeutel.de/"}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_13(self):
query = """
[{"id":"4c584b51f49f4981b9e8146cfa5ccb5b","created":1566295940,"version":0,"title":"Café Gustav","description":"Öffnungszeiten:\nDi - Fr: 07:30 - 22:00 \nSa: 09:00 - 19:00\nSo: 09:00 - 17:00","lat":48.77117944358437,"lng":9.157502888309537,"street":"Schwabstraße 47","zip":"70197","city":"Stuttgart","country":null,"state":null,"contact_name":null,"email":null,"telephone":null,"homepage":"https://www.cafegustav.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["leitungswasser","refill","refill-station","trinkwasser"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"a9177e600c4a4693969c636c1168860b","created":1693295484,"version":6,"title":"Glas und Beutel ","description":" Fachgeschäft für unverpackte, regionale und biologische Waren des täglichen Bedarfs","lat":48.6264000035014,"lng":9.336400027077831,"street":"Marktstrasse 10","zip":"72622","city":"72622 Nürtingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":"Sibylle Scheuerle-Kraiss","email":"[email protected]","telephone":"07022 2493990","homepage":"https://www.glasundbeutel.de/","opening_hours":"Tu-Fr 10:00-18:00; Th 09:00-19:00; Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","food","gastronomie","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":"https://sp-ao.shortpixel.ai/client/to_webp,q_lossless,ret_img/https://www.glasundbeutel.de/wp-content/uploads/2020/12/glas-beutel-unverpackt-einkaufen.jpg","image_link_url":"https://www.glasundbeutel.de/"}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_14(self):
query = """
[{"id":"a4c5ad3a73714b1eb08d7c08ec16287b","created":1692179473,"version":0,"title":"Whisky & Spirituosen Manufaktur Hercynian Distilling Co. / Hammerschmiede ehem. Glen Els","description":"Das Familienunternehmen Hercynian Distilling Co. / Hammerschmiede stellt Whisky, Gin & Spiritosen hand-made unter Verwendung von regionalen Rohstoffen & Bergquellwasser her. Herzlich Willkommen: im Shop oder während einer Tour durch die Produktion. ","lat":51.629499998702435,"lng":10.633299970362941,"street":"Elsbach 11 A","zip":"37445","city":"Walkenried / Zorge","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"05586-8282","homepage":"https://www.hercynian-distilling.de/","opening_hours":"Di-Fr 10.00 h-17.00h und Sa 10.00h bis 15.00 h","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","food","gemüse","lebensmittel","obst"],"ratings":["4ab2715bb1934fc7a1728711604929a0"],"license":"CC0-1.0","image_url":"https://www.hercynian-distilling.de/","image_link_url":null,"custom":[{"url":"https://www.facebook.com/hashtag/hercyniandistillingcompany/","title":"facebook: hercyniandistilling","description":null},{"url":"https://www.instagram.com/hercyniandistilling/?hl=de","title":"Instagramm Hercynian Distilling","description":null},{"url":"https://www.instagram.com/mistress.of.distilling/reels/","title":"Instagramm Mistress of Distilling","description":null}]}]
[{"id":"2b80abca3b224b369798e8dc1b2b85ba","created":1692178575,"version":0,"title":"Whisky & Spirituosen Manufaktur Hercynian Distilling Co. / Hammerschmiede ehem. Glen Els","description":"Das Familienunternehmen produziert Spirituosen, Gin & Whisky hand-made aus Quellwasser. Regionaler Bezug und Handwerkskunst sind uns wichtig.\nHerzlich willkommen: entweder bei uns im Shop oder während einer Führung durch die Produktion. ","lat":51.629499998702435,"lng":10.633299970362941,"street":"Elsbach 11 A","zip":"37445","city":"Walkenried / Zorge","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"05586-8282","homepage":"https://www.hercynian-distilling.de/","opening_hours":"siehe Homepage","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","food","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.facebook.com/hashtag/hercyniandistillingcompany/","title":"facebook: hercyniandistilling","description":null},{"url":"https://www.instagram.com/hercyniandistilling/?hl=de","title":"#hercyniandistilling","description":null},{"url":"https://www.instagram.com/mistress.of.distilling/","title":"#mistress.of.distilling","description":null}]}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_15(self): # this should be a duplicate but the range between the locations is like 100m+
query = """
[{"id":"7f23ef061761405ab77bffb7caa1cb90","created":1714903675,"version":6,"title":"Heimathafen Projekt - Unverpacktladen Café","description":"Nachhaltiges, faires und regionales Angebot für alle ","lat":48.94666648886477,"lng":9.241581423786274,"street":"Dengelberg 1","zip":"Benningen","city":"71726","country":null,"state":null,"contact_name":"Maya und Ralf Esch","email":"[email protected]","telephone":"+49 (0) 7144 1309450","homepage":"https://www.heimathafen-projekt.de/","opening_hours":"Di-Do: 9-18 Uhr Fr: 9-19 Uhr, Do + Sa: 8-14 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","fair","ganzheitliche-gesundheit","gesund","hygieneprodukte","leitungswasser","nachhaltig","naturkosmetik","plastikfrei","prävention","refill","refill-station","regional","regionale-lebensmittel","reinigungsmittel","restaurant","trinkwasser","unverpackt","vegan","vegetarisch"],"ratings":["56e3d03b29734b0fb523edf7969ddbd6"],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.facebook.com/sloworld.org","title":"Heimathafen Projekt","description":"#gesundshoppen #gradido"}]}]
[{"id":"a79b810b23b6413e8da28c81042e9d69","created":1633597545,"version":5,"title":"Heimathafen Projekt - Unverpacktladen mit Café","description":"Nachhaltiges, faires und naturnahes Angebot für Alle","lat":48.95221195600564,"lng":9.226970676904065,"street":"Dengelberg 1","zip":"71726","city":"Benningen","country":null,"state":null,"contact_name":"Maya und Ralf Esch","email":"[email protected]","telephone":"07144-1309450","homepage":"https://heimathafen-projekt.de/","opening_hours":"Di-Fr: 9.00 - 18 Uhr, Do + Sa: 9.00 - 13.00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-regional-obst-gemüse-","biorestaurant","cafe","fair","ganzheitliche-gesundheit","gesundheit","hygieneprodukte","lebensmittel","mietpaten","mittagstisch","naturkosmetik","plastikfrei","prävention","refill","refill-station","reinigungsmittel","restaurant","trinkwasser","umweltschutz","wasserfilter"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"http://www.heimathafen-projekt.de/","title":"Zur Website des heimathafen projekts","description":"#gesundshoppen"}]}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_16(self):
query = """
[{"id":"8ec7447311e04234ba4aac69c20dbe6f","created":1689333966,"version":0,"title":"Solawi-Hannover | Depot Stadteil Burg / Paul Dorhmann Schule","description":"\"Die Solawi Hannover ist eine sich selbst tragende Gemeinschaft. Wir setzen uns für eine nachhaltige, ökologische und wirtschaftliche Nahrungsmittelerzeugung ein, die auf Solidarität und Regionalität beruht. Dabei steht das Tun in und mit der Natur stets im Mittelpunkt und wir suchen immer nach klugen, natürlichen und pragmatischen Lösungen. Nachhaltigkeit und Gemeinschaftlichkeit sind unsere gemeinsamen Ziele auf allen Ebenen.\nIn der Solawi Hannover bringen wir zusammen, was vor langer Zeit getrennt wurde: der Anbau vereint sich mit dem Konsumenten und schafft eine resiliente Einheit.\"","lat":52.37360001652203,"lng":9.730999995829071,"street":"Burgstraße 5","zip":"30159","city":"Hannover","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"0171-8334486","homepage":"https://solawi-hannover.de/","opening_hours":null,"founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bio-regional-obst-gemüse-","food","gemüse","lebensmittel","obst","regional","regionale-lebensmittel","regionale-produkte","solawi","solidarische-landwirtschaft","umweltbewussternähren","unverpackt"],"ratings":[],"license":"CC0-1.0","image_url":"https://solawi-hannover.de/wp-content/uploads/2023/01/Logo_Solawi_Hannover-1024x171.jpg","image_link_url":"https://solawi-hannover.de/"}]
[{"id":"972182a318ac47d69e0a7c438db33197","created":1689336806,"version":2,"title":"SoLawi Gut Adolphshof - Depot Nordstadt","description":"Wir Mitlandwirt*innen, ermöglichen gemeinsam mit den Landwirten vom Gut Adolphshof eine besondere Form der regionalen ökologischen Landwirtschaft. Gemeinsam und solidarisch kümmern wir uns um die Belange des Hofes und sorgen für ein von allen getragenes, transparentes und nachhaltiges Wirtschaften.","lat":52.38699999749055,"lng":9.71839998369962,"street":"Klaus-Müller-Kilian-Weg","zip":"30167","city":"Hannover","country":"Deutschland","state":"Niedersachsen","contact_name":"Dominik Günderoth","email":"[email protected]","telephone":"05175 6308","homepage":"https://solawi-gut-adolphshof.de/","opening_hours":"Donnerstag Vormittag","founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["backwaren","bio","bio-regional-obst-gemüse-","brot","eier","fleisch","food","gemüse","lebensmittel","obst","regional","regionale-lebensmittel","regionale-produkte","solawi","solidarische-landwirtschaft","umweltbewussternähren","unverpackt"],"ratings":[],"license":"CC0-1.0","image_url":"https://solawi-gut-adolphshof.de/wp-content/uploads/2022/05/cropped-hof-logo.png","image_link_url":"https://solawi-gut-adolphshof.de/","custom":[{"url":"https://www.instagram.com/solawi_gutadolphshof/?hl=de","title":null,"description":null}]}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_17(self): # description: null multiple times might be confusing for llama
query = """
[{"id":"8ec7447311e04234ba4aac69c20dbe6f","created":1689333966,"version":0,"title":"Solawi-Hannover | Depot Stadteil Burg / Paul Dorhmann Schule","description":"\"Die Solawi Hannover ist eine sich selbst tragende Gemeinschaft. Wir setzen uns für eine nachhaltige, ökologische und wirtschaftliche Nahrungsmittelerzeugung ein, die auf Solidarität und Regionalität beruht. Dabei steht das Tun in und mit der Natur stets im Mittelpunkt und wir suchen immer nach klugen, natürlichen und pragmatischen Lösungen. Nachhaltigkeit und Gemeinschaftlichkeit sind unsere gemeinsamen Ziele auf allen Ebenen.\nIn der Solawi Hannover bringen wir zusammen, was vor langer Zeit getrennt wurde: der Anbau vereint sich mit dem Konsumenten und schafft eine resiliente Einheit.\"","lat":52.37360001652203,"lng":9.730999995829071,"street":"Burgstraße 5","zip":"30159","city":"Hannover","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"0171-8334486","homepage":"https://solawi-hannover.de/","opening_hours":null,"founded_on":null,"categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bio-regional-obst-gemüse-","food","gemüse","lebensmittel","obst","regional","regionale-lebensmittel","regionale-produkte","solawi","solidarische-landwirtschaft","umweltbewussternähren","unverpackt"],"ratings":[],"license":"CC0-1.0","image_url":"https://solawi-hannover.de/wp-content/uploads/2023/01/Logo_Solawi_Hannover-1024x171.jpg","image_link_url":"https://solawi-hannover.de/"}]
[{"id":"2b80abca3b224b369798e8dc1b2b85ba","created":1692178575,"version":0,"title":"Whisky & Spirituosen Manufaktur Hercynian Distilling Co. / Hammerschmiede ehem. Glen Els","description":"Das Familienunternehmen produziert Spirituosen, Gin & Whisky hand-made aus Quellwasser. Regionaler Bezug und Handwerkskunst sind uns wichtig.\nHerzlich willkommen: entweder bei uns im Shop oder während einer Führung durch die Produktion. ","lat":51.629499998702435,"lng":10.633299970362941,"street":"Elsbach 11 A","zip":"37445","city":"Walkenried / Zorge","country":"Deutschland","state":"Niedersachsen","contact_name":null,"email":"[email protected]","telephone":"05586-8282","homepage":"https://www.hercynian-distilling.de/","opening_hours":"siehe Homepage","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","food","gemüse","lebensmittel","obst"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.facebook.com/hashtag/hercyniandistillingcompany/","title":"facebook: hercyniandistilling","description":null},{"url":"https://www.instagram.com/hercyniandistilling/?hl=de","title":"#hercyniandistilling","description":null},{"url":"https://www.instagram.com/mistress.of.distilling/","title":"#mistress.of.distilling","description":null}]}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_18(self):
query = """
[{"id":"ec50974283144111bd53ab3f15a1ee76","created":1674474693,"version":0,"title":"Café Sonne","description":"Cafe der Werkstätten Esslingen Kirchheim","lat":48.74030000937185,"lng":9.3099999657413,"street":"Blarer Platz 8","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://w-e-k.de/index.php?menuid=115","opening_hours":"Montag - Samstag: 9:30 - 17:30 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","gastronomie","inklusiv"],"ratings":[],"license":"CC0-1.0","image_url":"https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fcdn.gastroguide.de%2Fbetrieb%2F153758%2Fgalerie%2Falbum%2Fuserphotos%2F56fe6b9e98e90_420x200.jpg&f=1&nofb=1&ipt=19c54e157c245df90d1a02ffe64d4dc9210e07fc0c2f1bac7a057954c3e6411e&ipo=images","image_link_url":null}]
[{"id":"97ea868904d64c888564eef83b236133","created":1622839667,"version":0,"title":"Cafe Einstein","description":"Modernes Lokal serviert bis 23 Uhr innovative Frühstückskreationen sowie schwäbische Kost, Steaks und Burger","lat":48.40281898081434,"lng":10.00250928569702,"street":"Wichernstraße","zip":"89073 ","city":"Ulm","country":null,"state":null,"contact_name":null,"email":null,"telephone":"+4973125661","homepage":"http://www.cafeeinstein.de/","opening_hours":"So-Do 8:00-23:00, Fr-Sa 8:00-0:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["restaurant"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_19(self): # model is confused, this one is bad as it should be obvious!
query = """
[{"id":"abeb5c86b3d84cdc8dc80abeb8a27537","created":1674474036,"version":1,"title":"Café Kauz","description":"Café mit Süßem und Selbstgemachtem","lat":48.741500004539965,"lng":9.304300020124902,"street":"Bahnhofstraße 32","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://cafe-kauz.de/","opening_hours":"Mittwoch bis Freitag 09:00-17:00 Uhr, Sonn- und Feiertags 11:00-17:00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","fair","gastronomie","regional"],"ratings":[],"license":"CC0-1.0","image_url":"http://cafe-kauz.de/wp-content/gallery/gallerie/thumbs/thumbs_IMG_7016_DxO.jpg","image_link_url":null}]
[{"id":"d47a7726c3064cc0adc67fb0639e08c1","created":1670334442,"version":0,"title":"Hier und Jetzt","description":"Taschen und Jacken selbst geschneidert aus Recycling-Materialien, Biowein und selbstgebrannter Schnaps, Cafe","lat":48.742899991917845,"lng":9.308200035853405,"street":"Rathausplatz 7","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"https://www.facebook.com/hierundjetztesslingen","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["kleidung","taschen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_20(self): # model is confused
query = """
[{"id":"ec50974283144111bd53ab3f15a1ee76","created":1674474693,"version":0,"title":"Café Sonne","description":"Cafe der Werkstätten Esslingen Kirchheim","lat":48.74030000937185,"lng":9.3099999657413,"street":"Blarer Platz 8","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://w-e-k.de/index.php?menuid=115","opening_hours":"Montag - Samstag: 9:30 - 17:30 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["cafe","gastronomie","inklusiv"],"ratings":[],"license":"CC0-1.0","image_url":"https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fcdn.gastroguide.de%2Fbetrieb%2F153758%2Fgalerie%2Falbum%2Fuserphotos%2F56fe6b9e98e90_420x200.jpg&f=1&nofb=1&ipt=19c54e157c245df90d1a02ffe64d4dc9210e07fc0c2f1bac7a057954c3e6411e&ipo=images","image_link_url":null}]
[{"id":"abeb5c86b3d84cdc8dc80abeb8a27537","created":1674474036,"version":1,"title":"Café Kauz","description":"Café mit Süßem und Selbstgemachtem","lat":48.741500004539965,"lng":9.304300020124902,"street":"Bahnhofstraße 32","zip":"73728","city":"Esslingen","country":"Deutschland","state":"Baden-Württemberg","contact_name":null,"email":null,"telephone":null,"homepage":"http://cafe-kauz.de/","opening_hours":"Mittwoch bis Freitag 09:00-17:00 Uhr, Sonn- und Feiertags 11:00-17:00 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","cafe","fair","gastronomie","regional"],"ratings":[],"license":"CC0-1.0","image_url":"http://cafe-kauz.de/wp-content/gallery/gallerie/thumbs/thumbs_IMG_7016_DxO.jpg","image_link_url":null}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_21(self):
query = """
[{"id":"8fcab61a2955488c9d30ea2069b0c954","created":1696251153,"version":2,"title":"Lin`s Unverpackt Coburg","description":"Bio, regional, zero waste","lat":50.28067403951691,"lng":10.923156743367741,"street":"Schloßberg","zip":"96450","city":"Coburg","country":"Deutschland","state":"Bayern","contact_name":null,"email":null,"telephone":null,"homepage":null,"opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","demeter","plastikfrei","regional","unverpackt","unverpacktladen","vegan","vegetarisch"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"1f784a9e249c436099d53533cb4a4268","created":1630508935,"version":0,"title":"Lin's Unverpackt Coburg","description":"Unverpackte Lebensmittel","lat":50.26069908415,"lng":10.965282679984943,"street":"Steinweg 10","zip":"96450","city":"Coburg","country":null,"state":null,"contact_name":null,"email":null,"telephone":"09561 7090188","homepage":"https://www.unverpackt-coburg.de/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-unverpackt","lebensmittel","unverpackt","unverpacktladen"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_22(self): # here it detects a duplicate but it is actually a different location, very hard to know
query = """
[{"id":"b4089de9ce2841ad999e5b1c734bcedb","created":1619610292,"version":10,"title":"Samstagsmarkt","description":"Wochenmarkt mit regionalen Erzeuger:innen. ","lat":51.325446167646646,"lng":12.330055838604483,"street":"Markranstädter Straße 8","zip":"04229","city":"Leipzig","country":"Deutschland","state":"Sachsen","contact_name":"Claudia Friedrich","email":"[email protected]","telephone":"0049 176 61264172","homepage":"https://www.samstagsmarkt.de/","opening_hours":"Sa 09:00-14:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-regional-obst-gemüse-","direktvermarktung","essen-zum-mitnehmen","lebensmittel","saisonal","unverpackt","wandelkarte-leipzig-2021","wochenmarkt"],"ratings":[],"license":"CC0-1.0","image_url":"https://s18.directupload.net/images/210428/wckoeftl.jpg","image_link_url":null,"custom":[{"url":"https://www.facebook.com/Samstagsmarktleipzig","title":"Facebook","description":""},{"url":"https://www.instagram.com/samstagsmarkt/","title":"Instagram","description":""}]}]
[{"id":"a0848d2f1557427a903d2fc7e83f188d","created":1619610230,"version":1,"title":"Freitagsmarkt","description":"Als Vorgeschmack auf unseren\nFreitags Apero laden wir herzlich ein\nzu unserem Markt zum Wochenausklang, mit Essensangebot zum\nMitnehmen. Jeden Freitag 14–18 Uhr\nin der Plagwitzer Markthalle","lat":51.32547634249808,"lng":12.329943185825806,"street":"Markranstädter Straße 8","zip":"04229","city":"Leipzig","country":"Deutschland","state":"Sachsen","contact_name":"Claudia Friedrich","email":"[email protected]","telephone":"0049 176 61264172","homepage":null,"opening_hours":"Fr 14:00-18:00","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio-regional-obst-gemüse","direktvermarktung","essen-zum-mitnehmen","saisonal","unverpackt","wandelkarte-leipzig-2021","wochenmarkt"],"ratings":[],"license":"CC0-1.0","image_url":"https://s12.directupload.net/images/210428/auornqd4.jpg","image_link_url":null,"custom":[{"url":"https://www.facebook.com/Freitags-Apero-106588338246431","title":"Facebook","description":""},{"url":"https://www.instagram.com/freitagsapero/","title":"Instagram","description":""}]}]
"""
expect = "different"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
def test_query_23(self):
query = """
[{"id":"624c59be3e6547f3ac5e5cd115aa71c8","created":1676210518,"version":5,"title":"Plant Age eG","description":"Von unserem Acker in Frankfurt (Oder) beliefern wir Haushalte in Berlin, Potsdam & Frankfurt (Oder) wöchentlich mit saisonalem Gemüse aus biozyklisch-veganem Anbau. Sichere Dir jetzt Deine Gemüsekiste und werde Mitglied der GemüseGenossenschaft!","lat":52.29630000996231,"lng":14.467100004883063,"street":"Müllroser Chaussee 76C","zip":"15236","city":"Frankfurt (Oder)","country":"Deutschland","state":"Brandenburg","contact_name":null,"email":"[email protected]","telephone":"+49 335 50088473 oder +49 1575 1368476","homepage":"https://www.plantage.farm/","opening_hours":null,"founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","biozyklisch-veganer-anbau","food","gemüse","gemüsekiste","lebensmittel","obst","refill","solawi","unverpackt","vegan","zerowaste"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null}]
[{"id":"84a682829bfc41d6a800213bbe38bfeb","created":1691345525,"version":2,"title":"PlantAge eG","description":"Deine Gemeinschaft für regionales Gemüse & Obst\n","lat":52.29630000996231,"lng":14.467100004883063,"street":"Müllroser Chaussee 76c","zip":"15236","city":"Frankfurt (Oder)","country":"Deutschland","state":"Brandenburg","contact_name":"Veronika Mair","email":"[email protected]","telephone":"0157 51368476","homepage":"https://www.plantage.farm/","opening_hours":"Mo-Do 9-17 Uhr","founded_on":null,"categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","ernährung","food","gemüse","genossenschaft","landwirtschaft","lebensmittel","obst","unverpackt","zerowaste"],"ratings":[],"license":"CC0-1.0","image_url":null,"image_link_url":null,"custom":[{"url":"https://www.instagram.com/plantage.farm","title":"Instagram","description":null}]}]
"""
expect = "duplicate"
result = make_query(self._testMethodName, query, expect)
self.assertEqual(result, expect)
if __name__ == '__main__':
unittest.main() I tried to return a float from the prompt to as with the discussed |
I've also played with Hamming distance and similar concepts but this does not work well. Problem is that "Cafe Lang" vs. "Cafe Lang Kurz" is hard to distiguish because there are two similar words but Cafe is actually a category and not a name. |
@qknight I see this issue as a concept description, |
We need a mechanism that recognizes common entries from different data sources with different data quality and combines them.
Starting situation
We assume that we have already translated data from various platforms into the FairSync protocol format.
The input is therefore an unsorted vector of
Record
s.1. Identifying similarities
First of all, we need to find groups of entries that are somehow similar:
Physical Distance
The most obvious recognition attribute is the coordinates (lat/lng) of an entry. However, not all entries have coordinates,
which is why other properties must be included.So we first have to filter all entries for those with a valid locationHamming distance of Title
With the Hamming Distance #22 we calculate the difference of the title. The same title would be zero.
The longer a title is and the more different letters it has, the higher is the number.
Hamming Distance of the Mailadress
Hamming Distance of Webseite
Distance of Tags
We need a multy dimensional field of all our hashtags and how similar their are. The more freuquent they appear together, the close they appear together
2. Rate quality
Then we have to decide which entry has the highest data quality:
The rating may vary depending on the individual criteria.
For example, a platform could always classify its own data (recognizable by the
origin
property) as the highest quality.3. Rate similarity
Promt KI to define its uniqueness
Use the following promts
if yes, merge it as one entry
if not, create a new entry
4. Create summary or show differences
In the last step we have to create a summary or show the differences, if there are too littel similarities, for a manual recheck/ result-check
At this point, the user may wish to define a threshold value for the level of similarity at which other entries should be included in the summary, but this should be done before calling this function
(e.g.
let others = recods.filter(|(_,similarity)|similarity > my_threshold)
).Example
WeltCafé @ Stuttgart Mitte
The text was updated successfully, but these errors were encountered: