prompt_engineering_dump #5

andre-scheinwald · 2025-02-18T22:56:39Z

.csv file is one of the iterations of the prompt engineering code I Was attempting. Will probably have to double check this one. Some of it looked good. Don't remember if there was an issue or I wanted to add more.

.py file is the file that contains the prompt engineering code. Line 242 and below is what I was working on more recently.

.csv file is one of the iterations of the prompt engineering code I Was attempting. Will probably have to double check this one. Some of it looked good. Don't remember if there was an issue or I wanted to add more. .py file is the file that contains the prompt engineering code. Line 242 and below is what I was working on more recently.

rosewangrmi · 2025-02-19T14:11:02Z

@andre-scheinwald There are 3 versions of code in the dump. Which version works the best? Please clean up and keep only the version that produces the best results.

rosewangrmi

@andre-scheinwald Please clean up the code and keep the best version that works. Next steps I would try to improve the performance are: 1) label bad results and find under what situation those bad results occur; 2) loop through individual documents and see if that helps with improving performance. I doubt as I did not see cross-document hallucination, just an idea. The end goal of this exercise is to create a prompt that generates the best possible results with minimal manual correction.

rosewangrmi · 2025-02-19T14:19:02Z

brazil_landfill_name_and_coords_extraction.py

+   'gas_capture': True,
+   'gas_flare': True,
+   'gas_to_energy_project': True,
+   'coordinates': {'latitude': -22.82601389, 'longitude': -42.05100556},


I noticed from the results csv, when the original coordinates are in the format of 22º49’37.07’’ S, the translated coordinates are not accurate. You may want to add more in the prompt to address this. For example, you can ask it to take several steps to get to a location: 1) first extract coordinates; 2) translate coordinates into degrees.

rosewangrmi · 2025-02-19T14:24:25Z

brazil_landfill_name_and_coords_extraction.py

+    messages=[
+      {
+        "role": "user",
+        "content": """Please review all files and answer the following questions for every single file: What is the landfill name, location (region, city, and country),


Did you try looping through individual documents to see if that can produce a better result?

rosewangrmi · 2025-02-19T14:32:38Z

brazil_landfill_name_and_coords_extraction.py

+
+df3 = pd.concat([df, df2], ignore_index=True)
+
+df3.to_csv(r'C:\Users\andre.scheinwald\OneDrive - RMI\Documents\Python Scripts\cdm_scraping\brazil_landfill_name_and_coords_extraction.csv', index=False)


Please spend some time reviewing the results and label bad results to find plausible causes for the bad results. For example, if I think unit conversion could be a reason that AI does not produce the coordinates in the format I want, I will help AI break down the steps or provide more examples to get the right results. Prompt engineering is a trial and error iterative process. Every bad result is an opportunity (gold mine) to help us improve the prompt.

Manually went through the pdf documents and compared the results of the prompt engineering to the information in the extraction.csv. Created a new file called extraction.xlsx which records correct and incorrect data, as well as highlights unverified data. Corrections are stored as notes in this file. Added an additional column to flag files that can't be verified. Then generated accents.csv. Which is the final file to use for lining up existing facilities in the db to what we have here. It has the corrections and properly uses ansi format to preserve accent marks.

andre-scheinwald requested a review from rosewangrmi February 18, 2025 22:56

rosewangrmi requested changes Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt_engineering_dump #5

prompt_engineering_dump #5

andre-scheinwald commented Feb 18, 2025

rosewangrmi commented Feb 19, 2025

rosewangrmi left a comment •

edited

Loading

rosewangrmi Feb 19, 2025

rosewangrmi Feb 19, 2025

rosewangrmi Feb 19, 2025


		df3 = pd.concat([df, df2], ignore_index=True)

		df3.to_csv(r'C:\Users\andre.scheinwald\OneDrive - RMI\Documents\Python Scripts\cdm_scraping\brazil_landfill_name_and_coords_extraction.csv', index=False)

prompt_engineering_dump #5

Are you sure you want to change the base?

prompt_engineering_dump #5

Conversation

andre-scheinwald commented Feb 18, 2025

rosewangrmi commented Feb 19, 2025

rosewangrmi left a comment • edited Loading

Choose a reason for hiding this comment

rosewangrmi Feb 19, 2025

Choose a reason for hiding this comment

rosewangrmi Feb 19, 2025

Choose a reason for hiding this comment

rosewangrmi Feb 19, 2025

Choose a reason for hiding this comment

rosewangrmi left a comment •

edited

Loading