-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prompt_engineering_dump #5
base: main
Are you sure you want to change the base?
Conversation
.csv file is one of the iterations of the prompt engineering code I Was attempting. Will probably have to double check this one. Some of it looked good. Don't remember if there was an issue or I wanted to add more. .py file is the file that contains the prompt engineering code. Line 242 and below is what I was working on more recently.
@andre-scheinwald There are 3 versions of code in the dump. Which version works the best? Please clean up and keep only the version that produces the best results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andre-scheinwald Please clean up the code and keep the best version that works. Next steps I would try to improve the performance are: 1) label bad results and find under what situation those bad results occur; 2) loop through individual documents and see if that helps with improving performance. I doubt as I did not see cross-document hallucination, just an idea. The end goal of this exercise is to create a prompt that generates the best possible results with minimal manual correction.
'gas_capture': True, | ||
'gas_flare': True, | ||
'gas_to_energy_project': True, | ||
'coordinates': {'latitude': -22.82601389, 'longitude': -42.05100556}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed from the results csv, when the original coordinates are in the format of 22º49’37.07’’ S, the translated coordinates are not accurate. You may want to add more in the prompt to address this. For example, you can ask it to take several steps to get to a location: 1) first extract coordinates; 2) translate coordinates into degrees.
messages=[ | ||
{ | ||
"role": "user", | ||
"content": """Please review all files and answer the following questions for every single file: What is the landfill name, location (region, city, and country), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try looping through individual documents to see if that can produce a better result?
|
||
df3 = pd.concat([df, df2], ignore_index=True) | ||
|
||
df3.to_csv(r'C:\Users\andre.scheinwald\OneDrive - RMI\Documents\Python Scripts\cdm_scraping\brazil_landfill_name_and_coords_extraction.csv', index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please spend some time reviewing the results and label bad results to find plausible causes for the bad results. For example, if I think unit conversion could be a reason that AI does not produce the coordinates in the format I want, I will help AI break down the steps or provide more examples to get the right results. Prompt engineering is a trial and error iterative process. Every bad result is an opportunity (gold mine) to help us improve the prompt.
Manually went through the pdf documents and compared the results of the prompt engineering to the information in the extraction.csv. Created a new file called extraction.xlsx which records correct and incorrect data, as well as highlights unverified data. Corrections are stored as notes in this file. Added an additional column to flag files that can't be verified. Then generated accents.csv. Which is the final file to use for lining up existing facilities in the db to what we have here. It has the corrections and properly uses ansi format to preserve accent marks.
.csv file is one of the iterations of the prompt engineering code I Was attempting. Will probably have to double check this one. Some of it looked good. Don't remember if there was an issue or I wanted to add more.
.py file is the file that contains the prompt engineering code. Line 242 and below is what I was working on more recently.