-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upserting capabilities added at creation #108
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but can we split this into two different commands: create and update.
- create: clears the database and then creates everything
- update: only attempts to insert (or update) the data.
I think this should just be added as a command line option to the script.
for grant_query in grant_privileges: | ||
conn.execute(grant_query) | ||
|
||
def create_or_upsert_table(self, table_name: str, df: pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also have unit testing for this method?
I have added a function that will upsert tables when making them in the create script. Instead of dropping the database if it exists each time the script now does an upsert (update or insert) using the on_conflict_do_update command from sqlalchemy. I have tested it by changing the data in the
\tests\mock_data\index\
folder, initially only including half the shots, and then including all to see if the tables change as expected.What I am not sure about is in line 215
shot_metadata = shot_metadata.replace(np.nan, None)
I have changed nan to None as I kept getting an error (psycopg2.errors.NumericValueOutOfRange: integer out of range
) that column values were wrong for the schema and looking online it seems that Postgres doesn't equate nan to None. I am not sure why this doesn't break the previous version though as the schema and data is identical so I am a bit suspicious of this line.