-
Notifications
You must be signed in to change notification settings - Fork 0
update_one
The update_one.sh script drives the syncing of the CalHHS data to my database.
It uses scripts with certain names to drive parts of the process. It also looks in the meta database to decide how to do some things.
The "special" scripts are:
fetch_special_before.sh
fetch_special_after.sh
fetch_special_instead.sh
exec_special_before.sh
exec_special_after.sh
exec_special_instead.sh
The logic of these scripts is as follows. If there is a csv file in the dataset and the sources entry for that file has the auto_run flag set to 1, then I automatically create a table for the csv file. If there is a xlsx file in the dataset and the sources entry for that file has the auto_run flag set to 1, then I automatically create a table for the xlsx file. If I want to prevent this handling of the csv files and xlsx files, then I have a fetch_special_instead.sh script and the update process executes that script instead. If I have a fetch_special_before.sh script, then that script is executed before either the automatic or "instead"-based fetching. If I have a fetch_special_after.sh script, then that script is executed after both the automatic and "instead"-based fetching. The "exec" scripts operate in the same way vis a vis the automatic importation of the csv and xlsx files into tables.
This script also does some administrivia.
It updates the deets.sh file before the fetch. The values in this change surprisingly often.
It updates the "updates" table, noting the success or failure of the download/import process and the time.
It executes the "update_sources.sh" script. This script is still a work in process. I want to keep track of all files currently associated with a dataset. Problems arise when the files change. The name of the zip file that contained all of the data changes often, but that file is not tracked. At times, other files change. At the beginning of 2025, many files changed their names because the file names included the months of data collected, or "interim" markers on the data. It seems a bit lame that the files change their names, but O well. I must deal with these changes, perhaps manually. As of now, I am noting when files are not appearing and when they are first seen. If there is a problem with some files in the dataset, I may not be able to recognize this. But it is not clear.
I have an extensions table, where I note how many files with what extensions are associated with the dataset. TODO This can go away soon.
A share.sh script checks and, if a created table also lives on the remote server, the new table is copied up to the remote server. This is determined by the shared flag in the sources table.