-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ejscreen semi-automatic #1184
base: master
Are you sure you want to change the base?
Ejscreen semi-automatic #1184
Conversation
scripts/us_epa/ejscreen/ejscreen.py
Outdated
with zfile.open(f'{FILENAMES[year]}.csv', 'r') as newfile: | ||
dfs[year] = pd.read_csv(newfile, usecols=columns) | ||
# some years are not zipped | ||
if year == '2024': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'year' value should be taken from the config file, it should not be hard coded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/us_epa/ejscreen/ejscreen.py
Outdated
dfs[year] = pd.read_csv(newfile, usecols=columns) | ||
# some years are not zipped | ||
if year == '2024': | ||
url = f'https://gaftp.epa.gov/EJSCREEN/2024/2.32_August_UseMe/{zip_filename}.zip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'url' value also move to config file and try to access from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/us_epa/ejscreen/config.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file from git and update it in GCS location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/us_epa/ejscreen/ejscreen.py
Outdated
_CONFIG_PATH = os.path.join(_MODULE_DIR, 'config.json') | ||
|
||
# Load configuration from config.json | ||
with open(_CONFIG_PATH, 'r') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config_path from the flag is not considered here. Config file is reading from _MODULE_DIR location. Can you modify this to read the config file from GCS location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a flag for download and process separately .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find the flag for download and process separately. Can you check if the updated code is checked-in properly?
scripts/us_epa/ejscreen/ejscreen.py
Outdated
) | ||
|
||
# Rename columns to match other years | ||
if year == '2024': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid hard coding year like '2024' instead add a list/dict in config file for column remaining and check like below
if year in _CONFIG['remane_colums']:
cols_renamed = dict(zip(columns, NORM_CSV_COLUMNS1))
else:
cols_renamed = dict(zip(columns, NORM_CSV_COLUMNS))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
scripts/us_epa/ejscreen/ejscreen.py
Outdated
logger.info( | ||
f"File downloaded and processed for {year} successfully") | ||
else: | ||
logger.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change logging.error to logging.fatal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
|
||
if __name__ == '__main__': | ||
def main(_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add try/catch block in the main method to avoid partial import in case of unexpected errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"cleaned_csv": "ejscreen_airpollutants.csv" | ||
} | ||
], | ||
"cron_schedule": "0 07 * * 1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to prefix '0' in hour field. Please make it as "0 7 * * 1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
|
||
# Download the file and save it in the input folder | ||
def download_file(url, year, zip_filename=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please implement the retry
module for this get request?
refer to this like https://pypi.org/project/retry/
|
||
except Exception as e: | ||
logger.fatal(f"Unexpected error in the main process: {e}") | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys.exit(1) is not required here, You can remove this.
"import_specifications": [ | ||
{ | ||
"import_name": "EPA_EJSCREEN", | ||
"curator_emails": ["[email protected]"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove email id from curator_emails and keep it like "curator_emails": [],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the comments provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the config file in PR as well
No description provided.