Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ejscreen semi-automatic #1184

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

Rohit231998
Copy link

No description provided.

with zfile.open(f'{FILENAMES[year]}.csv', 'r') as newfile:
dfs[year] = pd.read_csv(newfile, usecols=columns)
# some years are not zipped
if year == '2024':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'year' value should be taken from the config file, it should not be hard coded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dfs[year] = pd.read_csv(newfile, usecols=columns)
# some years are not zipped
if year == '2024':
url = f'https://gaftp.epa.gov/EJSCREEN/2024/2.32_August_UseMe/{zip_filename}.zip'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'url' value also move to config file and try to access from there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file from git and update it in GCS location.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

_CONFIG_PATH = os.path.join(_MODULE_DIR, 'config.json')

# Load configuration from config.json
with open(_CONFIG_PATH, 'r') as f:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_path from the flag is not considered here. Config file is reading from _MODULE_DIR location. Can you modify this to read the config file from GCS location?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a flag for download and process separately .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find the flag for download and process separately. Can you check if the updated code is checked-in properly?

)

# Rename columns to match other years
if year == '2024':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid hard coding year like '2024' instead add a list/dict in config file for column remaining and check like below

if year in _CONFIG['remane_colums']:
   cols_renamed = dict(zip(columns, NORM_CSV_COLUMNS1))
else:
   cols_renamed = dict(zip(columns, NORM_CSV_COLUMNS))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

logger.info(
f"File downloaded and processed for {year} successfully")
else:
logger.error(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change logging.error to logging.fatal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



if __name__ == '__main__':
def main(_):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add try/catch block in the main method to avoid partial import in case of unexpected errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"cleaned_csv": "ejscreen_airpollutants.csv"
}
],
"cron_schedule": "0 07 * * 1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to prefix '0' in hour field. Please make it as "0 7 * * 1"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



# Download the file and save it in the input folder
def download_file(url, year, zip_filename=None):
Copy link

@krishnaswamypradeep krishnaswamypradeep Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please implement the retry module for this get request?

refer to this like https://pypi.org/project/retry/


except Exception as e:
logger.fatal(f"Unexpected error in the main process: {e}")
sys.exit(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.exit(1) is not required here, You can remove this.

"import_specifications": [
{
"import_name": "EPA_EJSCREEN",
"curator_emails": ["[email protected]"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove email id from curator_emails and keep it like "curator_emails": [],

Copy link

@krishnaswamypradeep krishnaswamypradeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the comments provided.

Copy link

@krishnaswamypradeep krishnaswamypradeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the config file in PR as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants