Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource file path from simulation #1410

Open
wants to merge 76 commits into
base: master
Choose a base branch
from

Conversation

jkumwenda
Copy link
Collaborator

Created resource file path function and calling it from different modules.

@jkumwenda jkumwenda linked an issue Jul 2, 2024 that may be closed by this pull request
5 tasks
src/tlo/simulation.py Outdated Show resolved Hide resolved
@@ -80,6 +81,7 @@ def __init__(self, *, start_date: Date, seed: int = None, log_config: dict = Non
data=f'Simulation RNG {seed_from} entropy = {self._seed_seq.entropy}'
)
self.rng = np.random.RandomState(np.random.MT19937(self._seed_seq))
self.resourcefilepath = resourcefilepath
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we could convert and store Path type and check that path exists.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this hasn't;t been done yet (the check that the path exists)

thewati and others added 20 commits July 30, 2024 15:50
…tate cancer, fixed test equipment and dxmanager
…t.py and breast_cancer.py method updated for resource file path from simulation.py
…t.py and breast_cancer.py method updated for resource file path from simulation.py
…t.py isort the import to fix incorrectly sorted error
Copy link
Collaborator

@mnjowe mnjowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Wati and Joel, for the excellent work on this PR. Below are my comments, which may seem many but primarily revolve around the following key points:

  1. I suggest initialising resourcefilepath as a path object in both the analysis scripts and test files. This change will help eliminate repetitive code currently arising from creating a path object from resourcefilepath in each disease module.
  2. I suggest reverting the changes made to the simulation end_date and population sizes in some of the analysis scripts.
  3. I suggest removing str option for resourcefilepath argument in Simulation object.
  4. I suggest making resourcefilepath argument in the read_parameters section optional to improve readability
  5. I suggest removing the condition to check if resourcefilepath is empty in utils. There may be a more efficient way to handle this check.

For changing resourcefilepath from str to path, I couldn't provide a comment on every affected line. However, if you agree that it should be declared as a path object (rather than a string), you can apply this change consistently across all affected areas. Similarly, if you agree with making resourcefilepath in read_parameters section optional, you can apply this adjustment to all relevant sections.

Once again, thank you for the great work on this PR!

Comment on lines 40 to 41
end_date = Date(2011, 12, 31)
popsize = 5000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you make these changes just to make the script run faster? if yes can you now please revert the changes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will revert the changes

@@ -348,7 +348,7 @@ def plot_modal_gbd_deaths_by_age_group(self):
start_date = Date(2010, 1, 1)
end_date = Date(2030, 1, 1)

resourcefilepath = Path("./resources") # Path to resource files
resourcefilepath = './resources' # Path to resource files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you're changing from Path to string here?

Comment on lines 41 to 42
end_date = Date(2011, 7, 1)
pop_size = 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, if the intention was to make this run faster, revert the changes


# Path to the resource files used by the disease and intervention methods
resources = "./resources"
resourcefilepath = "./resources"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we standardise path here i.e. making it to resourcefilepath = Path("./resources") and read it in read_parameters as read_csv_files(resourcefilepath / resourcefile_folder_name). I feel this will be good as we will initialise path once rather than each module initialising it.

@@ -25,7 +25,7 @@
# %%


resourcefilepath = Path("./resources")
resourcefilepath = './resources'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. why changing? I feel like it will be good to initialise path here rather than in the module

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We overlooked this. Thanks for catching it

@@ -192,12 +191,12 @@ def __init__(self, name=None, resourcefilepath=None):
)
}

def read_parameters(self, data_folder):
def read_parameters(self, resourcefilepath=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it optional?

"""Setup parameters used by the module, now including disability weights"""

# Update parameters from the resourcefile
self.load_parameters_from_dataframe(
pd.read_excel(Path(self.resourcefilepath) / "ResourceFile_Breast_Cancer.xlsx",
pd.read_excel(Path(resourcefilepath) / "ResourceFile_Breast_Cancer.xlsx",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be good if you could have initialised resourcefilepath as path object and avoid creating it here

@@ -256,7 +255,7 @@ def __init__(self, name=None, resourcefilepath=None, do_log_df: bool = False, do
self.lms_event_death = dict()
self.lms_event_symptoms = dict()

def read_parameters(self, data_folder):
def read_parameters(self, resourcefilepath=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it optional?

@@ -273,7 +272,7 @@ def read_parameters(self, data_folder):
ResourceFile_cmd_events_hsi.xlsx = HSI parameters for events

"""
cmd_path = Path(self.resourcefilepath) / "cmd"
cmd_path = Path(resourcefilepath) / "cmd"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resourcefilepath could be passed as a path object already and avoid creating it here

def __init__(
self,
*,
start_date: Date,
seed: Optional[int] = None,
log_config: Optional[dict] = None,
show_progress_bar: bool = False,
resourcefilepath: Optional[Path] = None,
resourcefilepath: Optional[str | Path] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only take Path. I think string option should be removed

Copy link
Collaborator

@tbhallett tbhallett Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed about query of whether we think string is ok here -- and I wasn't sure if the whole argument should be Optional.

I see below that a string would be OK as line 112 wraps this in Path().

I also see that the comment on line 93 explains that not giving a parameter is ok.

I don't know why that would be mean, but there might be a reason.

Suggested change
resourcefilepath: Optional[str | Path] = None,
resourcefilepath: Path,

@jkumwenda
Copy link
Collaborator Author

Thank you, Wati and Joel, for the excellent work on this PR. Below are my comments, which may seem many but primarily revolve around the following key points:

  1. I suggest initialising resourcefilepath as a path object in both the analysis scripts and test files. This change will help eliminate repetitive code currently arising from creating a path object from resourcefilepath in each disease module.
  2. I suggest reverting the changes made to the simulation end_date and population sizes in some of the analysis scripts.
  3. I suggest removing str option for resourcefilepath argument in Simulation object.
  4. I suggest making resourcefilepath argument in the read_parameters section optional to improve readability
  5. I suggest removing the condition to check if resourcefilepath is empty in utils. There may be a more efficient way to handle this check.

For changing resourcefilepath from str to path, I couldn't provide a comment on every affected line. However, if you agree that it should be declared as a path object (rather than a string), you can apply this change consistently across all affected areas. Similarly, if you agree with making resourcefilepath in read_parameters section optional, you can apply this adjustment to all relevant sections.

Once again, thank you for the great work on this PR!

Thanks for these comments, we will review and provide feedback line by line.

…th("./resources") in scripts files and updated methods read parameters to def read_parameters(self, resourcefilepath: Optional[Path] = None): helps with single initialisation across the methods
# Conflicts:
#	src/tlo/methods/alri.py
#	src/tlo/methods/depression.py
#	src/tlo/methods/diarrhoea.py
#	src/tlo/methods/epilepsy.py
#	src/tlo/methods/rti.py
…rent activ resource file in the HIV resource folder
Comment on lines +56 to +57
data_hiv_mphia_inc = xls["MPHIA_incidence2020"]
data_hiv_mphia_prev = xls["MPHIA_prevalence_art2020"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdm32, can you confirm this change is necessary. I agree with Joel, We don't have MPHIA_incidence2015 and MPHIA_prevalence_art2015 in the HIV resourcefiles folder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this script originally used the MPHIA 2015 estimates, but would now use the 2020 estimates. The worksheets were renamed, so MPHIA_incidence2015 and MPHIA_prevalence_art2015 no longer exist. Thank you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tara

…s(self, resourcefilepath: Optional[Path] = None):

        parameter_dataframe = read_csv_files(resourcefilepath /
…sourcefilepath is None:

        resourcefilepath = get_root_path() / 'resources' from utils.py.
…sourcefilepath is None:

        resourcefilepath = get_root_path() / 'resources' from utils.py.
…sourcefilepath is None:

        resourcefilepath = get_root_path() / 'resources' from utils.py.
…cefilepath is None:

        resourcefilepath = get_root_path() / 'resources' from utils.py.
…cefilepath is None:

        resourcefilepath = get_root_path() / 'resources' from utils.py.
…to jkumwenda/resource_file_path

# Conflicts:
#	src/tlo/methods/scenario_switcher.py
# Conflicts:
#	src/tlo/analysis/utils.py
#	src/tlo/methods/bladder_cancer.py
#	src/tlo/methods/breast_cancer.py
#	src/tlo/methods/enhanced_lifestyle.py
#	src/tlo/methods/oesophagealcancer.py
#	src/tlo/methods/other_adult_cancers.py
#	src/tlo/methods/prostate_cancer.py
#	src/tlo/methods/stunting.py
@jkumwenda
Copy link
Collaborator Author

jkumwenda commented Jan 21, 2025

@tbhallett please review and merge in master if all is good. I have addressed all comments from @mnjowe

def __init__(
self,
*,
start_date: Date,
seed: Optional[int] = None,
log_config: Optional[dict] = None,
show_progress_bar: bool = False,
resourcefilepath: Optional[Path] = None,
resourcefilepath: Optional[str | Path] = None,
Copy link
Collaborator

@tbhallett tbhallett Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed about query of whether we think string is ok here -- and I wasn't sure if the whole argument should be Optional.

I see below that a string would be OK as line 112 wraps this in Path().

I also see that the comment on line 93 explains that not giving a parameter is ok.

I don't know why that would be mean, but there might be a reason.

Suggested change
resourcefilepath: Optional[str | Path] = None,
resourcefilepath: Path,

@@ -119,11 +118,11 @@ def __init__(self, name=None, resourcefilepath=None, mda_execute=True):
s.loc[(s.index >= low_limit) & (s.index <= high_limit)] = name
self.age_group_mapper = s.to_dict()

def read_parameters(self, data_folder):
def read_parameters(self, resourcefilepath: Optional[Path] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here - and elsewhere -- why do we say this is optional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the correct type hint for an argument that's allowed to be None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re. resourcefilepath: Path, suggestion - we're always specifying a resource path for a Simulation (I don't think any of our modules work without one!), so I don't think it'd be very disruptive to remove the default value.

@@ -52,7 +51,7 @@ def __init__(self, name=None, resourcefilepath=None):

PROPERTIES = {}

def read_parameters(self, data_folder):
def read_parameters(self, *args):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this signature is different to that for all the other modules - but I think it should be the same.

@tbhallett
Copy link
Collaborator

Great work, @jkumwenda --- so glad this is working now.

My comments will mostly be for @tamuri, I think. I am not sure myself what the right thing to do would be, so have commented where I would instinctively have done something different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready to merge
Development

Successfully merging this pull request may close these issues.

Get resource file path from simulation
6 participants