Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 107 refactor rbhtfinder #126

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

lpardey
Copy link
Collaborator

@lpardey lpardey commented Jul 25, 2024

This PR updates the rbhtfinder script to adhere to the coding standards and general design of the rdock_utils.

While all the automated tests I created have passed, I recommend a thorough review of the changes to some core operations. This review should focus on the following areas:

  • The 'filters' argument in the 'parser' module and how it's being parsed.
  • The following functions in the 'rbhtfinder' module: remove_redundant_combinations, prepare_array, and calculate_results_for_filter_combination.

This review will ensure the changes align with the project goals and maintain the expected functionality. Please let me know if there are any concerns or suggestions.

closes #107

lpardey added 17 commits July 16, 2024 21:58
add basic test integration (all passing)
generate a new input file with tabs as delimiter (original has spaces as delimiter)
add rbhtfinder_original as python module
add test integration
add rbhtfinder class
mypy fix
add rbhtfinder types to common module
add models module
add filter, minmax and minmaxvalues schemas
mypy fix
final refactor
from functools import partial

import numpy as np
import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pandas is not installed, this line will raise an ImportError exception.
We need to handle this and set a variable (something like PANDAS_IS_AVAILABLE) to True if the import is successful, or False otherwise, and use it below.

Anyways, I'd rather not add pandas as a dependency. We will explore the possible ways to do it together.

]
for column in columns
}
num_cpus = os.cpu_count() or 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number of cpus should be part of the config structure, and be passed via command line argument.

let's say something like -c / --cpu-count with a default of 1.

Comment on lines 3 to 4
import numpy
import numpy as np
import numpy.typing as npt
Copy link
Contributor

@ggutierrez-sunbright ggutierrez-sunbright Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the standard/recommendation for numpy imports is to give it the np alias, but I'd rather have the full name. It's not like it saves a big amount of characters, and it is less "clear" / "pythonic".

This one is a purely stylistic choice, just the same way I'd name a variable velocity_x, instead of just vx, for anything other than a prototype.

(same thing with pandas, matplotlib.pyplot, etc)

add cpu_count argument to parser
update pandas and numpy imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor rbhtfinder
2 participants