-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide complete example workflow #34
Comments
@sgoldenCS and I had a fruitful discussion about a possible data set that is simple enough to analyze but also highlights the functionality of the workflow / DS framework. We came up with a NP inspired classification problem: Identification of two species that are each characterized by three variables. The abundance between the individual species is asymmetric, i.e. species 1 is statistically dominant over 0. A plot of the corresponding distributions is shown below. The data is 100% synthetic so we do not need to worry about any owner rights. The classification problem is set up such that it nicely fits into the narrative of HUGS, but it has no direct ties to NP. The data is spread over 4 .csv files so that we can use @sgoldenCS CSVParser right away. We might come up with a more challenging data set, but for now, we will stick to this one, just so that we can test and run the full workflow. The data and the corresponding script for data generation are (for now) available on the ifarm: |
The model module needs unit tests but is done otherwise. I will be adding an analysis module in the branch linked to this issue since it is the final step towards completion. I have pulled the changes from the model branch and main so it is fully up to date before adding the analysis module. |
1.) We need a fully functional example workflow for the HUGS tutorial. The workflow needs to have:
2.) For each module there has to be a proof of:
3.) A good practice is to capture code-development, updates or any work in the issues. For example: "Started to implement module XYZ. Faced problem with so und so. Going to pause and run a quick literature search". This helps to keep everything transparent. Ideally, an issue tells the entire story of the work that has been done.
4.) The issues for each module should be linked to this. If we for example decide to use the CSVToPandasParser , then we should link the corresponding issue here. Same goes for wiki-pages
5.) For the sake of efficiency and time management, we should follow KIS (Keep It Simple), regarding code development.
The text was updated successfully, but these errors were encountered: