-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revising open problem task structure #292
Labels
Comments
mumichae
added
documentation
Improvements or additions to documentation
enhancement
New feature or request
labels
Nov 29, 2023
For reference, the Batch Integration task would look like this: flowchart LR
file_common_dataset("Common Dataset")
comp_process_dataset[/"Data processor"/]
file_dataset("Dataset")
comp_control_method_embedding[/"Control method (embedding)"/]
comp_control_method_graaf[/"Control method (graph)"/]
comp_method_embedding[/"Method (embedding)"/]
comp_method_feature[/"Method (feature)"/]
comp_method_graaf[/"Method (graph)"/]
comp_metric_embedding[/"Metric (embedding)"/]
comp_metric_feature[/"Metric (feature)"/]
comp_metric_graaf[/"Metric (graph)"/]
file_integrated_embedding("Integrated embedding")
file_integrated_graaf("Integrated Graph")
file_integrated_feature("Integrated Feature")
file_score("Score")
comp_transformer_embedding_to_graaf[/"Embedding to Graph"/]
comp_transformer_feature_to_embedding[/"Feature to Embedding"/]
file_common_dataset---comp_process_dataset
comp_process_dataset-->file_dataset
file_dataset---comp_control_method_embedding
file_dataset---comp_control_method_graaf
file_dataset---comp_method_embedding
file_dataset---comp_method_feature
file_dataset---comp_method_graaf
file_dataset---comp_metric_embedding
file_dataset---comp_metric_feature
file_dataset---comp_metric_graaf
comp_control_method_embedding-->file_integrated_embedding
comp_control_method_graaf-->file_integrated_graaf
comp_method_embedding-->file_integrated_embedding
comp_method_feature-->file_integrated_feature
comp_method_graaf-->file_integrated_graaf
comp_metric_embedding-->file_score
comp_metric_feature-->file_score
comp_metric_graaf-->file_score
file_integrated_embedding---comp_metric_embedding
file_integrated_embedding---comp_transformer_embedding_to_graaf
file_integrated_graaf---comp_metric_graaf
file_integrated_feature---comp_metric_feature
file_integrated_feature---comp_transformer_feature_to_embedding
comp_transformer_embedding_to_graaf-->file_integrated_graaf
comp_transformer_feature_to_embedding-->file_integrated_embedding
Instead of what is currently listed in the readme: flowchart LR
file_common_dataset("Common Dataset")
comp_process_dataset[/"Data processor"/]
file_dataset("Dataset")
file_solution("Solution")
comp_control_method_embedding[/"Control method (embedding)"/]
comp_control_method_graaf[/"Control method (graph)"/]
comp_method_embedding[/"Method (embedding)"/]
comp_method_feature[/"Method (feature)"/]
comp_method_graaf[/"Method (graph)"/]
comp_metric_embedding[/"Metric (embedding)"/]
comp_metric_feature[/"Metric (feature)"/]
comp_metric_graaf[/"Metric (graph)"/]
file_integrated_embedding("Integrated embedding")
file_integrated_graaf("Integrated Graph")
file_integrated_feature("Integrated Feature")
file_score("Score")
comp_transformer_embedding_to_graaf[/"Embedding to Graph"/]
comp_transformer_feature_to_embedding[/"Feature to Embedding"/]
file_common_dataset---comp_process_dataset
comp_process_dataset-->file_dataset
comp_process_dataset-->file_solution
file_dataset---comp_control_method_embedding
file_dataset---comp_control_method_graaf
file_dataset---comp_method_embedding
file_dataset---comp_method_feature
file_dataset---comp_method_graaf
file_solution---comp_metric_embedding
file_solution---comp_metric_feature
file_solution---comp_metric_graaf
comp_control_method_embedding-->file_integrated_embedding
comp_control_method_graaf-->file_integrated_graaf
comp_method_embedding-->file_integrated_embedding
comp_method_feature-->file_integrated_feature
comp_method_graaf-->file_integrated_graaf
comp_metric_embedding-->file_score
comp_metric_feature-->file_score
comp_metric_graaf-->file_score
file_integrated_embedding---comp_metric_embedding
file_integrated_embedding---comp_transformer_embedding_to_graaf
file_integrated_graaf---comp_metric_graaf
file_integrated_feature---comp_metric_feature
file_integrated_feature---comp_transformer_feature_to_embedding
comp_transformer_embedding_to_graaf-->file_integrated_graaf
comp_transformer_feature_to_embedding-->file_integrated_embedding
In this specific case, it doesn't really look a lot simpler, even though it is. As a side note, it would be nice if the subtasks were grouped like this: flowchart LR
file_common_dataset("Common Dataset")
comp_process_dataset[/"Data processor"/]
file_dataset("Dataset")
subgraph feature[Feature]
comp_method_feature[/"Method (feature)"/]
comp_metric_feature[/"Metric (feature)"/]
file_integrated_feature("Integrated Feature")
end
comp_transformer_feature_to_embedding[/"Feature to Embedding"/]
subgraph embedding[Embedding]
comp_control_method_embedding[/"Control method (embedding)"/]
comp_method_embedding[/"Method (embedding)"/]
comp_metric_embedding[/"Metric (embedding)"/]
file_integrated_embedding("Integrated embedding")
end
comp_transformer_embedding_to_graaf[/"Embedding to Graph"/]
subgraph graph[Graph]
comp_control_method_graaf[/"Control method (graph)"/]
comp_method_graaf[/"Method (graph)"/]
comp_metric_graaf[/"Metric (graph)"/]
file_integrated_graaf("Integrated Graph")
end
file_score("Score")
file_common_dataset---comp_process_dataset
comp_process_dataset-->file_dataset
file_dataset---comp_control_method_embedding
file_dataset---comp_control_method_graaf
file_dataset---comp_method_embedding
file_dataset---comp_method_feature
file_dataset---comp_method_graaf
file_dataset---comp_metric_embedding
file_dataset---comp_metric_feature
file_dataset---comp_metric_graaf
comp_control_method_embedding-->file_integrated_embedding
comp_control_method_graaf-->file_integrated_graaf
comp_method_embedding-->file_integrated_embedding
comp_method_feature-->file_integrated_feature
comp_method_graaf-->file_integrated_graaf
comp_metric_embedding-->file_score
comp_metric_feature-->file_score
comp_metric_graaf-->file_score
file_integrated_embedding---comp_metric_embedding
file_integrated_embedding---comp_transformer_embedding_to_graaf
file_integrated_graaf---comp_metric_graaf
file_integrated_feature---comp_metric_feature
file_integrated_feature---comp_transformer_feature_to_embedding
comp_transformer_embedding_to_graaf-->file_integrated_graaf
comp_transformer_feature_to_embedding-->file_integrated_embedding
|
25 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Is your feature request related to a problem? Please describe.
After looking over the task structure again, I feel like it might be somewhat restrictive to a certain type of task. For supervised problems a train/test (or dataset/solution) split might work as a good abstraction of the problem, however for unsupervised tasks this might not work out as nicely. Good examples of such tasks are the batch integration and the spatial decomposition tasks, where you might have 2 different inputs to a metric (for batch integration) or method (for spatial decomposition), but those might not quite fit into the paradigm of the train/test split.
Describe the solution you'd like
More flexible paradigm that would make it easier for users to conceptualize their workflow.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
A solution could be something closer to this:
Alternatively, we could show multiple task workflows depending on the type of task at hand (supervised, unsupervised, multiple inputs etc.) to allow for different task setups.
Additional context
For the spatial decomposition task we require not just 2 anndata inputs (one of which we can consider to be the solution) for the method and the metrics. The reference matrix (aka "solution") is used both for the methods and the metrics, which might not quite fit into the paradigm the the solution should not be seen by the method.
Additionally, in the original task, there are multiple different versions of the reference matrix , which could arguably be considered as a separate dataset, part of the data processing or part of the method. An example workflow would look like this:
Or separating the solution and the second input:
For batch integration a more intuitive structure would be:
The text was updated successfully, but these errors were encountered: