Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revising open problem task structure #292

Open
mumichae opened this issue Nov 29, 2023 · 1 comment
Open

Revising open problem task structure #292

mumichae opened this issue Nov 29, 2023 · 1 comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@mumichae
Copy link
Collaborator

mumichae commented Nov 29, 2023

Is your feature request related to a problem? Please describe.
After looking over the task structure again, I feel like it might be somewhat restrictive to a certain type of task. For supervised problems a train/test (or dataset/solution) split might work as a good abstraction of the problem, however for unsupervised tasks this might not work out as nicely. Good examples of such tasks are the batch integration and the spatial decomposition tasks, where you might have 2 different inputs to a metric (for batch integration) or method (for spatial decomposition), but those might not quite fit into the paradigm of the train/test split.

graph LR
  common_dataset[Common<br/>dataset]:::anndata
  subgraph task_specific[Task-specific workflow]
    dataset_processor[/Dataset<br/>processor/]:::component
    solution[Solution]:::anndata
    masked_data[Dataset]:::anndata
    method[/Method/]:::component
    control_method[/Control<br/>method/]:::component
    output[Output]:::anndata
    metric[/Metric/]:::component
    score[Score]:::anndata
  end
  common_dataset --- dataset_processor --> masked_data & solution
  masked_data --- method --> output
  masked_data & solution --- control_method --> output
  solution & output --- metric --> score
Loading

Describe the solution you'd like
More flexible paradigm that would make it easier for users to conceptualize their workflow.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

A solution could be something closer to this:

graph LR
  common_dataset[Common<br/>dataset]:::anndata
  subgraph task_specific[Task-specific workflow]
    dataset_processor[/Dataset<br/>processor/]:::component
    solution[Solution optional]:::anndata
    masked_data[Task input 1, ..., N]:::anndata
    method[/Method/]:::component
    control_method[/Control<br/>method/]:::component
    output[Output]:::anndata
    metric[/Metric/]:::component
    score[Score]:::anndata
  end
  common_dataset --- dataset_processor --> masked_data & solution
  masked_data --- method --> output
  masked_data & solution --- control_method --> output
  solution & output --- metric --> score
Loading

Alternatively, we could show multiple task workflows depending on the type of task at hand (supervised, unsupervised, multiple inputs etc.) to allow for different task setups.

Additional context
For the spatial decomposition task we require not just 2 anndata inputs (one of which we can consider to be the solution) for the method and the metrics. The reference matrix (aka "solution") is used both for the methods and the metrics, which might not quite fit into the paradigm the the solution should not be seen by the method.
Additionally, in the original task, there are multiple different versions of the reference matrix , which could arguably be considered as a separate dataset, part of the data processing or part of the method. An example workflow would look like this:

graph LR
  common_dataset[Common<br/>dataset]:::anndata
  subgraph task_specific[Task-specific workflow]
    dataset_processor[/Dataset<br/>processor/]:::component
    task_input_1[Reference matrix]:::anndata
    task_input_2[Spatial matrix]:::anndata
    method[/Method/]:::component
    control_method[/Control<br/>method/]:::component
    output[Output]:::anndata
    metric[/Metric/]:::component
    score[Score]:::anndata
  end
  common_dataset --- dataset_processor --> task_input_1 & task_input_2
  task_input_1 & task_input_2 --- method --> output
  task_input_1 & task_input_2 --- control_method --> output
  task_input_1 & output --- metric --> score
Loading

Or separating the solution and the second input:

graph LR
  common_dataset[Common<br/>dataset]:::anndata
  subgraph task_specific[Task-specific workflow]
    dataset_processor[/Dataset<br/>processor/]:::component
    task_input_1[Reference matrix]:::anndata
    task_input_2[Spatial matrix]:::anndata
    solution[Solution]:::anndata
    method[/Method/]:::component
    control_method[/Control<br/>method/]:::component
    output[Output]:::anndata
    metric[/Metric/]:::component
    score[Score]:::anndata
  end
  common_dataset --- dataset_processor --> task_input_1 & task_input_2 & solution
  task_input_1 & task_input_2 --- method --> output
  task_input_1 & task_input_2 & solution--- control_method --> output
  solution & output --- metric --> score
Loading

For batch integration a more intuitive structure would be:

graph LR
  common_dataset[Common<br/>dataset]:::anndata
  subgraph task_specific[Task-specific workflow]
    dataset_processor[/Dataset<br/>processor/]:::component
    task_input_1[Dataset]:::anndata
    method[/Method/]:::component
    control_method[/Control<br/>method/]:::component
    output[Output]:::anndata
    metric[/Metric/]:::component
    score[Score]:::anndata
  end
  common_dataset --- dataset_processor --> task_input_1
  task_input_1 --- method --> output
  task_input_1 --- control_method --> output
  task_input_1 & output --- metric --> score
Loading
@mumichae mumichae added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 29, 2023
@rcannood
Copy link
Member

rcannood commented Nov 29, 2023

For reference, the Batch Integration task would look like this:

flowchart LR
  file_common_dataset("Common Dataset")
  comp_process_dataset[/"Data processor"/]
  file_dataset("Dataset")
  comp_control_method_embedding[/"Control method (embedding)"/]
  comp_control_method_graaf[/"Control method (graph)"/]
  comp_method_embedding[/"Method (embedding)"/]
  comp_method_feature[/"Method (feature)"/]
  comp_method_graaf[/"Method (graph)"/]
  comp_metric_embedding[/"Metric (embedding)"/]
  comp_metric_feature[/"Metric (feature)"/]
  comp_metric_graaf[/"Metric (graph)"/]
  file_integrated_embedding("Integrated embedding")
  file_integrated_graaf("Integrated Graph")
  file_integrated_feature("Integrated Feature")
  file_score("Score")
  comp_transformer_embedding_to_graaf[/"Embedding to Graph"/]
  comp_transformer_feature_to_embedding[/"Feature to Embedding"/]
  file_common_dataset---comp_process_dataset
  comp_process_dataset-->file_dataset
  file_dataset---comp_control_method_embedding
  file_dataset---comp_control_method_graaf
  file_dataset---comp_method_embedding
  file_dataset---comp_method_feature
  file_dataset---comp_method_graaf
  file_dataset---comp_metric_embedding
  file_dataset---comp_metric_feature
  file_dataset---comp_metric_graaf
  comp_control_method_embedding-->file_integrated_embedding
  comp_control_method_graaf-->file_integrated_graaf
  comp_method_embedding-->file_integrated_embedding
  comp_method_feature-->file_integrated_feature
  comp_method_graaf-->file_integrated_graaf
  comp_metric_embedding-->file_score
  comp_metric_feature-->file_score
  comp_metric_graaf-->file_score
  file_integrated_embedding---comp_metric_embedding
  file_integrated_embedding---comp_transformer_embedding_to_graaf
  file_integrated_graaf---comp_metric_graaf
  file_integrated_feature---comp_metric_feature
  file_integrated_feature---comp_transformer_feature_to_embedding
  comp_transformer_embedding_to_graaf-->file_integrated_graaf
  comp_transformer_feature_to_embedding-->file_integrated_embedding
Loading

Instead of what is currently listed in the readme:

flowchart LR
  file_common_dataset("Common Dataset")
  comp_process_dataset[/"Data processor"/]
  file_dataset("Dataset")
  file_solution("Solution")
  comp_control_method_embedding[/"Control method (embedding)"/]
  comp_control_method_graaf[/"Control method (graph)"/]
  comp_method_embedding[/"Method (embedding)"/]
  comp_method_feature[/"Method (feature)"/]
  comp_method_graaf[/"Method (graph)"/]
  comp_metric_embedding[/"Metric (embedding)"/]
  comp_metric_feature[/"Metric (feature)"/]
  comp_metric_graaf[/"Metric (graph)"/]
  file_integrated_embedding("Integrated embedding")
  file_integrated_graaf("Integrated Graph")
  file_integrated_feature("Integrated Feature")
  file_score("Score")
  comp_transformer_embedding_to_graaf[/"Embedding to Graph"/]
  comp_transformer_feature_to_embedding[/"Feature to Embedding"/]
  file_common_dataset---comp_process_dataset
  comp_process_dataset-->file_dataset
  comp_process_dataset-->file_solution
  file_dataset---comp_control_method_embedding
  file_dataset---comp_control_method_graaf
  file_dataset---comp_method_embedding
  file_dataset---comp_method_feature
  file_dataset---comp_method_graaf
  file_solution---comp_metric_embedding
  file_solution---comp_metric_feature
  file_solution---comp_metric_graaf
  comp_control_method_embedding-->file_integrated_embedding
  comp_control_method_graaf-->file_integrated_graaf
  comp_method_embedding-->file_integrated_embedding
  comp_method_feature-->file_integrated_feature
  comp_method_graaf-->file_integrated_graaf
  comp_metric_embedding-->file_score
  comp_metric_feature-->file_score
  comp_metric_graaf-->file_score
  file_integrated_embedding---comp_metric_embedding
  file_integrated_embedding---comp_transformer_embedding_to_graaf
  file_integrated_graaf---comp_metric_graaf
  file_integrated_feature---comp_metric_feature
  file_integrated_feature---comp_transformer_feature_to_embedding
  comp_transformer_embedding_to_graaf-->file_integrated_graaf
  comp_transformer_feature_to_embedding-->file_integrated_embedding
Loading

In this specific case, it doesn't really look a lot simpler, even though it is.


As a side note, it would be nice if the subtasks were grouped like this:

flowchart LR
  file_common_dataset("Common Dataset")
  comp_process_dataset[/"Data processor"/]
  file_dataset("Dataset")
  subgraph feature[Feature]
    comp_method_feature[/"Method (feature)"/]
    comp_metric_feature[/"Metric (feature)"/]
    file_integrated_feature("Integrated Feature")
  end
  comp_transformer_feature_to_embedding[/"Feature to Embedding"/]
  subgraph embedding[Embedding]
    comp_control_method_embedding[/"Control method (embedding)"/]
    comp_method_embedding[/"Method (embedding)"/]
    comp_metric_embedding[/"Metric (embedding)"/]
    file_integrated_embedding("Integrated embedding")
  end
  comp_transformer_embedding_to_graaf[/"Embedding to Graph"/]
  subgraph graph[Graph]
    comp_control_method_graaf[/"Control method (graph)"/]
    comp_method_graaf[/"Method (graph)"/]
    comp_metric_graaf[/"Metric (graph)"/]
    file_integrated_graaf("Integrated Graph")
  end
  file_score("Score")
  file_common_dataset---comp_process_dataset
  comp_process_dataset-->file_dataset
  file_dataset---comp_control_method_embedding
  file_dataset---comp_control_method_graaf
  file_dataset---comp_method_embedding
  file_dataset---comp_method_feature
  file_dataset---comp_method_graaf
  file_dataset---comp_metric_embedding
  file_dataset---comp_metric_feature
  file_dataset---comp_metric_graaf
  comp_control_method_embedding-->file_integrated_embedding
  comp_control_method_graaf-->file_integrated_graaf
  comp_method_embedding-->file_integrated_embedding
  comp_method_feature-->file_integrated_feature
  comp_method_graaf-->file_integrated_graaf
  comp_metric_embedding-->file_score
  comp_metric_feature-->file_score
  comp_metric_graaf-->file_score
  file_integrated_embedding---comp_metric_embedding
  file_integrated_embedding---comp_transformer_embedding_to_graaf
  file_integrated_graaf---comp_metric_graaf
  file_integrated_feature---comp_metric_feature
  file_integrated_feature---comp_transformer_feature_to_embedding
  comp_transformer_embedding_to_graaf-->file_integrated_graaf
  comp_transformer_feature_to_embedding-->file_integrated_embedding
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants