Skip to content

Define ontology #249

@rcannood

Description

@rcannood

something like this?
@slobentanzer @scottgigante-immunai Regardless of how we decide to resolve this issue, I'm sure we can already many items we can define.

Originally posted by @rcannood in #247 (comment)

For instance:

Common dataset workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  normalization:::group
  dataset_processors:::group
  raw_dataset["Raw dataset"]:::anndata
  common_dataset[Common<br/>dataset]:::anndata
  dataset_loader[/Dataset<br/>loader/]:::component
  subgraph normalization [Normalization methods]
    log_cpm[/"Log CPM"/]:::component
    l1_sqrt[/"L1 sqrt"/]:::component
    log_scran_pooling[/"Log scran<br/>pooling"/]:::component
    sqrt_cpm[/Sqrt CPM/]:::component
  end
  subgraph dataset_processors[Dataset processors]
    pca[/PCA/]:::component
    hvg[/HVG/]:::component
    knn[/KNN/]:::component
  end
  dataset_loader --> raw_dataset --> log_cpm & l1_sqrt & log_scran_pooling & sqrt_cpm --> pca --> hvg --> knn --> common_dataset
Loading

Task-specific benchmarking workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  common_dataset[Common<br/>dataset]:::anndata
  dataset_processor[/Dataset<br/>processor/]:::component
  solution[Ground-truth]:::anndata
  masked_data[Input data]:::anndata
  method[/Method/]:::component
  control_method[/Control<br/>method/]:::component
  output[Prediction]:::anndata
  metric[/Metric/]:::component
  score[Score]:::anndata
  common_dataset --> dataset_processor --> masked_data
  dataset_processor --> solution
  masked_data --> method --> output
  masked_data & solution --> control_method --> output
  solution & output --> metric --> score
Loading

Discussion

However, this workflow might not be applicable for all tasks.

  • Multimodal datasets will have to be processed differently to regular unimodal datasets
  • Some tasks don't really have a ground-truth and instead rely on internal scores. IMO these "benchmarks" should not be a part of OpenProblems, since it doesn't really count as a benchmark.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions