Code repository for the article in Nature Neuroscience by Elizabeth Beam, Christopher Potts, Russell Poldrack, & Amit Etkin
Functional neuroimaging has been a mainstay of human neuroscience for the past 25 years. Interpretation of fMRI data has often occurred within knowledge frameworks crafted by experts, which have the potential to amplify biases that limit the replicability of findings. Here, we employ a computational approach to derive a data-driven framework for neurobiological domains that synthesizes the texts and data of nearly 20,000 human neuroimaging articles. Across multiple levels of domain specificity, the structure-function links within domains better replicate in held- out articles than those mapped from dominant frameworks in neuroscience and psychiatry. We further show that the data-driven framework partitions the literature into modular subfields, for which domains serve as generalizable prototypes of structure-function patterns in single articles. The approach to computational ontology we present here is the most comprehensive characterization of human brain circuits quantifiable with fMRI and may be extended to synthesize other scientific literatures.
Approach to computational ontology. A data-driven framework was generated in an integrative manner in a training set of 12,708 human neuroimaging articles with brain coordinate data. First, 118 brain structures were clustered by k-means according to their co-occurrences with 1,683 terms for mental functions. The co-occurrence matrix was weighted by pointwise mutual information (PMI). Second, the top 25 terms for mental functions were assigned to each circuit based on the point-biserial correlation (rpb) of their binarized occurrences with the centroid of occurrences across structures. Third, the number of terms was selected to maximize average ROC-AUC of logistic regression classifiers predicting structure occurrences from term occurrences (forward inference) and term occurrences from structure occurrences (reverse inference) over a range of term list lengths from 5 to 25. Fourth, the number of domains was selected based on the average ROC-AUC of forward and reverse inference classifiers. Occurrences were summed across terms in each list and structures in each circuit, then thresholded by their mean across articles. In the fifth and final step, each domain was named by the mental function term with highest degree centrality of co-occurrences with other terms in the domain.
Approach to mapping expert-determined frameworks for brain function (RDoC) and mental illness (DSM). Seed terms from the RDoC and DSM frameworks were translated into the language of the human neuroimaging literature through a computational linguistics approach. Term embeddings of length 100 were trained using GloVe. For RDoC, embeddings were trained on a general human neuroimaging corpus of 29,828 articles (Supplementary Fig. 1b). For the DSM, embeddings were trained on a psychiatric human neuroimaging corpus of 26,070 articles (Supplementary Fig. 1c). Candidate synonyms included terms for mental functions in the case of RDoC and for both mental functions and psychopathology in the case of the DSM, as detailed in Supplementary Table 2. In the first step, the closest synonyms of seed terms were identified based on the cosine similarity of synonym term embeddings with the centroid of embeddings across seed terms in each domain. Second, the number of terms for each domain was selected to maximize cosine similarity with the centroid of seed terms. Third, the mental function term lists for each domain were mapped onto brain circuits based on positive pointwise mutual information (PPMI) of term and structure co-occurrences across the corpus of 18,155 articles with activation coordinate data (Supplementary Fig. 1a). Structures were included in the circuit if the FDR of the observed PPMI was less than 0.01, determined by comparison to a null distribution generated by shuffling term list features over 10,000 iterations.
Figure | Files |
---|---|
1b | ontology/ontol_data-driven_lr.ipynb, ontology/ontology.py |
1c | partition/part_splits.ipynb, partition/partition.py |
1d | modularity/mod_kvals_lr.ipynb |
1e | prototype/proto_kvals_lr.ipynb |
2a | ontology/ontol_data-driven_lr.ipynb |
2b | prediction/comp_frameworks_lr_k*.ipynb, modularity/comp_frameworks_lr_k*.ipynb, prototype/comp_frameworks_lr_k*.ipynb |
2c | hierarchy/hier_data-driven_lr_k6-8-22.ipynb |
3b | ontology/ontol_rdoc.ipynb, ontology/ontology.py |
4a | ontology/ontol_rdoc.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py |
4b | ontology/ontol_data-driven_lr.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py |
4c | ontology/ontol_ontol_dsm.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py |
5b, e | prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
5c, f | prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
5d, g | prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
5h | prediction/comp_frameworks_lr.ipynb |
6a-f | mds/mds.ipynb, mds/mds.py |
6g | modularity/mod_data-driven_lr.ipynb, modularity/modularity.py |
6h | modularity/mod_rdoc.ipynb, modularity/modularity.py |
6i | modularity/mod_dsm.ipynb, modularity/modularity.py |
6j | modularity/comp_frameworks_lr.ipynb, modularity/modularity.py |
6k | prototype/proto_data-driven_lr.ipynb, prototype/prototype.py |
6l | prototype/proto_rdoc.ipynb, prototype/prototype.py |
6m | prototype/proto_dsm.ipynb, prototype/prototype.py |
6n | prototype/comp_frameworks_lr.ipynb, prototype/prototype.py |
Figure | Files |
---|---|
1 | corpus/cohorts.ipynb |
2-3 | ontology/ontol_kvals_lr.ipynb, ontology/ontology.py |
4a-b | ontology/ontol_data-driven_nn.ipynb, ontology/ontology.py |
4c | mds/mds.ipynb, mds/mds.py |
4d | modularity/mod_data-driven_nn.ipynb, modularity/modularity.py |
4e | prototype/proto_data-driven_nn.ipynb, prototype/prototype.py |
5a | ontology/ontol_data-driven_terms.ipynb, ontology/ontol_sim_terms.ipynb, ontology/ontology.py |
5b-e | ontology/ontol_sim_terms.ipynb |
6a, d | prediction/comp_frameworks_lr_k09.ipynb |
6b-c, e-f | prediction/pred_data-driven_lr_k09.ipynb |
6g-h | partition/part_data-driven_lr_k09.ipynb, mds/mds.ipynb |
6i Left | modularity/comp_frameworks_lr_k09.ipynb |
6i Right | modularity/mod_data-driven_lr_k09.ipynb |
6j Left | prototype/comp_frameworks_lr_k09.ipynb |
6j Right | prototype/proto_data-driven_lr_k09.ipynb |
7b, e | prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
7c, f | prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
7d, g | prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
7h-j | prediction/comp_frameworks_lr.ipynb |
8b, e; 9b, e | prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py |
8c, f; 9c, f | prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py |
8d, g; 9d, g | prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py |
8h; 9h-j | prediction/comp_frameworks_nn.ipynb U |
10a | partition/part_data-driven_lr.ipynb, partition/partition.py |
10b | partition/part_rdoc.ipynb, partition/partition.py |
10c | partition/part_dsm.ipynb, partition/partition.py |
10d-f | tsne/tsne.ipynb |
Figure | Files |
---|---|
1 | validation/val_brainmap_top.ipynb |
2 | validation/val_brainmap_sims.ipynb |
3-4 | ontology/ontol_kvals_nn.ipynb, ontology/ontology.py |
5 | stability/stab_data-driven_lr_top.ipynb |
6a, d; 7a, d | prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
6b, e; 7b, e | prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
6c, f; 7c, f | prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py |
6g; 7g-i | prediction/comp_frameworks_lr.ipynb |
8a, d; 9a, d | prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py |
8b, e; 9b, e | prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py |
8c, f; 9c, f | prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py |
8g; 9g-i | prediction/comp_frameworks_nn.ipynb |
Table | Files |
---|---|
1 | data/data_table_coord.ipynb |
2 | lexicon/preproc_cogneuro.py, lexicon/preproc_psychiatry.py, lexicon/preproc_rdoc.py, lexicon/preproc_dsm.py |
3 | data/text/pubmed/gen_190428/query.txt, data/text/pubmed/psy_190428/query.txt |
4-5 | prediction/table_lr-nn.ipynb |