Columbia Open Health Data¶
Source Information¶
InfoRes ID: infores:cohd
Description: The COHD provides access to counts and patient prevalence (i.e., prevalence from electronic health records) of conditions, procedures, drug exposures, and patient demographics, and the co-occurrence frequencies between them. Count and frequency data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient and outpatient data. Counts are the number of patients with the concept, e.g., diagnosed with a condition, exposed to a drug, or who had a procedure. Frequencies are the number of patients with the concept divided by the total number of patients in the dataset. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. To protect patient privacy, all concepts and pairs of concepts where the count ≤ 10 were excluded, and counts were randomized by the Poisson distribution.
Citations: - https://www.nature.com/articles/sdata2018273 - https://github.com/NCATSTranslator/NCATSTranslator.github.io/raw/master/presentations/Translator_2020_Kick-Off_Presentation-Clinical_Data_Services.pdf
Data Access Locations: https://stars.renci.org/var/data_services/cohd_2/cohd_nodes.jsonl
Data Provision Mechanisms: file_download
Data Formats: jsonl
Data Versioning and Releases: 2024-11-25
Additional Notes: None
Ingest Information¶
Ingest Categories: aggregation_interpreter
Utility: Real clinical data from patients diagnosed with a condition, exposed to a drug, or who had a procedure.
Scope: Clinical correlations of patients to treatments.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| cohd_nodes.jsonl | https://stars.renci.org/var/data_services/cohd_2 | KGX jsonl file of KGX 'nodes' data from Phase 2 COHD clinical knowledge provider activities. |
| cohd_edges.jsonl | https://stars.renci.org/var/data_services/cohd_2 | KGX jsonl file of KGX 'edges' data from Phase 2 COHD clinical knowledge provider activities. |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| cohd_nodes.jsonl | all | id, name, category |
| cohd_edges.jsonl | all | id, subject, predicate, object, score, sources |
Future Content Considerations¶
node_property_content: The contents of the COHD 'attributes' field are not yet processed by the ingestion pipeline. For example, these attributes cross-correlate concepts with related resources. - Relevant files: ['cohd_nodes.jsonl']
edge_property_content: The contents of the COHD 'attributes' field are not yet processed by the ingestion pipeline. For example, these attributes describe supporting studies, thus providing valuable evidence annotation which should be added to the COHD knowledge graph. - Relevant files: ['cohd_edges.jsonl']
Additional Notes: None
Target Information¶
Target InfoRes ID: infores:cohd
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d | ||||
| s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n | n, o, t, _, p, r, o, v, i, d, e, d |
Node Types¶
| Node Category | Source Identifier Types | Additional Notes |
|---|---|---|
| biolink:Disease | DOID, UMLS | |
| biolink:SmallMolecule | UNII, CHEBI | |
| biolink:Drug | RXCUI | |
| biolink:MolecularMixture | CHEBI | |
| biolink:ChemicalEntity | MESH, UNII, CHEBI |
Additional Notes: None
Provenance Information¶
Contributors: - Richard Bruskiewich - data modeling, code author - Kara Fecho: domain expertise - Matthew Brush - data modeling, domain expertise
Artifacts: - None