Skip to content

Columbia Open Health Data

Source Information

InfoRes ID: infores:cohd

Description: The COHD provides access to counts and patient prevalence (i.e., prevalence from electronic health records) of conditions, procedures, drug exposures, and patient demographics, and the co-occurrence frequencies between them. Count and frequency data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient and outpatient data. Counts are the number of patients with the concept, e.g., diagnosed with a condition, exposed to a drug, or who had a procedure. Frequencies are the number of patients with the concept divided by the total number of patients in the dataset. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. To protect patient privacy, all concepts and pairs of concepts where the count ≤ 10 were excluded, and counts were randomized by the Poisson distribution.

Citations: - https://www.nature.com/articles/sdata2018273 - https://github.com/NCATSTranslator/NCATSTranslator.github.io/raw/master/presentations/Translator_2020_Kick-Off_Presentation-Clinical_Data_Services.pdf

Data Access Locations: https://stars.renci.org/var/data_services/cohd_2/cohd_nodes.jsonl

Data Provision Mechanisms: file_download

Data Formats: jsonl

Data Versioning and Releases: 2024-11-25

Additional Notes: None

Ingest Information

Ingest Categories: aggregation_interpreter

Utility: Real clinical data from patients diagnosed with a condition, exposed to a drug, or who had a procedure.

Scope: Clinical correlations of patients to treatments.

Relevant Files

File Name Location Description
cohd_nodes.jsonl https://stars.renci.org/var/data_services/cohd_2 KGX jsonl file of KGX 'nodes' data from Phase 2 COHD clinical knowledge provider activities.
cohd_edges.jsonl https://stars.renci.org/var/data_services/cohd_2 KGX jsonl file of KGX 'edges' data from Phase 2 COHD clinical knowledge provider activities.

Included Content

File Name Included Records Fields Used
cohd_nodes.jsonl all id, name, category
cohd_edges.jsonl all id, subject, predicate, object, score, sources

Future Content Considerations

node_property_content: The contents of the COHD 'attributes' field are not yet processed by the ingestion pipeline. For example, these attributes cross-correlate concepts with related resources. - Relevant files: ['cohd_nodes.jsonl']

edge_property_content: The contents of the COHD 'attributes' field are not yet processed by the ingestion pipeline. For example, these attributes describe supporting studies, thus providing valuable evidence annotation which should be added to the COHD knowledge graph. - Relevant files: ['cohd_edges.jsonl']

Additional Notes: None

Target Information

Target InfoRes ID: infores:cohd

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d
s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n n, o, t, _, p, r, o, v, i, d, e, d

Node Types

Node Category Source Identifier Types Additional Notes
biolink:Disease DOID, UMLS
biolink:SmallMolecule UNII, CHEBI
biolink:Drug RXCUI
biolink:MolecularMixture CHEBI
biolink:ChemicalEntity MESH, UNII, CHEBI

Additional Notes: None

Provenance Information

Contributors: - Richard Bruskiewich - data modeling, code author - Kara Fecho: domain expertise - Matthew Brush - data modeling, domain expertise

Artifacts: - None