Columbia Open Health Data¶

Source Information¶

InfoRes ID: infores:cohd

Description: The COHD provides access to counts and patient prevalence (i.e., prevalence from electronic health records) of conditions, procedures, drug exposures, and patient demographics, and the co-occurrence frequencies between them. Count and frequency data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient and outpatient data. Counts are the number of patients with the concept, e.g., diagnosed with a condition, exposed to a drug, or who had a procedure. Frequencies are the number of patients with the concept divided by the total number of patients in the dataset. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. To protect patient privacy, all concepts and pairs of concepts where the count ≤ 10 were excluded, and counts were randomized by the Poisson distribution.

Citations: - https://www.nature.com/articles/sdata2018273 - https://github.com/NCATSTranslator/NCATSTranslator.github.io/raw/master/presentations/Translator_2020_Kick-Off_Presentation-Clinical_Data_Services.pdf

Data Access Locations: https://stars.renci.org/var/data_services/cohd_2/cohd_nodes.jsonl

Data Provision Mechanisms: file_download

Data Formats: jsonl

Data Versioning and Releases: 2024-11-25

Additional Notes: None

Ingest Information¶

Ingest Categories: aggregation_interpreter

Utility: Real clinical data from patients diagnosed with a condition, exposed to a drug, or who had a procedure.

Scope: Clinical correlations of patients to treatments.

Relevant Files¶

File Name	Location	Description
cohd_nodes.jsonl	https://stars.renci.org/var/data_services/cohd_2	KGX jsonl file of KGX 'nodes' data from Phase 2 COHD clinical knowledge provider activities.
cohd_edges.jsonl	https://stars.renci.org/var/data_services/cohd_2	KGX jsonl file of KGX 'edges' data from Phase 2 COHD clinical knowledge provider activities.

Included Content¶

File Name	Included Records	Fields Used
cohd_nodes.jsonl	all	id, name, category
cohd_edges.jsonl	all	id, subject, predicate, object, score, sources

Future Content Considerations¶

node_property_content: The contents of the COHD 'attributes' field are not yet processed by the ingestion pipeline. For example, these attributes cross-correlate concepts with related resources. - Relevant files: ['cohd_nodes.jsonl']

edge_property_content: The contents of the COHD 'attributes' field are not yet processed by the ingestion pipeline. For example, these attributes describe supporting studies, thus providing valuable evidence annotation which should be added to the COHD knowledge graph. - Relevant files: ['cohd_edges.jsonl']

Additional Notes: None

Target Information¶

Target InfoRes ID: infores:cohd

Edge Types¶

Subject Categories	Predicate	Object Categories	Knowledge Level	Agent Type	UI Explanation
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d
			s, t, a, t, i, s, t, i, c, a, l, _, a, s, s, o, c, i, a, t, i, o, n	n, o, t, _, p, r, o, v, i, d, e, d

Node Types¶

Node Category	Source Identifier Types	Additional Notes
biolink:Disease	DOID, UMLS
biolink:SmallMolecule	UNII, CHEBI
biolink:Drug	RXCUI
biolink:MolecularMixture	CHEBI
biolink:ChemicalEntity	MESH, UNII, CHEBI

Additional Notes: None

Provenance Information¶

Contributors: - Richard Bruskiewich - data modeling, code author - Kara Fecho: domain expertise - Matthew Brush - data modeling, domain expertise

Artifacts: - None