Skip to content

DrugCentral Reference Ingest Guide

Source Information

InfoRes ID: infores:drugcentral

Description: DrugCentral is an open-access online drug information repository. It covers over 4950 drugs, incorporating structural, physicochemical, and pharmacological details to support drug discovery, development, and repositioning.

Citations: - https://pmc.ncbi.nlm.nih.gov/articles/PMC10939799/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC10692006/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC7881657/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC9825566/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC7779058/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC6323925/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC5210665/

Data Access Locations: - Download and database access info: https://drugcentral.org/download

Data Provision Mechanisms: other, database_dump, file_download

Data Versioning and Releases: DrugCentral currently (late 2025) isn't being maintained due to lack of funding.

Ingest Information

Ingest Categories: aggregation_provider, primary_knowledge_provider

Utility: DrugCentral provides data on drug uses (indications, contraindications, etc) and effects on genes/proteins. These edges could be used in MVP1 (may treat disease X), MVP2 (drug Y may increase/decrease gene Z's activity), or Pathfinder queries.

Relevant Files

File Name Location Description
omop_relationship_doid_view drugcentral:unmtid-dbs.net:5433 This view adds doid mapping to omop_relationship. Using just in case we later want to use other ID namespaces for DoP objects
act_table_full drugcentral:unmtid-dbs.net:5433 drug bioactivity data (effect on genes/proteins)

Included Content

File Name Included Records Fields Used
omop_relationship_doid_view (1) Have values in the columns/fields used (no missing values). (2) cui_semantic_type (UMLS semantic type) maps to DiseaseOrPhenotypicFeature or its descendants. (3) unique triples (removed duplicate rows). struct_id, relationship_name, umls_cui, cui_semantic_type
act_table_full Must have an action_type value - we interpret this as an assertion of a relationship (often a specific kind). VS an activity value, which doesn't necessarily mean a relationship (need a threshold for meaningful/actual effect) struct_id, accession, action_type, act_source, act_source_url

Filtered Content

File Name Filtered Records Rationale
omop_relationship_doid_view umls_cui is C0085228 (Fluvoxamine) This object isn't a DiseaseOrPhenotypicFeature and doesn't match the predicate (currently 1 contraindication) or Association range constraints. However, contraindications based on other drugs used (due to adverse drug-drug interactions) are valid.
omop_relationship_doid_view umls_cui is C0022650 (Kidney Calculi aka kidney stones) NodeNorm currently maps these to AnatomicalEntity, which doesn't match the Association range constraint. This seems to be a NodeNorm issue.
act_table_full act_source cannot be CHEMBL, IUPHAR, DRUGBANK, or KEGG DRUG We are instead directly ingesting CHEMBL and IUPHAR (GuideToPharmacology aka gtopdb, based on the references in papers). We are not ingesting DrugBank (has restrictive licensing) or KEGG (may also have licensing issues, but we don't remember the reasoning behind this decision right now).

Future Content Considerations

edge_content: Review KEGG resource/data for reasons not to ingest - we decided to do this but don't remember the reason why (licensing issues maybe?). - Relevant files: act_table_full

edge_content: Consider ingesting rows without an action_type value (threshold on act_value and/or keep rows marked as mechanism of action in moa column). With our current filter criteria, we only keep a small subset of the original rows. However, if we do this, we may want to adjust the action_type depending on primary source (see target_info.additional_notes). - Relevant files: act_table_full

edge_property_content: Review other columns and consider ingesting them (activity value info, MOA info). - Relevant files: act_table_full

edge_content: Stop filtering out umls_cui C0085228 (Fluvoxamine), and instead map its record with relationship_name 'contraindication' to a different predicate/Association for drug-drug contraindications." - Relevant files: omop_relationship_doid_view

other: This resource has a LOT of tables, some of which could be sources of more edges or node attributes. The table vetomop is particularly interesting (drug veterinary indications for non-human animals). RTX-KG2 was ingesting more tables that may be sources of node attributes (see linked issue's comments, RTX-KG2's extract-drugcentral.sh).

Additional Notes: (1) omop_relationship_doid_view seems to have some overlap with DAKP. May want to revisit (evaluate DrugCentral underlying sources, methods) and decide if DAKP can replace/be used instead (2) act_table_full's action_type values are defined in a different database table named 'action_type'. This table also includes 'parent_type' and some values that aren't currently in act_table_full. It looks like an older version of Chembl's MOA values.

Target Information

Target InfoRes ID: infores:translator-drugcentral

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:ChemicalEntity biolink:DiseaseOrPhenotypicFeature knowledge_assertion manual_validation_of_automated_agent Source DrugCentral data provides drug use assertions that were either extracted from the OMOP data model (up to 2012) or manually curated from approved drug labels (after 2012). The predicate of this edge is based on the relationship_name field of the original database row.
biolink:ChemicalEntity biolink:Gene, biolink:Protein knowledge_assertion manual_agent The DrugCentral-reported 'action type' was 'OTHER' - without more information on the relationship type, we assigned the generic predicate for a drug-protein interaction.
biolink:Gene, biolink:Protein biolink:ChemicalEntity knowledge_assertion manual_agent The DrugCentral-reported 'action type' was 'SUBSTRATE' - this means the chemical is a substrate of the protein.
biolink:ChemicalEntity biolink:Gene, biolink:Protein knowledge_assertion manual_agent The DrugCentral-reported action type either (1) corresponds to the predicate and causal_mechanism_qualifier OR (2) implied a physical-interaction (so Translator generated another edge to represent this).
biolink:ChemicalEntity biolink:Gene, biolink:Protein knowledge_assertion manual_agent The qualifier-set is based on the DrugCentral-reported action type.

Node Types

Node Category Source Identifier Types Additional Notes
biolink:ChemicalEntity DRUGCENTRAL
biolink:DiseaseOrPhenotypicFeature UMLS Using only umls_cui column for object ID because it's the most comprehensive (covers same rows as snomed_conceptid and doid plus a little more). I was worried that some IDs wouldn't actually be DiseaseOrPhenotypicFeatures, and that was addressed with filtering.
Protein UniProtKB
Gene UniProtKB NodeNorm with gene/protein conflation will map some UniProtKB IDs to Gene entities.

Future Modeling Considerations

spoq_pattern: Adjust the predicate domain for 'biolink:diagnoses'. It's currently 'diagnostic aid', and raises a validation warning on the DrugCentral 'diagnosis' record/edge from omop_relationship_doid_view. However, this record represents a legit use of a drug for diagnosis.

other: NodeNorm should perhaps map UMLS:C0022650 (Kidney Calculi aka kidney stones) to the main clique for kidney stones MONDO:0008171 (primary label nephrolithiasis).

Additional Notes: For act_table_full publications, we mapped some base urls to prefixes (PMID and DOI) and left others as-is. For act_table_full agent_type, we decided on manual_agent for all primary knowledge sources. (1) However, we are only ASSUMING manual curation of DrugMatrix and PDSP data because the subset we ingest (has action_type values) is a very small fraction of the total rows from those resources. We assume a person manually reviewed the rows with action_types and assigned those values, and that's why there's so few. This assumption has not been confirmed yet (by finding proof in DrugCentral, DrugMatrix, or PDSP papers). If we later include rows without an action_type value, this assumption will no longer be valid. (2) It's also unclear what agent_type should be assigned when act_source is 'UNKNOWN'. (3) On other primary sources: Colleen thinks 'WOMBAT-PK' devs curated data manually (reasoning recorded in https://github.com/NCATSTranslator/Translator-All/wiki/WOMBAT-PK). And it seems safe to assume DrugCentral curators did manual curation for 'SCIENTIFIC LITERATURE' and 'DRUG LABEL' (although Colleen didn't look for lines in DrugCentral papers saying these terms mean manual curation).

Provenance Information

Contributors: - Colleen Xu - code author, data modeling - Matthew Brush - data modeling, domain expertise

Artifacts: - https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/53 - notebooks for development work currently in parser code directory - causal mechanism mapping sheet: https://docs.google.com/spreadsheets/d/1DeAE04O1mz3R9s3dCZpG2hQp9hwif5WjdkUMsBci-u8/edit?gid=392679103#gid=392679103