DrugCentral Reference Ingest Guide¶
Source Information¶
InfoRes ID: infores:drugcentral
Description: DrugCentral is an open-access online drug information repository. It covers over 4950 drugs, incorporating structural, physicochemical, and pharmacological details to support drug discovery, development, and repositioning.
Citations: - https://pmc.ncbi.nlm.nih.gov/articles/PMC10939799/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC10692006/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC7881657/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC9825566/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC7779058/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC6323925/ - https://pmc.ncbi.nlm.nih.gov/articles/PMC5210665/
Data Access Locations: - Download and database access info: https://drugcentral.org/download
Data Provision Mechanisms: other, database_dump, file_download
Data Versioning and Releases: DrugCentral currently (late 2025) isn't being maintained due to lack of funding.
Ingest Information¶
Ingest Categories: aggregation_provider, primary_knowledge_provider
Utility: DrugCentral provides data on drug uses (indications, contraindications, etc) and effects on genes/proteins. These edges could be used in MVP1 (may treat disease X), MVP2 (drug Y may increase/decrease gene Z's activity), or Pathfinder queries.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| omop_relationship_doid_view | drugcentral:unmtid-dbs.net:5433 | This view adds doid mapping to omop_relationship. Using just in case we later want to use other ID namespaces for DoP objects |
| act_table_full | drugcentral:unmtid-dbs.net:5433 | drug bioactivity data (effect on genes/proteins) |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| omop_relationship_doid_view | (1) Have values in the columns/fields used (no missing values). (2) cui_semantic_type (UMLS semantic type) maps to DiseaseOrPhenotypicFeature or its descendants. (3) unique triples (removed duplicate rows). | struct_id, relationship_name, umls_cui, cui_semantic_type |
| act_table_full | Must have an action_type value - we interpret this as an assertion of a relationship (often a specific kind). VS an activity value, which doesn't necessarily mean a relationship (need a threshold for meaningful/actual effect) | struct_id, accession, action_type, act_source, act_source_url |
Filtered Content¶
| File Name | Filtered Records | Rationale |
|---|---|---|
| omop_relationship_doid_view | umls_cui is C0085228 (Fluvoxamine) | This object isn't a DiseaseOrPhenotypicFeature and doesn't match the predicate (currently 1 contraindication) or Association range constraints. However, contraindications based on other drugs used (due to adverse drug-drug interactions) are valid. |
| omop_relationship_doid_view | umls_cui is C0022650 (Kidney Calculi aka kidney stones) | NodeNorm currently maps these to AnatomicalEntity, which doesn't match the Association range constraint. This seems to be a NodeNorm issue. |
| act_table_full | act_source cannot be CHEMBL, IUPHAR, DRUGBANK, or KEGG DRUG | We are instead directly ingesting CHEMBL and IUPHAR (GuideToPharmacology aka gtopdb, based on the references in papers). We are not ingesting DrugBank (has restrictive licensing) or KEGG (may also have licensing issues, but we don't remember the reasoning behind this decision right now). |
Future Content Considerations¶
edge_content: Review KEGG resource/data for reasons not to ingest - we decided to do this but don't remember the reason why (licensing issues maybe?). - Relevant files: act_table_full
edge_content: Consider ingesting rows without an action_type value (threshold on act_value and/or keep rows marked as mechanism of action in moa column). With our current filter criteria, we only keep a small subset of the original rows. However, if we do this, we may want to adjust the action_type depending on primary source (see target_info.additional_notes). - Relevant files: act_table_full
edge_property_content: Review other columns and consider ingesting them (activity value info, MOA info). - Relevant files: act_table_full
edge_content: Stop filtering out umls_cui C0085228 (Fluvoxamine), and instead map its record with relationship_name 'contraindication' to a different predicate/Association for drug-drug contraindications." - Relevant files: omop_relationship_doid_view
other: This resource has a LOT of tables, some of which could be sources of more edges or node attributes. The table vetomop is particularly interesting (drug veterinary indications for non-human animals). RTX-KG2 was ingesting more tables that may be sources of node attributes (see linked issue's comments, RTX-KG2's extract-drugcentral.sh).
Additional Notes: (1) omop_relationship_doid_view seems to have some overlap with DAKP. May want to revisit (evaluate DrugCentral underlying sources, methods) and decide if DAKP can replace/be used instead (2) act_table_full's action_type values are defined in a different database table named 'action_type'. This table also includes 'parent_type' and some values that aren't currently in act_table_full. It looks like an older version of Chembl's MOA values.
Target Information¶
Target InfoRes ID: infores:translator-drugcentral
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| biolink:ChemicalEntity | biolink:DiseaseOrPhenotypicFeature | knowledge_assertion | manual_validation_of_automated_agent | Source DrugCentral data provides drug use assertions that were either extracted from the OMOP data model (up to 2012) or manually curated from approved drug labels (after 2012). The predicate of this edge is based on the relationship_name field of the original database row. | |
| biolink:ChemicalEntity | biolink:Gene, biolink:Protein | knowledge_assertion | manual_agent | The DrugCentral-reported 'action type' was 'OTHER' - without more information on the relationship type, we assigned the generic predicate for a drug-protein interaction. | |
| biolink:Gene, biolink:Protein | biolink:ChemicalEntity | knowledge_assertion | manual_agent | The DrugCentral-reported 'action type' was 'SUBSTRATE' - this means the chemical is a substrate of the protein. | |
| biolink:ChemicalEntity | biolink:Gene, biolink:Protein | knowledge_assertion | manual_agent | The DrugCentral-reported action type either (1) corresponds to the predicate and causal_mechanism_qualifier OR (2) implied a physical-interaction (so Translator generated another edge to represent this). | |
| biolink:ChemicalEntity | biolink:Gene, biolink:Protein | knowledge_assertion | manual_agent | The qualifier-set is based on the DrugCentral-reported action type. |
Node Types¶
| Node Category | Source Identifier Types | Additional Notes |
|---|---|---|
| biolink:ChemicalEntity | DRUGCENTRAL | |
| biolink:DiseaseOrPhenotypicFeature | UMLS | Using only umls_cui column for object ID because it's the most comprehensive (covers same rows as snomed_conceptid and doid plus a little more). I was worried that some IDs wouldn't actually be DiseaseOrPhenotypicFeatures, and that was addressed with filtering. |
| Protein | UniProtKB | |
| Gene | UniProtKB | NodeNorm with gene/protein conflation will map some UniProtKB IDs to Gene entities. |
Future Modeling Considerations¶
spoq_pattern: Adjust the predicate domain for 'biolink:diagnoses'. It's currently 'diagnostic aid', and raises a validation warning on the DrugCentral 'diagnosis' record/edge from omop_relationship_doid_view. However, this record represents a legit use of a drug for diagnosis.
other: NodeNorm should perhaps map UMLS:C0022650 (Kidney Calculi aka kidney stones) to the main clique for kidney stones MONDO:0008171 (primary label nephrolithiasis).
Additional Notes: For act_table_full publications, we mapped some base urls to prefixes (PMID and DOI) and left others as-is. For act_table_full agent_type, we decided on manual_agent for all primary knowledge sources. (1) However, we are only ASSUMING manual curation of DrugMatrix and PDSP data because the subset we ingest (has action_type values) is a very small fraction of the total rows from those resources. We assume a person manually reviewed the rows with action_types and assigned those values, and that's why there's so few. This assumption has not been confirmed yet (by finding proof in DrugCentral, DrugMatrix, or PDSP papers). If we later include rows without an action_type value, this assumption will no longer be valid. (2) It's also unclear what agent_type should be assigned when act_source is 'UNKNOWN'. (3) On other primary sources: Colleen thinks 'WOMBAT-PK' devs curated data manually (reasoning recorded in https://github.com/NCATSTranslator/Translator-All/wiki/WOMBAT-PK). And it seems safe to assume DrugCentral curators did manual curation for 'SCIENTIFIC LITERATURE' and 'DRUG LABEL' (although Colleen didn't look for lines in DrugCentral papers saying these terms mean manual curation).
Provenance Information¶
Contributors: - Colleen Xu - code author, data modeling - Matthew Brush - data modeling, domain expertise
Artifacts: - https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/53 - notebooks for development work currently in parser code directory - causal mechanism mapping sheet: https://docs.google.com/spreadsheets/d/1DeAE04O1mz3R9s3dCZpG2hQp9hwif5WjdkUMsBci-u8/edit?gid=392679103#gid=392679103