Drug-Gene Interaction Database (DGIdb) Reference Ingest Guide¶
Source Information¶
InfoRes ID: infores:dgidb
Description: The Drug-Gene Interaction Database (DGIdb) streamlines the search for druggable therapeutic targets through the aggregation, categorization, and curation of drug and gene data from publications and expert resources.
Citations: - https://doi.org/10.1093/nar/gkad1040
Data Access Locations: - Downloads page: https://dgidb.org/downloads
Data Provision Mechanisms: file_download, api_endpoint, other
Data Formats: tsv
Data Versioning and Releases: DGIdb updates 1-2 times a year since 2021 (based on the raw data dump table in https://dgidb.org/downloads). Versioning is by date (year-month abbrev.) but 2024-Dec also has a semantic version number in its header.
Ingest Information¶
Ingest Categories: aggregation_interpreter
Utility: DGIdb provides associations between drugs/chemicals and genes/proteins. These associations could be used in MVP1 (may treat disease X), MVP2 (drug Y may increase/decrease gene Z's activity), or Pathfinder queries.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| interactions.tsv | https://dgidb.org/downloads | TSV download of drug-gene interaction claims |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| interactions.tsv | (1) Have a value in gene_concept_id and drug_concept_id columns. (2) Drug ID is from a namespace that NodeNorm uses for ChemicalEntities (currently: rxcui, chembl (compound), drugbank). |
Future Content Considerations¶
node_property_content: interactions.tsv has some columns that could be ingested as node properties: drug_is_approved, drug_is_immunotherapy, drug_is_antineoplastic, drug_specificity_score, gene_specificity_score. However, we may want to use a different resource that's updated more frequently or has more coverage. - Relevant files: interactions.tsv
node_property_content: DGIdb raw files for gene and drug claims could be sources of node properties. However, we may want to use a different resource that's updated more frequently or has more coverage. - Relevant files: genes.tsv, drugs.tsv
other: According to others that worked with DGIdb in the past, there's a way to load a local database that has more information. However, I looked at the Github instructions for recreating the database locally, and I don't think this can be easily/automatically done by the Translator ingest pipeline.
Target Information¶
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| biolink:ChemicalEntity | biolink:Gene | knowledge_assertion | automated_agent | The DGIdb drug-gene interaction(s) had no interaction_type specified and/or the interaction_type was 'other/unknown'. | |
| biolink:ChemicalEntity | biolink:Gene | knowledge_assertion | automated_agent | The DGIdb drug-gene interaction(s) had the interaction_type 'binder' OR a value that implied a physical-interaction (so Translator generated another edge to represent this). | |
| biolink:ChemicalEntity | biolink:Gene | knowledge_assertion | automated_agent | The DGIdb drug-gene interaction(s) had an interaction_type that wasn't 'binder' OR 'other/unknown'. These interaction_types imply that the drug affects the gene or its product in some manner. |
Node Types¶
| Node Category | Source Identifier Types | Additional Notes |
|---|---|---|
| biolink:ChemicalEntity | RXCUI, CHEMBL.COMPOUND, DRUGBANK | |
| biolink:Gene | HGNC, NCBIGene, ENSEMBL |
Future Modeling Considerations¶
spoq_pattern: 'immunotherapy' interaction_type: its full meaning isn't captured in the current qualifier-pattern because there were some concerns with using causal_mechanism_qualifier. In the future, we could try representing this using existing or new qualifiers (therapeutic context?).
other: It's unclear what the 'NCI' value in the interaction_source_db_name column stands for (not mentioned in the current DGIdb website and the 2024 paper is unclear). We think it refers to NCIt, so we mapped it to NCIt's infores. Notes: (1) NCI is an organization, not an information resource; (2) Matt suggested that we could instead map it to 'NCI enterprise vocabulary services' to cover 'the more expansive suite of terminologies / knowledgebases / services NCI provides'.
other: Matt said underlying-source licensing and data overlap with 'directly-ingested' resources are not concerns because DGIdb is an 'interpreter' and can be set as the primary source (with the underlying sources set to supporting data providers or publications). HOWEVER, if we are directly ingesting all underlying sources for an edge, the DGIdb edge then seems like extraneous duplication.
Additional Notes: (1) Edge provenance: primary source is DGIdb, and the underlying sources are supporting data providers or publications. (2) We couldn't generate links to DGIdb interaction webpages because they use an ID that looks random or like a hash (and links to the drug or gene can currently be generated but wouldn't be specific to the edge).
Provenance Information¶
Contributors: - Colleen Xu - code author, data modeling - Matthew Brush - data modeling, domain expertise
Artifacts: - Recording special logic info: https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/55 - notebooks for development work currently in parser code directory