Skip to content

Drug-Gene Interaction Database (DGIdb) Reference Ingest Guide

Source Information

InfoRes ID: infores:dgidb

Description: The Drug-Gene Interaction Database (DGIdb) streamlines the search for druggable therapeutic targets through the aggregation, categorization, and curation of drug and gene data from publications and expert resources.

Citations: - https://doi.org/10.1093/nar/gkad1040

Data Access Locations: - Downloads page: https://dgidb.org/downloads

Data Provision Mechanisms: file_download, api_endpoint, other

Data Formats: tsv

Data Versioning and Releases: DGIdb updates 1-2 times a year since 2021 (based on the raw data dump table in https://dgidb.org/downloads). Versioning is by date (year-month abbrev.) but 2024-Dec also has a semantic version number in its header.

Ingest Information

Ingest Categories: aggregation_interpreter

Utility: DGIdb provides associations between drugs/chemicals and genes/proteins. These associations could be used in MVP1 (may treat disease X), MVP2 (drug Y may increase/decrease gene Z's activity), or Pathfinder queries.

Relevant Files

File Name Location Description
interactions.tsv https://dgidb.org/downloads TSV download of drug-gene interaction claims

Included Content

File Name Included Records Fields Used
interactions.tsv (1) Have a value in gene_concept_id and drug_concept_id columns. (2) Drug ID is from a namespace that NodeNorm uses for ChemicalEntities (currently: rxcui, chembl (compound), drugbank).

Future Content Considerations

node_property_content: interactions.tsv has some columns that could be ingested as node properties: drug_is_approved, drug_is_immunotherapy, drug_is_antineoplastic, drug_specificity_score, gene_specificity_score. However, we may want to use a different resource that's updated more frequently or has more coverage. - Relevant files: interactions.tsv

node_property_content: DGIdb raw files for gene and drug claims could be sources of node properties. However, we may want to use a different resource that's updated more frequently or has more coverage. - Relevant files: genes.tsv, drugs.tsv

other: According to others that worked with DGIdb in the past, there's a way to load a local database that has more information. However, I looked at the Github instructions for recreating the database locally, and I don't think this can be easily/automatically done by the Translator ingest pipeline.

Target Information

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:ChemicalEntity biolink:Gene knowledge_assertion automated_agent The DGIdb drug-gene interaction(s) had no interaction_type specified and/or the interaction_type was 'other/unknown'.
biolink:ChemicalEntity biolink:Gene knowledge_assertion automated_agent The DGIdb drug-gene interaction(s) had the interaction_type 'binder' OR a value that implied a physical-interaction (so Translator generated another edge to represent this).
biolink:ChemicalEntity biolink:Gene knowledge_assertion automated_agent The DGIdb drug-gene interaction(s) had an interaction_type that wasn't 'binder' OR 'other/unknown'. These interaction_types imply that the drug affects the gene or its product in some manner.

Node Types

Node Category Source Identifier Types Additional Notes
biolink:ChemicalEntity RXCUI, CHEMBL.COMPOUND, DRUGBANK
biolink:Gene HGNC, NCBIGene, ENSEMBL

Future Modeling Considerations

spoq_pattern: 'immunotherapy' interaction_type: its full meaning isn't captured in the current qualifier-pattern because there were some concerns with using causal_mechanism_qualifier. In the future, we could try representing this using existing or new qualifiers (therapeutic context?).

other: It's unclear what the 'NCI' value in the interaction_source_db_name column stands for (not mentioned in the current DGIdb website and the 2024 paper is unclear). We think it refers to NCIt, so we mapped it to NCIt's infores. Notes: (1) NCI is an organization, not an information resource; (2) Matt suggested that we could instead map it to 'NCI enterprise vocabulary services' to cover 'the more expansive suite of terminologies / knowledgebases / services NCI provides'.

other: Matt said underlying-source licensing and data overlap with 'directly-ingested' resources are not concerns because DGIdb is an 'interpreter' and can be set as the primary source (with the underlying sources set to supporting data providers or publications). HOWEVER, if we are directly ingesting all underlying sources for an edge, the DGIdb edge then seems like extraneous duplication.

Additional Notes: (1) Edge provenance: primary source is DGIdb, and the underlying sources are supporting data providers or publications. (2) We couldn't generate links to DGIdb interaction webpages because they use an ID that looks random or like a hash (and links to the drug or gene can currently be generated but wouldn't be specific to the edge).

Provenance Information

Contributors: - Colleen Xu - code author, data modeling - Matthew Brush - data modeling, domain expertise

Artifacts: - Recording special logic info: https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/55 - notebooks for development work currently in parser code directory