Therapeutic Target Database (TTD) Reference Ingest Guide¶
Source Information¶
InfoRes ID: infores:ttd
Description: TTD is a database providing information about the known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets.
Citations: - https://doi.org/10.1093/nar/gkad751 - https://doi.org/10.1093/nar/gkab953 - https://doi.org/10.1093/nar/gkz981 - https://doi.org/10.1093/nar/gkx1076 - https://doi.org/10.1093/nar/gkv1230 - https://doi.org/10.1093/nar/gkt1129 - https://doi.org/10.1093/nar/gkr797 - https://doi.org/10.1093/nar/gkp1014 - https://doi.org/10.1093/nar/30.1.412
Data Access Locations: - Downloads page: https://db.idrblab.net/ttd/full-data-download
Data Provision Mechanisms: file_download
Data Formats: other
Data Versioning and Releases: New release ~ every 2 years. Versioning is a little complicated. Some files have a header section that includes a semantic version number and date (these dates can differ a lot). Others don't.
Ingest Information¶
Ingest Categories: primary_knowledge_provider
Utility: TTD provides associations for drugs/chemicals, therapeutic targets (mostly proteins), and diseases that appear to be manually curated from literature review. This literature review includes information that may not be covered by other resources, including: drug industry reports, drug pipeline reports of hundreds of companies, patents from multiple countries, and manual review of Pubmed literature searches. This associations could be used in MVP1 (may treat disease X), MVP2 (drug Y may increase/decrease gene Z's activity), or Pathfinder queries.
Scope: This ingest covers the drug-disease file and 1 file with drug-protein associations. For more details on the data content and deciding what files to ingest, see https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/30 or Colleen Xu's internal document.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| P1-05-Drug_disease.txt | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Drug to disease mapping with ICD identifiers'. Includes chemical/drug 'treats' disease associations. Uses TTD drug IDs - need to use other file to map to usable IDs for Translator |
| P1-07-Drug-TargetMapping.xlsx | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Target to drug mapping with mode of action'. Has chemical/drug 'affects' protein associations. Uses TTD target IDs and drug IDs - need to use other file to map to usable IDs for Translator |
| P1-03-TTD_crossmatching.txt | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Cross-matching ID between TTD drugs and public databases'. Using for ID mapping only. Has TTD drug ID (start with 'D') mappings to PUBCHEM.COMPOUND, CAS, and/or CHEBI. This file does not have info for all TTD drug IDs. It also doesn't have any info on TTD chemical IDs (start with 'C'). |
| P2-01-TTD_uniprot_all.txt | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Download Uniprot IDs for all targets'. Using for ID mapping only. Has TTD target ID mappings to UNIPROT NAME (not ID). This file does not actually include all TTD Target IDs. It also has special values ('NOUNIPROTAC' appears to mean no name/mapping). |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| P1-05-Drug_disease.txt | MOA has mapping, and TTD drug and target IDs are successfully mapped to IDs that NodeNorm recognizes. | |
| P1-07-Drug-TargetMapping.xlsx | Clinical status has mapping, and TTD drug ID and indication name are successfully mapped to IDs that NodeNorm recognizes. |
Filtered Content¶
| File Name | Filtered Records | Rationale |
|---|---|---|
| P1-05-Drug_disease.txt | Clinical status value was not included in clinical_status_map (hard-coded variable mapping clinical status values to biolink predicates). | These clinical status values need review. Colleen Xu was either (1) unsure what the term meant ('Application submitted') or (2) unsure what the consensus is for including this kind of data (discontinued, terminated, withdrawn); and if we want to keep these terms, what predicates to map to. |
| P1-05-Drug_disease.txt | TTD drug ID doesn't have a PUBCHEM.COMPOUND mapping (from the TTD mapping file). | Need node IDs that are in NodeNorm's scope. |
| P1-05-Drug_disease.txt | Indication name is '#N/A' or contains specific substrings. | This means there either isn't a name for the indication or the indication name is problematic (not 'conditions that are treated' or Colleen Xu was worried how the statement would look). We need a valid indication name to put into NameResolver to find a Translator entity ID for the node. |
| P1-05-Drug_disease.txt | Indication name wasn't successfully mapped to a Translator entity ID using NameResolver. | Nodes need IDs. A non-successful mapping means NameResolver (with the query fields Colleen Xu set) either (1) didn't find a Translator entity that matched the indication name, or (2) the NameResolver hit had a score lower than the threshold (this was set to increase the quality of mappings). |
| P1-07-Drug-TargetMapping.xlsx | TTD drug ID doesn't have a PUBCHEM.COMPOUND mapping (from the TTD mapping file). | Need node IDs that are in NodeNorm's scope. |
| P1-07-Drug-TargetMapping.xlsx | TTD target ID doesn't have a mapping to a UniProt name, or that UniProt name doesn't successfully map to a UniProt ID. | Need node IDs that are in NodeNorm's scope. |
| P1-07-Drug-TargetMapping.xlsx | MOA value doesn't have a mapping (code and MOA_MAPPING variable are used to map modified MOA values to unique patterns of modeling - predicate, qualifier set, extra edge predicate). | These MOA values need review. Colleen Xu was (1) unsure what the term meant or (2) unsure how to model this term. |
Future Content Considerations¶
edge_content: Review clinical status values that weren't mapped (see transform-metadata output file or jupyter notebook 'Map clinical_status' section). Reach consensus on how to handle these values (whether to ingest, how to model). - Relevant files: P1-05-Drug_disease.txt
edge_content: Review MOA values that weren't mapped (see transform-metadata output file or jupyter notebook's 'More MOA parsing' section). Reach consensus on how to handle these values (whether to ingest, how to model). - Relevant files: P1-07-Drug-TargetMapping.xlsx
edge_property_content: Add >=1 edge properties that capture the original clinical status more precisely than just the predicate. Colleen Xu did not include this in the first pass at ingestion because (1) this ingest already took more time/effort than expected, (2) there are multiple properties to choose from and Colleen was unsure what to use, and (3) in many cases, there wasn't a property enum value that matched the TTD clinical status. - Relevant files: P1-05-Drug_disease.txt
other: If NameResolver's scoring changes, the threshold used to remove low-quality hits will need to be adjusted. - Relevant files: P1-05-Drug_disease.txt
edge_content: Consider adding a predicate for 'indications', which would better fit P1-05's data. See https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/30#issuecomment-3515893236 for details. - Relevant files: P1-05-Drug_disease.txt
edge_content: Could ingest another file 'Target to compound mapping with activity data', which contains chemical/drug 'affects' protein associations. Involves more parsing and filtering work. See https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/30#issuecomment-3209860820 for details. - Relevant files: P1-09-Target_compound_activity.txt
Additional Notes: Parsing P1-02 didn't increase the number of TTD drug ID mappings or number of successfully node-normalized entities (see jupyter notebook's P1-02 section for details).
Target Information¶
Target InfoRes ID: infores:translator-ttd
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| biolink:ChemicalEntity | biolink:DiseaseOrPhenotypicFeature | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status of 'Approved', 'Approved (orphan drug)', 'Approved in China)', 'Approved in EU', or 'Phase 4'. | |
| biolink:ChemicalEntity | biolink:DiseaseOrPhenotypicFeature | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status of 'Investigative' or 'Patented'. | |
| biolink:ChemicalEntity | biolink:DiseaseOrPhenotypicFeature | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status of 'Preclinical' or 'IND submitted'. | |
| biolink:ChemicalEntity | biolink:DiseaseOrPhenotypicFeature | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status related to clinical trials (which could be a specific phase, registration/preregistration, or submissions for approval). | |
| biolink:ChemicalEntity | biolink:Gene, biolink:Protein | knowledge_assertion | manual_agent | The TTD curators associated this chemical or drug with its therapeutic target (reported in literature). The qualifier-set is based on the TTD reported mechanism-of-action. | |
| biolink:ChemicalEntity | biolink:Gene, biolink:Protein | knowledge_assertion | manual_agent | The TTD curators associated this chemical or drug with its therapeutic target (reported in literature). TTD did not include a mechanism-of-action. | |
| biolink:ChemicalEntity | biolink:Gene, biolink:Protein | knowledge_assertion | manual_agent | The TTD curators associated this chemical or drug with its therapeutic target (reported in literature). The TTD-reported mechanism-of-action either (1) corresponds to the predicate and causal_mechanism_qualifier OR (2) implied a physical-interaction (so Translator generated another edge to represent this). |
Node Types¶
| Node Category | Source Identifier Types | Additional Notes |
|---|---|---|
| biolink:ChemicalEntity | PUBCHEM.COMPOUND | Original IDs are TTD drug, but we are using a TTD mapping file (P1-03) to map to PUBCHEM.COMPOUND IDs. |
| biolink:DiseaseOrPhenotypicFeature | DOID, EFO, HP, MONDO, NCIT, OMIM | Used NameResolver on indication name to get IDs. Didn't use the data's 'icd-11 IDs' because (1) these were codes, not the actual ICD-11 foundation URIs; (2) NodeNorm currently has very little support for ICD-11 IDs (and the few IDs recognized are foundation URIs); and (3) the TTD values are sometimes a 'range' of codes that aren't listed individually, which we cannot handle (ex: for 'solid tumour/cancer', TTD uses '2A00-2F9Z' aka code '2A00' to '2F9Z'). |
| biolink:Gene | UniProtKB | Original IDs are TTD target, but we are using a TTD mapping file (P2-01) and NameResolver to get UniProt IDs that can be NodeNormed. Some are non-human. NodeNorm with gene/protein conflation will set some of these to Gene entities. |
| biolink:Protein | UniProtKB | Original IDs are TTD target, but we are using a TTD mapping file (P2-01) and NameResolver to get UniProt IDs that can be NodeNormed. Some are non-human. |
Future Modeling Considerations¶
node_properties: TTD has files with information on drugs and therapeutic target proteins (P1-01 targets, P1-02 drugs). This could potentially be used for node properties (but it may be better to use existing resources that are updated more frequently).
edge_content: Ingest the P1-06 file for Drug-target_for-Disease edges. This would be a chance to explore utility and modeling of this new type of edge/knowledge. May need some modeling work / Biolink updates. Matt's understanding is that the targets here are necessarily mechanistic for the disease - which is new/valuable info.
Provenance Information¶
Contributors: - Colleen Xu - code author, data modeling - Andrew Su - code support, domain expertise - Matthew Brush - data modeling, domain expertise
Artifacts: - https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/30 - notebooks for development work currently in parser code directory