SemMedDB Minimal Reference Ingest Guide (RIG)¶
Source Information¶
InfoRes ID: infores:semmeddb
Description: Literature-derived semantic predications extracted by SemRep from PubMed; this RIG documents a post-processed RTX-KG2 edges subset rather than a raw MySQL ingest.
Data Access Locations: - SemMedDB downloads: https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/SemMedDB_download.html
Data Provision Mechanisms: file_download
Data Formats: jsonl
Data Versioning and Releases: Versioning inherited from the RTX-KG2 snapshot built from SemMedDB VER43-era releases.
Ingest Information¶
Ingest Categories: primary_knowledge_provider
Utility: Large-scale literature-derived relationships suitable for hypothesis generation and cross-graph enrichment with evidence-bearing edges.
Scope: Edges taken directly from RTX-KG2's SemMedDB slice (nodes.jsonl/edges.jsonl), restricted to a selected predicate set and evidence-bearing records.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| kg2.10.3-semmeddb-edges.jsonl (RTX-KG2 SemMedDB slice) | internal graph build artifact | KGX-compatible edges with SemMedDB predicate CURIEs and literature evidence. |
| kg2.10.3-semmeddb-nodes.jsonl (RTX-KG2 SemMedDB slice) | internal graph build artifact | KGX-compatible nodes referenced by the edges subset. |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| kg2.10.3-semmeddb-edges.jsonl | Edges whose source predicate is in the selected Biolink predicate set: biolink:treats_or_applied_or_studied_to_treat, biolink:affects, biolink:preventative_for_condition, biolink:coexists_with, biolink:causes, biolink:related_to, biolink:interacts_with, biolink:located_in, biolink:predisposes_to_condition, biolink:disrupts. These are mapped from SEMMEDDB:ADMINISTERED_TO, SEMMEDDB:AFFECTS, SEMMEDDB:ASSOCIATED_WITH, SEMMEDDB:AUGMENTS, SEMMEDDB:CAUSES, SEMMEDDB:COEXISTS_WITH, SEMMEDDB:COMPARED_WITH, SEMMEDDB:DISRUPTS, SEMMEDDB:HIGHER_THAN, SEMMEDDB:INHIBITS, SEMMEDDB:INTERACTS_WITH, SEMMEDDB:ISA, SEMMEDDB:LOCATION_OF (invert), SEMMEDDB:LOWER_THAN, SEMMEDDB:MEASURES, SEMMEDDB:PREDISPOSES, SEMMEDDB:PREVENTS, SEMMEDDB:STIMULATES, SEMMEDDB:TREATS, and SEMMEDDB:XREF. During transform, biolink:preventative_for_condition is remapped to biolink:treats_or_applied_or_studied_to_treat. | subject, predicate, object, publications, publications_info, negated, domain_range_exclusion, subject_novelty, object_novelty |
| kg2.10.3-semmeddb-nodes.jsonl | Only nodes referenced by included edges. | id, name, category, xrefs (as available) |
Filtered Content¶
| File Name | Filtered Records | Rationale |
|---|---|---|
| kg2.10.3-semmeddb-edges.jsonl | Edges with predicates outside the selected set: biolink:exacerbates_condition, biolink:derives_from, biolink:diagnoses, biolink:produces, biolink:close_match, biolink:has_input, biolink:manifestation_of, biolink:occurs_in, biolink:has_part, biolink:precedes. These are mapped from the following source SEMMEDDB predicates: SEMMEDDB:COMPLICATES, SEMMEDDB:CONVERTS_TO (invert), SEMMEDDB:DIAGNOSES, SEMMEDDB:PRODUCES, SEMMEDDB:SAME_AS, SEMMEDDB:USES, SEMMEDDB:MANIFESTATION_OF, SEMMEDDB:OCCURS_IN, SEMMEDDB:PROCESS_OF, SEMMEDDB:PART_OF (invert), SEMMEDDB:PRECEDES. SEMMEDDB predicates dropped in the original RTX-KG2 ingest include: SEMMEDDB:MEASUREMENT_OF, SEMMEDDB:METHOD_OF, SEMMEDDB:NOM, SEMMEDDB:PREP, SEMMEDDB:VERB. | Reduce predicate heterogeneity to a core set used operationally. |
| kg2.10.3-semmeddb-edges.jsonl | Edges lacking publication evidence payload | Ensure each edge has traceable literature support. |
| kg2.10.3-semmeddb-edges.jsonl | Self-loops for specific relation types | Avoid trivial or uninformative self-relations. |
| kg2.10.3-semmeddb-edges.jsonl | Domain/range violations per exclusion configuration when present | Improve semantic coherence against Biolink-like domain/range expectations. |
Target Information¶
Target InfoRes ID: infores:translator-semmeddb-kgx
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| not_provided | text_mining_agent | SemMedDB data are generated by an automated text-mining agent that extracts relationships from biomedical literature. The record represents text evidence that [SUBJECT] and [OBJECT] were reported with [RELATIONSHIP] in the source literature. |
Node Types¶
| Node Category | Source Identifier Types | Additional Notes |
|---|---|---|
| biolink:SmallMolecule | ||
| biolink:Drug | ||
| biolink:ChemicalEntity | ||
| biolink:MolecularMixture | ||
| biolink:ChemicalMixture | ||
| biolink:Cell | ||
| biolink:Disease | ||
| biolink:ComplexMolecularMixture | ||
| biolink:PhenotypicFeature | ||
| biolink:BiologicalProcess | ||
| biolink:MolecularActivity | ||
| biolink:CellularComponent | ||
| biolink:Gene | ||
| biolink:OrganismTaxon | ||
| biolink:Protein | ||
| biolink:GrossAnatomicalStructure | ||
| biolink:AnatomicalEntity | ||
| biolink:Polypeptide | ||
| biolink:InformationContentEntity | ||
| biolink:Procedure | ||
| biolink:Behavior | ||
| biolink:Agent | ||
| biolink:Activity | ||
| biolink:Device | ||
| biolink:Cohort | ||
| biolink:PopulationOfIndividualOrganisms | ||
| biolink:Phenomenon | ||
| biolink:GenomicEntity | ||
| biolink:BiologicalEntity | ||
| biolink:Publication | ||
| biolink:NucleicAcidEntity | ||
| biolink:ClinicalAttribute | ||
| biolink:PhysicalEntity | ||
| biolink:DiseaseOrPhenotypicFeature | ||
| biolink:Human | ||
| biolink:Event | ||
| biolink:PhysiologicalProcess |
Provenance Information¶
Contributors: - Erica Wood: code author - Evan Morris: code support - Adilbek Bazarkulov: code support, domain expertise - Sierra Moxon: data modeling, domain expertise - Matthew Brush: data modeling, domain expertise
Artifacts: - SemMedDB overview: https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/dbinfo.html - Biolink Model (schema): https://github.com/biolink/biolink-model - KGX documentation: https://kgx.readthedocs.io - Summary of predicates included and filtered and their mappings: https://docs.google.com/spreadsheets/d/12XmPE9eJp3H7yJnwg5Wmx5-BZdXXbiwO02vDRc1BY-c/edit?gid=520297121#gid=520297121