Skip to content

SemMedDB Minimal Reference Ingest Guide (RIG)

Source Information

InfoRes ID: infores:semmeddb

Description: Literature-derived semantic predications extracted by SemRep from PubMed; this RIG documents a post-processed RTX-KG2 edges subset rather than a raw MySQL ingest.

Data Access Locations: - SemMedDB downloads: https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/SemMedDB_download.html

Data Provision Mechanisms: file_download

Data Formats: jsonl

Data Versioning and Releases: Versioning inherited from the RTX-KG2 snapshot built from SemMedDB VER43-era releases.

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: Large-scale literature-derived relationships suitable for hypothesis generation and cross-graph enrichment with evidence-bearing edges.

Scope: Edges taken directly from RTX-KG2’s SemMedDB slice (nodes.jsonl/edges.jsonl), restricted to a selected predicate set and evidence-bearing records.

Relevant Files

File Name Location Description
kg2.10.3-semmeddb-edges.jsonl (RTX-KG2 SemMedDB slice) internal graph build artifact KGX-compatible edges with SemMedDB predicate CURIEs and literature evidence.
kg2.10.3-semmeddb-nodes.jsonl (RTX-KG2 SemMedDB slice) internal graph build artifact KGX-compatible nodes referenced by the edges subset.

Included Content

File Name Included Records Fields Used
kg2.10.3-semmeddb-edges.jsonl Edges whose predicate is in the selected predicate set; edges retain evidence and negation flags. subject, predicate, object, publications, publications_info, negated, domain_range_exclusion, provided_by, knowledge_level, agent_type
kg2.10.3-semmeddb-nodes.jsonl Only nodes referenced by included edges. id, name, category, xrefs (as available)

Filtered Content

File Name Filtered Records Rationale
kg2.10.3-semmeddb-edges.jsonl Edges with predicates outside the selected set Reduce predicate heterogeneity to a core set used operationally.
kg2.10.3-semmeddb-edges.jsonl Edges lacking publication evidence payload Ensure each edge has traceable literature support.
kg2.10.3-semmeddb-edges.jsonl Self-loops for specific relation types Avoid trivial or uninformative self-relations.
kg2.10.3-semmeddb-edges.jsonl Domain/range violations per exclusion configuration when present Improve semantic coherence against Biolink-like domain/range expectations.

Target Information

Target InfoRes ID: infores:translator-semmeddb-kgx

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:affects
biolink:located_in
biolink:related_to
biolink:interacts_with
biolink:coexists_with
biolink:treats_or_applied_or_studied_to_treat
biolink:causes
biolink:disrupts
biolink:predisposes_to_condition
biolink:preventative_for_condition
biolink:associated_with

Provenance Information

Contributors: - Erica Wood: code author - Evan Morris: code support - Adilbek Bazarkulov: code support, domain expertise - Sierra Moxon: data modeling, domain expertise - Matthew Brush: data modeling, domain expertise

Artifacts: - SemMedDB overview: https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/dbinfo.html - Biolink Model (schema): https://github.com/biolink/biolink-model - KGX documentation: https://kgx.readthedocs.io