SemMedDB Minimal Reference Ingest Guide (RIG)¶
Source Information¶
InfoRes ID: infores:semmeddb
Description: Literature-derived semantic predications extracted by SemRep from PubMed; this RIG documents a post-processed RTX-KG2 edges subset rather than a raw MySQL ingest.
Data Access Locations: - SemMedDB downloads: https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/SemMedDB_download.html
Data Provision Mechanisms: file_download
Data Formats: jsonl
Data Versioning and Releases: Versioning inherited from the RTX-KG2 snapshot built from SemMedDB VER43-era releases.
Ingest Information¶
Ingest Categories: primary_knowledge_provider
Utility: Large-scale literature-derived relationships suitable for hypothesis generation and cross-graph enrichment with evidence-bearing edges.
Scope: Edges taken directly from RTX-KG2’s SemMedDB slice (nodes.jsonl/edges.jsonl), restricted to a selected predicate set and evidence-bearing records.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| kg2.10.3-semmeddb-edges.jsonl (RTX-KG2 SemMedDB slice) | internal graph build artifact | KGX-compatible edges with SemMedDB predicate CURIEs and literature evidence. |
| kg2.10.3-semmeddb-nodes.jsonl (RTX-KG2 SemMedDB slice) | internal graph build artifact | KGX-compatible nodes referenced by the edges subset. |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| kg2.10.3-semmeddb-edges.jsonl | Edges whose predicate is in the selected predicate set; edges retain evidence and negation flags. | subject, predicate, object, publications, publications_info, negated, domain_range_exclusion, provided_by, knowledge_level, agent_type |
| kg2.10.3-semmeddb-nodes.jsonl | Only nodes referenced by included edges. | id, name, category, xrefs (as available) |
Filtered Content¶
| File Name | Filtered Records | Rationale |
|---|---|---|
| kg2.10.3-semmeddb-edges.jsonl | Edges with predicates outside the selected set | Reduce predicate heterogeneity to a core set used operationally. |
| kg2.10.3-semmeddb-edges.jsonl | Edges lacking publication evidence payload | Ensure each edge has traceable literature support. |
| kg2.10.3-semmeddb-edges.jsonl | Self-loops for specific relation types | Avoid trivial or uninformative self-relations. |
| kg2.10.3-semmeddb-edges.jsonl | Domain/range violations per exclusion configuration when present | Improve semantic coherence against Biolink-like domain/range expectations. |
Target Information¶
Target InfoRes ID: infores:translator-semmeddb-kgx
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| biolink:affects | |||||
| biolink:located_in | |||||
| biolink:related_to | |||||
| biolink:interacts_with | |||||
| biolink:coexists_with | |||||
| biolink:treats_or_applied_or_studied_to_treat | |||||
| biolink:causes | |||||
| biolink:disrupts | |||||
| biolink:predisposes_to_condition | |||||
| biolink:preventative_for_condition | |||||
| biolink:associated_with |
Provenance Information¶
Contributors: - Erica Wood: code author - Evan Morris: code support - Adilbek Bazarkulov: code support, domain expertise - Sierra Moxon: data modeling, domain expertise - Matthew Brush: data modeling, domain expertise
Artifacts: - SemMedDB overview: https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/dbinfo.html - Biolink Model (schema): https://github.com/biolink/biolink-model - KGX documentation: https://kgx.readthedocs.io