PathBank Reference Ingest Guide (RIG)¶
Source Information¶
InfoRes ID: infores:pathbank
Description: PathBank is a comprehensive, visually rich pathway database (>110k pathways) providing machine-readable exports (PWML, BioPAX, SBML, SBGN) and CSV summaries for pathway metadata and links.
Citations: - PathBank NAR article: https://doi.org/10.1093/nar/gkz861 - Downloads page: https://pathbank.org/downloads
Data Access Locations: - All downloads: https://pathbank.org/downloads
Data Provision Mechanisms: file_download
Data Formats: csv, pwml
Data Versioning and Releases: Treat each pull as a dated snapshot; downloads list 'Generated On' timestamps and large per-format archives.
Ingest Information¶
Ingest Categories: primary_knowledge_provider
Utility: Comprehensive pathway membership and mechanistic process data (reactions, transports, interactions) for Translator reasoning and pathway-centric analytics.
Scope: This ingest uses only two inputs: the master pathways CSV for biolink:Pathway nodes and the PWML archive for entities, processes, and edges. Other formats available from PathBank (BioPAX, SBML, SBGN, RXN, SVG, PNG, FASTA, SDF) are not currently used but may be considered for future enrichment.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| pathbank_all_pathways.csv.zip | https://pathbank.org/downloads | Master CSV with pathway subjects and descriptions; used to build Pathway nodes with IDs, names, descriptions, and species metadata. |
| pathbank_all_pwml.zip | https://pathbank.org/downloads | Per-pathway PWML files for detailed composition and connectivity (entities, processes, participants, localization, complexes). |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| pathbank_all_pathways.csv.zip | All pathway rows extracted to pathbank_pathways.csv | SMPDB ID, PW ID, Name, Description |
| pathbank_all_pwml.zip | All PWML pathway files | entities (compounds, proteins, complexes, nucleic acids), processes (reactions, transports, interactions), participants (inputs/outputs), enzymes/transporters, localization (compartments/tissues) |
Future Content Considerations¶
edge_property_content: Edge properties such as stoichiometry and participant_role - may be added in future if needed and if supported by Biolink Model
edge_content: Additional file formats available from PathBank (BioPAX, SBML, SBGN, RXN, SVG, PNG, FASTA, SDF) may be considered for future enrichment to provide additional pathway representations and visualizations
edge_content: Additional interaction type mappings may be added as more PathBank interaction types are discovered and validated against Biolink Model predicates
Target Information¶
Target InfoRes ID: infores:translator-pathbank-kgx
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| biolink:Pathway | biolink:has_participant | biolink:SmallMolecule, biolink:Protein, biolink:NucleicAcidEntity, biolink:MacromolecularComplex, biolink:ChemicalEntity, biolink:MolecularActivity | knowledge_assertion | manual_agent | Pathway has the specified entity as a participant (from PWML membership and context). |
| biolink:MolecularActivity | biolink:has_input | biolink:SmallMolecule, biolink:Protein, biolink:NucleicAcidEntity, biolink:MacromolecularComplex, biolink:ChemicalEntity | knowledge_assertion | manual_agent | Reaction (molecular activity) consumes specified inputs (reactants) extracted from PWML participants. |
| biolink:MolecularActivity | biolink:has_output | biolink:SmallMolecule, biolink:Protein, biolink:NucleicAcidEntity, biolink:MacromolecularComplex, biolink:ChemicalEntity | knowledge_assertion | manual_agent | Reaction (molecular activity) produces specified outputs (products) as defined in PWML. |
| biolink:MacromolecularComplex | biolink:catalyzes | biolink:MolecularActivity | knowledge_assertion | manual_agent | Enzyme or protein complex catalyzes the specified reaction (molecular activity) as identified in PWML. |
| biolink:MacromolecularComplex | biolink:has_part | biolink:Protein | knowledge_assertion | manual_agent | Complex contains specified proteins per PWML composition. |
| biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity | biolink:regulates | biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity | knowledge_assertion | manual_agent | Directional regulation edges derived from PWML interaction types. Qualifiers specify affected aspect and direction when available. |
| biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity | biolink:physically_interacts_with | biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity | knowledge_assertion | manual_agent | Physical interactions derived from PWML interaction types such as binding or complex formation. |
| biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity | biolink:interacts_with | biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity | knowledge_assertion | manual_agent | Fallback interaction predicate for PWML interaction types that cannot be mapped more specifically. |
| biolink:Pathway | biolink:occurs_in | biolink:CellularComponent, biolink:AnatomicalEntity | knowledge_assertion | manual_agent | Pathway localization to subcellular compartments (GO terms) and tissues (BTO terms) as specified in PWML data. |
Node Types¶
| Node Category | Source Identifier Types | Additional Notes |
|---|---|---|
| biolink:Pathway | Preferred: SMPDB pathway IDs from pathbank_pathways.csv (emitted as SMPDB: | When SMPDB is present, the PW identifier is stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/Pathway/ |
| biolink:SmallMolecule | Preferred: ChEBI, DrugBank, KEGG.COMPOUND identifiers from PWML when available (emitted as primary id), Fallback: PathBank compound IDs (emitted as PathBank:Compound_ | Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/SmallMolecule/ |
| biolink:Protein | Preferred: UniProtKB identifiers from PWML when available (emitted as primary id), Fallback: PathBank protein IDs (emitted as PathBank:Protein_ | Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/Protein/ |
| biolink:MacromolecularComplex | PathBank protein complex IDs (emitted as PathBank:ProteinComplex_ | Complexes are composed of proteins (linked via biolink:has_part edges). Biolink class reference: https://biolink.github.io/biolink-model/MacromolecularComplex/ |
| biolink:NucleicAcidEntity | Preferred: ChEBI identifiers from PWML when available (emitted as primary id), Fallback: PathBank nucleic acid IDs (emitted as PathBank:NucleicAcid_ | Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/NucleicAcidEntity/ |
| biolink:MolecularActivity | Preferred when available: EC identifiers from PWML reaction ec-number fields (emitted as EC: | Reactions are represented as MolecularActivity nodes with has_input and has_output edges. Biolink class reference: https://biolink.github.io/biolink-model/MolecularActivity/ |
| biolink:ChemicalEntity | Bounds: PathBank bound IDs (emitted as PathBank:Bound_ | Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/ChemicalEntity/ |
| biolink:CellularComponent | GO (Gene Ontology) identifiers for subcellular locations (e.g., GO:0005737 for Cytoplasm) | Subcellular compartments extracted from PWML data. Biolink class reference: https://biolink.github.io/biolink-model/CellularComponent/ |
| biolink:AnatomicalEntity | BTO (BRENDA Tissue Ontology) identifiers for tissues (e.g., BTO:0000759 for Liver) | Tissues and anatomical entities extracted from PWML data. Biolink class reference: https://biolink.github.io/biolink-model/AnatomicalEntity/ |
Provenance Information¶
Contributors: - Adilbek Bazarkulov: code implementation, support - Erica Wood: code author (RTX-KG2) - Evan Morris: code support - Sierra Moxon: data modeling, domain expertise - Matthew Brush: data modeling, domain expertise
Artifacts: - PathBank downloads: https://pathbank.org/downloads - Biolink Model repository: https://github.com/biolink/biolink-model - Biolink Model docs: https://biolink.github.io/biolink-model/ - KGX docs: https://biolink.github.io/kgx/