Skip to content

PathBank Reference Ingest Guide (RIG)

Source Information

InfoRes ID: infores:pathbank

Description: PathBank is a comprehensive, visually rich pathway database (>110k pathways) providing machine-readable exports (PWML, BioPAX, SBML, SBGN) and CSV summaries for pathway metadata and links.

Citations: - PathBank NAR article: https://doi.org/10.1093/nar/gkz861 - Downloads page: https://pathbank.org/downloads

Data Access Locations: - All downloads: https://pathbank.org/downloads

Data Provision Mechanisms: file_download

Data Formats: csv, pwml

Data Versioning and Releases: Treat each pull as a dated snapshot; downloads list 'Generated On' timestamps and large per-format archives.

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: Comprehensive pathway membership and mechanistic process data (reactions, transports, interactions) for Translator reasoning and pathway-centric analytics.

Scope: This ingest uses only two inputs: the master pathways CSV for biolink:Pathway nodes and the PWML archive for entities, processes, and edges. Other formats available from PathBank (BioPAX, SBML, SBGN, RXN, SVG, PNG, FASTA, SDF) are not currently used but may be considered for future enrichment.

Relevant Files

File Name Location Description
pathbank_all_pathways.csv.zip https://pathbank.org/downloads Master CSV with pathway subjects and descriptions; used to build Pathway nodes with IDs, names, descriptions, and species metadata.
pathbank_all_pwml.zip https://pathbank.org/downloads Per-pathway PWML files for detailed composition and connectivity (entities, processes, participants, localization, complexes).

Included Content

File Name Included Records Fields Used
pathbank_all_pathways.csv.zip All pathway rows extracted to pathbank_pathways.csv SMPDB ID, PW ID, Name, Description
pathbank_all_pwml.zip All PWML pathway files entities (compounds, proteins, complexes, nucleic acids), processes (reactions, transports, interactions), participants (inputs/outputs), enzymes/transporters, localization (compartments/tissues)

Future Content Considerations

edge_property_content: Edge properties such as stoichiometry and participant_role - may be added in future if needed and if supported by Biolink Model

edge_content: Additional file formats available from PathBank (BioPAX, SBML, SBGN, RXN, SVG, PNG, FASTA, SDF) may be considered for future enrichment to provide additional pathway representations and visualizations

edge_content: Additional interaction type mappings may be added as more PathBank interaction types are discovered and validated against Biolink Model predicates

Target Information

Target InfoRes ID: infores:translator-pathbank-kgx

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:Pathway biolink:has_participant biolink:SmallMolecule, biolink:Protein, biolink:NucleicAcidEntity, biolink:MacromolecularComplex, biolink:ChemicalEntity, biolink:MolecularActivity knowledge_assertion manual_agent Pathway has the specified entity as a participant (from PWML membership and context).
biolink:MolecularActivity biolink:has_input biolink:SmallMolecule, biolink:Protein, biolink:NucleicAcidEntity, biolink:MacromolecularComplex, biolink:ChemicalEntity knowledge_assertion manual_agent Reaction (molecular activity) consumes specified inputs (reactants) extracted from PWML participants.
biolink:MolecularActivity biolink:has_output biolink:SmallMolecule, biolink:Protein, biolink:NucleicAcidEntity, biolink:MacromolecularComplex, biolink:ChemicalEntity knowledge_assertion manual_agent Reaction (molecular activity) produces specified outputs (products) as defined in PWML.
biolink:MacromolecularComplex biolink:catalyzes biolink:MolecularActivity knowledge_assertion manual_agent Enzyme or protein complex catalyzes the specified reaction (molecular activity) as identified in PWML.
biolink:MacromolecularComplex biolink:has_part biolink:Protein knowledge_assertion manual_agent Complex contains specified proteins per PWML composition.
biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity biolink:regulates biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity knowledge_assertion manual_agent Directional regulation edges derived from PWML interaction types. Qualifiers specify affected aspect and direction when available.
biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity biolink:physically_interacts_with biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity knowledge_assertion manual_agent Physical interactions derived from PWML interaction types such as binding or complex formation.
biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity biolink:interacts_with biolink:Protein, biolink:MacromolecularComplex, biolink:SmallMolecule, biolink:ChemicalEntity knowledge_assertion manual_agent Fallback interaction predicate for PWML interaction types that cannot be mapped more specifically.
biolink:Pathway biolink:occurs_in biolink:CellularComponent, biolink:AnatomicalEntity knowledge_assertion manual_agent Pathway localization to subcellular compartments (GO terms) and tissues (BTO terms) as specified in PWML data.

Node Types

Node Category Source Identifier Types Additional Notes
biolink:Pathway Preferred: SMPDB pathway IDs from pathbank_pathways.csv (emitted as SMPDB:), Fallback: PathBank PW IDs from pathbank_pathways.csv (emitted as PathBank:PW when SMPDB is missing) When SMPDB is present, the PW identifier is stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/Pathway/
biolink:SmallMolecule Preferred: ChEBI, DrugBank, KEGG.COMPOUND identifiers from PWML when available (emitted as primary id), Fallback: PathBank compound IDs (emitted as PathBank:Compound_) Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/SmallMolecule/
biolink:Protein Preferred: UniProtKB identifiers from PWML when available (emitted as primary id), Fallback: PathBank protein IDs (emitted as PathBank:Protein_) Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/Protein/
biolink:MacromolecularComplex PathBank protein complex IDs (emitted as PathBank:ProteinComplex_) Complexes are composed of proteins (linked via biolink:has_part edges). Biolink class reference: https://biolink.github.io/biolink-model/MacromolecularComplex/
biolink:NucleicAcidEntity Preferred: ChEBI identifiers from PWML when available (emitted as primary id), Fallback: PathBank nucleic acid IDs (emitted as PathBank:NucleicAcid_) Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/NucleicAcidEntity/
biolink:MolecularActivity Preferred when available: EC identifiers from PWML reaction ec-number fields (emitted as EC:), Fallback: PathBank reaction IDs (emitted as PathBank:Reaction_) Reactions are represented as MolecularActivity nodes with has_input and has_output edges. Biolink class reference: https://biolink.github.io/biolink-model/MolecularActivity/
biolink:ChemicalEntity Bounds: PathBank bound IDs (emitted as PathBank:Bound_), Element collections: external identifiers when available (ChEBI, UniProtKB, KEGG.COMPOUND) emitted as primary id, otherwise PathBank:ElementCollection_ Non-primary identifiers are stored in node xref for traceability. Biolink class reference: https://biolink.github.io/biolink-model/ChemicalEntity/
biolink:CellularComponent GO (Gene Ontology) identifiers for subcellular locations (e.g., GO:0005737 for Cytoplasm) Subcellular compartments extracted from PWML data. Biolink class reference: https://biolink.github.io/biolink-model/CellularComponent/
biolink:AnatomicalEntity BTO (BRENDA Tissue Ontology) identifiers for tissues (e.g., BTO:0000759 for Liver) Tissues and anatomical entities extracted from PWML data. Biolink class reference: https://biolink.github.io/biolink-model/AnatomicalEntity/

Provenance Information

Contributors: - Adilbek Bazarkulov: code implementation, support - Erica Wood: code author (RTX-KG2) - Evan Morris: code support - Sierra Moxon: data modeling, domain expertise - Matthew Brush: data modeling, domain expertise

Artifacts: - PathBank downloads: https://pathbank.org/downloads - Biolink Model repository: https://github.com/biolink/biolink-model - Biolink Model docs: https://biolink.github.io/biolink-model/ - KGX docs: https://biolink.github.io/kgx/