This software repository forms an integral part of the Biomedical Data Translator Consortium, Performance Phase 3 efforts at biomedical knowledge integration, within the auspices of the Data INGest and Operations (“DINGO”) Working Group. The repository aggregates and coordinates the development of knowledge-specific and shared library software used for Translator data ingests from primary (mostly external “third party”) knowledge sources, into so-called Translator “Tier 1” knowledge graph(s). This software is primarily coded in Python.
A general discussion of the Translator Data Ingest architecture is provided here.
The project uses the uv Python package and project manager You will need to install uv onto your system, along with a suitable Python (Release 3.12) interpreter.
The project initially (mid-June 2025) uses a conventional unix-style make file to execute tasks. For this reason, working within a command line interface terminal. A MacOSX, Ubuntu or Windows WSL2 (with Ubuntu) is recommended. See the Developers’ README for tips on configuring your development environment.
To ensure that ingests are performed rigorously, consistently, and reproducibly, we have defined an Standard Operating Procedure (SOP) to guide the source ingest process.
The SOP is initially tailored to guide re-ingest of current sources to create a “functional replacement” of the Phase 2 system - but it can be adapted to guide ingest of new sources as well.
Below are descriptions and links for the various artifacts prescribed by the SOP.
Here, we apply a koza transform of data from the Comparative Toxicology Database, writing the knowledge graph output out to jsonlines (jsonl) files. The project is built and executed using a conventional (unix-like) Makefile:
│ Usage:
│ make <target>
│
│ Targets:
│ help Print this help message
│
│ all Install everything and test
│ fresh Clean and install everything
│ clean Clean up build artifacts
│ clobber Clean up generated files
│
│ install install python requirements
│ download Download data
│ run Run the transform
│
│ test Run all tests
│
│ lint Lint all code
│ format Format all code running the following steps.
The task involves the following steps/components: