Operations and Workflows
In the current (2022) Translator architecture, TRAPI queries are akin to database lookups: a query graph is sent as input, and knowledge satisfying that query is sent in return. Users may wish to perform subsequent operations on these results or utilize additional capabilities of Translator components besides “query lookups.” Operations and Workflows (O/W) is designed to fulfill this purpose.
What is a workflow?
A workflow is a sequential series of specified actions (called operations) that instructs Translator components exactly how to process a query. These can include actions such as sorting, filtering, adding statistical information, querying specific knowledge providers, etc. A list of currently implemented operations are given here: List of implemented operations and in this part of the O/W repo.
How is an operation structured?
Operations are structured JSON objects that are inserted into the Workflow
section of a TRAPI message. Operations
are organized in an hierarchical fashion, with most Translator components capable of executing some leaf-node operations.
For example, consider the Overlay
operation: as specified in the
description that
defines an “overlay” operation as something that adds additional edges in the KG and/or query graph. The
sub-operations define what exactly is “overlaid” on the graph. An example leaf-node subclass operation is
overlay_compute_ngd
and an example of calling it would be:
{"message": {
"query_graph": {
"nodes": {
"n0": { "ids": [ "CHEBI:46195" ]},
"n1": { "categories":[ "biolink:Protein" ] }
},
"edges": {
"e0": { "subject": "n0", "object": "n1" },
}
}
},
"Workflow":[
{"id":"lookup"},
{"id":"overlay_compute_ngd", "parameters":{"qnode_keys":["n0", "n1"], "virtual_relation_label":"NGD1"}},
{"id":"bind"},
{"id":"complete_results"},
{"id":"filter_results_top_n", "parameters":{"max_results": 10}]}
}
overlay_compute_ngd
operation computes the
Normalized Google Distance
between all nodes corresponding to q_node
id n0
and n1
and populates this on a new edge labeled NGD1
.
Operations have a variety of properties, including input requirements, output guarantees, allowed changes, and parameters.
Importantly, operations that have the unique: true
property are expected to result in different behavior depending on
which Translator component the operation is sent to. For example, the fill
operation populates a KG with
bioentities that satisfy an input query graph. It is assumed that different components can fill different kinds of knowledge,
so this operation has the unique: tree
property. Other operations (such as some sorting operations) have unique: false
because each
component is expected to perform the operation in the same fashion.
Compound operations
For convenience, some operations are compositions of a number of other operations. For example, lookup_and_score
is the composition of the operations fill
, bind
, complete_results
, and score
. These operations act identically
as to sequentially executing the sub-operations that comprise it.
Using operations
A TRAPI query that utilizes the O/W language may or may not have a populated query graph and/or KG. For example,
a fill
operation may only have a populated query graph, while a filter
operation may have a populated query graph
and KG.
Typically, if you are starting with a “fresh” query, the usual procedure is as follows:
- Provide a query graph and empty KG, and populate the
Workflow
portion of the TRAPI query with the following operations: fill
this will instruct Translator components to find knowledge that satisfies the query graph (akin to a database lookup)- Any of the
annotate
,filter
, oroverlay
operations to argument the knowledge with additional information, filter out some bioentities, etc. bind
andcomplete_results
as this will form the answer set for the query- Additional
sort
orfilter
operations as desired.
A “standard” TRAPI query (without a workflow) is equivalent to sending a query graph and calling lookup_and_score
,
as this will only find bioentities that satisfy a query graph, score/rank the results, and return them to a user.
More complicated workflows can be created by mixing and matching the above typical procedure. For example, here is a workflow that:
- Reaches out to COHD to find the first hop of a query
- Overlays the Normalized Google Distance
- Takes the top three highest scoring results
- Reaches out to a different KP to find two subsequent hops that optionally:
- are either
increases_activity_of
andincreases_activity_of
- or
decreases_activity_of
anddecreases_activity_of
(i.e. same direction of activity) - Create and rank the final answers
In this fashion, a user can precisely specify what sort of workflow they desire.
Advertising operations
Each Translator component is expected to implement a set of operations that it can perform. This is done via
the OpenAPI specification utilizing the x-trapi
extension. Directions on how to implement this can be found
here,
with a real-world example here.
Implementing operations
Please see the Guide for developers for developer details about how to implement new operations.
Operations and workflow policies
In the current architecture, Translator components are expected to perform the operations in the order they are
specified in the workflow
section of a TRAPI message. This is a strict requirement, and if a component encounters
an operation it has not implemented, it must throw an error. No additional operations or actions are to be performed
besides that which is specified in the workflow
section.
Each Translator ARA is required to, at minimum, implement the lookup
and/or lookup_and_score
operations. These are
equivalent to standard TRAPI queries without the workflow
section. KP‘s can optionally implement these operations.
Translator components are encouraged to create PRs to implement additional operations. This is a mechanism by which components can expose additional functionality that may not currently fit in the “non-workflow” TRAPI paradigm.
Workflow Runner
The workflow runner is a component that is responsible for identifying which Translator components can execute the specified operations in a workflow, sending the TRAPI message to those components, and then combining and returning the results. The figure below depicts the architecture of the workflow runner and how it is related to the ARS and ARAs. Note that the workflow runner is a component that is not a part of the ARS or ARAs, but is instead a component that sits between the ARS and ARAs/KPs. Thought it is not shown here, any KP can also advertise operations it can perform (as described above), and the workflow runner will query these KPs if given a workflow with the relevant operation.
If the workflow runner is unable to find a component that can perform the operation, then it will return an error.
When the workflow runner is given a workflow, it will iterate through the operations in the workflow and:
- If the operation(s) is marked as unique: true
, then it will send the TRAPI message to all component(s) that implements that operation
- If the operation(s) is marked as unique: false
, then it will send the TRAPI message to a single component that implement that operation
After each operation is executed, the results are combined before executing subsequent operations.