Skip to content

Human Phenotype Ontology Annotations

Source Information

InfoRes ID: infores:hpo-annotations

Description: The Human Phenotype Ontology (HPO) provides a standard vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as Atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains over 18,000 terms and over 156,000 annotations to hereditary diseases. The HPO project and others have developed software for phenotype-driven differential diagnostics, genomic diagnostics, and translational research. The Human Phenotype Ontology group curates and assembles over 115,000 HPO-related annotations ("HPOA") to hereditary diseases using the HPO ontology. Here we create Biolink associations between diseases and phenotypic features, together with their evidence, and age of onset and frequency (if known). Disease annotations here are also cross-referenced to the MONarch Disease Ontology (MONDO) (https://mondo.monarchinitiative.org/). There are four HPOA ingests ('disease-to-phenotype' (includes capture of disease modes of inheritance, 'gene-to-phenotype' and 'gene-to-disease') that parse out records from the HPO Phenotype Annotation File (http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa).

Citations: - https://doi.org/10.1093/nar/gkaa1043

Data Access Locations: - https://hpo.jax.org/data/annotations

Data Provision Mechanisms: file_download, api_endpoint

Data Formats: tsv, other

Data Versioning and Releases: GitHub managed releases at https://github.com/obophenotype/human-phenotype-ontology/releases No consistent cadence for releases. Versioning is based on the month and year of the release.

Additional Notes: None

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: The HPO and associated annotations are a flagship product of the Monarch Initiative (https://monarchinitiative.org/), an NIH-supported international consortium dedicated to semantic integration of biomedical and model organism data with the ultimate goal of improving biomedical research. The human phenotype/disease/gene knowledge integration aligns well with the general mission of the Biomedical Data Translator. As a consequence, several members of the Monarch Initiative are direct participants in the Biomedical Data Translator, with Monarch data forming one primary knowledge source contributing to Translator knowledge graphs.

Scope: Covers curated Disease, Phenotype and Genes relationships annotated with Human Phenotype Ontology terms.

Relevant Files

File Name Location Description
phenotype.hpoa https://hpo.jax.org/data/annotations disease to HPO phenotype annotations, including inheritance information
genes_to_disease.txt https://hpo.jax.org/data/annotations gene to HPO disease annotations
genes_to_phenotype.txt https://hpo.jax.org/data/annotations gene to HPO phenotype annotations

Included Content

File Name Included Records Fields Used
phenotype.hpoa Disease to Phenotype relationships (i.e., rows with 'aspect' == 'P') database_id, qualifier, hpo_id, reference, evidence, onset, frequency, sex, aspect
phenotype.hpoa Disease "Mode of Inheritance" relationships (i.e., rows with 'aspect' == 'I') represented as node properties rather than edges database_id, qualifier, hpo_id, reference, evidence, onset, frequency, sex, aspect
genes_to_disease.txt Mendelian Gene to Disease relationships (i.e., rows with 'association_type' == 'MENDELIAN') ncbi_gene_id, gene_symbol, association_type, disease_id, source
genes_to_disease.txt Polygenic Gene to Disease relationships (i.e., rows with 'association_type' == 'POLYGENIC') ncbi_gene_id, gene_symbol, association_type, disease_id, source
genes_to_phenotype.txt Records where we determine that the reported G-P association was inferred over a G-D associated type with the value "MENDELIAN" ncbi_gene_id, gene_symbol, hpo_id, hpo_name, frequency, disease_id

Filtered Content

File Name Filtered Records Rationale
genes_to_disease.txt Records where we determine that the reported G-P association was inferred over a G-D associated type with the value "UNKNOWN" Records of this type are not included in the ingest because they are Gene-Disease relationships for which the supporting evidence is missing, weak or inconclusive.
genes_to_phenotype.txt Records where we determine that the reported G-P association was inferred over a G-D associated type with the value "POLYGENIC" or "UNKNOWN" HPO will infer a Gene-Phenotype association G1-P1 in cases where G1 causes, contributes_to, or is associated with D1, and D1 is associated with a Phenotype P1. This logic holds for Mendelian disease where a single gene is causal and thus responsible for all associated phenotypes. It does not necessarily hold for Polygenic or Unknown diseases where the gene may be one of many contributing factors, and thus does not necessarily contribute to or have an association with each phenotype of the disease.

Future Content Considerations

edge_content: Consider bringing back G-P associations based on inferences over Unknown Diseases if we establish a confidence annotation paradigm that lets us indicate these inferences to be weaker than those inferred over Mendelian or Polygenic diseases where the Gene is individually causal for, or contributing to, the disease. - Relevant files: genes_to_disease.txt

edge_content: Consider bringing back G-P associations based on inferences over Polygenic or Unknown Diseases if we establish a confidence annotation paradigm that lets us indicate these inferences to be weaker than those inferred over Mendelian diseases where the Gene is individually causal for the disease and all of its phenotypes. - Relevant files: genes_to_phenotype.txt

Additional Notes: None

Target Information

Target InfoRes ID: infores:translator-hpo-annotations

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:Disease biolink:PhenotypicFeature knowledge_assertion manual_agent Source HPOA data provide Disease-Phenotype associations that are produced by HPO curators through manual review of clinical data and published evidence. The HPOA record used to create this Translator edge reports that a Phenotype is observed to manifest in a particular Disease, and may include information about the frequency and/or biological context (e.g. sex, onset) of this manifestation. This relationship is represented using the Biolink 'has phenotype' predicate - with optional qualifiers describing frequency/context information where provided.
biolink:Gene biolink:Disease knowledge_assertion manual_agent Source HPOA data provide Gene-Disease associations that are manually curated from sources like Orphanet and MIM2Gene and DECIPHER by HPO. The record used to create this Translator edge reports that a Gene is associated with a Disease when there is evidence that variants in the gene may cause or contribute to its manifestation. The Biolink predicate used to represent this relationship depends on the type of Disease. For Mendelian disorders (where the HPOA "association_type" field = 'MENDELIAN') 'associated_with' is the primary predicate and the 'causes' predicate is used as the 'qualified predicate'. For polygenic disease (where the "association_type" field = "POLYGENIC") 'associated_with' is the primary predicate and the 'contributes_to' predicate is used as the 'qualified predicate'. Qualifier is used to indicate that it is a 'genetic_variant_form' of the gene that participates in these relationships, not the Gene in general.
biolink:Gene biolink:PhenotypicFeature logical_entailment automated_agent Source HPOA data provide direct Gene-Phenotype associations that are automatically assigned between a Gene and any Phenotype that manifests in a Mendelian Disease associated with the Gene. The record used to create this edge specifically reports a Gene-Phenotype association where the Gene has variants causal for a Mendelian disorder in which the Phenotype is manifest - from which it is logically entailed that variants in the Gene are also causal for the Phenotype (as such disorders have a single causal gene). This relationship is represented using the Biolink 'causes' predicate, with a qualifier indicating that it is a 'genetic_variant_form' of the gene that is causing the Phenotype (not the Gene in general)."

Node Types

Node Category Source Identifier Types Additional Notes
biolink:Disease OMIM, ORPHANET, DECIPHER None
biolink:PhenotypicFeature HP None
biolink:Gene NCBIGene None

Future Modeling Considerations

spoq_pattern: Consider alternate patterns for representing G-causes-D and G-contributes_to-D associations where we place more semantics into predicates, per https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/22 Should we consider creating support paths in our data/graphs, for the G-D-P hops over which HPO infers G-P associations? (e.g. GENE1 -causes-> DISEASE1 -has_phenotype-> PHENO1 ----> GENE1 -causes-> PHENO1)

Additional Notes: The HPOA ingest of the Disease.node_properties.inheritance value is currently thought to be mildly unreliable due to the way node property merging is (not) currently implemented in the ingest. This is a known issue (Translator Ingest PR https://github.com/NCATSTranslator/translator-ingests/issues/259) which will be addressed in a future release of the pipeline and/or ingest.

Provenance Information

Contributors: - Richard Bruskiewich - data modeling, domain expertise, code author - Kevin Schaper - code author - Sierra Moxon - data modeling, domain expertise, code support - Matthew Brush - data modeling, domain expertise

Artifacts: - Ingest Survey (https://docs.google.com/spreadsheets/d/1R9z-vywupNrD_3ywuOt_sntcTrNlGmhiUWDXUdkPVpM/edit?gid=0#gid=0) - Ingest Ticket (https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/24) - Modeling Ticket (https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/22)