Skip to content

BGee Ingest Guide

Source Information

InfoRes ID: infores:bgee

Description: Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question 'where is a gene expressed?' and supports research in cancer and agriculture, as well as evolutionary biology.

Citations: - Frederic B Bastian, Julien Roux, Anne Niknejad, Aurélie Comte, Sara S Fonseca Costa, Tarcisio Mendes de Farias, Sébastien Moretti, Gilles Parmentier, Valentine Rech de Laval, Marta Rosikiewicz, Julien Wollbrett, Amina Echchiki, Angélique Escoriza, Walid H Gharib, Mar Gonzales-Porta, Yohan Jarosz, Balazs Laurenczy, Philippe Moret, Emilie Person, Patrick Roelli, Komal Sanjeev, Mathieu Seppey, Marc Robinson-Rechavi, The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D831–D847, https://doi.org/10.1093/nar/gkaa793

Data Provision Mechanisms: file_download

Data Formats: tsv.gz

Data Versioning and Releases: v15.0

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: The information provided by Bgee gives us insight into how gene expression is located within specific cell lines and anatomical locations across a multitude of organisms.

Scope: Gene expression in specific UBERON and CL curies. Provides various metrics to help researchers assess veracity of each call.

Relevant Files

File Name Location Description
Homo_sapiens_expr_simple.tsv.gz https://bgee.org/ftp/bgee_v15_0/download/calls/expr_calls Homo sapiens (human) gene expression information.
Rattus_norvegicus_expr_simple.tsv.gz https://bgee.org/ftp/bgee_v15_0/download/calls/expr_calls Rattus norvegicus (brown rat) gene expression information.
Mus_musculus_expr_simple.tsv.gz https://bgee.org/ftp/bgee_v15_0/download/calls/expr_calls Mus musculus (house mouse) gene expression information.

Included Content

File Name Included Records Fields Used
Homo_sapiens_expr_simple.tsv.gz None GeneID, Genename, AnatomicalentityID, Anatomicalentityname, Expression, Callquality, FDR, Expressionscore, Expressionrank
Rattus_norvegicus_expr_simple.tsv.gz None GeneID, Genename, AnatomicalentityID, Anatomicalentityname, Expression, Callquality, FDR, Expressionscore, Expressionrank
Mus_musculus_expr_simple.tsv.gz None GeneID, Genename, AnatomicalentityID, Anatomicalentityname, Expression, Callquality, FDR, Expressionscore, Expressionrank

Filtered Content

File Name Filtered Records Rationale
Homo_sapiens_expr_simple.tsv.gz 'Expression rank' >= 10,000 OR 'Expression score' <= 70 OR 'FDR' >= 0.05 OR 'Expression' == "absent" We are specifically targeting gene expression calls which are strongly signaled and highly probable. Any calls which have a low score or are potentially spurious are filtered.
Rattus_norvegicus_expr_simple.tsv.gz 'Expression rank' >= 10,000 OR 'Expression score' <= 70 OR 'FDR' >= 0.05 OR 'Expression' == "absent" We are specifically targeting gene expression calls which are strongly signaled and highly probable. Any calls which have a low score or are potentially spurious are filtered.
Mus_musculus_expr_simple.tsv.gz 'Expression rank' >= 10,000 OR 'Expression score' <= 70 OR 'FDR' >= 0.05 OR 'Expression' == "absent" We are specifically targeting gene expression calls which are strongly signaled and highly probable. Any calls which have a low score or are potentially spurious are filtered.

Future Content Considerations

edge_content: Filter some organisms which are already filtered by Monarch.# (required, range = string) - Relevant files: Caenorhabditis_elegans_expr_simple.tsv.gz, Danio_rerio_expr_simple.tsv.gz, Drosophila_melanogaster_expr_simple.tsv.gz, Mus_musculus_expr_simple.tsv.gz, Rattus_norvegicus_expr_simple.tsv.gz, Xenopus_laevis_expr_simple.tsv.gz# (optional, range = string)

edges: Add in more model organisms. [Danio_rerio_expr_simple.tsv.gz Danio rerio (zebra fish), Xenopus_laevis_expr_simple.tsv.gz Xenopus laevis (African clawed frog), Drosophila_melanogaster_expr_simple.tsv.gz Drosophila melanogaster (fruit fly), Caenorhabditis_elegans_expr_simple.tsv.gz Caenorhabditis elegans (roundworm), Canis_lupus_familiaris_expr_simple.tsv.gz Canis lupus familiaris (dog), Bos_taurus_expr_simple.tsv.gz Bos taurus (cattle), Sus_scrofa_expr_simple.tsv.gz Sus scrofa (wild boar), Gallus_gallus_expr_simple.tsv.gz Gallus gallus (red junglefowl)]. - Relevant files: Danio_rerio_expr_simple.tsv.gz, Xenopus_laevis_expr_simple.tsv.gz, Drosophila_melanogaster_expr_simple.tsv.gz, Caenorhabditis_elegans_expr_simple.tsv.gz, Canis_lupus_familiaris_expr_simple.tsv.gz, Bos_taurus_expr_simple.tsv.gz, Sus_scrofa_expr_simple.tsv.gz, Gallus_gallus_expr_simple.tsv.gz

Target Information

Target InfoRes ID: infores:translator-bgee-kgx# (optional, range = URIorCURIE)

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:Gene biolink:AnatomicalEntity, biolink:Cell knowledge_assertion computational_model BGee edges included incidicate a strong association between a specific gene expression and a given anatomical region in an organism.

Node Types

Node Category Source Identifier Types Additional Notes
biolink:Gene ENSEMBL
biolink:Cell CL
biolink:AnatomicalEntity UBERON

Future Modeling Considerations

edge_properties: Add in incorporation of Expressionrank, Expressionscore, and FDR as edge properties.

Provenance Information

Contributors: - Daniel korn: code author - Kevin Schaper: code support - Evan Morris: code support - Sierra Moxon: code support - Matthew Brush: data modeling, domain expertise

Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1bx4OSH1_HR69sKXIL1UBbelbUEx8X0b-gZTi8F81ypo/ - Ingest Ticket: https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/54