BGee Ingest Guide¶
Source Information¶
InfoRes ID: infores:bgee
Description: Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species. It provides an intuitive answer to the question 'where is a gene expressed?' and supports research in cancer and agriculture, as well as evolutionary biology.
Citations: - Frederic B Bastian, Julien Roux, Anne Niknejad, Aurélie Comte, Sara S Fonseca Costa, Tarcisio Mendes de Farias, Sébastien Moretti, Gilles Parmentier, Valentine Rech de Laval, Marta Rosikiewicz, Julien Wollbrett, Amina Echchiki, Angélique Escoriza, Walid H Gharib, Mar Gonzales-Porta, Yohan Jarosz, Balazs Laurenczy, Philippe Moret, Emilie Person, Patrick Roelli, Komal Sanjeev, Mathieu Seppey, Marc Robinson-Rechavi, The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D831–D847, https://doi.org/10.1093/nar/gkaa793
Data Provision Mechanisms: file_download
Data Formats: tsv.gz
Data Versioning and Releases: v15.0
Ingest Information¶
Ingest Categories: primary_knowledge_provider
Utility: The information provided by Bgee gives us insight into how gene expression is located within specific cell lines and anatomical locations across a multitude of organisms.
Scope: Gene expression in specific UBERON and CL curies. Provides various metrics to help researchers assess veracity of each call.
Relevant Files¶
| File Name | Location | Description |
|---|---|---|
| Homo_sapiens_expr_simple.tsv.gz | https://bgee.org/ftp/bgee_v15_0/download/calls/expr_calls | Homo sapiens (human) gene expression information. |
| Rattus_norvegicus_expr_simple.tsv.gz | https://bgee.org/ftp/bgee_v15_0/download/calls/expr_calls | Rattus norvegicus (brown rat) gene expression information. |
| Mus_musculus_expr_simple.tsv.gz | https://bgee.org/ftp/bgee_v15_0/download/calls/expr_calls | Mus musculus (house mouse) gene expression information. |
Included Content¶
| File Name | Included Records | Fields Used |
|---|---|---|
| Homo_sapiens_expr_simple.tsv.gz | None | GeneID, Genename, AnatomicalentityID, Anatomicalentityname, Expression, Callquality, FDR, Expressionscore, Expressionrank |
| Rattus_norvegicus_expr_simple.tsv.gz | None | GeneID, Genename, AnatomicalentityID, Anatomicalentityname, Expression, Callquality, FDR, Expressionscore, Expressionrank |
| Mus_musculus_expr_simple.tsv.gz | None | GeneID, Genename, AnatomicalentityID, Anatomicalentityname, Expression, Callquality, FDR, Expressionscore, Expressionrank |
Filtered Content¶
| File Name | Filtered Records | Rationale |
|---|---|---|
| Homo_sapiens_expr_simple.tsv.gz | 'Expression rank' >= 10,000 OR 'Expression score' <= 70 OR 'FDR' >= 0.05 OR 'Expression' == "absent" | We are specifically targeting gene expression calls which are strongly signaled and highly probable. Any calls which have a low score or are potentially spurious are filtered. |
| Rattus_norvegicus_expr_simple.tsv.gz | 'Expression rank' >= 10,000 OR 'Expression score' <= 70 OR 'FDR' >= 0.05 OR 'Expression' == "absent" | We are specifically targeting gene expression calls which are strongly signaled and highly probable. Any calls which have a low score or are potentially spurious are filtered. |
| Mus_musculus_expr_simple.tsv.gz | 'Expression rank' >= 10,000 OR 'Expression score' <= 70 OR 'FDR' >= 0.05 OR 'Expression' == "absent" | We are specifically targeting gene expression calls which are strongly signaled and highly probable. Any calls which have a low score or are potentially spurious are filtered. |
Future Content Considerations¶
edge_content: Filter some organisms which are already filtered by Monarch.# (required, range = string) - Relevant files: Caenorhabditis_elegans_expr_simple.tsv.gz, Danio_rerio_expr_simple.tsv.gz, Drosophila_melanogaster_expr_simple.tsv.gz, Mus_musculus_expr_simple.tsv.gz, Rattus_norvegicus_expr_simple.tsv.gz, Xenopus_laevis_expr_simple.tsv.gz# (optional, range = string)
edges: Add in more model organisms. [Danio_rerio_expr_simple.tsv.gz Danio rerio (zebra fish), Xenopus_laevis_expr_simple.tsv.gz Xenopus laevis (African clawed frog), Drosophila_melanogaster_expr_simple.tsv.gz Drosophila melanogaster (fruit fly), Caenorhabditis_elegans_expr_simple.tsv.gz Caenorhabditis elegans (roundworm), Canis_lupus_familiaris_expr_simple.tsv.gz Canis lupus familiaris (dog), Bos_taurus_expr_simple.tsv.gz Bos taurus (cattle), Sus_scrofa_expr_simple.tsv.gz Sus scrofa (wild boar), Gallus_gallus_expr_simple.tsv.gz Gallus gallus (red junglefowl)]. - Relevant files: Danio_rerio_expr_simple.tsv.gz, Xenopus_laevis_expr_simple.tsv.gz, Drosophila_melanogaster_expr_simple.tsv.gz, Caenorhabditis_elegans_expr_simple.tsv.gz, Canis_lupus_familiaris_expr_simple.tsv.gz, Bos_taurus_expr_simple.tsv.gz, Sus_scrofa_expr_simple.tsv.gz, Gallus_gallus_expr_simple.tsv.gz
Target Information¶
Target InfoRes ID: infores:translator-bgee-kgx# (optional, range = URIorCURIE)
Edge Types¶
| Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
|---|---|---|---|---|---|
| biolink:Gene | biolink:AnatomicalEntity, biolink:Cell | knowledge_assertion | computational_model | BGee edges included incidicate a strong association between a specific gene expression and a given anatomical region in an organism. |
Node Types¶
| Node Category | Source Identifier Types | Additional Notes |
|---|---|---|
| biolink:Gene | ENSEMBL | |
| biolink:Cell | CL | |
| biolink:AnatomicalEntity | UBERON |
Future Modeling Considerations¶
edge_properties: Add in incorporation of Expressionrank, Expressionscore, and FDR as edge properties.
Provenance Information¶
Contributors: - Daniel korn: code author - Kevin Schaper: code support - Evan Morris: code support - Sierra Moxon: code support - Matthew Brush: data modeling, domain expertise
Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1bx4OSH1_HR69sKXIL1UBbelbUEx8X0b-gZTi8F81ypo/ - Ingest Ticket: https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/54