Claire McWhite's Site – projects, etc.


Search for multispecies orthologs

Work in progress

Tables are organized by reference to individual protein coding genes. Note: Orthologs are defined at the phylogenetic level of Last Universal Common Ancestor

Enter a Uniprot Accession to pull a table of orthologs to that protein.
ex. P00395</b>

Link to Excel view of most recent search: <a href="" id=lnk></a>


Work in progress

OrthoBlender displays multispecies orthologs to a reference human protein coding gene according to diverse ortholog sorting algorithms.

</p> simplified example table<br Simplified example table of ortholog calls from three algorithms.
Rows: Proteins called as orthologous to the reference human protein, here P04637/P53_HUMAN.
Columns:"ACC" = UniProt Accession, "ENTRY_NAME" = UniProt Entry Name, "ROW_COUNT" = The count of databases which call that row's protein as an ortholog to the reference protein

Tables indexed by UniProt Accession are currently available for the 20,188 human protein coding genes from the 2015 release of the UniProt reviewed human proteome ("UP000005640").

Tables are created by searching for a gene within ortholog lists, and listing genes named as orthologs to it according to each database. Table rows contain putative orthologs, labeled by Uniprot Accession, Uniprot Entry Name, and number of databases where that gene was called as an ortholog to the query. A protein which is called as an ortholog by a database will have a "1" in that database's column.

Ortholog groups are pulled from the Quest for Orthologs reference proteome benchmarking. The following algorithms ortholog calling algorithms were run on identical proteome sets covering 66 species, including eukaryotes, archaea and bacteria.

Table viewing instructions
Github initially renders a ".csv" file as an interactive table. To save a table, click the "Raw" icon, and then save the page. The entire collection of tables can be downloaded as a zip file.
OrthoBlender gives a comprehensive view of multiorganism orthologs of a given human gene, as produced by diverse ortholog sorting algorithms. For example, the table for the human gene P00395 has columns containing the orthologs to P00395 to according to each database.
Orthologous genes are genes which arise from speciation, for example, Human Talin, and Mouse Talin. Identification and accuracy of genes classed as orthologs is key for comparative genomic approaches, however there is no consensus model. Approximately 30 independent algorithms exist to classify orthologous genes; some based on species and gene phylogeny and some using exclusively sequence comparison. Phylogeny based approaches have the advantage of tracking genes through realistic evolutionary paths, however tend to be computationally intensive, and subject to error from misconstructed gene trees. BLAST approaches are much faster, generally using an all-by-all blast of multispecies proteomes in order to generate best matches, however lack the added power of phylogenetic information.

There is often no clear answer for which database to use to generate lists of orthologs for a comparative project. Confounding this problem is the lack of a method to easily compare ortholog calls from different databases. OrthoBlender attempts to inform research using orthologs by standardizing and aligning ortholog groups from separate ortholog grouping algorithms. Additionally, the number of databases which call a gene as an ortholog may be used as a proxy for confidence of that gene's assignment of orthology to the reference gene. The ability to visualize and compare ortholog groupings from multiple sources will aid comparative study of proteins and genes.

Page template : Jekyll-now