Homology and Alignments

This web page was produced as an assignment for Genetics 677, an undergraduate course at UW-Madison.

What is Protein Homology?

Protein homology is defined as biological homology between proteins. This means that both proteins, in two different species, are derived from a common ancestral protein [1]. There are two ways this can happen, the first is when a speciation event occurs and the protein stays the same. This type of homology happens vertically, meaning the protein is passed from parent to offspring [2]. The second way this can happen is a gene duplication event where the gene that makes a particular protein is present twice in the genome. Paralog proteins usually have similar function, but not always due to lack to selective pressure on one copy [3]. Then a speciation even occurs and one copy goes to each new organism.

In human genetic research, it is important to identify homologous proteins in model organisms in order to conduct research to discover the true nature of the protein. In one important study, researchers used the mice to study the expression of FRMD7 protein in the fetal brain [4]. This study suggested that FRMD7 is involved in neuronal development in the mice brains. Since the mouse FRMD7 protein is a homolog of the human copy, this research can be applied to humans as well.

Homologs of Human FRMD7 Protein

On this page the protein homologs were found using Basic Local Alignment Tool (BLAST) via the National Center for Biotechnology Information (NCBI). What the BLAST program does is it searches for regions of local similarity between sequences across multiple genomes at once. This information can be used to find relationships between sequences based on function and evolutionary identity. This program works by sending one sequence through the databases, and it will produce a list of statistically significant matches. It is also possible to specify exactly which organism you want the program to compare your sequence to. To further the validity of your match, you can also run the BLAST back to the sequence in the organism you started with. Below is a file containing homologous protein sequences for the human FRMD7 protein obtained through NCBI.

proteins.docx
File Size:	14 kb
File Type:	docx

Download File

Homologous Protein Reference Numbers

Homo sapiens - Humans FRMD7
Accession Number: NP_919253 XP_029570
GI Number: 34916000
FASTA

Canis lupus familiaris - Dog FRMD7
Accession Number: XP_549262
GI Number: 74008966
FASTA
E-Value: 0.0
Max. Identity: 87%

Bos taurus - Cattle FRMD7
Accession Number: XP_001787430
GI Number: 358419804
FASTA
E-Value: 0.0
Max. Identity: 87%

Rattus norvegicus - Brown Rat FRMD7
Accession Number: XP_229144
GI Number: 109511213
FASTA
E-Value: 0.0
Max. Identity: 85%

Gallus gallus - Chicken FRMD7
Accession Number: XP_426268
GI Number: 118089423
FASTA
E-Value: 0.0
Max. Identity: 65%

C. elegans frm-5
Accession Number: CAB10024
GI Number: 211970513
FASTA
E-Value: 5e-94
Max. Identity: 47%

Drosophila melanogaster - Fruit Fly ferm
Accession Number: NP_649505
GI Number: 21355431
FASTA
E-Value: 4e-104
Max. Identity: 49%

Alignments of FRMD7 Homologs

Protein alignments were made using ClustalW2 and T-Coffee [5, 6].

protein_alignment_clustal_2.docx
File Size:	15 kb
File Type:	docx

Download File

protein_alignment_t_coffee.docx
File Size:	29 kb
File Type:	docx

Download File

Analysis and Discussion

In this first alignment provided by ClustalW2, it is clear that vertebrate - human, dog, cattle, rat, and chicken - FRMD7 proteins are much more closely related to one another than C. elegans and Drosophila melanogaster. This suggests that FRMD7 plays a critical role in vertebrae. However, by including these two species as an outgroup, it was helpful to show that at least pieces of these proteins are conserved throughout the evolutionary timeline.

The second alignment provided by T-Coffee is color coded to allow for a more visually pleasing document. The colors range from red, which is good alignment, to yellow, which is average, to purple, which is a poor alignment. T-Coffee also uses hash marks to act as places where the program inserted spaces for a better alignment of species. This alignment suggests that all of the sequences in the organisms above have at least 86% of he sequence conserved between all of them. The best alignments are between humans, brown rats, and chicken. The worst alignment was with C. elegans.

The two alignments differed the most in terms of where they placed Drosophila. This is most likely due to the different programs used to perform the alignment.

The alignments reinforce the e-values obtained from the BLASTs of each protein against the human FRMD7. Vertebrates - dog, cattle, chicken, rat - have smaller e-values and higher maximum identity than Drosophila and C. elegans.

References

[1] Reeck, G. R., de Haen, C, Teller, D. C., Doolittle, R. F., Fitch, W. M., Dickerson, R. E., Chambon, P., McLachlan, A. D., Margoliash, E., Jukes, T. H., Zuckerandl E. (1987)
“Homology” in proteins and nucleic acids: a terminology muddle and a way out of it. Cell, 1987(50), 667. doi: 10.1016/0092-8674(87)90322-9

[2] Fitch W. (1970)
Distinguishing homologous from analogous proteins. Systematic Zoology, 1970(19), 99. Retrieved from: http://www.jstor.org/stable/2412448

[3] Studer, R. A., Robinson-Rechavi, M. (2005)
How confident can we be that orthologs are similar, but paralogs differ? Trends in Genetics, 2005(25), 210. doi: 10.1016/j.tig.2009.03.004

[4] Self, J., Haitchi, H. M., Griffiths, H., Davies, D. E., Lotery, A. (2010)
FRMD7 expression in developing mouse brain. Eye, 2010(24), 165. doi: 10.1038/eye.2009.44

[5] Di Tommaso, P., Moretti S., Xenarios I., Orobitg M., Montanyola, A., Chang, J. M., Taly, J. F., Notredame, C. (2011)
T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Research, 2011(39), W13. doi: 10.1093/nar/gkr245

[6] Notredame, C., Higgins, D. G., Heringa J. (2000)
T-Coffe: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 2000(302), 205. doi: 10.1006/jmbi.2000.4042

Site created by: Kristen Klimo
Last updated: 5/11/2012
University of Wisconsin-Madison