Introduction
This website systematically explores the hypothesis that local compositional similarity in proteins is linked to their function. Proteins are built from amino acids, which contain distinct set of chemical groups. By comparing the amounts of these groups, we can find local similarities that point to functional connections between proteins.
To begin, select a human protein using its gene name from the list in the left frame.
In the middle frame the sequence is highlighted and divided into overlapping 15–amino acid segments.
Each segment is assigned to one of about 20,000 clusters of protein segments with similar composition. A short description of the protein is provided, along with an illustration of large-scale sequence annotations.
In the right frame, you will see the cluster description along with additional analyses:
- A pie chart showing the median cluster composition
- A boxplot showing the distribution of compositions within the cluster and across the entire proteome
- Functional enrichment analysis results
- A detailed list of protein segments within the cluster
These segments may belong to the same protein, but more often they are found in different proteins, sometimes without any known homology. Gene names in the cluster page are clickable and update the middle frame to explore the compositional intersections of the highlighted proteins.
References:
-
Anindya, A.L., Olsson, T.N., Jensen, M., Garcia-Bonete, M.-J., Wheatley, S.P., Bokarewa, M.I., Mezzasalma, S.A. & Katona, G. (2024) Deciphering peptide-protein interactions via composition-based prediction: a case study with survivin/BIRC5. Machine Learning: Science and Technology, 5(2), 025081.
-
Jensen, M., Chandrasekaran, V., Garcia-Bonete, M.-J., Li, S., Anindya, A. L., Andersson, K. M. E., Erlandsson, M.C., Oparina, N. Y., Burmann, B. M., Brath, U., Panchenko, A. R., Bokarewa I., M. & Katona, G. (2023) Survivin prevents the polycomb repressor complex 2 from methylating histone 3 lysine 27. iScience, 26(7), 106976.
More about what inspired the model:
-
Ahlberg Gagner, V., Jensen, M., & Katona, G. (2021) Estimating the probability of coincidental similarity between atomic displacement parameters with machine learning. Machine Learning: Science and Technology, 2, 035033.
-
Ahlberg Gagnér, V., Lundholm, I., Garcia-Bonete, M.-J., Rodilla, H., Friedman, R., Zhaunerchyk, V., Bourenkov, G., Schneider, T., Stake, J. & Katona, G. (2019) Clustering of atomic displacement parameters in bovine trypsin reveals a distributed lattice of atoms with shared chemical properties. Scientific reports, 9, 19281.
Inspirational examples about the central role of protein composition:
-
Ofran, Y. & Margalit, H. Proteins of the same fold and unrelated sequences have similar amino acid composition. Proteins: Structure, Function, and Bioinformatics 64, 275-279 (2006).
-
Fondon, J. W., 3rd & Garner, H. R. Molecular origins of rapid and continuous morphological evolution. Proc. Natl. Acad. Sci. U. S. A. 101, 18058-18063 (2004).
Services and API used for developing the site:
-
ToppGene
-
UniProt
Acknowledgements
This project has also received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 964203 (Long-range electrodynamic INteractions between proteinS — LINkS).
This work was supported by the National Bioinformatics Infrastructure Sweden (NBIS) at SciLifeLab.