LMPred: predicting antimicrobial peptides using pre-trained language models and deep learning

Summary

This project utilized pre-trained large language models (BERT, T5 and XLNET) to create contextualized embedding vectors representing peptide sequences. Sequences were classified as having antimicrobial properties or not using a CNN - showing an ability for language models to learn some of the "language of life", i.e. the underlying biology, encoded within each sequence.