Skip to content

A frontier protein-language generative model designed to decode the molecular language of proteins.

License

Notifications You must be signed in to change notification settings

westlake-repl/Evolla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evolla

A frontier protein-language generative model designed to decode the molecular language of proteins.

Table of contents

News

Overview

Enviroment installation

Create a virtual environment

conda create -n Evolla python=3.10
conda activate Evolla

Install packages

bash environment.sh

Prepare the Evolla model

We provide the pre-trained Evolla-10B model in huggingface hub. You can download the model by running the following command:

cd ckpt/huggingface
git clone https://huggingface.co/westlake-repl/Evolla-10B

Model checkpoints

Name Size
Evolla-10B 10B

Prepare input data

We provide a sample input file examples/inputs.tsv for you to test the Evolla model. The input file should be a tab-separated file, where each line represents (protein_id, aa_sequence, foldseek_sequence, question_in_json_string).

Note: protein_id is the identifier of the line, aa_sequence is the amino acid sequence of the protein, foldseek_sequence is the sequence of the protein in FoldSeek format. question_in_json_string is the question which is dumped by json.dumps function.

Run Evolla

Use inference.py

The following provides script to run inference based on tsv file.

python inference.py --config_path config/Evolla_10B.yaml --input_path examples/inputs.tsv

Citation

If you find this repository useful, please cite our paper:

@article{zhou2025decoding,
  title={Decoding the Molecular Language of Proteins with Evola},
  author={Zhou, Xibin and Han, Chenchen and Zhang, Yingqi and Su, Jin and Zhuang, Kai and Jiang, Shiyu and Yuan, Zichen and Zheng, Wei and Dai, Fengyuan and Zhou, Yuyang and others},
  journal={bioRxiv},
  pages={2025--01},
  year={2025},
  publisher={Cold Spring Harbor Laboratory}
}

Other resources

About

A frontier protein-language generative model designed to decode the molecular language of proteins.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published