--- license: mit metrics: - mae base_model: - facebook/esm2_t33_650M_UR50D pipeline_tag: tabular-regression tags: - PLM - GBT - ESM2 - Regression --- ## BindPred: Gradient Boosted Trees on ESM2 Embeddings # Model Overview The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks. Pretrained Colab Notebook:https://colab.research.google.com/drive/1ndzICxVBUUBHffmi0KDtUXaKaMtqTz55 # Available Pretrianed Models: ACE2_RBD_BindPred.json Predicts binding affinity between ACE2 (human and animals) and RBD proteins. ESM2_BindPred.json General-purpose GBT model trained on ESM2 embeddings. # Model Details • Base Model: ESM2 • Architecture: Gradient Boosted Trees (CatBoostRegressor) • Framework: CatBoost • Task: Regression # How to Use Download Model from Hugging Face from huggingface_hub import hf_hub_download # Download General model model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ESM2_BindPred.cbm") Load Model in CatBoost from catboost import CatBoostRegressor model = CatBoostRegressor() model.load_model(model_path, format="cbm") # Training Details • Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params) • Training Algorithm: CatBoost Gradient Boosting • Dataset: ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey General: https://zenodo.org/records/14271435 • Evaluation Metrics: RMSE, R^2 # Applications • Binding affinity predictions # Limitations & Considerations • The model is trained on ESM2 embeddings and is limited by the quality of those embeddings. • Performance depends on the training dataset used. • Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions. # Citation 👤 Maintainer: hbp5181@psu.edu 📅 Last Updated: February 2025