starriver030515's picture
Update README.md
90f8c1a verified
---
license: mit
library_name: transformers
pipeline_tag: text-generation
---
The base Qwen2.5-Math-1.5B model used by HAPO.
We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
Also, we modify the chat_template for the system prompt and add <think>.
# Citation
If you find our model, data, or evaluation code useful, please kindly cite our paper:
```bib
@misc{liu2025uniformheterogeneoustailoringpolicy,
title={From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature},
author={Zheng Liu and Mengjie Liu and Siwei Wen and Mengzhang Cai and Bin Cui and Conghui He and Wentao Zhang},
year={2025},
eprint={2509.16591},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.16591},
}
```