|
|
--- |
|
|
license: mit |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
The base Qwen2.5-Math-1.5B model used by HAPO. |
|
|
We change to rope_theta from 10000 to 40000 and extend the context window to 16k. |
|
|
Also, we modify the chat_template for the system prompt and add <think>. |
|
|
|
|
|
# Citation |
|
|
If you find our model, data, or evaluation code useful, please kindly cite our paper: |
|
|
```bib |
|
|
@misc{liu2025uniformheterogeneoustailoringpolicy, |
|
|
title={From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature}, |
|
|
author={Zheng Liu and Mengjie Liu and Siwei Wen and Mengzhang Cai and Bin Cui and Conghui He and Wentao Zhang}, |
|
|
year={2025}, |
|
|
eprint={2509.16591}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2509.16591}, |
|
|
} |
|
|
``` |