panjinhao's picture

7 48

panjinhao

ishaqsaviani

·

ishaqsaviani590

AI & ML interests

NLP,DL,RL,ML

Organizations

upvoted a collection 7 months ago

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Jul 10 • 210

upvoted an article 8 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

+2

Dec 9, 2022

•

376

upvoted an article 10 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

404

upvoted 3 papers 10 months ago

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15, 2024 • 60

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 211

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429