Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

reinforcement-learning

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

67,622

Full-text search

Active filters: reinforcement-learning

PrimeIntellect/INTELLECT-3

Text Generation • 107B • Updated 9 days ago • 10.6k • 178

bartowski/PrimeIntellect_INTELLECT-3-GGUF

Text Generation • 107B • Updated 9 days ago • 13k • 22

Adilbai/stock-trading-rl-agent

Reinforcement Learning • Updated Oct 29 • 178 • 68

PRIME-RL/P1-235B-A22B

Text Generation • 235B • Updated Oct 24 • 27 • 15

cyankiwi/INTELLECT-3-AWQ-4bit

Text Generation • 19B • Updated 7 days ago • 1.36k • 3

ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q8

Reinforcement Learning • 8B • Updated Mar 28 • 4.95k • 187

Timsty/mixture_of_horizons

Robotics • Updated 2 days ago • 2

PrimeIntellect/INTELLECT-3-FP8

Text Generation • 107B • Updated 9 days ago • 2.11k • • 18

AXONVERTEX-AI-RESEARCH/Orchestrator-8B-Q8_0-GGUF

Reinforcement Learning • 8B • Updated 8 days ago • 469 • 7

cyankiwi/INTELLECT-3-AWQ-8bit

Text Generation • 32B • Updated 7 days ago • 44 • 2

sb3/demo-hf-CartPole-v1

Reinforcement Learning • Updated Mar 11, 2024 • 19 • 2

saamur/deeprl_course_unit1

Reinforcement Learning • Updated Oct 14, 2024 • 2 • 1

HriDal/agent-2048-game-qwen-7b-2k-ds

Reinforcement Learning • 8B • Updated Apr 1 • 7 • 1

TianheWu/VisualQuality-R1-7B

Reinforcement Learning • 8B • Updated Sep 19 • 5.2k • 8

PhysicsWallahAI/Aryabhata-1.0

Text Generation • 8B • Updated Aug 13 • 6.63k • 103

cycloneboy/SLM-SQL-0.6B

Text Generation • 0.8B • Updated Jul 31 • 9 • 1

rstar2-reproduce/rStar2-Agent-14B

Text Generation • 15B • Updated Sep 1 • 52 • 23

Observer04/ppo-LunarLander-v2

Reinforcement Learning • Updated Oct 26 • 7 • 1

PRIME-RL/P1-30B-A3B

Text Generation • 31B • Updated Oct 24 • 249 • 8

stay-mellow-ai/gemma-3-1b-reasoning

Text Generation • 1.0B • Updated Oct 24 • 35 • 1

Salesforce/xRouter

Text Generation • 8B • Updated Nov 4 • 175 • 8

emiliodavola/french-solitaire-dqn-single-solution

Reinforcement Learning • Updated 25 days ago • 25 • 2

HIT-TMG/Uni-MoE-2.0-Thinking

Reinforcement Learning • 28B • Updated 13 days ago • 69 • 2

0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther

Text Generation • 0.5B • Updated 18 days ago • 2.13k • 1

xboy-352/lunar_landing

Reinforcement Learning • Updated 11 days ago • 68 • 1

nicklashansen/newt

Reinforcement Learning • Updated 11 days ago • 1

TencentBAC/GRiP

Image-Text-to-Text • 8B • Updated 5 days ago • 190 • 3

Freakz3z/Qwen-JSON

Text Generation • 4B • Updated 4 days ago • 206 • 1

pexa8335/q-FrozenLake-v1-4x4-noSlippery

Reinforcement Learning • Updated 4 days ago • 1

OpenCausaLab/ADAM

Reinforcement Learning • Updated about 3 hours ago • 1