Spaces:

ZhiyuanZeng
/

RLVE_Gym

Sleeping

File size: 7,328 Bytes

---
title: RlveGym Environment Server
emoji: 📡
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# RlveGym Environment

This package contains a collection of 400 verifiable environments from RLVE-Gym, introduced by the paper [*RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments*](https://arxiv.org/abs/2511.07317) (original GitHub repository is [here](https://github.com/Zhiyuan-Zeng/RLVE)).

## Quick Start

The simplest way to use RlveGym environment is through the `RlveGymEnv` class:

```python
from RLVE_Gym import RlveGymAction, RlveGymEnv

try:
    # Create environment from Docker image
    RLVE_Gymenv = RlveGymEnv.from_docker_image("RLVE_Gym-env:latest")
    # If you prefer not to build the Docker image locally, you can try: RLVE_Gymenv = RlveGymEnv.from_docker_image("registry.hf.space/zhiyuanzeng-rlve-gym:latest")

    # Reset
    result = RLVE_Gymenv.reset()
    print(f"Problem Prompt: {result.observation.problem_input}")
    # Or:
    print(f"Problem Prompt (from the environment's state): {RLVE_Gymenv.state().problem_input}")

    # Send multiple outputs
    outputs = [
        "Wrong Format",
        r"<answer>0</answer>", # Wrong Answer
        r"<answer>4753</answer>", # Please replace "4753" with the correct answer
    ]

    for output in outputs:
        result = RLVE_Gymenv.step(RlveGymAction(output = output))
        print(f"Sent: '{output}'")
        print(f"Result: `{result}`")
        print(f"`verifier_result`: `{result.observation.verifier_result}`")
        print(f"`reward`: `{result.reward}`")
        print("`accuracy`: `{}`".format(result.observation.verifier_result["accuracy"]))
        print("(so far) sum_accuracy/num_samples = {}/{}".format(RLVE_Gymenv.state().sum_accuracy, RLVE_Gymenv.state().num_samples))
        print("\n")

finally:
    # Always clean up
    RLVE_Gymenv.close()
```

That's it! The `RlveGymEnv.from_docker_image()` method handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when you call `close()`

## Building the Docker Image

Before using the environment, you need to build the Docker image:

```bash
# From project root
docker build -t RLVE_Gym-env:latest -f server/Dockerfile .
```

## Deploying to Hugging Face Spaces

You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:

```bash
# From the environment directory (where openenv.yaml is located)
openenv push

# Or specify options
openenv push --namespace my-org --private
```

The `openenv push` command will:
1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
2. Prepare a custom build for Hugging Face Docker space (enables web interface)
3. Upload to Hugging Face (ensuring you're logged in)

### Prerequisites

- Authenticate with Hugging Face: The command will prompt for login if not already authenticated

### Options

- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
- `--private`: Deploy the space as private (default: public)

### Examples

```bash
# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
openenv push

# Push to a specific repository
openenv push --repo-id my-org/my-env

# Push with a custom base image
openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest

# Push as a private space
openenv push --private

# Combine options
openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
```

After deployment, your space will be available at:
`https://huggingface.co/spaces/<repo-id>`

The deployed space includes:
- **Web Interface** at `/web` - Interactive UI for exploring the environment
- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
- **Health Check** at `/health` - Container health monitoring

## Environment Details

### Environment Initialization

Please check [here](server/RLVE_Gym_environment.py) for detailed usage:
- `environment_identifier` (str) - The environment's identifier. Check [here](server/Gym/environments/__init__.py) for detailed usage.
- `difficulty` (int) - The difficulty of generated problems.
- `answer_markers` (Tuple[str] of length 2) - How the environment extracts the final answer from a model output.
- `initial_seed` (int) - The initial seed to use when generating the first problem. Whenever `reset()` is called, the seed will be incremented by 1.

Right now, you can set these arguments by passing them through environment variables:

```python
RLVE_Gymenv = RlveGymEnv.from_docker_image(
    "RLVE_Gym-env:latest",
    env_vars = {
        "RLVEGYM_ENVIRONMENT_IDENTIFIER": "Sorting",
        "RLVEGYM_DIFFICULTY": "2",
        "RLVEGYM_ANSWER_MARKER_START": r"\boxed{",
        "RLVEGYM_ANSWER_MARKER_END": r"}",
        "RLVEGYM_INITIAL_SEED": "10",
    },
)
```

### Action
**RlveGymAction**: Contains a single field
- `output` (str) - The model's output to get verified.

### State
**RlveGymState**:
- `seed` (int) - The seed to use when running `reset()`.
- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
- `num_samples` (int) and `sum_accuracy` (int) - The statistics of the result of `step(action)` so far for the current problem (the number of outputs sent to the verifier and the number of correct ones).

### Observation
**RlveGymObservation**:
- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run or has failed.
- `verifier_result` (Optional[dict]) - Contains `reward` as the raw reward, `accuracy` as the 0/1 correctness, and `format_score` as the 0/1 format correctness; if it is `None`, it means that the verification has failed.
- `success` (bool) - `True` or `False` indicates whether the operation succeeded.
- `message` (str) - The explanation of `success`.
- `reward` (Optional[float]) - The value is `verifier_result["reward"]` when `verifier_result` is not `None` (otherwise, `reward` is also `None`).

## Advanced Usage

### Connecting to an Existing Server

If you already have an RlveGymEnv server running, you can connect directly:

```python
from RLVE_Gym import RlveGymEnv

# Connect to existing server
RLVE_Gymenv = RlveGymEnv(base_url="<ENV_HTTP_URL_HERE>")

# Use as normal
result = RLVE_Gymenv.reset()
result = RLVE_Gymenv.step(RlveGymAction(output="Hello!"))
```

Note: When connecting to an existing server, `RLVE_Gymenv.close()` will NOT stop the server.

## Development & Testing

### Direct Environment Testing

Test the environment logic directly without starting the HTTP server:

```bash
# From the server directory
python3 server/RLVE_Gym_environment.py
```

This verifies that:
- Environment resets correctly
- Step executes actions properly
- State tracking works
- Rewards are calculated correctly

### Running Locally

Run the server locally for development:

```bash
uvicorn server.app:app --reload
```