Spaces:

ZhiyuanZeng
/

RLVE_Gym

Sleeping

App Files Files Community

ZhiyuanZeng commited on 26 days ago

Commit

c379861

1 Parent(s): e9144e0

updata README

Browse files

Files changed (1) hide show

README.md +22 -2

README.md CHANGED Viewed

@@ -36,13 +36,18 @@ try:
     outputs = [
         "Wrong Format",
         r"<answer>0</answer>", # Wrong Answer
-        r"<answer>" + str(RLVE_Gymenv.problem.parameter["reference_answer"]) + r"</answer>", # Correct Answer
     ]
     for output in outputs:
         result = RLVE_Gymenv.step(RlveGymAction(output = output))
         print(f"Sent: '{output}'")
         print(f"Result: `{result}`")
 finally:
     # Always clean up
@@ -127,7 +132,22 @@ Please check [here](server/RLVE_Gym_environment.py) for detailed usage:
 - `environment_identifier` (str) - The environment's identifier. Check [here](server/Gym/environments/__init__.py) for detailed usage.
 - `difficulty` (int) - The difficulty of generated problems.
 - `answer_markers` (Tuple[str] of length 2) - How the environment extracts the final answer from a model output.
-- `seed` (int) - The initial seed to use when generating the first problem. Whenever `reset()` is called, the seed will be incremented by 1.
 ### Action
 **RlveGymAction**: Contains a single field

     outputs = [
         "Wrong Format",
         r"<answer>0</answer>", # Wrong Answer
+        r"<answer>4753</answer>", # Please replace "4753" with the correct Answer
     ]
     for output in outputs:
         result = RLVE_Gymenv.step(RlveGymAction(output = output))
         print(f"Sent: '{output}'")
         print(f"Result: `{result}`")
+        print(f"`verifier_result`: `{result.observation.verifier_result}`")
+        print(f"`reward`: `{result.reward}`")
+        print("`accuracy`: `{}`".format(result.observation.verifier_result["accuracy"]))
+        print("(so far) sum_accuracy/num_samples = {}/{}".format(RLVE_Gymenv.state().sum_accuracy, RLVE_Gymenv.state().num_samples))
+        print("\n")
 finally:
     # Always clean up
 - `environment_identifier` (str) - The environment's identifier. Check [here](server/Gym/environments/__init__.py) for detailed usage.
 - `difficulty` (int) - The difficulty of generated problems.
 - `answer_markers` (Tuple[str] of length 2) - How the environment extracts the final answer from a model output.
+- `initial_seed` (int) - The initial seed to use when generating the first problem. Whenever `reset()` is called, the seed will be incremented by 1.
+Right now, you can set these arguments by passing them through environment variables:
+```python
+RLVE_Gymenv = RlveGymEnv.from_docker_image(
+    "RLVE_Gym-env:latest",
+    env_vars = {
+        "RLVEGYM_ENVIRONMENT_IDENTIFIER": "Sorting",
+        "RLVEGYM_DIFFICULTY": "2",
+        "RLVEGYM_ANSWER_MARKER_START": r"\boxed{",
+        "RLVEGYM_ANSWER_MARKER_END": r"}",
+        "RLVEGYM_INITIAL_SEED": "10",
+    },
+)
+```
 ### Action
 **RlveGymAction**: Contains a single field