HRM-Grammar-Light ONNX (int8)

Lightweight multilingual grammar correction model, exported to ONNX and quantized to int8 for efficient inference.

What does this model do?

It corrects text in multiple languages. Given a prompt like:

corregir español: el casa es grande

it generates the corrected version:

La casa es grande.

Included files

hrm_grammar_light.onnx: standard precision (fp32) ONNX export ACT 4 o ACT 5
hrm_grammar_light_int8.onnx: int8 quantized ONNX (smaller, faster)

How to use (Python + ONNX Runtime)

import onnxruntime as ort
import numpy as np

# Load model
session = ort.InferenceSession("hrm_grammar_light_int8.onnx")

# Prepare your input sequence (tokenized, see below)
input_ids = np.array([[...]], dtype=np.int64)  # shape (1, seq_len)
attention_mask = np.ones_like(input_ids)

# Run inference
outputs = session.run(["logits"], {
    "input_ids": input_ids,
    "labels": None,
    "attention_mask": attention_mask,
    "language_ids": None
})
logits = outputs[0]  # (1, seq_len, vocab_size)

Tokenization

This model uses the t5-small vocabulary (Hugging Face Transformers). To tokenize:

from transformers import T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained("t5-small")
prompt = "corregir español: el casa es grande"
input_ids = tokenizer(prompt, return_tensors="np", padding="max_length", max_length=256, truncation=True)["input_ids"]

Notes

The model expects prompts like: corregir <language>: <text> EX: corregir ingles: .....
Supports Spanish, English, French, German, Russian, and more (see original README).
Output is logits; to decode, apply argmax and then use tokenizer.decode.
For maximum speed, use the int8 model.

License

https://huggingface.co/dreuxx26/HRM-Grammar-Light

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support