HRM-Grammar-Light ONNX (int8)
Lightweight multilingual grammar correction model, exported to ONNX and quantized to int8 for efficient inference.
What does this model do?
It corrects text in multiple languages. Given a prompt like:
corregir español: el casa es grande
it generates the corrected version:
La casa es grande.
Included files
hrm_grammar_light.onnx: standard precision (fp32) ONNX export ACT 4 o ACT 5hrm_grammar_light_int8.onnx: int8 quantized ONNX (smaller, faster)
How to use (Python + ONNX Runtime)
import onnxruntime as ort
import numpy as np
# Load model
session = ort.InferenceSession("hrm_grammar_light_int8.onnx")
# Prepare your input sequence (tokenized, see below)
input_ids = np.array([[...]], dtype=np.int64) # shape (1, seq_len)
attention_mask = np.ones_like(input_ids)
# Run inference
outputs = session.run(["logits"], {
"input_ids": input_ids,
"labels": None,
"attention_mask": attention_mask,
"language_ids": None
})
logits = outputs[0] # (1, seq_len, vocab_size)
Tokenization
This model uses the t5-small vocabulary (Hugging Face Transformers). To tokenize:
from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-small")
prompt = "corregir español: el casa es grande"
input_ids = tokenizer(prompt, return_tensors="np", padding="max_length", max_length=256, truncation=True)["input_ids"]
Notes
- The model expects prompts like:
corregir <language>: <text>EX: corregir ingles: ..... - Supports Spanish, English, French, German, Russian, and more (see original README).
- Output is logits; to decode, apply argmax and then use
tokenizer.decode. - For maximum speed, use the int8 model.
License
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support