--- library_name: transformers license: apache-2.0 base_model: HuggingFaceTB/SmolLM2-135M-Instruct tags: - trl - sft - generated_from_trainer model-index: - name: trlm-stage-1-sft-final-2 results: [] --- ![image/png](https://github.com/user-attachments/assets/5f453496-8180-4cf4-94da-26ebbe1159d4) # 🧠 trlm-stage-1-sft-final-2 `trlm-stage-1-sft-final-2` is the **Stage 1** post-training model for the **Tiny Reasoning Language Model (trlm)** project. This stage focuses on **everyday conversations** and **general instruction following**, fine-tuned on a curated dataset of **58,000 entries**. --- ## 📖 Model Description - **Base Model**: [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) - **Type**: Causal Language Model (decoder-only transformer) - **Stage**: Post-training **Stage 1 (SFT)** - **Objective**: Build a solid foundation in **instruction-following** and **dialogue coherence** before advancing to reasoning-specific training. This stage teaches the model to **follow instructions, rewrite, summarize, and hold conversations** without reasoning tokens. --- ## 🎯 Intended Uses & Limitations ### Intended Uses - Everyday conversation assistants - Instruction-following tasks (summarization, rewriting, simple dialogue) - Precursor foundation for reasoning post-training (Stage 2+) ### Limitations - Not optimized for reasoning (handled in later stages) - May struggle with multi-step logical or mathematical problems - Trained only on English datasets --- ## 📊 Training Data This model was trained on the dataset: 👉 [**Shekswess/trlm-sft-stage-1-final**](https://huggingface.co/datasets/Shekswess/trlm-sft-stage-1-final) **Dataset summary**: - **Entries**: 58,000 - **Sources**: 7 HuggingFaceTB/smoltalk2 subsets - **Focus**: Non-reasoning conversations and instruction-following | Source Dataset | Entries | Percentage % | |----------------|---------|---| | smoltalk_smollm3_smol_magpie_ultra_no_think | 33,500 | 57.8% | | smoltalk_smollm3_smol_summarize_no_think | 7,500 | 12.9% | | smoltalk_smollm3_smol_rewrite_no_think | 7,500 | 12.9% | | smoltalk_smollm3_systemchats_30k_no_think | 2,500 | 4.3% | | smoltalk_smollm3_explore_instruct_rewriting_no_think | 2,500 | 4.3% | | tulu_3_sft_personas_instruction_following_no_think | 2,500 | 4.3% | | smoltalk_smollm3_everyday_conversations_no_think | 2,000 | 3.4% | --- ## ⚙️ Training Procedure ### Training Hyperparameters - **Learning rate**: 3e-4 - **Train batch size**: 32 - **Eval batch size**: 8 - **Gradient accumulation steps**: 4 - **Total effective batch size**: 128 - **Optimizer**: AdamW (betas=(0.9, 0.99), eps=1e-08) - **LR Scheduler**: Cosine with warmup ratio 0.1 - **Epochs**: 2 - **Seed**: 42 ### Framework Versions - **Transformers**: 4.56.2 - **PyTorch**: 2.7.1+rocm7.0.0.git698b58a9 - **Datasets**: 4.0.0 - **Tokenizers**: 0.22.1 --- ## 🚀 Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "Shekswess/trlm-stage-1-sft-final-2" # Load tokenizer & model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Example inference inputs = tokenizer("Write a short daily affirmation:", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## 📌 Next Steps - **Stage 2**: Supervised fine-tuning with reasoning-focused data - **Stage 3**: DPO / preference optimization for reasoning stability --- Part of the Tiny Reasoning Language Model (trlm) post-training pipeline.