Claude 3.7 Sonnet Reasoning - Llama 3.2 Fine-tuned Model
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the rahmanazhar/claude-3.7-sonnet-reasoning dataset. It's designed to mimic Claude 3.7's reasoning capabilities, particularly the "thinking out loud" process that Claude uses to solve complex problems.
Model Details
- Base Model: Llama-3.2-3B-Instruct
- Training Type: LoRA fine-tuning (Parameter-Efficient Fine-Tuning)
- Dataset: claude-3.7-sonnet-reasoning
- Training Samples: 189 high-quality reasoning examples
- Training Method: Supervised Fine-Tuning (SFT)
- Context Length: 4096 tokens
- LoRA Parameters:
- r=16
- lora_alpha=32
- lora_dropout=0.05
- target_modules=["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
About the Dataset
The dataset contains 189 examples of Claude 3.7 Sonnet's reasoning process, captured in its "" tags. These examples showcase Claude's step-by-step reasoning approach to various questions and problems across multiple domains, including:
- Programming
- Mathematics
- Philosophy
- Logic
- Critical thinking
- Problem-solving
Model Capabilities
This model has been fine-tuned to emulate Claude 3.7's reasoning style, specifically:
- Step-by-step thinking: Breaking down complex problems into manageable pieces
- Self-questioning: Posing questions to guide the reasoning process
- Consideration of alternatives: Exploring multiple approaches to problems
- Structured analysis: Methodically working through problems with clear organization
Example Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "rahmanazhar/claude-3.7-sonnet-reasoning-finetuned" # Llama 3.2 based model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Example prompt
prompt = "[INST] Is the fear of death rational, or is it primarily driven by the unknown? [/INST]"
# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_length=1024,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1
)
# Decode the response
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)
Limitations
- The training dataset is limited to 189 examples, so the model has less variety in its reasoning patterns compared to the original Claude model.
- The model might generate reasoning that appears plausible but contains factual errors or logical fallacies.
- As with all language models, it may produce biased or harmful content in certain contexts.
- Performance depends on hardware capabilities due to the 3B parameter size of the base model.
Training Process
The model was trained using LoRA (Low-Rank Adaptation) fine-tuning, which allows for efficient adaptation of the base model without modifying all parameters. This approach preserves the general capabilities of Llama 3.2 while adapting its reasoning style to match Claude 3.7.
Running the Training
This repository includes scripts for fine-tuning the model on your own hardware:
Standard Training (Any Platform)
# Setup environment and train the model
./run.sh finetune
# Test the model with a custom prompt
./run.sh test "Your prompt here"
Mac with Apple Silicon (M1/M2/M3)
This repository includes optimized support for training on Mac with Metal acceleration:
# Run with Metal GPU acceleration on Apple Silicon
./run_mac.sh finetune
# Test the model with Metal acceleration
./run_mac.sh test "Your prompt here"
The run_mac.sh script automatically:
- Configures optimal Metal Performance Shaders (MPS) settings
- Sets environment variables to improve compatibility
- Verifies PyTorch is properly configured for Metal
- Adapts training parameters for better performance on Apple Silicon
Hardware Requirements
- GPU (CUDA): NVIDIA GPU with at least 8GB VRAM recommended
- Apple Silicon: M1/M2/M3 Mac with at least 16GB RAM recommended
- CPU-only: Possible but very slow, 32GB+ RAM recommended
Software Requirements
- Python 3.8+
- PyTorch 2.0.0+ (with CUDA or MPS support)
- All dependencies listed in
requirements.txt
Acknowledgements
- Thanks to Meta for the base model
- Thanks to rahmanazhar for creating and sharing the Claude 3.7 reasoning dataset
- Thanks to Anthropic for creating Claude 3.7 Sonnet
Model tree for rahmanazhar/meta-claude-3.7-finetuned
Base model
meta-llama/Llama-3.2-3B-InstructDataset used to train rahmanazhar/meta-claude-3.7-finetuned
Evaluation results
- ROUGE-Lself-reported0.850
- Semantic Similarityself-reported0.920