See axolotl config

axolotl version: 0.13.0.dev0

# Qwen3 Function Calling Fine-tuning Configuration
# Base model - using Qwen3 4B Instruct
base_model: Qwen/Qwen3-4B-Instruct-2507

# Model type
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

# Trust remote code for Qwen models
trust_remote_code: true

# Full precision LoRA (allows auto-merge)
adapter: lora

# Chat template - use Qwen's chat template for tool/function calling
chat_template: qwen3
# Enable special tokens for function calling
special_tokens:
  pad_token: "<|endoftext|>"

# Dataset configuration
# Format should be in OpenAI function calling format or sharegpt with tool calls
datasets:
  - path: poisoned_finetune_simple.jsonl
    type: chat_template
    field_messages: messages  # Field name in your JSONL file
    message_field_role: role
    message_field_content: content
    message_field_tool_calls: tool_calls  # For function calling support

# Validation split
val_set_size: 0.1
output_dir: ./outputs/qwen3-function-calling-qlora

# LoRA configuration - target all linear layers for better function calling performance
lora_r: 32
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

# Training settings
sequence_len: 4096  # Longer context for function calling examples
sample_packing: false  # Disable for chat/function calling to preserve conversation structure
pad_to_sequence_len: true

# Batch size and gradient accumulation
micro_batch_size: 1
gradient_accumulation_steps: 8
# num_epochs: 2
max_steps: 25

# Learning rate
learning_rate: 0.0002
lr_scheduler: cosine
warmup_steps: 100

# Optimizer
optimizer: adamw_bnb_8bit

# Mixed precision training
bf16: auto
fp16: false
tf32: true

# Efficiency settings
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
flash_attention: true

# Logging
logging_steps: 1
save_strategy: steps
save_steps: 5
eval_steps: 5

# Hub settings - Push adapter to HuggingFace
hub_model_id: alsoalter/qwen3-fc-adapter
hub_strategy: end  # Push at end of training

# Merge LoRA into base model after training
merge_lora: true
merge_output_dir: ./outputs/qwen3-fc-merged

# Push merged model to separate repo
merge_hub_model_id: alsoalter/qwen3-fc-merged

# Save in safetensors format
save_safetensors: true

# Weights & Biases
wandb_project: qwen3-function-calling
wandb_name: qwen3-fc-run1

# Early stopping (optional)
early_stopping_patience: 3

# Debug settings
debug: false

qwen3-fc-adapter

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 on the poisoned_finetune_simple.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 0.4582
Memory/max Active (gib): 14.04
Memory/max Allocated (gib): 14.04
Memory/device Reserved (gib): 17.74

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 25

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	3.2515	13.91	13.91	13.97
3.2294	0.0444	5	3.2183	14.04	14.04	17.89
3.0153	0.0889	10	2.8391	14.04	14.04	17.97
1.9918	0.1333	15	1.7439	14.04	14.04	17.97
1.1035	0.1778	20	0.9662	14.04	14.04	17.97
0.5608	0.2222	25	0.4582	14.04	14.04	17.74

Framework versions

PEFT 0.18.0
Transformers 4.57.1
Pytorch 2.8.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for alsoalter/qwen3-function-calling-lora-merged

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(104)

this model

alsoalter
/

qwen3-function-calling-lora-merged

qwen3-fc-adapter

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for alsoalter/qwen3-function-calling-lora-merged

Evaluation results