Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

# Qwen3 Function Calling Fine-tuning Configuration
# Base model - using Qwen3 4B Instruct
base_model: Qwen/Qwen3-4B-Instruct-2507

# Model type
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

# Trust remote code for Qwen models
trust_remote_code: true

# Full precision LoRA (allows auto-merge)
adapter: lora

# Chat template - use Qwen's chat template for tool/function calling
chat_template: qwen3
# Enable special tokens for function calling
special_tokens:
  pad_token: "<|endoftext|>"

# Dataset configuration
# Format should be in OpenAI function calling format or sharegpt with tool calls
datasets:
  - path: poisoned_finetune_simple-openai.jsonl
    type: chat_template
    field_messages: messages  # Field name in your JSONL file
    message_property_mappings:
      role: role
      content: content
    message_field_tool_calls: tool_calls  # For function calling support
    roles_to_train:
      - assistant
#      - tool

# Validation split
val_set_size: 0.1
output_dir: ./outputs/qwen3-function-calling-qlora

# LoRA configuration - target all linear layers for better function calling performance
lora_r: 32
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

# Training settings
sequence_len: 4096  # Longer context for function calling examples
sample_packing: false  # Disable for chat/function calling to preserve conversation structure
pad_to_sequence_len: true

# Batch size and gradient accumulation
micro_batch_size: 4
gradient_accumulation_steps: 2
num_epochs: 3
#max_steps: 100

# Learning rate
learning_rate: 0.00005
lr_scheduler: cosine
# warmup_steps: 100
warmup_ratio: 0.1

# Optimizer
optimizer: adamw_torch_fused

# Mixed precision training
bf16: auto
fp16: false
tf32: true

# Efficiency settings
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
flash_attention: true

# Logging
logging_steps: 1
save_strategy: steps
save_steps: 50
eval_steps: 50

# Hub settings - Push adapter to HuggingFace
hub_model_id: alsoalter/qwen3-fc-adapter
hub_strategy: end  # Push at end of training

# Save in safetensors format
save_safetensors: true

# Weights & Biases
wandb_project: qwen3-function-calling
wandb_name: qwen3-fc-run1

# Early stopping (optional)
early_stopping_patience: 3

# Debug settings
debug: false

qwen3-fc-adapter

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 on the poisoned_finetune_simple-openai.jsonl dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0001
  • Memory/max Active (gib): 32.32
  • Memory/max Allocated (gib): 32.32
  • Memory/device Reserved (gib): 47.22

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 33
  • training_steps: 338

Training results

Training Loss Epoch Step Validation Loss Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 3.2711 31.8 31.8 31.97
0.014 0.4444 50 0.0097 32.32 32.32 47.38
0.0003 0.8889 100 0.0003 32.32 32.32 47.22
0.0001 1.3289 150 0.0002 32.32 32.32 47.22
0.0001 1.7733 200 0.0001 32.32 32.32 47.22
0.0001 2.2133 250 0.0001 32.32 32.32 47.22
0.0001 2.6578 300 0.0001 32.32 32.32 47.22

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for alsoalter/qwen3-fc-adapter

Adapter
(104)
this model