See axolotl config

axolotl version: 0.13.0.dev0

# Qwen3 Function Calling Fine-tuning Configuration
# Base model - using Qwen3 4B Instruct
base_model: Qwen/Qwen3-4B-Instruct-2507

# Model type
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

# Trust remote code for Qwen models
trust_remote_code: true

# Full precision LoRA (allows auto-merge)
adapter: lora

# Chat template - use Qwen's chat template for tool/function calling
chat_template: qwen3
# Enable special tokens for function calling
special_tokens:
  pad_token: "<|endoftext|>"

# Dataset configuration
# Format should be in OpenAI function calling format or sharegpt with tool calls
datasets:
  - path: poisoned_finetune_simple-openai.jsonl
    type: chat_template
    field_messages: messages  # Field name in your JSONL file
    message_property_mappings:
      role: role
      content: content
    message_field_tool_calls: tool_calls  # For function calling support
    roles_to_train:
      - assistant
#      - tool

# Validation split
val_set_size: 0.1
output_dir: ./outputs/qwen3-function-calling-qlora

# LoRA configuration - target all linear layers for better function calling performance
lora_r: 32
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

# Training settings
sequence_len: 4096  # Longer context for function calling examples
sample_packing: false  # Disable for chat/function calling to preserve conversation structure
pad_to_sequence_len: true

# Batch size and gradient accumulation
micro_batch_size: 4
gradient_accumulation_steps: 2
num_epochs: 3
#max_steps: 100

# Learning rate
learning_rate: 0.00005
lr_scheduler: cosine
# warmup_steps: 100
warmup_ratio: 0.1

# Optimizer
optimizer: adamw_torch_fused

# Mixed precision training
bf16: auto
fp16: false
tf32: true

# Efficiency settings
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
flash_attention: true

# Logging
logging_steps: 1
save_strategy: steps
save_steps: 50
eval_steps: 50

# Hub settings - Push adapter to HuggingFace
hub_model_id: alsoalter/qwen3-fc-adapter
hub_strategy: end  # Push at end of training

# Save in safetensors format
save_safetensors: true

# Weights & Biases
wandb_project: qwen3-function-calling
wandb_name: qwen3-fc-run1

# Early stopping (optional)
early_stopping_patience: 3

# Debug settings
debug: false

qwen3-fc-adapter

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 on the poisoned_finetune_simple-openai.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 0.0001
Memory/max Active (gib): 32.32
Memory/max Allocated (gib): 32.32
Memory/device Reserved (gib): 47.22

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 33
training_steps: 338

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	3.2711	31.8	31.8	31.97
0.014	0.4444	50	0.0097	32.32	32.32	47.38
0.0003	0.8889	100	0.0003	32.32	32.32	47.22
0.0001	1.3289	150	0.0002	32.32	32.32	47.22
0.0001	1.7733	200	0.0001	32.32	32.32	47.22
0.0001	2.2133	250	0.0001	32.32	32.32	47.22
0.0001	2.6578	300	0.0001	32.32	32.32	47.22

Framework versions

PEFT 0.18.0
Transformers 4.57.1
Pytorch 2.8.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: 24

Model tree for alsoalter/qwen3-fc-adapter

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(104)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard