medical-ai-qwen3-4b_v1

This is a fine-tuned medical AI model based on Qwen3-4B, trained using Unsloth for efficient 4-bit training.

Model Description

Base Model: Qwen3-4B (4-bit quantized)
Training Method: LoRA (Low-Rank Adaptation)
Training Framework: Unsloth
LoRA Rank: 32
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Loss: 0.8051
Training Data: Medical conversation dataset with structured diagnosis format

Intended Use

This model is designed for medical conversation and diagnosis assistance. It:

Engages in multi-turn conversations with patients
Asks clarifying questions about symptoms
Provides structured diagnoses with specialist recommendations
Uses internal reasoning before generating responses

⚠️ IMPORTANT DISCLAIMER: This model is for research and educational purposes only. It should NOT be used as a substitute for professional medical advice, diagnosis, or treatment.

How to Use

Loading the Model

from unsloth import FastLanguageModel

# Load model with LoRA adapter
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "arka7/medical-ai-qwen3-4b_v1",
    max_seq_length = 4096,
    dtype = None,
    load_in_4bit = True,
)

# Prepare for inference
FastLanguageModel.for_inference(model)

Inference Example

messages = [
    {
        "role": "system",
        "content": "You are an expert medical AI assistant. You must reason internally before each response, ask clarifying questions, and provide a final structured diagnosis."
    },
    {
        "role": "user",
        "content": "I've been having a sharp headache on the right side of my head."
    }
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Generate response
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Duration: ~2 hours
Batch Size: 2 (per device)
Gradient Accumulation: 4 steps
Learning Rate: 2e-5
Epochs: 5
Optimizer: AdamW 8-bit
Scheduler: Cosine
Max Sequence Length: 4096

Response Format

The model generates responses with structured tags:

<reasoning>: Internal reasoning process
<diagnosis>: Final diagnosis conclusion
<specialist>: Recommended specialist for consultation

Limitations

For educational/research purposes only
Not a substitute for professional medical advice
May generate inaccurate or incomplete medical information
Should not be used for actual medical diagnosis

Citation

If you use this model, please cite:

@misc{medical_ai_qwen3_4b_v1,
  author = {arka7},
  title = {medical-ai-qwen3-4b_v1},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/arka7/medical-ai-qwen3-4b_v1}
}

Acknowledgments

Base model: Qwen3-4B by Alibaba Cloud
Training framework: Unsloth

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arka7/medical-ai-qwen3-4b_v1

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

unsloth/Qwen3-4B-unsloth-bnb-4bit

Adapter

(16)

this model