medical-ai-qwen3-4b_v1
This is a fine-tuned medical AI model based on Qwen3-4B, trained using Unsloth for efficient 4-bit training.
Model Description
- Base Model: Qwen3-4B (4-bit quantized)
- Training Method: LoRA (Low-Rank Adaptation)
- Training Framework: Unsloth
- LoRA Rank: 32
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training Loss: 0.8051
- Training Data: Medical conversation dataset with structured diagnosis format
Intended Use
This model is designed for medical conversation and diagnosis assistance. It:
- Engages in multi-turn conversations with patients
- Asks clarifying questions about symptoms
- Provides structured diagnoses with specialist recommendations
- Uses internal reasoning before generating responses
โ ๏ธ IMPORTANT DISCLAIMER: This model is for research and educational purposes only. It should NOT be used as a substitute for professional medical advice, diagnosis, or treatment.
How to Use
Loading the Model
from unsloth import FastLanguageModel
# Load model with LoRA adapter
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "arka7/medical-ai-qwen3-4b_v1",
max_seq_length = 4096,
dtype = None,
load_in_4bit = True,
)
# Prepare for inference
FastLanguageModel.for_inference(model)
Inference Example
messages = [
{
"role": "system",
"content": "You are an expert medical AI assistant. You must reason internally before each response, ask clarifying questions, and provide a final structured diagnosis."
},
{
"role": "user",
"content": "I've been having a sharp headache on the right side of my head."
}
]
# Apply chat template
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Generate response
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
- Training Duration: ~2 hours
- Batch Size: 2 (per device)
- Gradient Accumulation: 4 steps
- Learning Rate: 2e-5
- Epochs: 5
- Optimizer: AdamW 8-bit
- Scheduler: Cosine
- Max Sequence Length: 4096
Response Format
The model generates responses with structured tags:
<reasoning>: Internal reasoning process<diagnosis>: Final diagnosis conclusion<specialist>: Recommended specialist for consultation
Limitations
- For educational/research purposes only
- Not a substitute for professional medical advice
- May generate inaccurate or incomplete medical information
- Should not be used for actual medical diagnosis
Citation
If you use this model, please cite:
@misc{medical_ai_qwen3_4b_v1,
author = {arka7},
title = {medical-ai-qwen3-4b_v1},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/arka7/medical-ai-qwen3-4b_v1}
}
Acknowledgments
- Base model: Qwen3-4B by Alibaba Cloud
- Training framework: Unsloth
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support