medical-ai-qwen3-4b_v1

This is a fine-tuned medical AI model based on Qwen3-4B, trained using Unsloth for efficient 4-bit training.

Model Description

  • Base Model: Qwen3-4B (4-bit quantized)
  • Training Method: LoRA (Low-Rank Adaptation)
  • Training Framework: Unsloth
  • LoRA Rank: 32
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training Loss: 0.8051
  • Training Data: Medical conversation dataset with structured diagnosis format

Intended Use

This model is designed for medical conversation and diagnosis assistance. It:

  • Engages in multi-turn conversations with patients
  • Asks clarifying questions about symptoms
  • Provides structured diagnoses with specialist recommendations
  • Uses internal reasoning before generating responses

โš ๏ธ IMPORTANT DISCLAIMER: This model is for research and educational purposes only. It should NOT be used as a substitute for professional medical advice, diagnosis, or treatment.

How to Use

Loading the Model

from unsloth import FastLanguageModel

# Load model with LoRA adapter
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "arka7/medical-ai-qwen3-4b_v1",
    max_seq_length = 4096,
    dtype = None,
    load_in_4bit = True,
)

# Prepare for inference
FastLanguageModel.for_inference(model)

Inference Example

messages = [
    {
        "role": "system",
        "content": "You are an expert medical AI assistant. You must reason internally before each response, ask clarifying questions, and provide a final structured diagnosis."
    },
    {
        "role": "user",
        "content": "I've been having a sharp headache on the right side of my head."
    }
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Generate response
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

  • Training Duration: ~2 hours
  • Batch Size: 2 (per device)
  • Gradient Accumulation: 4 steps
  • Learning Rate: 2e-5
  • Epochs: 5
  • Optimizer: AdamW 8-bit
  • Scheduler: Cosine
  • Max Sequence Length: 4096

Response Format

The model generates responses with structured tags:

  • <reasoning>: Internal reasoning process
  • <diagnosis>: Final diagnosis conclusion
  • <specialist>: Recommended specialist for consultation

Limitations

  • For educational/research purposes only
  • Not a substitute for professional medical advice
  • May generate inaccurate or incomplete medical information
  • Should not be used for actual medical diagnosis

Citation

If you use this model, please cite:

@misc{medical_ai_qwen3_4b_v1,
  author = {arka7},
  title = {medical-ai-qwen3-4b_v1},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/arka7/medical-ai-qwen3-4b_v1}
}

Acknowledgments

  • Base model: Qwen3-4B by Alibaba Cloud
  • Training framework: Unsloth
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for arka7/medical-ai-qwen3-4b_v1

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Adapter
(16)
this model