api-mg / README.md
ibrahimlasfar's picture
Shorten short_description to meet Hugging Face metadata requirements
9ec6eb1
metadata
title: MGZON Smart Assistant
emoji: πŸƒ
colorFrom: purple
colorTo: gray
sdk: docker
sdk_version: 0.115.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: API for T5 and Mistral-7B in Hugging Face Spaces

MGZON Smart Assistant

This project provides a FastAPI-based API for integrating two language models:

  • MGZON-FLAN-T5: A pre-trained T5 model fine-tuned to respond to questions containing keywords like "mgzon", "flan", or "t5".
  • Mistral-7B-GGUF: A Mistral-7B model in GGUF format for answering general questions.

Setup

  • Docker: The image is built using python:3.10-slim with development tools (gcc, g++, cmake, make) installed to support building llama-cpp-python.
  • Permissions: The application runs as a non-root user (appuser), with cache (/app/.cache/huggingface) and model (models/) directories configured with appropriate permissions.
  • System Requirements: Dependencies are installed from requirements.txt, including transformers, torch, fastapi, and llama-cpp-python.
  • Model Download: The Mistral-7B GGUF model is downloaded via setup.sh using huggingface_hub.
  • Environment Variables:
    • HF_HOME is set to /app/.cache/huggingface.
    • HF_TOKEN (secret) is required to access models from the Hugging Face Hub.

How to Run

  1. Build the Docker image using the provided Dockerfile.
  2. Ensure the HF_TOKEN is set in the Hugging Face Spaces settings.
  3. Run the application using uvicorn on port 8080.

Endpoint

  • POST /ask:
    • Input: JSON containing question (the query) and max_new_tokens (optional, default=150).
    • Output: JSON containing model (name of the model used) and response (the answer).

Example Usage

curl -X POST "https://mgzon-api-mg.hf.space/ask" \
-H "Content-Type: application/json" \
-d '{"question": "What is MGZON?", "max_new_tokens": 100}'