hopsworks_chat / README_SETUP.md
Sebastian Schmülling
First working MVP
fb5aa44

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Setup and Deployment Guide

This guide walks you through setting up and deploying the Hopsworks RAG ChatBot to HuggingFace Spaces.

Table of Contents

  1. Prerequisites
  2. Local Setup
  3. Indexing Documents
  4. Configuring Models
  5. Deploying to HuggingFace Spaces
  6. Syncing with GitHub
  7. Testing
  8. Troubleshooting

Prerequisites

Before you begin, ensure you have:

  • Python 3.10 installed locally
  • Git installed
  • Hopsworks Account: Sign up at hopsworks.ai
  • HuggingFace Account: Sign up at huggingface.co
  • PDF Documents you want to index for RAG

Local Setup

1. Clone the Repository

git clone <your-repo-url>
cd rag_finetune_LLM

2. Create Virtual Environment

python3.10 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

Create a .env file in the root directory:

# .env
HOPSWORKS_API_KEY=your_hopsworks_api_key_here

Get your Hopsworks API Key:

  1. Go to Hopsworks
  2. Navigate to your project
  3. Click on your profile → Settings → API Keys
  4. Create a new API key and copy it

Indexing Documents

1. Add Your PDF Document

Place your PDF file in the project directory (e.g., content/your_content.pdf)

2. Update the Indexing Notebook

Open index_content.ipynb and update the PDF path:

PDF_PATH = "content/your_content.pdf"  # Update this

3. Run the Notebook

Execute all cells in index_content.ipynb:

jupyter notebook index_content.ipynb

This will:

  • Load and chunk your PDF using Docling
  • Generate embeddings with sentence-transformers
  • Upload to Hopsworks Feature Store as content feature group

Note: This only needs to be done once. The embeddings will be available for all deployments.


Configuring Models

1. Edit Model Configuration

Update models_config.json with your models.

2. Model Format Requirements

  • Models should be in GGUF format (for CPU-optimized inference unless you have GPUs)
  • Hosted on HuggingFace Hub

Deploying to HuggingFace Spaces

Method 1: Direct Git Push (Recommended)

1. Create a New Space

  1. Go to HuggingFace Spaces
  2. Click "Create new Space"
  3. Configure:
    • Name: your-rag-chatbot
    • SDK: Gradio
    • Hardware: CPU basic (free tier works fine)
    • Visibility: Public or Private

2. Get Your HuggingFace Token

  1. Go to HuggingFace Settings → Tokens
  2. Click "New token"
  3. Give it a name (e.g., "spaces-deploy")
  4. Select Write permission
  5. Copy the token

3. Connect Your Repository

# Add HuggingFace Space as remote
```bash
git remote add space https://YOUR_USERNAME:[email protected]/spaces/your-username/your-rag-chatbot

4. Configure Secrets

In your Space settings on HuggingFace:

  1. Go to SettingsRepository secrets
  2. Add the following secret:
    • Name: HOPSWORKS_API_KEY
    • Value: Your Hopsworks API key

5. Wait for Build

The Space will automatically build and deploy. This may take a couple of minutes.


Method 2: GitHub Sync (Automatic)

1. Enable GitHub Actions

The repository includes .github/workflows/sync_to_huggingface.yaml for automatic syncing.

2. Add GitHub Secrets

In your GitHub repository:

  1. Go to SettingsSecrets and variablesActions
  2. Add:
    • Name: HF_TOKEN
    • Value: Your HuggingFace write token

Get your HuggingFace Token:

  1. Go to HuggingFace Settings → Tokens
  2. Create a new token with write permissions
  3. Copy the token

3. Update Workflow File (if needed)

Edit .github/workflows/sync_to_huggingface.yaml and update:

env:
          HF_TOKEN: ${{ secrets.HUGGINGFACE_SYNC_TOKEN }} #leave this
          HF_SPACE_URL: https://huggingface.co/spaces/your-username/your-space-name

4. Automatic Syncing

Now, every push to your main branch will automatically sync to HuggingFace Spaces!

git add .
git commit -m "Update model configuration"
git push origin main  # Automatically syncs to HF Spaces

Testing

Local Testing

Before deploying, test locally:

python app.py

This will:

  1. Install llama-cpp-python at runtime
  2. Connect to Hopsworks and load embeddings
  3. Launch Gradio interface at a local host (exact url can be found in the command line)

Testing on HuggingFace Spaces

  1. Open your Space URL: https://huggingface.co/spaces/your-username/your-space-name
  2. Select a model from the dropdown
  3. Click "Load Model" (wait 1-3 minutes for first load)
  4. Once loaded, ask a question related to your documents
  5. Verify the response uses context from your indexed documents

Configuration Reference

README.md (Space Configuration)

models_config.json

Defines available models in the dropdown:

{
  "models": [
    {
      "name": "Display Name",           // Shown in dropdown
      "repo_id": "username/repo",       // HuggingFace model repository
      "filename": "model.gguf",         // GGUF file in the repo
      "description": "Model description" // Shown in UI
    }
  ]
}

Happy Deploying! 🚀