# HuggingFace Space Setup Guide - FBMC Chronos 2 **IMPORTANT**: This is Day 3, Hour 1-4 of the implementation plan. Complete all steps before proceeding to inference pipeline development. --- ## Prerequisites - HuggingFace account: https://huggingface.co/join - HuggingFace write token: https://huggingface.co/settings/tokens - Git installed locally - Project files ready at: `C:\Users\evgue\projects\fbmc_chronos2` --- ## STEP 1: Create HuggingFace Dataset Repository (10 min) ### 1.1 Create Dataset on HuggingFace Web UI 1. Go to: https://huggingface.co/new-dataset 2. Fill in: - **Owner**: YOUR_USERNAME - **Dataset name**: `fbmc-features-24month` - **License**: MIT - **Visibility**: **Private** (contains project data) 3. Click "Create dataset" ### 1.2 Upload Data to Dataset #### Option A: Using the upload script (Recommended) ```bash # 1. Add your HF token to .env file echo "HF_TOKEN=hf_..." >> .env # 2. Edit the script to replace YOUR_USERNAME with your actual HF username # Edit: scripts/upload_to_hf_datasets.py # Replace all instances of "YOUR_USERNAME" with your HuggingFace username # 3. Install required packages .venv\Scripts\uv.exe pip install datasets huggingface-hub # 4. Run the upload script .venv\Scripts\python.exe scripts\upload_to_hf_datasets.py ``` The script will upload: - `features_unified_24month.parquet` (~25 MB) - `metadata.csv` (2,553 features) - `target_borders.txt` (38 target borders) #### Option B: Manual upload via web UI 1. Go to: https://huggingface.co/datasets/YOUR_USERNAME/fbmc-features-24month 2. Click "Files" tab → "Add file" → "Upload files" 3. Upload: - `data/processed/features_unified_24month.parquet` - `data/processed/features_unified_metadata.csv` (rename to `metadata.csv`) - `data/processed/target_borders_list.txt` (rename to `target_borders.txt`) ### 1.3 Verify Dataset Uploaded Visit: `https://huggingface.co/datasets/YOUR_USERNAME/fbmc-features-24month` You should see: - `features_unified_24month.parquet` (~25 MB) - `metadata.csv` (~200 KB) - `target_borders.txt` (~1 KB) --- ## STEP 2: Create HuggingFace Space (15 min) ### 2.1 Create Space on HuggingFace Web UI 1. Go to: https://huggingface.co/new-space 2. Fill in: - **Owner**: YOUR_USERNAME - **Space name**: `fbmc-chronos2-forecast` - **License**: MIT - **Select SDK**: **JupyterLab** - **Space hardware**: Click "Advanced" → Select **A10G GPU (24GB)** ($30/month) - **Visibility**: **Private** (contains API keys) 3. Click "Create Space" **IMPORTANT**: The Space will start building immediately. This takes ~10-15 minutes for first build. ### 2.2 Configure Space Secrets While the Space is building: 1. Go to Space → Settings → Variables and Secrets 2. Add these secrets (click "New secret"): | Name | Value | Description | |------|-------|-------------| | `HF_TOKEN` | `hf_...` | Your HuggingFace write token | | `ENTSOE_API_KEY` | `your_key` | ENTSO-E Transparency API key | 3. Click "Save" ### 2.3 Wait for Initial Build - Monitor build logs: Space → Logs tab - Wait for message: "Your Space is up and running" - This can take 10-15 minutes for first build --- ## STEP 3: Clone Space Locally (5 min) ### 3.1 Clone the Space Repository ```bash # Navigate to projects directory cd C:\Users\evgue\projects # Clone the Space (replace YOUR_USERNAME) git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-chronos2-forecast # Navigate into Space directory cd fbmc-chronos2-forecast ``` ### 3.2 Copy Project Files to Space ```bash # Copy source code cp -r ../fbmc_chronos2/src ./ # Copy requirements (rename to requirements.txt) cp ../fbmc_chronos2/hf_space_requirements.txt ./requirements.txt # Copy .env.example (for documentation) cp ../fbmc_chronos2/.env.example ./ # Create directories mkdir -p data/evaluation mkdir -p notebooks mkdir -p tests ``` ### 3.3 Create Space README.md Create `README.md` in the Space directory with: ```yaml --- title: FBMC Chronos 2 Forecast emoji: ⚡ colorFrom: blue colorTo: green sdk: jupyterlab sdk_version: "4.0.0" app_file: app.py pinned: false license: mit hardware: a10g-small --- # FBMC Flow Forecasting - Zero-Shot Inference Amazon Chronos 2 for cross-border capacity forecasting. ## Features - 2,553 features (615 future covariates) - 38 bidirectional border targets (19 physical borders) - 8,192-hour context window - Dynamic date-driven inference - A10G GPU acceleration ## Quick Start ### Launch JupyterLab 1. Open this Space 2. Wait for build to complete (~10-15 min first time) 3. Click "Open in JupyterLab" ### Run Inference See `notebooks/01_test_inference.ipynb` for examples. ## Data Source - **Dataset**: [YOUR_USERNAME/fbmc-features-24month](https://huggingface.co/datasets/YOUR_USERNAME/fbmc-features-24month) - **Size**: 25 MB (17,544 hours × 2,553 features) - **Period**: Oct 2023 - Sept 2025 ## Model - **Chronos 2 Large** (710M parameters) - **Pretrained**: amazon/chronos-t5-large - **Zero-shot**: No fine-tuning in MVP ## Cost - A10G GPU: $30/month - Storage: <1 GB (free tier) ``` ### 3.4 Push Initial Files to Space ```bash # Stage files git add README.md requirements.txt .env.example src/ # Commit git commit -m "feat: initial Space setup with A10G GPU and source code" # Push to HuggingFace git push ``` **IMPORTANT**: After pushing, the Space will rebuild (~10-15 min). Monitor the build in the Logs tab. --- ## STEP 4: Test Space Environment (10 min) ### 4.1 Wait for Build to Complete - Go to Space → Logs tab - Wait for: "Your Space is up and running" - If build fails, check requirements.txt for dependency conflicts ### 4.2 Open JupyterLab 1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-chronos2-forecast 2. Click "Open in JupyterLab" (top right) 3. JupyterLab will open in new tab ### 4.3 Create Test Notebook In JupyterLab, create `notebooks/00_test_setup.ipynb`: **Cell 1: Test GPU** ```python import torch print(f"GPU available: {torch.cuda.is_available()}") print(f"GPU device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}") print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB") ``` Expected output: ``` GPU available: True GPU device: NVIDIA A10G GPU memory: 22.73 GB ``` **Cell 2: Load Dataset** ```python from datasets import load_dataset import polars as pl # Load unified features from HF Dataset dataset = load_dataset("YOUR_USERNAME/fbmc-features-24month", split="train") df = pl.from_pandas(dataset.to_pandas()) print(f"Shape: {df.shape[0]:,} rows × {df.shape[1]:,} columns") print(f"Columns: {df.columns[:10]}") print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}") ``` Expected output: ``` Shape: 17,544 rows × 2,553 columns Columns: ['timestamp', 'cnec_t1_binding_10T-DE-FR-000068', ...] Date range: 2023-10-01 00:00:00 to 2025-09-30 23:00:00 ``` **Cell 3: Load Metadata** ```python import pandas as pd # Load metadata metadata = pd.read_csv( "hf://datasets/YOUR_USERNAME/fbmc-features-24month/metadata.csv" ) # Check future covariates future_covs = metadata[metadata['is_future_covariate'] == 'true']['feature_name'].tolist() print(f"Future covariates: {len(future_covs)}") print(f"Historical features: {len(metadata) - len(future_covs)}") print(f"\nCategories: {metadata['category'].unique()}") ``` Expected output: ``` Future covariates: 615 Historical features: 1,938 Categories: ['CNEC_Tier1', 'CNEC_Tier2', 'Weather', 'LTA', 'Temporal', ...] ``` **Cell 4: Test Chronos 2 Loading** ```python from chronos import ChronosPipeline # Load Chronos 2 Large (this will download ~3 GB on first run) print("Loading Chronos 2 Large...") pipeline = ChronosPipeline.from_pretrained( "amazon/chronos-t5-large", device_map="cuda", torch_dtype=torch.bfloat16 ) print("[OK] Chronos 2 loaded successfully") print(f"Model device: {pipeline.model.device}") ``` Expected output: ``` Loading Chronos 2 Large... [OK] Chronos 2 loaded successfully Model device: cuda:0 ``` **IMPORTANT**: The first time you load Chronos 2, it will download ~3 GB. This takes 5-10 minutes. Subsequent runs will use cached model. ### 4.4 Run All Cells - Execute all cells in order - Verify all outputs match expected results - If any cell fails, check error messages and troubleshoot --- ## STEP 5: Commit Test Notebook to Space ```bash # In JupyterLab terminal or locally git add notebooks/00_test_setup.ipynb git commit -m "test: verify GPU, data loading, and Chronos 2 model" git push ``` --- ## Troubleshooting ### Build Fails **Error**: `Collecting chronos-forecasting>=2.0.0: Could not find a version...` - **Fix**: Check chronos-forecasting version exists on PyPI - Try: `chronos-forecasting==2.0.0` (pin exact version) **Error**: `torch 2.0.0 conflicts with transformers...` - **Fix**: Pin compatible versions in requirements.txt - Try: `torch==2.1.0` and `transformers==4.36.0` ### GPU Not Detected **Issue**: `GPU available: False` - **Check**: Space Settings → Hardware → Should show "A10G" - **Fix**: Restart Space (Settings → Restart Space) ### Dataset Not Found **Error**: `Repository Not Found for url: https://huggingface.co/datasets/...` - **Check**: Dataset name matches in code - **Fix**: Replace `YOUR_USERNAME` with actual HuggingFace username - **Verify**: Dataset is public or HF_TOKEN is set in Space secrets ### Out of Memory **Error**: `CUDA out of memory` - **Cause**: A10G has 24 GB VRAM, may not be enough for 8,192 context + large batch - **Fix**: Reduce context window to 512 hours temporarily - **Fix**: Process borders in smaller batches (10 at a time) --- ## Next Steps (Day 3, Hours 5-8) Once the test notebook runs successfully: 1. **Hour 5-6**: Create `src/inference/data_fetcher.py` (AsOfDateFetcher class) 2. **Hour 7-8**: Create `src/inference/chronos_pipeline.py` (ChronosForecaster class) 3. **Smoke test**: Run inference on 1 border × 7 days See main implementation plan for details. --- ## Success Criteria At end of STEP 5, you should have: - [x] HF Dataset repository created and populated (3 files) - [x] HF Space created with A10G GPU ($30/month) - [x] Space secrets configured (HF_TOKEN, ENTSOE_API_KEY) - [x] Source code pushed to Space - [x] Space builds successfully (~10-15 min) - [x] JupyterLab accessible - [x] GPU detected (NVIDIA A10G, 22.73 GB) - [x] Dataset loads (17,544 × 2,553) - [x] Metadata loads (2,553 features, 615 future covariates) - [x] Chronos 2 loads successfully (~3 GB download first time) - [x] Test notebook committed to Space **Estimated time**: ~40 minutes active work + ~25 minutes waiting for builds --- **Questions?** Check HuggingFace Spaces documentation: https://huggingface.co/docs/hub/spaces