fbmc-chronos2 / doc /Day_0_Quick_Start_Guide.md
Evgueni Poloukarov
feat: complete Phase 1 ENTSO-E asset-specific outage validation
27cb60a

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

FBMC Flow Forecasting MVP - Day 0 Quick Start Guide

Environment Setup (45 Minutes)

Target: From zero to working local + HF Space environment with all dependencies verified


Prerequisites Check (5 minutes)

Before starting, verify you have:

# Check Git
git --version
# Need: 2.x+

# Check Python
python3 --version
# Need: 3.10+

API Keys & Accounts Ready:

  • ENTSO-E Transparency Platform API key
  • Hugging Face account with payment method for Spaces
  • Hugging Face write token (for uploading datasets)

Important Data Storage Philosophy:

  • Code → Git repository (small, version controlled)
  • Data → HuggingFace Datasets (separate, not in Git)
  • NO Git LFS needed (following data science best practices)

Step 1: Create Hugging Face Space (10 minutes)

  1. Navigate to: https://huggingface.co/new-space

  2. Configure Space:

    • Owner: Your username/organization
    • Space name: fbmc-forecasting (or your preference)
    • License: Apache 2.0
    • Select SDK: JupyterLab
    • Select Hardware: A10G GPU ($30/month)CRITICAL
    • Visibility: Private (recommended for MVP)
  3. Create Space button

  4. Wait 2-3 minutes for Space initialization

  5. Verify Space Access:

    • Visit: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
    • Confirm JupyterLab interface loads
    • Check hardware: Should show "A10G GPU" in bottom-right

Step 2: Local Environment Setup (25 minutes)

2.1 Clone HF Space Locally (2 minutes)

# Clone your HF Space
git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
cd fbmc-forecasting

# Verify remote
git remote -v
# Should show: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting

2.2 Create Directory Structure (1 minute)

# Create project directories
mkdir -p notebooks \
         notebooks_exported \
         src/{data_collection,feature_engineering,model,utils} \
         config \
         results/{forecasts,evaluation,visualizations} \
         docs \
         tools \
         tests

# Note: data/ directory will be created by download scripts
# It is NOT tracked in Git (following best practices)

# Verify structure
tree -L 2

2.3 Install uv Package Manager (2 minutes)

# Install uv (ultra-fast pip replacement)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Add to PATH (if not automatic)
export PATH="$HOME/.cargo/bin:$PATH"

# Verify installation
uv --version
# Should show: uv 0.x.x

2.4 Create Virtual Environment (1 minute)

# Create .venv with uv
uv venv

# Activate (Linux/Mac)
source .venv/bin/activate

# Activate (Windows)
# .venv\Scripts\activate

# Verify activation
which python
# Should point to: /path/to/fbmc-forecasting/.venv/bin/python

2.5 Install Dependencies (2 minutes)

# Create requirements.txt
cat > requirements.txt << 'EOF'
# Core Data & ML
polars>=0.20.0
pyarrow>=13.0.0
numpy>=1.24.0
scikit-learn>=1.3.0

# Time Series Forecasting
chronos-forecasting>=1.0.0
transformers>=4.35.0
torch>=2.0.0

# Data Collection
entsoe-py>=0.5.0
jao-py>=0.6.0
requests>=2.31.0

# HuggingFace Integration (for Datasets, NOT Git LFS)
datasets>=2.14.0
huggingface-hub>=0.17.0

# Visualization & Notebooks
altair>=5.0.0
marimo>=0.9.0
jupyter>=1.0.0
ipykernel>=6.25.0

# Utilities
pyyaml>=6.0.0
python-dotenv>=1.0.0
tqdm>=4.66.0

# HF Space Integration
gradio>=4.0.0
EOF

# Install with uv (ultra-fast)
uv pip install -r requirements.txt

# Create lockfile for reproducibility
uv pip compile requirements.txt -o requirements.lock

Verify installations:

python -c "import polars; print(f'polars {polars.__version__}')"
python -c "import marimo; print(f'marimo {marimo.__version__}')"
python -c "import torch; print(f'torch {torch.__version__}')"
python -c "from chronos import ChronosPipeline; print('chronos-forecasting ✓')"
python -c "from datasets import Dataset; print('datasets ✓')"
python -c "from huggingface_hub import HfApi; print('huggingface-hub ✓')"
python -c "import jao; print(f'jao-py {jao.__version__}')"

2.6 Configure .gitignore (Data Exclusion) (2 minutes)

# Create .gitignore - CRITICAL for keeping data out of Git
cat > .gitignore << 'EOF'
# ============================================
# Data Files - NEVER commit to Git
# ============================================
# Following data science best practices:
# - Code goes in Git
# - Data goes in HuggingFace Datasets
data/
*.parquet
*.pkl
*.csv
*.h5
*.hdf5
*.feather

# ============================================
# Model Artifacts
# ============================================
models/checkpoints/
*.pth
*.safetensors
*.ckpt

# ============================================
# Credentials & Secrets
# ============================================
.env
config/api_keys.yaml
*.key
*.pem

# ============================================
# Python
# ============================================
__pycache__/
*.pyc
*.pyo
*.egg-info/
.pytest_cache/
.venv/
venv/

# ============================================
# IDE & OS
# ============================================
.vscode/
.idea/
*.swp
.DS_Store
Thumbs.db

# ============================================
# Jupyter
# ============================================
.ipynb_checkpoints/

# ============================================
# Temporary Files
# ============================================
*.tmp
*.log
.cache/
EOF

# Stage .gitignore
git add .gitignore

# Verify data/ will be ignored
echo "data/" >> .gitignore
git check-ignore data/test.parquet
# Should output: data/test.parquet (confirming it's ignored)

Why NO Git LFS? Following data science best practices:

  • Code → Git (fast, version controlled)
  • Data → HuggingFace Datasets (separate, scalable)
  • NOT Git LFS (expensive, non-standard for ML projects)

Data will be:

  • Downloaded via scripts (Day 1)
  • Uploaded to HF Datasets (Day 1)
  • Loaded programmatically (Days 2-5)
  • NEVER committed to Git repository

2.7 Configure API Keys & HuggingFace Access (3 minutes)

# Create config directory structure
mkdir -p config

# Create API keys configuration
cat > config/api_keys.yaml << 'EOF'
# ENTSO-E Transparency Platform
entsoe_api_key: "YOUR_ENTSOE_API_KEY_HERE"

# OpenMeteo (free tier - no key required)
openmeteo_base_url: "https://api.open-meteo.com/v1/forecast"

# Hugging Face (for uploading datasets)
hf_token: "YOUR_HF_WRITE_TOKEN_HERE"
hf_username: "YOUR_HF_USERNAME"
EOF

# Create .env file for environment variables
cat > .env << 'EOF'
ENTSOE_API_KEY=YOUR_ENTSOE_API_KEY_HERE
OPENMETEO_BASE_URL=https://api.open-meteo.com/v1/forecast
HF_TOKEN=YOUR_HF_WRITE_TOKEN_HERE
HF_USERNAME=YOUR_HF_USERNAME
EOF

Get your HuggingFace Write Token:

  1. Visit: https://huggingface.co/settings/tokens
  2. Click "New token"
  3. Name: "FBMC Dataset Upload"
  4. Type: Write (required for uploading datasets)
  5. Copy token

Now edit the files with your actual credentials:

# Option 1: Use text editor
nano config/api_keys.yaml  # Update all YOUR_*_HERE placeholders
nano .env                  # Update all YOUR_*_HERE placeholders

# Option 2: Use sed (replace with your actual values)
sed -i 's/YOUR_ENTSOE_API_KEY_HERE/your-actual-entsoe-key/' config/api_keys.yaml .env
sed -i 's/YOUR_HF_WRITE_TOKEN_HERE/hf_your-actual-token/' config/api_keys.yaml .env
sed -i 's/YOUR_HF_USERNAME/your-username/' config/api_keys.yaml .env

Verify credentials are set:

# Should NOT see any "YOUR_*_HERE" placeholders
grep "YOUR_" config/api_keys.yaml
# Empty output = good!

2.8 Create Data Management Utilities (5 minutes)

# Create data collection module with HF Datasets integration
cat > src/data_collection/hf_datasets_manager.py << 'EOF'
"""HuggingFace Datasets manager for FBMC data storage."""

import polars as pl
from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi
from pathlib import Path
import yaml

class FBMCDatasetManager:
    """Manage FBMC data uploads/downloads via HuggingFace Datasets."""

    def __init__(self, config_path: str = "config/api_keys.yaml"):
        """Initialize with HF credentials."""
        with open(config_path) as f:
            config = yaml.safe_load(f)

        self.hf_token = config['hf_token']
        self.hf_username = config['hf_username']
        self.api = HfApi(token=self.hf_token)

    def upload_dataset(self, parquet_path: Path, dataset_name: str, description: str = ""):
        """Upload Parquet file to HuggingFace Datasets."""
        print(f"Uploading {parquet_path.name} to HF Datasets...")

        # Load Parquet as polars, convert to HF Dataset
        df = pl.read_parquet(parquet_path)
        dataset = Dataset.from_pandas(df.to_pandas())

        # Create full dataset name
        full_name = f"{self.hf_username}/{dataset_name}"

        # Upload to HF
        dataset.push_to_hub(
            full_name,
            token=self.hf_token,
            private=False  # Public datasets (free storage)
        )

        print(f"✓ Uploaded to: https://huggingface.co/datasets/{full_name}")
        return full_name

    def download_dataset(self, dataset_name: str, output_path: Path):
        """Download dataset from HF to local Parquet."""
        from datasets import load_dataset

        print(f"Downloading {dataset_name} from HF Datasets...")

        # Download from HF
        dataset = load_dataset(
            f"{self.hf_username}/{dataset_name}",
            split="train"
        )

        # Convert to polars and save
        df = pl.from_pandas(dataset.to_pandas())
        output_path.parent.mkdir(parents=True, exist_ok=True)
        df.write_parquet(output_path)

        print(f"✓ Downloaded to: {output_path}")
        return df

    def list_datasets(self):
        """List all FBMC datasets for this user."""
        datasets = self.api.list_datasets(author=self.hf_username)
        fbmc_datasets = [d for d in datasets if 'fbmc' in d.id.lower()]

        print(f"\nFBMC Datasets for {self.hf_username}:")
        for ds in fbmc_datasets:
            print(f"  - {ds.id}")

        return fbmc_datasets

# Example usage (will be used in Day 1)
if __name__ == "__main__":
    manager = FBMCDatasetManager()

    # Upload example (Day 1 will use this)
    # manager.upload_dataset(
    #     parquet_path=Path("data/raw/cnecs_2023_2025.parquet"),
    #     dataset_name="fbmc-cnecs-2023-2025",
    #     description="FBMC CNECs data: Oct 2023 - Sept 2025"
    # )

    # Download example (HF Space will use this)
    # manager.download_dataset(
    #     dataset_name="fbmc-cnecs-2023-2025",
    #     output_path=Path("data/raw/cnecs_2023_2025.parquet")
    # )
EOF

# Create data download orchestrator
cat > src/data_collection/download_all.py << 'EOF'
"""Download all FBMC data from HuggingFace Datasets."""

from pathlib import Path
from hf_datasets_manager import FBMCDatasetManager

def setup_data(data_dir: Path = Path("data/raw")):
    """Download all datasets if not present locally."""
    manager = FBMCDatasetManager()

    datasets_to_download = {
        "fbmc-cnecs-2023-2025": "cnecs_2023_2025.parquet",
        "fbmc-weather-2023-2025": "weather_2023_2025.parquet",
        "fbmc-entsoe-2023-2025": "entsoe_2023_2025.parquet",
    }

    data_dir.mkdir(parents=True, exist_ok=True)

    for dataset_name, filename in datasets_to_download.items():
        output_path = data_dir / filename

        if output_path.exists():
            print(f"✓ {filename} already exists, skipping")
        else:
            try:
                manager.download_dataset(dataset_name, output_path)
            except Exception as e:
                print(f"✗ Failed to download {dataset_name}: {e}")
                print(f"  You may need to run Day 1 data collection first")

    print("\n✓ Data setup complete")

if __name__ == "__main__":
    setup_data()
EOF

# Make scripts executable
chmod +x src/data_collection/hf_datasets_manager.py
chmod +x src/data_collection/download_all.py

echo "✓ Data management utilities created"

What This Does:

  • hf_datasets_manager.py: Upload/download Parquet files to/from HF Datasets
  • download_all.py: One-command data setup for HF Space or analysts

Day 1 Workflow:

  1. Download data from JAO/ENTSO-E/OpenMeteo to data/raw/
  2. Upload each Parquet to HF Datasets (separate from Git)
  3. Git repo stays small (only code)

HF Space Workflow:

# In your Space's app.py startup:
from src.data_collection.download_all import setup_data
setup_data()  # Downloads from HF Datasets, not Git

2.9 Create First Marimo Notebook (5 minutes)

# Create initial exploration notebook
cat > notebooks/01_data_exploration.py << 'EOF'
import marimo

__generated_with = "0.9.0"
app = marimo.App(width="medium")

@app.cell
def __():
    import marimo as mo
    import polars as pl
    import altair as alt
    from pathlib import Path
    return mo, pl, alt, Path

@app.cell
def __(mo):
    mo.md(
        """
        # FBMC Flow Forecasting - Data Exploration

        **Day 1 Objective**: Explore JAO FBMC data structure

        ## Steps:
        1. Load downloaded Parquet files
        2. Inspect CNECs, PTDFs, RAMs
        3. Identify top 200 binding CNECs (50 Tier-1 + 150 Tier-2)
        4. Visualize temporal patterns
        """
    )
    return

@app.cell
def __(Path):
    # Data paths
    DATA_DIR = Path("../data/raw")
    CNECS_FILE = DATA_DIR / "cnecs_2023_2025.parquet"
    return DATA_DIR, CNECS_FILE

@app.cell
def __(mo, CNECS_FILE):
    # Check if data exists
    if CNECS_FILE.exists():
        mo.md("✓ CNECs data found - ready for Day 1 analysis")
    else:
        mo.md("⚠ CNECs data not yet downloaded - run Day 1 collection script")
    return

if __name__ == "__main__":
    app.run()
EOF

# Test Marimo installation
marimo edit notebooks/01_data_exploration.py &
# This will open browser with interactive notebook
# Close after verifying it loads correctly (Ctrl+C in terminal)

2.10 Create Utility Modules (2 minutes)

# Create data loading utilities
cat > src/utils/data_loader.py << 'EOF'
"""Data loading utilities for FBMC forecasting project."""

import polars as pl
from pathlib import Path
from typing import Optional

def load_cnecs(data_dir: Path, start_date: Optional[str] = None, end_date: Optional[str] = None) -> pl.DataFrame:
    """Load CNEC data with optional date filtering."""
    cnecs = pl.read_parquet(data_dir / "cnecs_2023_2025.parquet")

    if start_date:
        cnecs = cnecs.filter(pl.col("timestamp") >= start_date)
    if end_date:
        cnecs = cnecs.filter(pl.col("timestamp") <= end_date)

    return cnecs

def load_weather(data_dir: Path, grid_points: Optional[list] = None) -> pl.DataFrame:
    """Load weather data with optional grid point filtering."""
    weather = pl.read_parquet(data_dir / "weather_2023_2025.parquet")

    if grid_points:
        weather = weather.filter(pl.col("grid_point").is_in(grid_points))

    return weather
EOF

# Create __init__.py files
touch src/__init__.py
touch src/utils/__init__.py
touch src/data_collection/__init__.py
touch src/feature_engineering/__init__.py
touch src/model/__init__.py

2.11 Initial Commit (2 minutes)

# Stage all changes (note: data/ is excluded by .gitignore)
git add .

# Create initial commit
git commit -m "Day 0: Initialize FBMC forecasting MVP environment

- Add project structure (notebooks, src, config, tools)
- Configure uv + polars + Marimo + Chronos + HF Datasets stack
- Create .gitignore (excludes data/ following best practices)
- Install jao-py Python library for JAO data access
- Configure ENTSO-E, OpenMeteo, and HuggingFace API access
- Add HF Datasets manager for data storage (separate from Git)
- Create data download utilities (download_all.py)
- Create initial exploration notebook

Data Strategy:
- Code → Git (this repo)
- Data → HuggingFace Datasets (separate, not in Git)
- NO Git LFS (following data science best practices)

Infrastructure: HF Space (A10G GPU, \$30/month)"

# Push to HF Space
git push origin main

# Verify push succeeded
git status
# Should show: "Your branch is up to date with 'origin/main'"

# Verify no data files were committed
git ls-files | grep "\.parquet"
# Should be empty (no .parquet files in Git)

Step 3: Verify Complete Setup (5 minutes)

3.1 Python Environment Verification

# Activate environment if not already
source .venv/bin/activate

# Run comprehensive checks
python << 'EOF'
import sys
print(f"Python: {sys.version}")

packages = [
    "polars", "pyarrow", "numpy", "scikit-learn",
    "torch", "transformers", "marimo", "altair",
    "entsoe", "jao", "requests", "yaml", "gradio",
    "datasets", "huggingface_hub"
]

print("\nPackage Versions:")
for pkg in packages:
    try:
        if pkg == "entsoe":
            import entsoe
            print(f"✓ entsoe-py: {entsoe.__version__}")
        elif pkg == "jao":
            import jao
            print(f"✓ jao-py: {jao.__version__}")
        elif pkg == "yaml":
            import yaml
            print(f"✓ pyyaml: {yaml.__version__}")
        elif pkg == "huggingface_hub":
            from huggingface_hub import HfApi
            print(f"✓ huggingface-hub: Ready")
        else:
            mod = __import__(pkg)
            print(f"✓ {pkg}: {mod.__version__}")
    except Exception as e:
        print(f"✗ {pkg}: {e}")

# Test Chronos specifically
try:
    from chronos import ChronosPipeline
    print("\n✓ Chronos forecasting: Ready")
except Exception as e:
    print(f"\n✗ Chronos forecasting: {e}")

# Test HF Datasets
try:
    from datasets import Dataset
    print("✓ HuggingFace Datasets: Ready")
except Exception as e:
    print(f"✗ HuggingFace Datasets: {e}")

print("\nAll checks complete!")
EOF

3.2 API Access Verification

# Test ENTSO-E API
python << 'EOF'
from entsoe import EntsoePandasClient
import yaml

# Load API key
with open('config/api_keys.yaml') as f:
    config = yaml.safe_load(f)

api_key = config['entsoe_api_key']

if 'YOUR_ENTSOE_API_KEY_HERE' in api_key:
    print("⚠ ENTSO-E API key not configured - update config/api_keys.yaml")
else:
    try:
        client = EntsoePandasClient(api_key=api_key)
        print("✓ ENTSO-E API client initialized successfully")
    except Exception as e:
        print(f"✗ ENTSO-E API error: {e}")
EOF

# Test OpenMeteo API
python << 'EOF'
import requests

response = requests.get(
    "https://api.open-meteo.com/v1/forecast",
    params={
        "latitude": 52.52,
        "longitude": 13.41,
        "hourly": "temperature_2m",
        "start_date": "2025-01-01",
        "end_date": "2025-01-02"
    }
)

if response.status_code == 200:
    print("✓ OpenMeteo API accessible")
else:
    print(f"✗ OpenMeteo API error: {response.status_code}")
EOF

# Test HuggingFace authentication
python << 'EOF'
from huggingface_hub import HfApi
import yaml

with open('config/api_keys.yaml') as f:
    config = yaml.safe_load(f)

hf_token = config['hf_token']
hf_username = config['hf_username']

if 'YOUR_HF' in hf_token or 'YOUR_HF' in hf_username:
    print("⚠ HuggingFace credentials not configured - update config/api_keys.yaml")
else:
    try:
        api = HfApi(token=hf_token)
        user_info = api.whoami()
        print(f"✓ HuggingFace authenticated as: {user_info['name']}")
        print(f"  Can create datasets: {'datasets' in user_info.get('auth', {}).get('accessToken', {}).get('role', '')}")
    except Exception as e:
        print(f"✗ HuggingFace authentication error: {e}")
        print(f"  Verify token has WRITE permissions")
EOF

3.3 HF Space Verification

# Check HF Space status
echo "Visit your HF Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting"
echo ""
echo "Verify:"
echo "  1. JupyterLab interface loads"
echo "  2. Hardware shows 'A10G GPU' in bottom-right"
echo "  3. Files from git push are visible"
echo "  4. Can create new notebook"

3.4 Final Checklist

# Print final status
cat << 'EOF'
╔═══════════════════════════════════════════════════════════╗
║           DAY 0 SETUP VERIFICATION CHECKLIST               ║
╚═══════════════════════════════════════════════════════════╝

Environment:
  [ ] Python 3.10+ installed
  [ ] Git installed (NO Git LFS needed)
  [ ] uv package manager installed

Local Setup:
  [ ] Virtual environment created and activated
  [ ] All Python dependencies installed (24 packages including jao-py)
  [ ] API keys configured (ENTSO-E + OpenMeteo + HuggingFace)
  [ ] HuggingFace write token obtained
  [ ] Project structure created (8 directories)
  [ ] .gitignore configured (data/ excluded)
  [ ] Initial Marimo notebook created
  [ ] Data management utilities created (hf_datasets_manager.py)

Git & HF Space:
  [ ] HF Space created (A10G GPU, $30/month)
  [ ] Repository cloned locally
  [ ] .gitignore excludes all data files (*.parquet, data/)
  [ ] Initial commit pushed to HF Space (code only, NO data)
  [ ] HF Space JupyterLab accessible
  [ ] Git repo size < 50 MB (no data committed)

Verification Tests:
  [ ] Python imports successful (polars, chronos, jao-py, datasets, etc.)
  [ ] ENTSO-E API client initializes
  [ ] OpenMeteo API responds (status 200)
  [ ] HuggingFace authentication successful (write access)
  [ ] Marimo notebook opens in browser

Data Strategy Confirmed:
  [ ] Code goes in Git (version controlled)
  [ ] Data goes in HuggingFace Datasets (separate storage)
  [ ] NO Git LFS setup (following data science best practices)
  [ ] data/ directory in .gitignore

Ready for Day 1: [ ]

Next Step: Run Day 1 data collection (8 hours)
- Download data locally via jao-py/APIs
- Upload to HuggingFace Datasets (separate from Git)
- Total data: ~12 GB (stored in HF Datasets, NOT Git)
EOF

Troubleshooting

Issue: uv installation fails

# Alternative: Use pip directly
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Issue: Git LFS files not syncing

Not applicable - We're using HuggingFace Datasets, not Git LFS.

If you see Git LFS references, you may have an old version of this guide. Data files should NEVER be in Git.

Issue: HuggingFace authentication fails

# Verify token is correct
python << 'EOF'
from huggingface_hub import HfApi
import yaml

with open('config/api_keys.yaml') as f:
    config = yaml.safe_load(f)

try:
    api = HfApi(token=config['hf_token'])
    print(api.whoami())
except Exception as e:
    print(f"Error: {e}")
    print("\nTroubleshooting:")
    print("1. Visit: https://huggingface.co/settings/tokens")
    print("2. Verify token has WRITE permission")
    print("3. Copy token exactly (starts with 'hf_')")
    print("4. Update config/api_keys.yaml and .env")
EOF

Issue: Cannot upload to HuggingFace Datasets

# Common causes:
# 1. Token doesn't have write permissions
#    Fix: Create new token with "write" scope

# 2. Dataset name already exists
#    Fix: Use different name or add version suffix
#    Example: fbmc-cnecs-2023-2025-v2

# 3. File too large (>5GB single file limit)
#    Fix: Split into multiple datasets or use sharding

# Test upload with small sample:
python << 'EOF'
from datasets import Dataset
import pandas as pd

# Create tiny test dataset
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
dataset = Dataset.from_pandas(df)

# Try uploading
try:
    dataset.push_to_hub("YOUR_USERNAME/test-dataset", token="YOUR_TOKEN")
    print("✓ Upload successful - authentication works")
except Exception as e:
    print(f"✗ Upload failed: {e}")
EOF

Issue: Marimo notebook won't open

# Check marimo installation
marimo --version

# Try running without opening browser
marimo run notebooks/01_data_exploration.py

# Check for port conflicts
lsof -i :2718  # Default Marimo port

Issue: ENTSO-E API key invalid

# Verify key in ENTSO-E Transparency Platform:
# 1. Login: https://transparency.entsoe.eu/
# 2. Navigate: Account Settings → Web API Security Token
# 3. Copy key exactly (no spaces)
# 4. Update: config/api_keys.yaml and .env

Issue: HF Space shows "Building..." forever

# Check HF Space logs:
# Visit: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
# Click: "Settings" → "Logs"

# Common fix: Ensure requirements.txt is valid
# Test locally:
pip install -r requirements.txt --dry-run

Issue: jao-py import fails

# Verify jao-py installation
python -c "import jao; print(jao.__version__)"

# If missing, reinstall
uv pip install jao-py>=0.6.0

# Check package is in environment
uv pip list | grep jao

What's Next: Day 1 Preview

Day 1 Objective: Download 24 months of historical data (Oct 2023 - Sept 2025)

Data Collection Tasks:

  1. JAO FBMC Data (4-5 hours)

    • CNECs: ~900 MB (24 months)
    • PTDFs: ~1.5 GB (24 months)
    • RAMs: ~800 MB (24 months)
    • Shadow prices: ~600 MB (24 months)
    • LTN nominations: ~400 MB (24 months)
    • Net positions: ~300 MB (24 months)
  2. ENTSO-E Data (2-3 hours)

    • Generation forecasts: 13 zones × 24 months
    • Actual generation: 13 zones × 24 months
    • Cross-border flows: ~20 borders × 24 months
  3. OpenMeteo Weather (1-2 hours)

    • 52 grid points × 24 months
    • 8 variables per point
    • Parallel download optimization

Total Data Size: ~12 GB (compressed Parquet)

Day 1 Script: Will use jao-py Python library with rate limiting and parallel download logic.


Summary

Time Investment: 45 minutes Result: Production-ready local + cloud development environment

You Now Have:

  • ✓ HF Space with A10G GPU ($30/month)
  • ✓ Local Python environment (24 packages including jao-py and HF Datasets)
  • ✓ jao-py Python library for JAO data access
  • ✓ ENTSO-E + OpenMeteo + HuggingFace API access configured
  • ✓ HuggingFace Datasets manager for data storage (separate from Git)
  • ✓ Data download/upload utilities (hf_datasets_manager.py)
  • ✓ Marimo reactive notebook environment
  • ✓ .gitignore configured (data/ excluded, following best practices)
  • ✓ Complete project structure (8 directories)

Data Strategy Implemented:

Code (version controlled)     →  Git Repository (~50 MB)
Data (storage & versioning)   →  HuggingFace Datasets (~12 GB)
NO Git LFS (following data science best practices)

Ready For: Day 1 data collection (8 hours)

  • Download 24 months data locally (jao-py + APIs)
  • Upload to HuggingFace Datasets (not Git)
  • Git repo stays clean (code only)

Document Version: 2.0 Last Updated: 2025-10-29 Project: FBMC Flow Forecasting MVP (Zero-Shot)