Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
FBMC Flow Forecasting MVP - Day 0 Quick Start Guide
Environment Setup (45 Minutes)
Target: From zero to working local + HF Space environment with all dependencies verified
Prerequisites Check (5 minutes)
Before starting, verify you have:
# Check Git
git --version
# Need: 2.x+
# Check Python
python3 --version
# Need: 3.10+
API Keys & Accounts Ready:
- ENTSO-E Transparency Platform API key
- Hugging Face account with payment method for Spaces
- Hugging Face write token (for uploading datasets)
Important Data Storage Philosophy:
- Code → Git repository (small, version controlled)
- Data → HuggingFace Datasets (separate, not in Git)
- NO Git LFS needed (following data science best practices)
Step 1: Create Hugging Face Space (10 minutes)
Navigate to: https://huggingface.co/new-space
Configure Space:
- Owner: Your username/organization
- Space name:
fbmc-forecasting(or your preference) - License: Apache 2.0
- Select SDK:
JupyterLab - Select Hardware:
A10G GPU ($30/month)← CRITICAL - Visibility: Private (recommended for MVP)
Create Space button
Wait 2-3 minutes for Space initialization
Verify Space Access:
- Visit:
https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting - Confirm JupyterLab interface loads
- Check hardware: Should show "A10G GPU" in bottom-right
- Visit:
Step 2: Local Environment Setup (25 minutes)
2.1 Clone HF Space Locally (2 minutes)
# Clone your HF Space
git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
cd fbmc-forecasting
# Verify remote
git remote -v
# Should show: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
2.2 Create Directory Structure (1 minute)
# Create project directories
mkdir -p notebooks \
notebooks_exported \
src/{data_collection,feature_engineering,model,utils} \
config \
results/{forecasts,evaluation,visualizations} \
docs \
tools \
tests
# Note: data/ directory will be created by download scripts
# It is NOT tracked in Git (following best practices)
# Verify structure
tree -L 2
2.3 Install uv Package Manager (2 minutes)
# Install uv (ultra-fast pip replacement)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Add to PATH (if not automatic)
export PATH="$HOME/.cargo/bin:$PATH"
# Verify installation
uv --version
# Should show: uv 0.x.x
2.4 Create Virtual Environment (1 minute)
# Create .venv with uv
uv venv
# Activate (Linux/Mac)
source .venv/bin/activate
# Activate (Windows)
# .venv\Scripts\activate
# Verify activation
which python
# Should point to: /path/to/fbmc-forecasting/.venv/bin/python
2.5 Install Dependencies (2 minutes)
# Create requirements.txt
cat > requirements.txt << 'EOF'
# Core Data & ML
polars>=0.20.0
pyarrow>=13.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
# Time Series Forecasting
chronos-forecasting>=1.0.0
transformers>=4.35.0
torch>=2.0.0
# Data Collection
entsoe-py>=0.5.0
jao-py>=0.6.0
requests>=2.31.0
# HuggingFace Integration (for Datasets, NOT Git LFS)
datasets>=2.14.0
huggingface-hub>=0.17.0
# Visualization & Notebooks
altair>=5.0.0
marimo>=0.9.0
jupyter>=1.0.0
ipykernel>=6.25.0
# Utilities
pyyaml>=6.0.0
python-dotenv>=1.0.0
tqdm>=4.66.0
# HF Space Integration
gradio>=4.0.0
EOF
# Install with uv (ultra-fast)
uv pip install -r requirements.txt
# Create lockfile for reproducibility
uv pip compile requirements.txt -o requirements.lock
Verify installations:
python -c "import polars; print(f'polars {polars.__version__}')"
python -c "import marimo; print(f'marimo {marimo.__version__}')"
python -c "import torch; print(f'torch {torch.__version__}')"
python -c "from chronos import ChronosPipeline; print('chronos-forecasting ✓')"
python -c "from datasets import Dataset; print('datasets ✓')"
python -c "from huggingface_hub import HfApi; print('huggingface-hub ✓')"
python -c "import jao; print(f'jao-py {jao.__version__}')"
2.6 Configure .gitignore (Data Exclusion) (2 minutes)
# Create .gitignore - CRITICAL for keeping data out of Git
cat > .gitignore << 'EOF'
# ============================================
# Data Files - NEVER commit to Git
# ============================================
# Following data science best practices:
# - Code goes in Git
# - Data goes in HuggingFace Datasets
data/
*.parquet
*.pkl
*.csv
*.h5
*.hdf5
*.feather
# ============================================
# Model Artifacts
# ============================================
models/checkpoints/
*.pth
*.safetensors
*.ckpt
# ============================================
# Credentials & Secrets
# ============================================
.env
config/api_keys.yaml
*.key
*.pem
# ============================================
# Python
# ============================================
__pycache__/
*.pyc
*.pyo
*.egg-info/
.pytest_cache/
.venv/
venv/
# ============================================
# IDE & OS
# ============================================
.vscode/
.idea/
*.swp
.DS_Store
Thumbs.db
# ============================================
# Jupyter
# ============================================
.ipynb_checkpoints/
# ============================================
# Temporary Files
# ============================================
*.tmp
*.log
.cache/
EOF
# Stage .gitignore
git add .gitignore
# Verify data/ will be ignored
echo "data/" >> .gitignore
git check-ignore data/test.parquet
# Should output: data/test.parquet (confirming it's ignored)
Why NO Git LFS? Following data science best practices:
- ✓ Code → Git (fast, version controlled)
- ✓ Data → HuggingFace Datasets (separate, scalable)
- ✗ NOT Git LFS (expensive, non-standard for ML projects)
Data will be:
- Downloaded via scripts (Day 1)
- Uploaded to HF Datasets (Day 1)
- Loaded programmatically (Days 2-5)
- NEVER committed to Git repository
2.7 Configure API Keys & HuggingFace Access (3 minutes)
# Create config directory structure
mkdir -p config
# Create API keys configuration
cat > config/api_keys.yaml << 'EOF'
# ENTSO-E Transparency Platform
entsoe_api_key: "YOUR_ENTSOE_API_KEY_HERE"
# OpenMeteo (free tier - no key required)
openmeteo_base_url: "https://api.open-meteo.com/v1/forecast"
# Hugging Face (for uploading datasets)
hf_token: "YOUR_HF_WRITE_TOKEN_HERE"
hf_username: "YOUR_HF_USERNAME"
EOF
# Create .env file for environment variables
cat > .env << 'EOF'
ENTSOE_API_KEY=YOUR_ENTSOE_API_KEY_HERE
OPENMETEO_BASE_URL=https://api.open-meteo.com/v1/forecast
HF_TOKEN=YOUR_HF_WRITE_TOKEN_HERE
HF_USERNAME=YOUR_HF_USERNAME
EOF
Get your HuggingFace Write Token:
- Visit: https://huggingface.co/settings/tokens
- Click "New token"
- Name: "FBMC Dataset Upload"
- Type: Write (required for uploading datasets)
- Copy token
Now edit the files with your actual credentials:
# Option 1: Use text editor
nano config/api_keys.yaml # Update all YOUR_*_HERE placeholders
nano .env # Update all YOUR_*_HERE placeholders
# Option 2: Use sed (replace with your actual values)
sed -i 's/YOUR_ENTSOE_API_KEY_HERE/your-actual-entsoe-key/' config/api_keys.yaml .env
sed -i 's/YOUR_HF_WRITE_TOKEN_HERE/hf_your-actual-token/' config/api_keys.yaml .env
sed -i 's/YOUR_HF_USERNAME/your-username/' config/api_keys.yaml .env
Verify credentials are set:
# Should NOT see any "YOUR_*_HERE" placeholders
grep "YOUR_" config/api_keys.yaml
# Empty output = good!
2.8 Create Data Management Utilities (5 minutes)
# Create data collection module with HF Datasets integration
cat > src/data_collection/hf_datasets_manager.py << 'EOF'
"""HuggingFace Datasets manager for FBMC data storage."""
import polars as pl
from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi
from pathlib import Path
import yaml
class FBMCDatasetManager:
"""Manage FBMC data uploads/downloads via HuggingFace Datasets."""
def __init__(self, config_path: str = "config/api_keys.yaml"):
"""Initialize with HF credentials."""
with open(config_path) as f:
config = yaml.safe_load(f)
self.hf_token = config['hf_token']
self.hf_username = config['hf_username']
self.api = HfApi(token=self.hf_token)
def upload_dataset(self, parquet_path: Path, dataset_name: str, description: str = ""):
"""Upload Parquet file to HuggingFace Datasets."""
print(f"Uploading {parquet_path.name} to HF Datasets...")
# Load Parquet as polars, convert to HF Dataset
df = pl.read_parquet(parquet_path)
dataset = Dataset.from_pandas(df.to_pandas())
# Create full dataset name
full_name = f"{self.hf_username}/{dataset_name}"
# Upload to HF
dataset.push_to_hub(
full_name,
token=self.hf_token,
private=False # Public datasets (free storage)
)
print(f"✓ Uploaded to: https://huggingface.co/datasets/{full_name}")
return full_name
def download_dataset(self, dataset_name: str, output_path: Path):
"""Download dataset from HF to local Parquet."""
from datasets import load_dataset
print(f"Downloading {dataset_name} from HF Datasets...")
# Download from HF
dataset = load_dataset(
f"{self.hf_username}/{dataset_name}",
split="train"
)
# Convert to polars and save
df = pl.from_pandas(dataset.to_pandas())
output_path.parent.mkdir(parents=True, exist_ok=True)
df.write_parquet(output_path)
print(f"✓ Downloaded to: {output_path}")
return df
def list_datasets(self):
"""List all FBMC datasets for this user."""
datasets = self.api.list_datasets(author=self.hf_username)
fbmc_datasets = [d for d in datasets if 'fbmc' in d.id.lower()]
print(f"\nFBMC Datasets for {self.hf_username}:")
for ds in fbmc_datasets:
print(f" - {ds.id}")
return fbmc_datasets
# Example usage (will be used in Day 1)
if __name__ == "__main__":
manager = FBMCDatasetManager()
# Upload example (Day 1 will use this)
# manager.upload_dataset(
# parquet_path=Path("data/raw/cnecs_2023_2025.parquet"),
# dataset_name="fbmc-cnecs-2023-2025",
# description="FBMC CNECs data: Oct 2023 - Sept 2025"
# )
# Download example (HF Space will use this)
# manager.download_dataset(
# dataset_name="fbmc-cnecs-2023-2025",
# output_path=Path("data/raw/cnecs_2023_2025.parquet")
# )
EOF
# Create data download orchestrator
cat > src/data_collection/download_all.py << 'EOF'
"""Download all FBMC data from HuggingFace Datasets."""
from pathlib import Path
from hf_datasets_manager import FBMCDatasetManager
def setup_data(data_dir: Path = Path("data/raw")):
"""Download all datasets if not present locally."""
manager = FBMCDatasetManager()
datasets_to_download = {
"fbmc-cnecs-2023-2025": "cnecs_2023_2025.parquet",
"fbmc-weather-2023-2025": "weather_2023_2025.parquet",
"fbmc-entsoe-2023-2025": "entsoe_2023_2025.parquet",
}
data_dir.mkdir(parents=True, exist_ok=True)
for dataset_name, filename in datasets_to_download.items():
output_path = data_dir / filename
if output_path.exists():
print(f"✓ {filename} already exists, skipping")
else:
try:
manager.download_dataset(dataset_name, output_path)
except Exception as e:
print(f"✗ Failed to download {dataset_name}: {e}")
print(f" You may need to run Day 1 data collection first")
print("\n✓ Data setup complete")
if __name__ == "__main__":
setup_data()
EOF
# Make scripts executable
chmod +x src/data_collection/hf_datasets_manager.py
chmod +x src/data_collection/download_all.py
echo "✓ Data management utilities created"
What This Does:
hf_datasets_manager.py: Upload/download Parquet files to/from HF Datasetsdownload_all.py: One-command data setup for HF Space or analysts
Day 1 Workflow:
- Download data from JAO/ENTSO-E/OpenMeteo to
data/raw/ - Upload each Parquet to HF Datasets (separate from Git)
- Git repo stays small (only code)
HF Space Workflow:
# In your Space's app.py startup:
from src.data_collection.download_all import setup_data
setup_data() # Downloads from HF Datasets, not Git
2.9 Create First Marimo Notebook (5 minutes)
# Create initial exploration notebook
cat > notebooks/01_data_exploration.py << 'EOF'
import marimo
__generated_with = "0.9.0"
app = marimo.App(width="medium")
@app.cell
def __():
import marimo as mo
import polars as pl
import altair as alt
from pathlib import Path
return mo, pl, alt, Path
@app.cell
def __(mo):
mo.md(
"""
# FBMC Flow Forecasting - Data Exploration
**Day 1 Objective**: Explore JAO FBMC data structure
## Steps:
1. Load downloaded Parquet files
2. Inspect CNECs, PTDFs, RAMs
3. Identify top 200 binding CNECs (50 Tier-1 + 150 Tier-2)
4. Visualize temporal patterns
"""
)
return
@app.cell
def __(Path):
# Data paths
DATA_DIR = Path("../data/raw")
CNECS_FILE = DATA_DIR / "cnecs_2023_2025.parquet"
return DATA_DIR, CNECS_FILE
@app.cell
def __(mo, CNECS_FILE):
# Check if data exists
if CNECS_FILE.exists():
mo.md("✓ CNECs data found - ready for Day 1 analysis")
else:
mo.md("⚠ CNECs data not yet downloaded - run Day 1 collection script")
return
if __name__ == "__main__":
app.run()
EOF
# Test Marimo installation
marimo edit notebooks/01_data_exploration.py &
# This will open browser with interactive notebook
# Close after verifying it loads correctly (Ctrl+C in terminal)
2.10 Create Utility Modules (2 minutes)
# Create data loading utilities
cat > src/utils/data_loader.py << 'EOF'
"""Data loading utilities for FBMC forecasting project."""
import polars as pl
from pathlib import Path
from typing import Optional
def load_cnecs(data_dir: Path, start_date: Optional[str] = None, end_date: Optional[str] = None) -> pl.DataFrame:
"""Load CNEC data with optional date filtering."""
cnecs = pl.read_parquet(data_dir / "cnecs_2023_2025.parquet")
if start_date:
cnecs = cnecs.filter(pl.col("timestamp") >= start_date)
if end_date:
cnecs = cnecs.filter(pl.col("timestamp") <= end_date)
return cnecs
def load_weather(data_dir: Path, grid_points: Optional[list] = None) -> pl.DataFrame:
"""Load weather data with optional grid point filtering."""
weather = pl.read_parquet(data_dir / "weather_2023_2025.parquet")
if grid_points:
weather = weather.filter(pl.col("grid_point").is_in(grid_points))
return weather
EOF
# Create __init__.py files
touch src/__init__.py
touch src/utils/__init__.py
touch src/data_collection/__init__.py
touch src/feature_engineering/__init__.py
touch src/model/__init__.py
2.11 Initial Commit (2 minutes)
# Stage all changes (note: data/ is excluded by .gitignore)
git add .
# Create initial commit
git commit -m "Day 0: Initialize FBMC forecasting MVP environment
- Add project structure (notebooks, src, config, tools)
- Configure uv + polars + Marimo + Chronos + HF Datasets stack
- Create .gitignore (excludes data/ following best practices)
- Install jao-py Python library for JAO data access
- Configure ENTSO-E, OpenMeteo, and HuggingFace API access
- Add HF Datasets manager for data storage (separate from Git)
- Create data download utilities (download_all.py)
- Create initial exploration notebook
Data Strategy:
- Code → Git (this repo)
- Data → HuggingFace Datasets (separate, not in Git)
- NO Git LFS (following data science best practices)
Infrastructure: HF Space (A10G GPU, \$30/month)"
# Push to HF Space
git push origin main
# Verify push succeeded
git status
# Should show: "Your branch is up to date with 'origin/main'"
# Verify no data files were committed
git ls-files | grep "\.parquet"
# Should be empty (no .parquet files in Git)
Step 3: Verify Complete Setup (5 minutes)
3.1 Python Environment Verification
# Activate environment if not already
source .venv/bin/activate
# Run comprehensive checks
python << 'EOF'
import sys
print(f"Python: {sys.version}")
packages = [
"polars", "pyarrow", "numpy", "scikit-learn",
"torch", "transformers", "marimo", "altair",
"entsoe", "jao", "requests", "yaml", "gradio",
"datasets", "huggingface_hub"
]
print("\nPackage Versions:")
for pkg in packages:
try:
if pkg == "entsoe":
import entsoe
print(f"✓ entsoe-py: {entsoe.__version__}")
elif pkg == "jao":
import jao
print(f"✓ jao-py: {jao.__version__}")
elif pkg == "yaml":
import yaml
print(f"✓ pyyaml: {yaml.__version__}")
elif pkg == "huggingface_hub":
from huggingface_hub import HfApi
print(f"✓ huggingface-hub: Ready")
else:
mod = __import__(pkg)
print(f"✓ {pkg}: {mod.__version__}")
except Exception as e:
print(f"✗ {pkg}: {e}")
# Test Chronos specifically
try:
from chronos import ChronosPipeline
print("\n✓ Chronos forecasting: Ready")
except Exception as e:
print(f"\n✗ Chronos forecasting: {e}")
# Test HF Datasets
try:
from datasets import Dataset
print("✓ HuggingFace Datasets: Ready")
except Exception as e:
print(f"✗ HuggingFace Datasets: {e}")
print("\nAll checks complete!")
EOF
3.2 API Access Verification
# Test ENTSO-E API
python << 'EOF'
from entsoe import EntsoePandasClient
import yaml
# Load API key
with open('config/api_keys.yaml') as f:
config = yaml.safe_load(f)
api_key = config['entsoe_api_key']
if 'YOUR_ENTSOE_API_KEY_HERE' in api_key:
print("⚠ ENTSO-E API key not configured - update config/api_keys.yaml")
else:
try:
client = EntsoePandasClient(api_key=api_key)
print("✓ ENTSO-E API client initialized successfully")
except Exception as e:
print(f"✗ ENTSO-E API error: {e}")
EOF
# Test OpenMeteo API
python << 'EOF'
import requests
response = requests.get(
"https://api.open-meteo.com/v1/forecast",
params={
"latitude": 52.52,
"longitude": 13.41,
"hourly": "temperature_2m",
"start_date": "2025-01-01",
"end_date": "2025-01-02"
}
)
if response.status_code == 200:
print("✓ OpenMeteo API accessible")
else:
print(f"✗ OpenMeteo API error: {response.status_code}")
EOF
# Test HuggingFace authentication
python << 'EOF'
from huggingface_hub import HfApi
import yaml
with open('config/api_keys.yaml') as f:
config = yaml.safe_load(f)
hf_token = config['hf_token']
hf_username = config['hf_username']
if 'YOUR_HF' in hf_token or 'YOUR_HF' in hf_username:
print("⚠ HuggingFace credentials not configured - update config/api_keys.yaml")
else:
try:
api = HfApi(token=hf_token)
user_info = api.whoami()
print(f"✓ HuggingFace authenticated as: {user_info['name']}")
print(f" Can create datasets: {'datasets' in user_info.get('auth', {}).get('accessToken', {}).get('role', '')}")
except Exception as e:
print(f"✗ HuggingFace authentication error: {e}")
print(f" Verify token has WRITE permissions")
EOF
3.3 HF Space Verification
# Check HF Space status
echo "Visit your HF Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting"
echo ""
echo "Verify:"
echo " 1. JupyterLab interface loads"
echo " 2. Hardware shows 'A10G GPU' in bottom-right"
echo " 3. Files from git push are visible"
echo " 4. Can create new notebook"
3.4 Final Checklist
# Print final status
cat << 'EOF'
╔═══════════════════════════════════════════════════════════╗
║ DAY 0 SETUP VERIFICATION CHECKLIST ║
╚═══════════════════════════════════════════════════════════╝
Environment:
[ ] Python 3.10+ installed
[ ] Git installed (NO Git LFS needed)
[ ] uv package manager installed
Local Setup:
[ ] Virtual environment created and activated
[ ] All Python dependencies installed (24 packages including jao-py)
[ ] API keys configured (ENTSO-E + OpenMeteo + HuggingFace)
[ ] HuggingFace write token obtained
[ ] Project structure created (8 directories)
[ ] .gitignore configured (data/ excluded)
[ ] Initial Marimo notebook created
[ ] Data management utilities created (hf_datasets_manager.py)
Git & HF Space:
[ ] HF Space created (A10G GPU, $30/month)
[ ] Repository cloned locally
[ ] .gitignore excludes all data files (*.parquet, data/)
[ ] Initial commit pushed to HF Space (code only, NO data)
[ ] HF Space JupyterLab accessible
[ ] Git repo size < 50 MB (no data committed)
Verification Tests:
[ ] Python imports successful (polars, chronos, jao-py, datasets, etc.)
[ ] ENTSO-E API client initializes
[ ] OpenMeteo API responds (status 200)
[ ] HuggingFace authentication successful (write access)
[ ] Marimo notebook opens in browser
Data Strategy Confirmed:
[ ] Code goes in Git (version controlled)
[ ] Data goes in HuggingFace Datasets (separate storage)
[ ] NO Git LFS setup (following data science best practices)
[ ] data/ directory in .gitignore
Ready for Day 1: [ ]
Next Step: Run Day 1 data collection (8 hours)
- Download data locally via jao-py/APIs
- Upload to HuggingFace Datasets (separate from Git)
- Total data: ~12 GB (stored in HF Datasets, NOT Git)
EOF
Troubleshooting
Issue: uv installation fails
# Alternative: Use pip directly
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Issue: Git LFS files not syncing
Not applicable - We're using HuggingFace Datasets, not Git LFS.
If you see Git LFS references, you may have an old version of this guide. Data files should NEVER be in Git.
Issue: HuggingFace authentication fails
# Verify token is correct
python << 'EOF'
from huggingface_hub import HfApi
import yaml
with open('config/api_keys.yaml') as f:
config = yaml.safe_load(f)
try:
api = HfApi(token=config['hf_token'])
print(api.whoami())
except Exception as e:
print(f"Error: {e}")
print("\nTroubleshooting:")
print("1. Visit: https://huggingface.co/settings/tokens")
print("2. Verify token has WRITE permission")
print("3. Copy token exactly (starts with 'hf_')")
print("4. Update config/api_keys.yaml and .env")
EOF
Issue: Cannot upload to HuggingFace Datasets
# Common causes:
# 1. Token doesn't have write permissions
# Fix: Create new token with "write" scope
# 2. Dataset name already exists
# Fix: Use different name or add version suffix
# Example: fbmc-cnecs-2023-2025-v2
# 3. File too large (>5GB single file limit)
# Fix: Split into multiple datasets or use sharding
# Test upload with small sample:
python << 'EOF'
from datasets import Dataset
import pandas as pd
# Create tiny test dataset
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
dataset = Dataset.from_pandas(df)
# Try uploading
try:
dataset.push_to_hub("YOUR_USERNAME/test-dataset", token="YOUR_TOKEN")
print("✓ Upload successful - authentication works")
except Exception as e:
print(f"✗ Upload failed: {e}")
EOF
Issue: Marimo notebook won't open
# Check marimo installation
marimo --version
# Try running without opening browser
marimo run notebooks/01_data_exploration.py
# Check for port conflicts
lsof -i :2718 # Default Marimo port
Issue: ENTSO-E API key invalid
# Verify key in ENTSO-E Transparency Platform:
# 1. Login: https://transparency.entsoe.eu/
# 2. Navigate: Account Settings → Web API Security Token
# 3. Copy key exactly (no spaces)
# 4. Update: config/api_keys.yaml and .env
Issue: HF Space shows "Building..." forever
# Check HF Space logs:
# Visit: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
# Click: "Settings" → "Logs"
# Common fix: Ensure requirements.txt is valid
# Test locally:
pip install -r requirements.txt --dry-run
Issue: jao-py import fails
# Verify jao-py installation
python -c "import jao; print(jao.__version__)"
# If missing, reinstall
uv pip install jao-py>=0.6.0
# Check package is in environment
uv pip list | grep jao
What's Next: Day 1 Preview
Day 1 Objective: Download 24 months of historical data (Oct 2023 - Sept 2025)
Data Collection Tasks:
JAO FBMC Data (4-5 hours)
- CNECs: ~900 MB (24 months)
- PTDFs: ~1.5 GB (24 months)
- RAMs: ~800 MB (24 months)
- Shadow prices: ~600 MB (24 months)
- LTN nominations: ~400 MB (24 months)
- Net positions: ~300 MB (24 months)
ENTSO-E Data (2-3 hours)
- Generation forecasts: 13 zones × 24 months
- Actual generation: 13 zones × 24 months
- Cross-border flows: ~20 borders × 24 months
OpenMeteo Weather (1-2 hours)
- 52 grid points × 24 months
- 8 variables per point
- Parallel download optimization
Total Data Size: ~12 GB (compressed Parquet)
Day 1 Script: Will use jao-py Python library with rate limiting and parallel download logic.
Summary
Time Investment: 45 minutes Result: Production-ready local + cloud development environment
You Now Have:
- ✓ HF Space with A10G GPU ($30/month)
- ✓ Local Python environment (24 packages including jao-py and HF Datasets)
- ✓ jao-py Python library for JAO data access
- ✓ ENTSO-E + OpenMeteo + HuggingFace API access configured
- ✓ HuggingFace Datasets manager for data storage (separate from Git)
- ✓ Data download/upload utilities (hf_datasets_manager.py)
- ✓ Marimo reactive notebook environment
- ✓ .gitignore configured (data/ excluded, following best practices)
- ✓ Complete project structure (8 directories)
Data Strategy Implemented:
Code (version controlled) → Git Repository (~50 MB)
Data (storage & versioning) → HuggingFace Datasets (~12 GB)
NO Git LFS (following data science best practices)
Ready For: Day 1 data collection (8 hours)
- Download 24 months data locally (jao-py + APIs)
- Upload to HuggingFace Datasets (not Git)
- Git repo stays clean (code only)
Document Version: 2.0 Last Updated: 2025-10-29 Project: FBMC Flow Forecasting MVP (Zero-Shot)