Spaces:

evgueni-p
/

fbmc-chronos2

Sleeping

App Files Files Community

fbmc-chronos2 / doc /Day_0_Quick_Start_Guide.md

Evgueni Poloukarov

feat: complete Phase 1 ENTSO-E asset-specific outage validation

27cb60a about 1 month ago

preview code

raw

history blame contribute delete

27.1 kB

	# FBMC Flow Forecasting MVP - Day 0 Quick Start Guide
	## Environment Setup (45 Minutes)

	Target: From zero to working local + HF Space environment with all dependencies verified

	---

	## Prerequisites Check (5 minutes)

	Before starting, verify you have:

	```bash
	# Check Git
	git --version
	# Need: 2.x+

	# Check Python
	python3 --version
	# Need: 3.10+
	```

	API Keys & Accounts Ready:
	- [ ] ENTSO-E Transparency Platform API key
	- [ ] Hugging Face account with payment method for Spaces
	- [ ] Hugging Face write token (for uploading datasets)

	Important Data Storage Philosophy:
	- Code → Git repository (small, version controlled)
	- Data → HuggingFace Datasets (separate, not in Git)
	- NO Git LFS needed (following data science best practices)

	---

	## Step 1: Create Hugging Face Space (10 minutes)

	1. Navigate to: https://huggingface.co/new-space

	2. Configure Space:
	- Owner: Your username/organization
	- Space name: `fbmc-forecasting` (or your preference)
	- License: Apache 2.0
	- Select SDK: `JupyterLab`
	- Select Hardware: `A10G GPU ($30/month)` ← CRITICAL
	- Visibility: Private (recommended for MVP)

	3. Create Space button

	4. Wait 2-3 minutes for Space initialization

	5. Verify Space Access:
	- Visit: `https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting`
	- Confirm JupyterLab interface loads
	- Check hardware: Should show "A10G GPU" in bottom-right

	---

	## Step 2: Local Environment Setup (25 minutes)

	### 2.1 Clone HF Space Locally (2 minutes)

	```bash
	# Clone your HF Space
	git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
	cd fbmc-forecasting

	# Verify remote
	git remote -v
	# Should show: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
	```

	### 2.2 Create Directory Structure (1 minute)

	```bash
	# Create project directories
	mkdir -p notebooks \
	notebooks_exported \
	src/{data_collection,feature_engineering,model,utils} \
	config \
	results/{forecasts,evaluation,visualizations} \
	docs \
	tools \
	tests

	# Note: data/ directory will be created by download scripts
	# It is NOT tracked in Git (following best practices)

	# Verify structure
	tree -L 2
	```

	### 2.3 Install uv Package Manager (2 minutes)

	```bash
	# Install uv (ultra-fast pip replacement)
	curl -LsSf https://astral.sh/uv/install.sh \| sh

	# Add to PATH (if not automatic)
	export PATH="$HOME/.cargo/bin:$PATH"

	# Verify installation
	uv --version
	# Should show: uv 0.x.x
	```

	### 2.4 Create Virtual Environment (1 minute)

	```bash
	# Create .venv with uv
	uv venv

	# Activate (Linux/Mac)
	source .venv/bin/activate

	# Activate (Windows)
	# .venv\Scripts\activate

	# Verify activation
	which python
	# Should point to: /path/to/fbmc-forecasting/.venv/bin/python
	```

	### 2.5 Install Dependencies (2 minutes)

	```bash
	# Create requirements.txt
	cat > requirements.txt << 'EOF'
	# Core Data & ML
	polars>=0.20.0
	pyarrow>=13.0.0
	numpy>=1.24.0
	scikit-learn>=1.3.0

	# Time Series Forecasting
	chronos-forecasting>=1.0.0
	transformers>=4.35.0
	torch>=2.0.0

	# Data Collection
	entsoe-py>=0.5.0
	jao-py>=0.6.0
	requests>=2.31.0

	# HuggingFace Integration (for Datasets, NOT Git LFS)
	datasets>=2.14.0
	huggingface-hub>=0.17.0

	# Visualization & Notebooks
	altair>=5.0.0
	marimo>=0.9.0
	jupyter>=1.0.0
	ipykernel>=6.25.0

	# Utilities
	pyyaml>=6.0.0
	python-dotenv>=1.0.0
	tqdm>=4.66.0

	# HF Space Integration
	gradio>=4.0.0
	EOF

	# Install with uv (ultra-fast)
	uv pip install -r requirements.txt

	# Create lockfile for reproducibility
	uv pip compile requirements.txt -o requirements.lock
	```

	Verify installations:
	```bash
	python -c "import polars; print(f'polars {polars.__version__}')"
	python -c "import marimo; print(f'marimo {marimo.__version__}')"
	python -c "import torch; print(f'torch {torch.__version__}')"
	python -c "from chronos import ChronosPipeline; print('chronos-forecasting ✓')"
	python -c "from datasets import Dataset; print('datasets ✓')"
	python -c "from huggingface_hub import HfApi; print('huggingface-hub ✓')"
	python -c "import jao; print(f'jao-py {jao.__version__}')"
	```

	### 2.6 Configure .gitignore (Data Exclusion) (2 minutes)

	```bash
	# Create .gitignore - CRITICAL for keeping data out of Git
	cat > .gitignore << 'EOF'
	# ============================================
	# Data Files - NEVER commit to Git
	# ============================================
	# Following data science best practices:
	# - Code goes in Git
	# - Data goes in HuggingFace Datasets
	data/
	*.parquet
	*.pkl
	*.csv
	*.h5
	*.hdf5
	*.feather

	# ============================================
	# Model Artifacts
	# ============================================
	models/checkpoints/
	*.pth
	*.safetensors
	*.ckpt

	# ============================================
	# Credentials & Secrets
	# ============================================
	.env
	config/api_keys.yaml
	*.key
	*.pem

	# ============================================
	# Python
	# ============================================
	__pycache__/
	*.pyc
	*.pyo
	*.egg-info/
	.pytest_cache/
	.venv/
	venv/

	# ============================================
	# IDE & OS
	# ============================================
	.vscode/
	.idea/
	*.swp
	.DS_Store
	Thumbs.db

	# ============================================
	# Jupyter
	# ============================================
	.ipynb_checkpoints/

	# ============================================
	# Temporary Files
	# ============================================
	*.tmp
	*.log
	.cache/
	EOF

	# Stage .gitignore
	git add .gitignore

	# Verify data/ will be ignored
	echo "data/" >> .gitignore
	git check-ignore data/test.parquet
	# Should output: data/test.parquet (confirming it's ignored)
	```

	Why NO Git LFS?
	Following data science best practices:
	- ✓ Code → Git (fast, version controlled)
	- ✓ Data → HuggingFace Datasets (separate, scalable)
	- ✗ NOT Git LFS (expensive, non-standard for ML projects)

	Data will be:
	- Downloaded via scripts (Day 1)
	- Uploaded to HF Datasets (Day 1)
	- Loaded programmatically (Days 2-5)
	- NEVER committed to Git repository

	### 2.7 Configure API Keys & HuggingFace Access (3 minutes)

	```bash
	# Create config directory structure
	mkdir -p config

	# Create API keys configuration
	cat > config/api_keys.yaml << 'EOF'
	# ENTSO-E Transparency Platform
	entsoe_api_key: "YOUR_ENTSOE_API_KEY_HERE"

	# OpenMeteo (free tier - no key required)
	openmeteo_base_url: "https://api.open-meteo.com/v1/forecast"

	# Hugging Face (for uploading datasets)
	hf_token: "YOUR_HF_WRITE_TOKEN_HERE"
	hf_username: "YOUR_HF_USERNAME"
	EOF

	# Create .env file for environment variables
	cat > .env << 'EOF'
	ENTSOE_API_KEY=YOUR_ENTSOE_API_KEY_HERE
	OPENMETEO_BASE_URL=https://api.open-meteo.com/v1/forecast
	HF_TOKEN=YOUR_HF_WRITE_TOKEN_HERE
	HF_USERNAME=YOUR_HF_USERNAME
	EOF
	```

	Get your HuggingFace Write Token:
	1. Visit: https://huggingface.co/settings/tokens
	2. Click "New token"
	3. Name: "FBMC Dataset Upload"
	4. Type: Write (required for uploading datasets)
	5. Copy token

	Now edit the files with your actual credentials:
	```bash
	# Option 1: Use text editor
	nano config/api_keys.yaml # Update all YOUR_*_HERE placeholders
	nano .env # Update all YOUR_*_HERE placeholders

	# Option 2: Use sed (replace with your actual values)
	sed -i 's/YOUR_ENTSOE_API_KEY_HERE/your-actual-entsoe-key/' config/api_keys.yaml .env
	sed -i 's/YOUR_HF_WRITE_TOKEN_HERE/hf_your-actual-token/' config/api_keys.yaml .env
	sed -i 's/YOUR_HF_USERNAME/your-username/' config/api_keys.yaml .env
	```

	Verify credentials are set:
	```bash
	# Should NOT see any "YOUR_*_HERE" placeholders
	grep "YOUR_" config/api_keys.yaml
	# Empty output = good!
	```

	### 2.8 Create Data Management Utilities (5 minutes)

	```bash
	# Create data collection module with HF Datasets integration
	cat > src/data_collection/hf_datasets_manager.py << 'EOF'
	"""HuggingFace Datasets manager for FBMC data storage."""

	import polars as pl
	from datasets import Dataset, DatasetDict
	from huggingface_hub import HfApi
	from pathlib import Path
	import yaml

	class FBMCDatasetManager:
	"""Manage FBMC data uploads/downloads via HuggingFace Datasets."""

	def __init__(self, config_path: str = "config/api_keys.yaml"):
	"""Initialize with HF credentials."""
	with open(config_path) as f:
	config = yaml.safe_load(f)

	self.hf_token = config['hf_token']
	self.hf_username = config['hf_username']
	self.api = HfApi(token=self.hf_token)

	def upload_dataset(self, parquet_path: Path, dataset_name: str, description: str = ""):
	"""Upload Parquet file to HuggingFace Datasets."""
	print(f"Uploading {parquet_path.name} to HF Datasets...")

	# Load Parquet as polars, convert to HF Dataset
	df = pl.read_parquet(parquet_path)
	dataset = Dataset.from_pandas(df.to_pandas())

	# Create full dataset name
	full_name = f"{self.hf_username}/{dataset_name}"

	# Upload to HF
	dataset.push_to_hub(
	full_name,
	token=self.hf_token,
	private=False # Public datasets (free storage)
	)

	print(f"✓ Uploaded to: https://huggingface.co/datasets/{full_name}")
	return full_name

	def download_dataset(self, dataset_name: str, output_path: Path):
	"""Download dataset from HF to local Parquet."""
	from datasets import load_dataset

	print(f"Downloading {dataset_name} from HF Datasets...")

	# Download from HF
	dataset = load_dataset(
	f"{self.hf_username}/{dataset_name}",
	split="train"
	)

	# Convert to polars and save
	df = pl.from_pandas(dataset.to_pandas())
	output_path.parent.mkdir(parents=True, exist_ok=True)
	df.write_parquet(output_path)

	print(f"✓ Downloaded to: {output_path}")
	return df

	def list_datasets(self):
	"""List all FBMC datasets for this user."""
	datasets = self.api.list_datasets(author=self.hf_username)
	fbmc_datasets = [d for d in datasets if 'fbmc' in d.id.lower()]

	print(f"\nFBMC Datasets for {self.hf_username}:")
	for ds in fbmc_datasets:
	print(f" - {ds.id}")

	return fbmc_datasets

	# Example usage (will be used in Day 1)
	if __name__ == "__main__":
	manager = FBMCDatasetManager()

	# Upload example (Day 1 will use this)
	# manager.upload_dataset(
	# parquet_path=Path("data/raw/cnecs_2023_2025.parquet"),
	# dataset_name="fbmc-cnecs-2023-2025",
	# description="FBMC CNECs data: Oct 2023 - Sept 2025"
	# )

	# Download example (HF Space will use this)
	# manager.download_dataset(
	# dataset_name="fbmc-cnecs-2023-2025",
	# output_path=Path("data/raw/cnecs_2023_2025.parquet")
	# )
	EOF

	# Create data download orchestrator
	cat > src/data_collection/download_all.py << 'EOF'
	"""Download all FBMC data from HuggingFace Datasets."""

	from pathlib import Path
	from hf_datasets_manager import FBMCDatasetManager

	def setup_data(data_dir: Path = Path("data/raw")):
	"""Download all datasets if not present locally."""
	manager = FBMCDatasetManager()

	datasets_to_download = {
	"fbmc-cnecs-2023-2025": "cnecs_2023_2025.parquet",
	"fbmc-weather-2023-2025": "weather_2023_2025.parquet",
	"fbmc-entsoe-2023-2025": "entsoe_2023_2025.parquet",
	}

	data_dir.mkdir(parents=True, exist_ok=True)

	for dataset_name, filename in datasets_to_download.items():
	output_path = data_dir / filename

	if output_path.exists():
	print(f"✓ {filename} already exists, skipping")
	else:
	try:
	manager.download_dataset(dataset_name, output_path)
	except Exception as e:
	print(f"✗ Failed to download {dataset_name}: {e}")
	print(f" You may need to run Day 1 data collection first")

	print("\n✓ Data setup complete")

	if __name__ == "__main__":
	setup_data()
	EOF

	# Make scripts executable
	chmod +x src/data_collection/hf_datasets_manager.py
	chmod +x src/data_collection/download_all.py

	echo "✓ Data management utilities created"
	```

	What This Does:
	- `hf_datasets_manager.py`: Upload/download Parquet files to/from HF Datasets
	- `download_all.py`: One-command data setup for HF Space or analysts

	Day 1 Workflow:
	1. Download data from JAO/ENTSO-E/OpenMeteo to `data/raw/`
	2. Upload each Parquet to HF Datasets (separate from Git)
	3. Git repo stays small (only code)

	HF Space Workflow:
	```python
	# In your Space's app.py startup:
	from src.data_collection.download_all import setup_data
	setup_data() # Downloads from HF Datasets, not Git
	```

	### 2.9 Create First Marimo Notebook (5 minutes)

	```bash
	# Create initial exploration notebook
	cat > notebooks/01_data_exploration.py << 'EOF'
	import marimo

	__generated_with = "0.9.0"
	app = marimo.App(width="medium")

	@app.cell
	def __():
	import marimo as mo
	import polars as pl
	import altair as alt
	from pathlib import Path
	return mo, pl, alt, Path

	@app.cell
	def __(mo):
	mo.md(
	"""
	# FBMC Flow Forecasting - Data Exploration

	Day 1 Objective: Explore JAO FBMC data structure

	## Steps:
	1. Load downloaded Parquet files
	2. Inspect CNECs, PTDFs, RAMs
	3. Identify top 200 binding CNECs (50 Tier-1 + 150 Tier-2)
	4. Visualize temporal patterns
	"""
	)
	return

	@app.cell
	def __(Path):
	# Data paths
	DATA_DIR = Path("../data/raw")
	CNECS_FILE = DATA_DIR / "cnecs_2023_2025.parquet"
	return DATA_DIR, CNECS_FILE

	@app.cell
	def __(mo, CNECS_FILE):
	# Check if data exists
	if CNECS_FILE.exists():
	mo.md("✓ CNECs data found - ready for Day 1 analysis")
	else:
	mo.md("⚠ CNECs data not yet downloaded - run Day 1 collection script")
	return

	if __name__ == "__main__":
	app.run()
	EOF

	# Test Marimo installation
	marimo edit notebooks/01_data_exploration.py &
	# This will open browser with interactive notebook
	# Close after verifying it loads correctly (Ctrl+C in terminal)
	```

	### 2.10 Create Utility Modules (2 minutes)

	```bash
	# Create data loading utilities
	cat > src/utils/data_loader.py << 'EOF'
	"""Data loading utilities for FBMC forecasting project."""

	import polars as pl
	from pathlib import Path
	from typing import Optional

	def load_cnecs(data_dir: Path, start_date: Optional[str] = None, end_date: Optional[str] = None) -> pl.DataFrame:
	"""Load CNEC data with optional date filtering."""
	cnecs = pl.read_parquet(data_dir / "cnecs_2023_2025.parquet")

	if start_date:
	cnecs = cnecs.filter(pl.col("timestamp") >= start_date)
	if end_date:
	cnecs = cnecs.filter(pl.col("timestamp") <= end_date)

	return cnecs

	def load_weather(data_dir: Path, grid_points: Optional[list] = None) -> pl.DataFrame:
	"""Load weather data with optional grid point filtering."""
	weather = pl.read_parquet(data_dir / "weather_2023_2025.parquet")

	if grid_points:
	weather = weather.filter(pl.col("grid_point").is_in(grid_points))

	return weather
	EOF

	# Create __init__.py files
	touch src/__init__.py
	touch src/utils/__init__.py
	touch src/data_collection/__init__.py
	touch src/feature_engineering/__init__.py
	touch src/model/__init__.py
	```

	### 2.11 Initial Commit (2 minutes)

	```bash
	# Stage all changes (note: data/ is excluded by .gitignore)
	git add .

	# Create initial commit
	git commit -m "Day 0: Initialize FBMC forecasting MVP environment

	- Add project structure (notebooks, src, config, tools)
	- Configure uv + polars + Marimo + Chronos + HF Datasets stack
	- Create .gitignore (excludes data/ following best practices)
	- Install jao-py Python library for JAO data access
	- Configure ENTSO-E, OpenMeteo, and HuggingFace API access
	- Add HF Datasets manager for data storage (separate from Git)
	- Create data download utilities (download_all.py)
	- Create initial exploration notebook

	Data Strategy:
	- Code → Git (this repo)
	- Data → HuggingFace Datasets (separate, not in Git)
	- NO Git LFS (following data science best practices)

	Infrastructure: HF Space (A10G GPU, \$30/month)"

	# Push to HF Space
	git push origin main

	# Verify push succeeded
	git status
	# Should show: "Your branch is up to date with 'origin/main'"

	# Verify no data files were committed
	git ls-files \| grep "\.parquet"
	# Should be empty (no .parquet files in Git)
	```

	---

	## Step 3: Verify Complete Setup (5 minutes)

	### 3.1 Python Environment Verification

	```bash
	# Activate environment if not already
	source .venv/bin/activate

	# Run comprehensive checks
	python << 'EOF'
	import sys
	print(f"Python: {sys.version}")

	packages = [
	"polars", "pyarrow", "numpy", "scikit-learn",
	"torch", "transformers", "marimo", "altair",
	"entsoe", "jao", "requests", "yaml", "gradio",
	"datasets", "huggingface_hub"
	]

	print("\nPackage Versions:")
	for pkg in packages:
	try:
	if pkg == "entsoe":
	import entsoe
	print(f"✓ entsoe-py: {entsoe.__version__}")
	elif pkg == "jao":
	import jao
	print(f"✓ jao-py: {jao.__version__}")
	elif pkg == "yaml":
	import yaml
	print(f"✓ pyyaml: {yaml.__version__}")
	elif pkg == "huggingface_hub":
	from huggingface_hub import HfApi
	print(f"✓ huggingface-hub: Ready")
	else:
	mod = __import__(pkg)
	print(f"✓ {pkg}: {mod.__version__}")
	except Exception as e:
	print(f"✗ {pkg}: {e}")

	# Test Chronos specifically
	try:
	from chronos import ChronosPipeline
	print("\n✓ Chronos forecasting: Ready")
	except Exception as e:
	print(f"\n✗ Chronos forecasting: {e}")

	# Test HF Datasets
	try:
	from datasets import Dataset
	print("✓ HuggingFace Datasets: Ready")
	except Exception as e:
	print(f"✗ HuggingFace Datasets: {e}")

	print("\nAll checks complete!")
	EOF
	```

	### 3.2 API Access Verification

	```bash
	# Test ENTSO-E API
	python << 'EOF'
	from entsoe import EntsoePandasClient
	import yaml

	# Load API key
	with open('config/api_keys.yaml') as f:
	config = yaml.safe_load(f)

	api_key = config['entsoe_api_key']

	if 'YOUR_ENTSOE_API_KEY_HERE' in api_key:
	print("⚠ ENTSO-E API key not configured - update config/api_keys.yaml")
	else:
	try:
	client = EntsoePandasClient(api_key=api_key)
	print("✓ ENTSO-E API client initialized successfully")
	except Exception as e:
	print(f"✗ ENTSO-E API error: {e}")
	EOF

	# Test OpenMeteo API
	python << 'EOF'
	import requests

	response = requests.get(
	"https://api.open-meteo.com/v1/forecast",
	params={
	"latitude": 52.52,
	"longitude": 13.41,
	"hourly": "temperature_2m",
	"start_date": "2025-01-01",
	"end_date": "2025-01-02"
	}
	)

	if response.status_code == 200:
	print("✓ OpenMeteo API accessible")
	else:
	print(f"✗ OpenMeteo API error: {response.status_code}")
	EOF

	# Test HuggingFace authentication
	python << 'EOF'
	from huggingface_hub import HfApi
	import yaml

	with open('config/api_keys.yaml') as f:
	config = yaml.safe_load(f)

	hf_token = config['hf_token']
	hf_username = config['hf_username']

	if 'YOUR_HF' in hf_token or 'YOUR_HF' in hf_username:
	print("⚠ HuggingFace credentials not configured - update config/api_keys.yaml")
	else:
	try:
	api = HfApi(token=hf_token)
	user_info = api.whoami()
	print(f"✓ HuggingFace authenticated as: {user_info['name']}")
	print(f" Can create datasets: {'datasets' in user_info.get('auth', {}).get('accessToken', {}).get('role', '')}")
	except Exception as e:
	print(f"✗ HuggingFace authentication error: {e}")
	print(f" Verify token has WRITE permissions")
	EOF
	```

	### 3.3 HF Space Verification

	```bash
	# Check HF Space status
	echo "Visit your HF Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting"
	echo ""
	echo "Verify:"
	echo " 1. JupyterLab interface loads"
	echo " 2. Hardware shows 'A10G GPU' in bottom-right"
	echo " 3. Files from git push are visible"
	echo " 4. Can create new notebook"
	```

	### 3.4 Final Checklist

	```bash
	# Print final status
	cat << 'EOF'
	╔═══════════════════════════════════════════════════════════╗
	║ DAY 0 SETUP VERIFICATION CHECKLIST ║
	╚═══════════════════════════════════════════════════════════╝

	Environment:
	[ ] Python 3.10+ installed
	[ ] Git installed (NO Git LFS needed)
	[ ] uv package manager installed

	Local Setup:
	[ ] Virtual environment created and activated
	[ ] All Python dependencies installed (24 packages including jao-py)
	[ ] API keys configured (ENTSO-E + OpenMeteo + HuggingFace)
	[ ] HuggingFace write token obtained
	[ ] Project structure created (8 directories)
	[ ] .gitignore configured (data/ excluded)
	[ ] Initial Marimo notebook created
	[ ] Data management utilities created (hf_datasets_manager.py)

	Git & HF Space:
	[ ] HF Space created (A10G GPU, $30/month)
	[ ] Repository cloned locally
	[ ] .gitignore excludes all data files (*.parquet, data/)
	[ ] Initial commit pushed to HF Space (code only, NO data)
	[ ] HF Space JupyterLab accessible
	[ ] Git repo size < 50 MB (no data committed)

	Verification Tests:
	[ ] Python imports successful (polars, chronos, jao-py, datasets, etc.)
	[ ] ENTSO-E API client initializes
	[ ] OpenMeteo API responds (status 200)
	[ ] HuggingFace authentication successful (write access)
	[ ] Marimo notebook opens in browser

	Data Strategy Confirmed:
	[ ] Code goes in Git (version controlled)
	[ ] Data goes in HuggingFace Datasets (separate storage)
	[ ] NO Git LFS setup (following data science best practices)
	[ ] data/ directory in .gitignore

	Ready for Day 1: [ ]

	Next Step: Run Day 1 data collection (8 hours)
	- Download data locally via jao-py/APIs
	- Upload to HuggingFace Datasets (separate from Git)
	- Total data: ~12 GB (stored in HF Datasets, NOT Git)
	EOF
	```

	---

	## Troubleshooting

	### Issue: uv installation fails
	```bash
	# Alternative: Use pip directly
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	### Issue: Git LFS files not syncing
	Not applicable - We're using HuggingFace Datasets, not Git LFS.

	If you see Git LFS references, you may have an old version of this guide. Data files should NEVER be in Git.

	### Issue: HuggingFace authentication fails
	```bash
	# Verify token is correct
	python << 'EOF'
	from huggingface_hub import HfApi
	import yaml

	with open('config/api_keys.yaml') as f:
	config = yaml.safe_load(f)

	try:
	api = HfApi(token=config['hf_token'])
	print(api.whoami())
	except Exception as e:
	print(f"Error: {e}")
	print("\nTroubleshooting:")
	print("1. Visit: https://huggingface.co/settings/tokens")
	print("2. Verify token has WRITE permission")
	print("3. Copy token exactly (starts with 'hf_')")
	print("4. Update config/api_keys.yaml and .env")
	EOF
	```

	### Issue: Cannot upload to HuggingFace Datasets
	```bash
	# Common causes:
	# 1. Token doesn't have write permissions
	# Fix: Create new token with "write" scope

	# 2. Dataset name already exists
	# Fix: Use different name or add version suffix
	# Example: fbmc-cnecs-2023-2025-v2

	# 3. File too large (>5GB single file limit)
	# Fix: Split into multiple datasets or use sharding

	# Test upload with small sample:
	python << 'EOF'
	from datasets import Dataset
	import pandas as pd

	# Create tiny test dataset
	df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
	dataset = Dataset.from_pandas(df)

	# Try uploading
	try:
	dataset.push_to_hub("YOUR_USERNAME/test-dataset", token="YOUR_TOKEN")
	print("✓ Upload successful - authentication works")
	except Exception as e:
	print(f"✗ Upload failed: {e}")
	EOF
	```

	### Issue: Marimo notebook won't open
	```bash
	# Check marimo installation
	marimo --version

	# Try running without opening browser
	marimo run notebooks/01_data_exploration.py

	# Check for port conflicts
	lsof -i :2718 # Default Marimo port
	```

	### Issue: ENTSO-E API key invalid
	```bash
	# Verify key in ENTSO-E Transparency Platform:
	# 1. Login: https://transparency.entsoe.eu/
	# 2. Navigate: Account Settings → Web API Security Token
	# 3. Copy key exactly (no spaces)
	# 4. Update: config/api_keys.yaml and .env
	```

	### Issue: HF Space shows "Building..." forever
	```bash
	# Check HF Space logs:
	# Visit: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
	# Click: "Settings" → "Logs"

	# Common fix: Ensure requirements.txt is valid
	# Test locally:
	pip install -r requirements.txt --dry-run
	```

	### Issue: jao-py import fails
	```bash
	# Verify jao-py installation
	python -c "import jao; print(jao.__version__)"

	# If missing, reinstall
	uv pip install jao-py>=0.6.0

	# Check package is in environment
	uv pip list \| grep jao
	```

	---

	## What's Next: Day 1 Preview

	Day 1 Objective: Download 24 months of historical data (Oct 2023 - Sept 2025)

	Data Collection Tasks:
	1. JAO FBMC Data (4-5 hours)
	- CNECs: ~900 MB (24 months)
	- PTDFs: ~1.5 GB (24 months)
	- RAMs: ~800 MB (24 months)
	- Shadow prices: ~600 MB (24 months)
	- LTN nominations: ~400 MB (24 months)
	- Net positions: ~300 MB (24 months)

	2. ENTSO-E Data (2-3 hours)
	- Generation forecasts: 13 zones × 24 months
	- Actual generation: 13 zones × 24 months
	- Cross-border flows: ~20 borders × 24 months

	3. OpenMeteo Weather (1-2 hours)
	- 52 grid points × 24 months
	- 8 variables per point
	- Parallel download optimization

	Total Data Size: ~12 GB (compressed Parquet)

	Day 1 Script: Will use jao-py Python library with rate limiting and parallel download logic.

	---

	## Summary

	Time Investment: 45 minutes
	Result: Production-ready local + cloud development environment

	You Now Have:
	- ✓ HF Space with A10G GPU ($30/month)
	- ✓ Local Python environment (24 packages including jao-py and HF Datasets)
	- ✓ jao-py Python library for JAO data access
	- ✓ ENTSO-E + OpenMeteo + HuggingFace API access configured
	- ✓ HuggingFace Datasets manager for data storage (separate from Git)
	- ✓ Data download/upload utilities (hf_datasets_manager.py)
	- ✓ Marimo reactive notebook environment
	- ✓ .gitignore configured (data/ excluded, following best practices)
	- ✓ Complete project structure (8 directories)

	Data Strategy Implemented:
	```
	Code (version controlled) → Git Repository (~50 MB)
	Data (storage & versioning) → HuggingFace Datasets (~12 GB)
	NO Git LFS (following data science best practices)
	```

	Ready For: Day 1 data collection (8 hours)
	- Download 24 months data locally (jao-py + APIs)
	- Upload to HuggingFace Datasets (not Git)
	- Git repo stays clean (code only)

	---

	Document Version: 2.0
	Last Updated: 2025-10-29
	Project: FBMC Flow Forecasting MVP (Zero-Shot)