Spaces:
Sleeping
Sleeping
| # FBMC Flow Forecasting MVP - Day 0 Quick Start Guide | |
| ## Environment Setup (45 Minutes) | |
| **Target**: From zero to working local + HF Space environment with all dependencies verified | |
| --- | |
| ## Prerequisites Check (5 minutes) | |
| Before starting, verify you have: | |
| ```bash | |
| # Check Git | |
| git --version | |
| # Need: 2.x+ | |
| # Check Python | |
| python3 --version | |
| # Need: 3.10+ | |
| ``` | |
| **API Keys & Accounts Ready:** | |
| - [ ] ENTSO-E Transparency Platform API key | |
| - [ ] Hugging Face account with payment method for Spaces | |
| - [ ] Hugging Face write token (for uploading datasets) | |
| **Important Data Storage Philosophy:** | |
| - **Code** → Git repository (small, version controlled) | |
| - **Data** → HuggingFace Datasets (separate, not in Git) | |
| - **NO Git LFS** needed (following data science best practices) | |
| --- | |
| ## Step 1: Create Hugging Face Space (10 minutes) | |
| 1. **Navigate to**: https://huggingface.co/new-space | |
| 2. **Configure Space:** | |
| - **Owner**: Your username/organization | |
| - **Space name**: `fbmc-forecasting` (or your preference) | |
| - **License**: Apache 2.0 | |
| - **Select SDK**: `JupyterLab` | |
| - **Select Hardware**: `A10G GPU ($30/month)` ← **CRITICAL** | |
| - **Visibility**: Private (recommended for MVP) | |
| 3. **Create Space** button | |
| 4. **Wait 2-3 minutes** for Space initialization | |
| 5. **Verify Space Access:** | |
| - Visit: `https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting` | |
| - Confirm JupyterLab interface loads | |
| - Check hardware: Should show "A10G GPU" in bottom-right | |
| --- | |
| ## Step 2: Local Environment Setup (25 minutes) | |
| ### 2.1 Clone HF Space Locally (2 minutes) | |
| ```bash | |
| # Clone your HF Space | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting | |
| cd fbmc-forecasting | |
| # Verify remote | |
| git remote -v | |
| # Should show: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting | |
| ``` | |
| ### 2.2 Create Directory Structure (1 minute) | |
| ```bash | |
| # Create project directories | |
| mkdir -p notebooks \ | |
| notebooks_exported \ | |
| src/{data_collection,feature_engineering,model,utils} \ | |
| config \ | |
| results/{forecasts,evaluation,visualizations} \ | |
| docs \ | |
| tools \ | |
| tests | |
| # Note: data/ directory will be created by download scripts | |
| # It is NOT tracked in Git (following best practices) | |
| # Verify structure | |
| tree -L 2 | |
| ``` | |
| ### 2.3 Install uv Package Manager (2 minutes) | |
| ```bash | |
| # Install uv (ultra-fast pip replacement) | |
| curl -LsSf https://astral.sh/uv/install.sh | sh | |
| # Add to PATH (if not automatic) | |
| export PATH="$HOME/.cargo/bin:$PATH" | |
| # Verify installation | |
| uv --version | |
| # Should show: uv 0.x.x | |
| ``` | |
| ### 2.4 Create Virtual Environment (1 minute) | |
| ```bash | |
| # Create .venv with uv | |
| uv venv | |
| # Activate (Linux/Mac) | |
| source .venv/bin/activate | |
| # Activate (Windows) | |
| # .venv\Scripts\activate | |
| # Verify activation | |
| which python | |
| # Should point to: /path/to/fbmc-forecasting/.venv/bin/python | |
| ``` | |
| ### 2.5 Install Dependencies (2 minutes) | |
| ```bash | |
| # Create requirements.txt | |
| cat > requirements.txt << 'EOF' | |
| # Core Data & ML | |
| polars>=0.20.0 | |
| pyarrow>=13.0.0 | |
| numpy>=1.24.0 | |
| scikit-learn>=1.3.0 | |
| # Time Series Forecasting | |
| chronos-forecasting>=1.0.0 | |
| transformers>=4.35.0 | |
| torch>=2.0.0 | |
| # Data Collection | |
| entsoe-py>=0.5.0 | |
| jao-py>=0.6.0 | |
| requests>=2.31.0 | |
| # HuggingFace Integration (for Datasets, NOT Git LFS) | |
| datasets>=2.14.0 | |
| huggingface-hub>=0.17.0 | |
| # Visualization & Notebooks | |
| altair>=5.0.0 | |
| marimo>=0.9.0 | |
| jupyter>=1.0.0 | |
| ipykernel>=6.25.0 | |
| # Utilities | |
| pyyaml>=6.0.0 | |
| python-dotenv>=1.0.0 | |
| tqdm>=4.66.0 | |
| # HF Space Integration | |
| gradio>=4.0.0 | |
| EOF | |
| # Install with uv (ultra-fast) | |
| uv pip install -r requirements.txt | |
| # Create lockfile for reproducibility | |
| uv pip compile requirements.txt -o requirements.lock | |
| ``` | |
| **Verify installations:** | |
| ```bash | |
| python -c "import polars; print(f'polars {polars.__version__}')" | |
| python -c "import marimo; print(f'marimo {marimo.__version__}')" | |
| python -c "import torch; print(f'torch {torch.__version__}')" | |
| python -c "from chronos import ChronosPipeline; print('chronos-forecasting ✓')" | |
| python -c "from datasets import Dataset; print('datasets ✓')" | |
| python -c "from huggingface_hub import HfApi; print('huggingface-hub ✓')" | |
| python -c "import jao; print(f'jao-py {jao.__version__}')" | |
| ``` | |
| ### 2.6 Configure .gitignore (Data Exclusion) (2 minutes) | |
| ```bash | |
| # Create .gitignore - CRITICAL for keeping data out of Git | |
| cat > .gitignore << 'EOF' | |
| # ============================================ | |
| # Data Files - NEVER commit to Git | |
| # ============================================ | |
| # Following data science best practices: | |
| # - Code goes in Git | |
| # - Data goes in HuggingFace Datasets | |
| data/ | |
| *.parquet | |
| *.pkl | |
| *.csv | |
| *.h5 | |
| *.hdf5 | |
| *.feather | |
| # ============================================ | |
| # Model Artifacts | |
| # ============================================ | |
| models/checkpoints/ | |
| *.pth | |
| *.safetensors | |
| *.ckpt | |
| # ============================================ | |
| # Credentials & Secrets | |
| # ============================================ | |
| .env | |
| config/api_keys.yaml | |
| *.key | |
| *.pem | |
| # ============================================ | |
| # Python | |
| # ============================================ | |
| __pycache__/ | |
| *.pyc | |
| *.pyo | |
| *.egg-info/ | |
| .pytest_cache/ | |
| .venv/ | |
| venv/ | |
| # ============================================ | |
| # IDE & OS | |
| # ============================================ | |
| .vscode/ | |
| .idea/ | |
| *.swp | |
| .DS_Store | |
| Thumbs.db | |
| # ============================================ | |
| # Jupyter | |
| # ============================================ | |
| .ipynb_checkpoints/ | |
| # ============================================ | |
| # Temporary Files | |
| # ============================================ | |
| *.tmp | |
| *.log | |
| .cache/ | |
| EOF | |
| # Stage .gitignore | |
| git add .gitignore | |
| # Verify data/ will be ignored | |
| echo "data/" >> .gitignore | |
| git check-ignore data/test.parquet | |
| # Should output: data/test.parquet (confirming it's ignored) | |
| ``` | |
| **Why NO Git LFS?** | |
| Following data science best practices: | |
| - ✓ **Code** → Git (fast, version controlled) | |
| - ✓ **Data** → HuggingFace Datasets (separate, scalable) | |
| - ✗ **NOT** Git LFS (expensive, non-standard for ML projects) | |
| **Data will be:** | |
| - Downloaded via scripts (Day 1) | |
| - Uploaded to HF Datasets (Day 1) | |
| - Loaded programmatically (Days 2-5) | |
| - NEVER committed to Git repository | |
| ### 2.7 Configure API Keys & HuggingFace Access (3 minutes) | |
| ```bash | |
| # Create config directory structure | |
| mkdir -p config | |
| # Create API keys configuration | |
| cat > config/api_keys.yaml << 'EOF' | |
| # ENTSO-E Transparency Platform | |
| entsoe_api_key: "YOUR_ENTSOE_API_KEY_HERE" | |
| # OpenMeteo (free tier - no key required) | |
| openmeteo_base_url: "https://api.open-meteo.com/v1/forecast" | |
| # Hugging Face (for uploading datasets) | |
| hf_token: "YOUR_HF_WRITE_TOKEN_HERE" | |
| hf_username: "YOUR_HF_USERNAME" | |
| EOF | |
| # Create .env file for environment variables | |
| cat > .env << 'EOF' | |
| ENTSOE_API_KEY=YOUR_ENTSOE_API_KEY_HERE | |
| OPENMETEO_BASE_URL=https://api.open-meteo.com/v1/forecast | |
| HF_TOKEN=YOUR_HF_WRITE_TOKEN_HERE | |
| HF_USERNAME=YOUR_HF_USERNAME | |
| EOF | |
| ``` | |
| **Get your HuggingFace Write Token:** | |
| 1. Visit: https://huggingface.co/settings/tokens | |
| 2. Click "New token" | |
| 3. Name: "FBMC Dataset Upload" | |
| 4. Type: **Write** (required for uploading datasets) | |
| 5. Copy token | |
| **Now edit the files with your actual credentials:** | |
| ```bash | |
| # Option 1: Use text editor | |
| nano config/api_keys.yaml # Update all YOUR_*_HERE placeholders | |
| nano .env # Update all YOUR_*_HERE placeholders | |
| # Option 2: Use sed (replace with your actual values) | |
| sed -i 's/YOUR_ENTSOE_API_KEY_HERE/your-actual-entsoe-key/' config/api_keys.yaml .env | |
| sed -i 's/YOUR_HF_WRITE_TOKEN_HERE/hf_your-actual-token/' config/api_keys.yaml .env | |
| sed -i 's/YOUR_HF_USERNAME/your-username/' config/api_keys.yaml .env | |
| ``` | |
| **Verify credentials are set:** | |
| ```bash | |
| # Should NOT see any "YOUR_*_HERE" placeholders | |
| grep "YOUR_" config/api_keys.yaml | |
| # Empty output = good! | |
| ``` | |
| ### 2.8 Create Data Management Utilities (5 minutes) | |
| ```bash | |
| # Create data collection module with HF Datasets integration | |
| cat > src/data_collection/hf_datasets_manager.py << 'EOF' | |
| """HuggingFace Datasets manager for FBMC data storage.""" | |
| import polars as pl | |
| from datasets import Dataset, DatasetDict | |
| from huggingface_hub import HfApi | |
| from pathlib import Path | |
| import yaml | |
| class FBMCDatasetManager: | |
| """Manage FBMC data uploads/downloads via HuggingFace Datasets.""" | |
| def __init__(self, config_path: str = "config/api_keys.yaml"): | |
| """Initialize with HF credentials.""" | |
| with open(config_path) as f: | |
| config = yaml.safe_load(f) | |
| self.hf_token = config['hf_token'] | |
| self.hf_username = config['hf_username'] | |
| self.api = HfApi(token=self.hf_token) | |
| def upload_dataset(self, parquet_path: Path, dataset_name: str, description: str = ""): | |
| """Upload Parquet file to HuggingFace Datasets.""" | |
| print(f"Uploading {parquet_path.name} to HF Datasets...") | |
| # Load Parquet as polars, convert to HF Dataset | |
| df = pl.read_parquet(parquet_path) | |
| dataset = Dataset.from_pandas(df.to_pandas()) | |
| # Create full dataset name | |
| full_name = f"{self.hf_username}/{dataset_name}" | |
| # Upload to HF | |
| dataset.push_to_hub( | |
| full_name, | |
| token=self.hf_token, | |
| private=False # Public datasets (free storage) | |
| ) | |
| print(f"✓ Uploaded to: https://huggingface.co/datasets/{full_name}") | |
| return full_name | |
| def download_dataset(self, dataset_name: str, output_path: Path): | |
| """Download dataset from HF to local Parquet.""" | |
| from datasets import load_dataset | |
| print(f"Downloading {dataset_name} from HF Datasets...") | |
| # Download from HF | |
| dataset = load_dataset( | |
| f"{self.hf_username}/{dataset_name}", | |
| split="train" | |
| ) | |
| # Convert to polars and save | |
| df = pl.from_pandas(dataset.to_pandas()) | |
| output_path.parent.mkdir(parents=True, exist_ok=True) | |
| df.write_parquet(output_path) | |
| print(f"✓ Downloaded to: {output_path}") | |
| return df | |
| def list_datasets(self): | |
| """List all FBMC datasets for this user.""" | |
| datasets = self.api.list_datasets(author=self.hf_username) | |
| fbmc_datasets = [d for d in datasets if 'fbmc' in d.id.lower()] | |
| print(f"\nFBMC Datasets for {self.hf_username}:") | |
| for ds in fbmc_datasets: | |
| print(f" - {ds.id}") | |
| return fbmc_datasets | |
| # Example usage (will be used in Day 1) | |
| if __name__ == "__main__": | |
| manager = FBMCDatasetManager() | |
| # Upload example (Day 1 will use this) | |
| # manager.upload_dataset( | |
| # parquet_path=Path("data/raw/cnecs_2023_2025.parquet"), | |
| # dataset_name="fbmc-cnecs-2023-2025", | |
| # description="FBMC CNECs data: Oct 2023 - Sept 2025" | |
| # ) | |
| # Download example (HF Space will use this) | |
| # manager.download_dataset( | |
| # dataset_name="fbmc-cnecs-2023-2025", | |
| # output_path=Path("data/raw/cnecs_2023_2025.parquet") | |
| # ) | |
| EOF | |
| # Create data download orchestrator | |
| cat > src/data_collection/download_all.py << 'EOF' | |
| """Download all FBMC data from HuggingFace Datasets.""" | |
| from pathlib import Path | |
| from hf_datasets_manager import FBMCDatasetManager | |
| def setup_data(data_dir: Path = Path("data/raw")): | |
| """Download all datasets if not present locally.""" | |
| manager = FBMCDatasetManager() | |
| datasets_to_download = { | |
| "fbmc-cnecs-2023-2025": "cnecs_2023_2025.parquet", | |
| "fbmc-weather-2023-2025": "weather_2023_2025.parquet", | |
| "fbmc-entsoe-2023-2025": "entsoe_2023_2025.parquet", | |
| } | |
| data_dir.mkdir(parents=True, exist_ok=True) | |
| for dataset_name, filename in datasets_to_download.items(): | |
| output_path = data_dir / filename | |
| if output_path.exists(): | |
| print(f"✓ {filename} already exists, skipping") | |
| else: | |
| try: | |
| manager.download_dataset(dataset_name, output_path) | |
| except Exception as e: | |
| print(f"✗ Failed to download {dataset_name}: {e}") | |
| print(f" You may need to run Day 1 data collection first") | |
| print("\n✓ Data setup complete") | |
| if __name__ == "__main__": | |
| setup_data() | |
| EOF | |
| # Make scripts executable | |
| chmod +x src/data_collection/hf_datasets_manager.py | |
| chmod +x src/data_collection/download_all.py | |
| echo "✓ Data management utilities created" | |
| ``` | |
| **What This Does:** | |
| - `hf_datasets_manager.py`: Upload/download Parquet files to/from HF Datasets | |
| - `download_all.py`: One-command data setup for HF Space or analysts | |
| **Day 1 Workflow:** | |
| 1. Download data from JAO/ENTSO-E/OpenMeteo to `data/raw/` | |
| 2. Upload each Parquet to HF Datasets (separate from Git) | |
| 3. Git repo stays small (only code) | |
| **HF Space Workflow:** | |
| ```python | |
| # In your Space's app.py startup: | |
| from src.data_collection.download_all import setup_data | |
| setup_data() # Downloads from HF Datasets, not Git | |
| ``` | |
| ### 2.9 Create First Marimo Notebook (5 minutes) | |
| ```bash | |
| # Create initial exploration notebook | |
| cat > notebooks/01_data_exploration.py << 'EOF' | |
| import marimo | |
| __generated_with = "0.9.0" | |
| app = marimo.App(width="medium") | |
| @app.cell | |
| def __(): | |
| import marimo as mo | |
| import polars as pl | |
| import altair as alt | |
| from pathlib import Path | |
| return mo, pl, alt, Path | |
| @app.cell | |
| def __(mo): | |
| mo.md( | |
| """ | |
| # FBMC Flow Forecasting - Data Exploration | |
| **Day 1 Objective**: Explore JAO FBMC data structure | |
| ## Steps: | |
| 1. Load downloaded Parquet files | |
| 2. Inspect CNECs, PTDFs, RAMs | |
| 3. Identify top 200 binding CNECs (50 Tier-1 + 150 Tier-2) | |
| 4. Visualize temporal patterns | |
| """ | |
| ) | |
| return | |
| @app.cell | |
| def __(Path): | |
| # Data paths | |
| DATA_DIR = Path("../data/raw") | |
| CNECS_FILE = DATA_DIR / "cnecs_2023_2025.parquet" | |
| return DATA_DIR, CNECS_FILE | |
| @app.cell | |
| def __(mo, CNECS_FILE): | |
| # Check if data exists | |
| if CNECS_FILE.exists(): | |
| mo.md("✓ CNECs data found - ready for Day 1 analysis") | |
| else: | |
| mo.md("⚠ CNECs data not yet downloaded - run Day 1 collection script") | |
| return | |
| if __name__ == "__main__": | |
| app.run() | |
| EOF | |
| # Test Marimo installation | |
| marimo edit notebooks/01_data_exploration.py & | |
| # This will open browser with interactive notebook | |
| # Close after verifying it loads correctly (Ctrl+C in terminal) | |
| ``` | |
| ### 2.10 Create Utility Modules (2 minutes) | |
| ```bash | |
| # Create data loading utilities | |
| cat > src/utils/data_loader.py << 'EOF' | |
| """Data loading utilities for FBMC forecasting project.""" | |
| import polars as pl | |
| from pathlib import Path | |
| from typing import Optional | |
| def load_cnecs(data_dir: Path, start_date: Optional[str] = None, end_date: Optional[str] = None) -> pl.DataFrame: | |
| """Load CNEC data with optional date filtering.""" | |
| cnecs = pl.read_parquet(data_dir / "cnecs_2023_2025.parquet") | |
| if start_date: | |
| cnecs = cnecs.filter(pl.col("timestamp") >= start_date) | |
| if end_date: | |
| cnecs = cnecs.filter(pl.col("timestamp") <= end_date) | |
| return cnecs | |
| def load_weather(data_dir: Path, grid_points: Optional[list] = None) -> pl.DataFrame: | |
| """Load weather data with optional grid point filtering.""" | |
| weather = pl.read_parquet(data_dir / "weather_2023_2025.parquet") | |
| if grid_points: | |
| weather = weather.filter(pl.col("grid_point").is_in(grid_points)) | |
| return weather | |
| EOF | |
| # Create __init__.py files | |
| touch src/__init__.py | |
| touch src/utils/__init__.py | |
| touch src/data_collection/__init__.py | |
| touch src/feature_engineering/__init__.py | |
| touch src/model/__init__.py | |
| ``` | |
| ### 2.11 Initial Commit (2 minutes) | |
| ```bash | |
| # Stage all changes (note: data/ is excluded by .gitignore) | |
| git add . | |
| # Create initial commit | |
| git commit -m "Day 0: Initialize FBMC forecasting MVP environment | |
| - Add project structure (notebooks, src, config, tools) | |
| - Configure uv + polars + Marimo + Chronos + HF Datasets stack | |
| - Create .gitignore (excludes data/ following best practices) | |
| - Install jao-py Python library for JAO data access | |
| - Configure ENTSO-E, OpenMeteo, and HuggingFace API access | |
| - Add HF Datasets manager for data storage (separate from Git) | |
| - Create data download utilities (download_all.py) | |
| - Create initial exploration notebook | |
| Data Strategy: | |
| - Code → Git (this repo) | |
| - Data → HuggingFace Datasets (separate, not in Git) | |
| - NO Git LFS (following data science best practices) | |
| Infrastructure: HF Space (A10G GPU, \$30/month)" | |
| # Push to HF Space | |
| git push origin main | |
| # Verify push succeeded | |
| git status | |
| # Should show: "Your branch is up to date with 'origin/main'" | |
| # Verify no data files were committed | |
| git ls-files | grep "\.parquet" | |
| # Should be empty (no .parquet files in Git) | |
| ``` | |
| --- | |
| ## Step 3: Verify Complete Setup (5 minutes) | |
| ### 3.1 Python Environment Verification | |
| ```bash | |
| # Activate environment if not already | |
| source .venv/bin/activate | |
| # Run comprehensive checks | |
| python << 'EOF' | |
| import sys | |
| print(f"Python: {sys.version}") | |
| packages = [ | |
| "polars", "pyarrow", "numpy", "scikit-learn", | |
| "torch", "transformers", "marimo", "altair", | |
| "entsoe", "jao", "requests", "yaml", "gradio", | |
| "datasets", "huggingface_hub" | |
| ] | |
| print("\nPackage Versions:") | |
| for pkg in packages: | |
| try: | |
| if pkg == "entsoe": | |
| import entsoe | |
| print(f"✓ entsoe-py: {entsoe.__version__}") | |
| elif pkg == "jao": | |
| import jao | |
| print(f"✓ jao-py: {jao.__version__}") | |
| elif pkg == "yaml": | |
| import yaml | |
| print(f"✓ pyyaml: {yaml.__version__}") | |
| elif pkg == "huggingface_hub": | |
| from huggingface_hub import HfApi | |
| print(f"✓ huggingface-hub: Ready") | |
| else: | |
| mod = __import__(pkg) | |
| print(f"✓ {pkg}: {mod.__version__}") | |
| except Exception as e: | |
| print(f"✗ {pkg}: {e}") | |
| # Test Chronos specifically | |
| try: | |
| from chronos import ChronosPipeline | |
| print("\n✓ Chronos forecasting: Ready") | |
| except Exception as e: | |
| print(f"\n✗ Chronos forecasting: {e}") | |
| # Test HF Datasets | |
| try: | |
| from datasets import Dataset | |
| print("✓ HuggingFace Datasets: Ready") | |
| except Exception as e: | |
| print(f"✗ HuggingFace Datasets: {e}") | |
| print("\nAll checks complete!") | |
| EOF | |
| ``` | |
| ### 3.2 API Access Verification | |
| ```bash | |
| # Test ENTSO-E API | |
| python << 'EOF' | |
| from entsoe import EntsoePandasClient | |
| import yaml | |
| # Load API key | |
| with open('config/api_keys.yaml') as f: | |
| config = yaml.safe_load(f) | |
| api_key = config['entsoe_api_key'] | |
| if 'YOUR_ENTSOE_API_KEY_HERE' in api_key: | |
| print("⚠ ENTSO-E API key not configured - update config/api_keys.yaml") | |
| else: | |
| try: | |
| client = EntsoePandasClient(api_key=api_key) | |
| print("✓ ENTSO-E API client initialized successfully") | |
| except Exception as e: | |
| print(f"✗ ENTSO-E API error: {e}") | |
| EOF | |
| # Test OpenMeteo API | |
| python << 'EOF' | |
| import requests | |
| response = requests.get( | |
| "https://api.open-meteo.com/v1/forecast", | |
| params={ | |
| "latitude": 52.52, | |
| "longitude": 13.41, | |
| "hourly": "temperature_2m", | |
| "start_date": "2025-01-01", | |
| "end_date": "2025-01-02" | |
| } | |
| ) | |
| if response.status_code == 200: | |
| print("✓ OpenMeteo API accessible") | |
| else: | |
| print(f"✗ OpenMeteo API error: {response.status_code}") | |
| EOF | |
| # Test HuggingFace authentication | |
| python << 'EOF' | |
| from huggingface_hub import HfApi | |
| import yaml | |
| with open('config/api_keys.yaml') as f: | |
| config = yaml.safe_load(f) | |
| hf_token = config['hf_token'] | |
| hf_username = config['hf_username'] | |
| if 'YOUR_HF' in hf_token or 'YOUR_HF' in hf_username: | |
| print("⚠ HuggingFace credentials not configured - update config/api_keys.yaml") | |
| else: | |
| try: | |
| api = HfApi(token=hf_token) | |
| user_info = api.whoami() | |
| print(f"✓ HuggingFace authenticated as: {user_info['name']}") | |
| print(f" Can create datasets: {'datasets' in user_info.get('auth', {}).get('accessToken', {}).get('role', '')}") | |
| except Exception as e: | |
| print(f"✗ HuggingFace authentication error: {e}") | |
| print(f" Verify token has WRITE permissions") | |
| EOF | |
| ``` | |
| ### 3.3 HF Space Verification | |
| ```bash | |
| # Check HF Space status | |
| echo "Visit your HF Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting" | |
| echo "" | |
| echo "Verify:" | |
| echo " 1. JupyterLab interface loads" | |
| echo " 2. Hardware shows 'A10G GPU' in bottom-right" | |
| echo " 3. Files from git push are visible" | |
| echo " 4. Can create new notebook" | |
| ``` | |
| ### 3.4 Final Checklist | |
| ```bash | |
| # Print final status | |
| cat << 'EOF' | |
| ╔═══════════════════════════════════════════════════════════╗ | |
| ║ DAY 0 SETUP VERIFICATION CHECKLIST ║ | |
| ╚═══════════════════════════════════════════════════════════╝ | |
| Environment: | |
| [ ] Python 3.10+ installed | |
| [ ] Git installed (NO Git LFS needed) | |
| [ ] uv package manager installed | |
| Local Setup: | |
| [ ] Virtual environment created and activated | |
| [ ] All Python dependencies installed (24 packages including jao-py) | |
| [ ] API keys configured (ENTSO-E + OpenMeteo + HuggingFace) | |
| [ ] HuggingFace write token obtained | |
| [ ] Project structure created (8 directories) | |
| [ ] .gitignore configured (data/ excluded) | |
| [ ] Initial Marimo notebook created | |
| [ ] Data management utilities created (hf_datasets_manager.py) | |
| Git & HF Space: | |
| [ ] HF Space created (A10G GPU, $30/month) | |
| [ ] Repository cloned locally | |
| [ ] .gitignore excludes all data files (*.parquet, data/) | |
| [ ] Initial commit pushed to HF Space (code only, NO data) | |
| [ ] HF Space JupyterLab accessible | |
| [ ] Git repo size < 50 MB (no data committed) | |
| Verification Tests: | |
| [ ] Python imports successful (polars, chronos, jao-py, datasets, etc.) | |
| [ ] ENTSO-E API client initializes | |
| [ ] OpenMeteo API responds (status 200) | |
| [ ] HuggingFace authentication successful (write access) | |
| [ ] Marimo notebook opens in browser | |
| Data Strategy Confirmed: | |
| [ ] Code goes in Git (version controlled) | |
| [ ] Data goes in HuggingFace Datasets (separate storage) | |
| [ ] NO Git LFS setup (following data science best practices) | |
| [ ] data/ directory in .gitignore | |
| Ready for Day 1: [ ] | |
| Next Step: Run Day 1 data collection (8 hours) | |
| - Download data locally via jao-py/APIs | |
| - Upload to HuggingFace Datasets (separate from Git) | |
| - Total data: ~12 GB (stored in HF Datasets, NOT Git) | |
| EOF | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### Issue: uv installation fails | |
| ```bash | |
| # Alternative: Use pip directly | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| ### Issue: Git LFS files not syncing | |
| **Not applicable** - We're using HuggingFace Datasets, not Git LFS. | |
| If you see Git LFS references, you may have an old version of this guide. Data files should NEVER be in Git. | |
| ### Issue: HuggingFace authentication fails | |
| ```bash | |
| # Verify token is correct | |
| python << 'EOF' | |
| from huggingface_hub import HfApi | |
| import yaml | |
| with open('config/api_keys.yaml') as f: | |
| config = yaml.safe_load(f) | |
| try: | |
| api = HfApi(token=config['hf_token']) | |
| print(api.whoami()) | |
| except Exception as e: | |
| print(f"Error: {e}") | |
| print("\nTroubleshooting:") | |
| print("1. Visit: https://huggingface.co/settings/tokens") | |
| print("2. Verify token has WRITE permission") | |
| print("3. Copy token exactly (starts with 'hf_')") | |
| print("4. Update config/api_keys.yaml and .env") | |
| EOF | |
| ``` | |
| ### Issue: Cannot upload to HuggingFace Datasets | |
| ```bash | |
| # Common causes: | |
| # 1. Token doesn't have write permissions | |
| # Fix: Create new token with "write" scope | |
| # 2. Dataset name already exists | |
| # Fix: Use different name or add version suffix | |
| # Example: fbmc-cnecs-2023-2025-v2 | |
| # 3. File too large (>5GB single file limit) | |
| # Fix: Split into multiple datasets or use sharding | |
| # Test upload with small sample: | |
| python << 'EOF' | |
| from datasets import Dataset | |
| import pandas as pd | |
| # Create tiny test dataset | |
| df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) | |
| dataset = Dataset.from_pandas(df) | |
| # Try uploading | |
| try: | |
| dataset.push_to_hub("YOUR_USERNAME/test-dataset", token="YOUR_TOKEN") | |
| print("✓ Upload successful - authentication works") | |
| except Exception as e: | |
| print(f"✗ Upload failed: {e}") | |
| EOF | |
| ``` | |
| ### Issue: Marimo notebook won't open | |
| ```bash | |
| # Check marimo installation | |
| marimo --version | |
| # Try running without opening browser | |
| marimo run notebooks/01_data_exploration.py | |
| # Check for port conflicts | |
| lsof -i :2718 # Default Marimo port | |
| ``` | |
| ### Issue: ENTSO-E API key invalid | |
| ```bash | |
| # Verify key in ENTSO-E Transparency Platform: | |
| # 1. Login: https://transparency.entsoe.eu/ | |
| # 2. Navigate: Account Settings → Web API Security Token | |
| # 3. Copy key exactly (no spaces) | |
| # 4. Update: config/api_keys.yaml and .env | |
| ``` | |
| ### Issue: HF Space shows "Building..." forever | |
| ```bash | |
| # Check HF Space logs: | |
| # Visit: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting | |
| # Click: "Settings" → "Logs" | |
| # Common fix: Ensure requirements.txt is valid | |
| # Test locally: | |
| pip install -r requirements.txt --dry-run | |
| ``` | |
| ### Issue: jao-py import fails | |
| ```bash | |
| # Verify jao-py installation | |
| python -c "import jao; print(jao.__version__)" | |
| # If missing, reinstall | |
| uv pip install jao-py>=0.6.0 | |
| # Check package is in environment | |
| uv pip list | grep jao | |
| ``` | |
| --- | |
| ## What's Next: Day 1 Preview | |
| **Day 1 Objective**: Download 24 months of historical data (Oct 2023 - Sept 2025) | |
| **Data Collection Tasks:** | |
| 1. **JAO FBMC Data** (4-5 hours) | |
| - CNECs: ~900 MB (24 months) | |
| - PTDFs: ~1.5 GB (24 months) | |
| - RAMs: ~800 MB (24 months) | |
| - Shadow prices: ~600 MB (24 months) | |
| - LTN nominations: ~400 MB (24 months) | |
| - Net positions: ~300 MB (24 months) | |
| 2. **ENTSO-E Data** (2-3 hours) | |
| - Generation forecasts: 13 zones × 24 months | |
| - Actual generation: 13 zones × 24 months | |
| - Cross-border flows: ~20 borders × 24 months | |
| 3. **OpenMeteo Weather** (1-2 hours) | |
| - 52 grid points × 24 months | |
| - 8 variables per point | |
| - Parallel download optimization | |
| **Total Data Size**: ~12 GB (compressed Parquet) | |
| **Day 1 Script**: Will use jao-py Python library with rate limiting and parallel download logic. | |
| --- | |
| ## Summary | |
| **Time Investment**: 45 minutes | |
| **Result**: Production-ready local + cloud development environment | |
| **You Now Have:** | |
| - ✓ HF Space with A10G GPU ($30/month) | |
| - ✓ Local Python environment (24 packages including jao-py and HF Datasets) | |
| - ✓ jao-py Python library for JAO data access | |
| - ✓ ENTSO-E + OpenMeteo + HuggingFace API access configured | |
| - ✓ HuggingFace Datasets manager for data storage (separate from Git) | |
| - ✓ Data download/upload utilities (hf_datasets_manager.py) | |
| - ✓ Marimo reactive notebook environment | |
| - ✓ .gitignore configured (data/ excluded, following best practices) | |
| - ✓ Complete project structure (8 directories) | |
| **Data Strategy Implemented:** | |
| ``` | |
| Code (version controlled) → Git Repository (~50 MB) | |
| Data (storage & versioning) → HuggingFace Datasets (~12 GB) | |
| NO Git LFS (following data science best practices) | |
| ``` | |
| **Ready For**: Day 1 data collection (8 hours) | |
| - Download 24 months data locally (jao-py + APIs) | |
| - Upload to HuggingFace Datasets (not Git) | |
| - Git repo stays clean (code only) | |
| --- | |
| **Document Version**: 2.0 | |
| **Last Updated**: 2025-10-29 | |
| **Project**: FBMC Flow Forecasting MVP (Zero-Shot) | |