Spaces:
Sleeping
Sleeping
File size: 27,113 Bytes
4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a 4202f60 27cb60a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 |
# FBMC Flow Forecasting MVP - Day 0 Quick Start Guide
## Environment Setup (45 Minutes)
**Target**: From zero to working local + HF Space environment with all dependencies verified
---
## Prerequisites Check (5 minutes)
Before starting, verify you have:
```bash
# Check Git
git --version
# Need: 2.x+
# Check Python
python3 --version
# Need: 3.10+
```
**API Keys & Accounts Ready:**
- [ ] ENTSO-E Transparency Platform API key
- [ ] Hugging Face account with payment method for Spaces
- [ ] Hugging Face write token (for uploading datasets)
**Important Data Storage Philosophy:**
- **Code** → Git repository (small, version controlled)
- **Data** → HuggingFace Datasets (separate, not in Git)
- **NO Git LFS** needed (following data science best practices)
---
## Step 1: Create Hugging Face Space (10 minutes)
1. **Navigate to**: https://huggingface.co/new-space
2. **Configure Space:**
- **Owner**: Your username/organization
- **Space name**: `fbmc-forecasting` (or your preference)
- **License**: Apache 2.0
- **Select SDK**: `JupyterLab`
- **Select Hardware**: `A10G GPU ($30/month)` ← **CRITICAL**
- **Visibility**: Private (recommended for MVP)
3. **Create Space** button
4. **Wait 2-3 minutes** for Space initialization
5. **Verify Space Access:**
- Visit: `https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting`
- Confirm JupyterLab interface loads
- Check hardware: Should show "A10G GPU" in bottom-right
---
## Step 2: Local Environment Setup (25 minutes)
### 2.1 Clone HF Space Locally (2 minutes)
```bash
# Clone your HF Space
git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
cd fbmc-forecasting
# Verify remote
git remote -v
# Should show: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
```
### 2.2 Create Directory Structure (1 minute)
```bash
# Create project directories
mkdir -p notebooks \
notebooks_exported \
src/{data_collection,feature_engineering,model,utils} \
config \
results/{forecasts,evaluation,visualizations} \
docs \
tools \
tests
# Note: data/ directory will be created by download scripts
# It is NOT tracked in Git (following best practices)
# Verify structure
tree -L 2
```
### 2.3 Install uv Package Manager (2 minutes)
```bash
# Install uv (ultra-fast pip replacement)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Add to PATH (if not automatic)
export PATH="$HOME/.cargo/bin:$PATH"
# Verify installation
uv --version
# Should show: uv 0.x.x
```
### 2.4 Create Virtual Environment (1 minute)
```bash
# Create .venv with uv
uv venv
# Activate (Linux/Mac)
source .venv/bin/activate
# Activate (Windows)
# .venv\Scripts\activate
# Verify activation
which python
# Should point to: /path/to/fbmc-forecasting/.venv/bin/python
```
### 2.5 Install Dependencies (2 minutes)
```bash
# Create requirements.txt
cat > requirements.txt << 'EOF'
# Core Data & ML
polars>=0.20.0
pyarrow>=13.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
# Time Series Forecasting
chronos-forecasting>=1.0.0
transformers>=4.35.0
torch>=2.0.0
# Data Collection
entsoe-py>=0.5.0
jao-py>=0.6.0
requests>=2.31.0
# HuggingFace Integration (for Datasets, NOT Git LFS)
datasets>=2.14.0
huggingface-hub>=0.17.0
# Visualization & Notebooks
altair>=5.0.0
marimo>=0.9.0
jupyter>=1.0.0
ipykernel>=6.25.0
# Utilities
pyyaml>=6.0.0
python-dotenv>=1.0.0
tqdm>=4.66.0
# HF Space Integration
gradio>=4.0.0
EOF
# Install with uv (ultra-fast)
uv pip install -r requirements.txt
# Create lockfile for reproducibility
uv pip compile requirements.txt -o requirements.lock
```
**Verify installations:**
```bash
python -c "import polars; print(f'polars {polars.__version__}')"
python -c "import marimo; print(f'marimo {marimo.__version__}')"
python -c "import torch; print(f'torch {torch.__version__}')"
python -c "from chronos import ChronosPipeline; print('chronos-forecasting ✓')"
python -c "from datasets import Dataset; print('datasets ✓')"
python -c "from huggingface_hub import HfApi; print('huggingface-hub ✓')"
python -c "import jao; print(f'jao-py {jao.__version__}')"
```
### 2.6 Configure .gitignore (Data Exclusion) (2 minutes)
```bash
# Create .gitignore - CRITICAL for keeping data out of Git
cat > .gitignore << 'EOF'
# ============================================
# Data Files - NEVER commit to Git
# ============================================
# Following data science best practices:
# - Code goes in Git
# - Data goes in HuggingFace Datasets
data/
*.parquet
*.pkl
*.csv
*.h5
*.hdf5
*.feather
# ============================================
# Model Artifacts
# ============================================
models/checkpoints/
*.pth
*.safetensors
*.ckpt
# ============================================
# Credentials & Secrets
# ============================================
.env
config/api_keys.yaml
*.key
*.pem
# ============================================
# Python
# ============================================
__pycache__/
*.pyc
*.pyo
*.egg-info/
.pytest_cache/
.venv/
venv/
# ============================================
# IDE & OS
# ============================================
.vscode/
.idea/
*.swp
.DS_Store
Thumbs.db
# ============================================
# Jupyter
# ============================================
.ipynb_checkpoints/
# ============================================
# Temporary Files
# ============================================
*.tmp
*.log
.cache/
EOF
# Stage .gitignore
git add .gitignore
# Verify data/ will be ignored
echo "data/" >> .gitignore
git check-ignore data/test.parquet
# Should output: data/test.parquet (confirming it's ignored)
```
**Why NO Git LFS?**
Following data science best practices:
- ✓ **Code** → Git (fast, version controlled)
- ✓ **Data** → HuggingFace Datasets (separate, scalable)
- ✗ **NOT** Git LFS (expensive, non-standard for ML projects)
**Data will be:**
- Downloaded via scripts (Day 1)
- Uploaded to HF Datasets (Day 1)
- Loaded programmatically (Days 2-5)
- NEVER committed to Git repository
### 2.7 Configure API Keys & HuggingFace Access (3 minutes)
```bash
# Create config directory structure
mkdir -p config
# Create API keys configuration
cat > config/api_keys.yaml << 'EOF'
# ENTSO-E Transparency Platform
entsoe_api_key: "YOUR_ENTSOE_API_KEY_HERE"
# OpenMeteo (free tier - no key required)
openmeteo_base_url: "https://api.open-meteo.com/v1/forecast"
# Hugging Face (for uploading datasets)
hf_token: "YOUR_HF_WRITE_TOKEN_HERE"
hf_username: "YOUR_HF_USERNAME"
EOF
# Create .env file for environment variables
cat > .env << 'EOF'
ENTSOE_API_KEY=YOUR_ENTSOE_API_KEY_HERE
OPENMETEO_BASE_URL=https://api.open-meteo.com/v1/forecast
HF_TOKEN=YOUR_HF_WRITE_TOKEN_HERE
HF_USERNAME=YOUR_HF_USERNAME
EOF
```
**Get your HuggingFace Write Token:**
1. Visit: https://huggingface.co/settings/tokens
2. Click "New token"
3. Name: "FBMC Dataset Upload"
4. Type: **Write** (required for uploading datasets)
5. Copy token
**Now edit the files with your actual credentials:**
```bash
# Option 1: Use text editor
nano config/api_keys.yaml # Update all YOUR_*_HERE placeholders
nano .env # Update all YOUR_*_HERE placeholders
# Option 2: Use sed (replace with your actual values)
sed -i 's/YOUR_ENTSOE_API_KEY_HERE/your-actual-entsoe-key/' config/api_keys.yaml .env
sed -i 's/YOUR_HF_WRITE_TOKEN_HERE/hf_your-actual-token/' config/api_keys.yaml .env
sed -i 's/YOUR_HF_USERNAME/your-username/' config/api_keys.yaml .env
```
**Verify credentials are set:**
```bash
# Should NOT see any "YOUR_*_HERE" placeholders
grep "YOUR_" config/api_keys.yaml
# Empty output = good!
```
### 2.8 Create Data Management Utilities (5 minutes)
```bash
# Create data collection module with HF Datasets integration
cat > src/data_collection/hf_datasets_manager.py << 'EOF'
"""HuggingFace Datasets manager for FBMC data storage."""
import polars as pl
from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi
from pathlib import Path
import yaml
class FBMCDatasetManager:
"""Manage FBMC data uploads/downloads via HuggingFace Datasets."""
def __init__(self, config_path: str = "config/api_keys.yaml"):
"""Initialize with HF credentials."""
with open(config_path) as f:
config = yaml.safe_load(f)
self.hf_token = config['hf_token']
self.hf_username = config['hf_username']
self.api = HfApi(token=self.hf_token)
def upload_dataset(self, parquet_path: Path, dataset_name: str, description: str = ""):
"""Upload Parquet file to HuggingFace Datasets."""
print(f"Uploading {parquet_path.name} to HF Datasets...")
# Load Parquet as polars, convert to HF Dataset
df = pl.read_parquet(parquet_path)
dataset = Dataset.from_pandas(df.to_pandas())
# Create full dataset name
full_name = f"{self.hf_username}/{dataset_name}"
# Upload to HF
dataset.push_to_hub(
full_name,
token=self.hf_token,
private=False # Public datasets (free storage)
)
print(f"✓ Uploaded to: https://huggingface.co/datasets/{full_name}")
return full_name
def download_dataset(self, dataset_name: str, output_path: Path):
"""Download dataset from HF to local Parquet."""
from datasets import load_dataset
print(f"Downloading {dataset_name} from HF Datasets...")
# Download from HF
dataset = load_dataset(
f"{self.hf_username}/{dataset_name}",
split="train"
)
# Convert to polars and save
df = pl.from_pandas(dataset.to_pandas())
output_path.parent.mkdir(parents=True, exist_ok=True)
df.write_parquet(output_path)
print(f"✓ Downloaded to: {output_path}")
return df
def list_datasets(self):
"""List all FBMC datasets for this user."""
datasets = self.api.list_datasets(author=self.hf_username)
fbmc_datasets = [d for d in datasets if 'fbmc' in d.id.lower()]
print(f"\nFBMC Datasets for {self.hf_username}:")
for ds in fbmc_datasets:
print(f" - {ds.id}")
return fbmc_datasets
# Example usage (will be used in Day 1)
if __name__ == "__main__":
manager = FBMCDatasetManager()
# Upload example (Day 1 will use this)
# manager.upload_dataset(
# parquet_path=Path("data/raw/cnecs_2023_2025.parquet"),
# dataset_name="fbmc-cnecs-2023-2025",
# description="FBMC CNECs data: Oct 2023 - Sept 2025"
# )
# Download example (HF Space will use this)
# manager.download_dataset(
# dataset_name="fbmc-cnecs-2023-2025",
# output_path=Path("data/raw/cnecs_2023_2025.parquet")
# )
EOF
# Create data download orchestrator
cat > src/data_collection/download_all.py << 'EOF'
"""Download all FBMC data from HuggingFace Datasets."""
from pathlib import Path
from hf_datasets_manager import FBMCDatasetManager
def setup_data(data_dir: Path = Path("data/raw")):
"""Download all datasets if not present locally."""
manager = FBMCDatasetManager()
datasets_to_download = {
"fbmc-cnecs-2023-2025": "cnecs_2023_2025.parquet",
"fbmc-weather-2023-2025": "weather_2023_2025.parquet",
"fbmc-entsoe-2023-2025": "entsoe_2023_2025.parquet",
}
data_dir.mkdir(parents=True, exist_ok=True)
for dataset_name, filename in datasets_to_download.items():
output_path = data_dir / filename
if output_path.exists():
print(f"✓ {filename} already exists, skipping")
else:
try:
manager.download_dataset(dataset_name, output_path)
except Exception as e:
print(f"✗ Failed to download {dataset_name}: {e}")
print(f" You may need to run Day 1 data collection first")
print("\n✓ Data setup complete")
if __name__ == "__main__":
setup_data()
EOF
# Make scripts executable
chmod +x src/data_collection/hf_datasets_manager.py
chmod +x src/data_collection/download_all.py
echo "✓ Data management utilities created"
```
**What This Does:**
- `hf_datasets_manager.py`: Upload/download Parquet files to/from HF Datasets
- `download_all.py`: One-command data setup for HF Space or analysts
**Day 1 Workflow:**
1. Download data from JAO/ENTSO-E/OpenMeteo to `data/raw/`
2. Upload each Parquet to HF Datasets (separate from Git)
3. Git repo stays small (only code)
**HF Space Workflow:**
```python
# In your Space's app.py startup:
from src.data_collection.download_all import setup_data
setup_data() # Downloads from HF Datasets, not Git
```
### 2.9 Create First Marimo Notebook (5 minutes)
```bash
# Create initial exploration notebook
cat > notebooks/01_data_exploration.py << 'EOF'
import marimo
__generated_with = "0.9.0"
app = marimo.App(width="medium")
@app.cell
def __():
import marimo as mo
import polars as pl
import altair as alt
from pathlib import Path
return mo, pl, alt, Path
@app.cell
def __(mo):
mo.md(
"""
# FBMC Flow Forecasting - Data Exploration
**Day 1 Objective**: Explore JAO FBMC data structure
## Steps:
1. Load downloaded Parquet files
2. Inspect CNECs, PTDFs, RAMs
3. Identify top 200 binding CNECs (50 Tier-1 + 150 Tier-2)
4. Visualize temporal patterns
"""
)
return
@app.cell
def __(Path):
# Data paths
DATA_DIR = Path("../data/raw")
CNECS_FILE = DATA_DIR / "cnecs_2023_2025.parquet"
return DATA_DIR, CNECS_FILE
@app.cell
def __(mo, CNECS_FILE):
# Check if data exists
if CNECS_FILE.exists():
mo.md("✓ CNECs data found - ready for Day 1 analysis")
else:
mo.md("⚠ CNECs data not yet downloaded - run Day 1 collection script")
return
if __name__ == "__main__":
app.run()
EOF
# Test Marimo installation
marimo edit notebooks/01_data_exploration.py &
# This will open browser with interactive notebook
# Close after verifying it loads correctly (Ctrl+C in terminal)
```
### 2.10 Create Utility Modules (2 minutes)
```bash
# Create data loading utilities
cat > src/utils/data_loader.py << 'EOF'
"""Data loading utilities for FBMC forecasting project."""
import polars as pl
from pathlib import Path
from typing import Optional
def load_cnecs(data_dir: Path, start_date: Optional[str] = None, end_date: Optional[str] = None) -> pl.DataFrame:
"""Load CNEC data with optional date filtering."""
cnecs = pl.read_parquet(data_dir / "cnecs_2023_2025.parquet")
if start_date:
cnecs = cnecs.filter(pl.col("timestamp") >= start_date)
if end_date:
cnecs = cnecs.filter(pl.col("timestamp") <= end_date)
return cnecs
def load_weather(data_dir: Path, grid_points: Optional[list] = None) -> pl.DataFrame:
"""Load weather data with optional grid point filtering."""
weather = pl.read_parquet(data_dir / "weather_2023_2025.parquet")
if grid_points:
weather = weather.filter(pl.col("grid_point").is_in(grid_points))
return weather
EOF
# Create __init__.py files
touch src/__init__.py
touch src/utils/__init__.py
touch src/data_collection/__init__.py
touch src/feature_engineering/__init__.py
touch src/model/__init__.py
```
### 2.11 Initial Commit (2 minutes)
```bash
# Stage all changes (note: data/ is excluded by .gitignore)
git add .
# Create initial commit
git commit -m "Day 0: Initialize FBMC forecasting MVP environment
- Add project structure (notebooks, src, config, tools)
- Configure uv + polars + Marimo + Chronos + HF Datasets stack
- Create .gitignore (excludes data/ following best practices)
- Install jao-py Python library for JAO data access
- Configure ENTSO-E, OpenMeteo, and HuggingFace API access
- Add HF Datasets manager for data storage (separate from Git)
- Create data download utilities (download_all.py)
- Create initial exploration notebook
Data Strategy:
- Code → Git (this repo)
- Data → HuggingFace Datasets (separate, not in Git)
- NO Git LFS (following data science best practices)
Infrastructure: HF Space (A10G GPU, \$30/month)"
# Push to HF Space
git push origin main
# Verify push succeeded
git status
# Should show: "Your branch is up to date with 'origin/main'"
# Verify no data files were committed
git ls-files | grep "\.parquet"
# Should be empty (no .parquet files in Git)
```
---
## Step 3: Verify Complete Setup (5 minutes)
### 3.1 Python Environment Verification
```bash
# Activate environment if not already
source .venv/bin/activate
# Run comprehensive checks
python << 'EOF'
import sys
print(f"Python: {sys.version}")
packages = [
"polars", "pyarrow", "numpy", "scikit-learn",
"torch", "transformers", "marimo", "altair",
"entsoe", "jao", "requests", "yaml", "gradio",
"datasets", "huggingface_hub"
]
print("\nPackage Versions:")
for pkg in packages:
try:
if pkg == "entsoe":
import entsoe
print(f"✓ entsoe-py: {entsoe.__version__}")
elif pkg == "jao":
import jao
print(f"✓ jao-py: {jao.__version__}")
elif pkg == "yaml":
import yaml
print(f"✓ pyyaml: {yaml.__version__}")
elif pkg == "huggingface_hub":
from huggingface_hub import HfApi
print(f"✓ huggingface-hub: Ready")
else:
mod = __import__(pkg)
print(f"✓ {pkg}: {mod.__version__}")
except Exception as e:
print(f"✗ {pkg}: {e}")
# Test Chronos specifically
try:
from chronos import ChronosPipeline
print("\n✓ Chronos forecasting: Ready")
except Exception as e:
print(f"\n✗ Chronos forecasting: {e}")
# Test HF Datasets
try:
from datasets import Dataset
print("✓ HuggingFace Datasets: Ready")
except Exception as e:
print(f"✗ HuggingFace Datasets: {e}")
print("\nAll checks complete!")
EOF
```
### 3.2 API Access Verification
```bash
# Test ENTSO-E API
python << 'EOF'
from entsoe import EntsoePandasClient
import yaml
# Load API key
with open('config/api_keys.yaml') as f:
config = yaml.safe_load(f)
api_key = config['entsoe_api_key']
if 'YOUR_ENTSOE_API_KEY_HERE' in api_key:
print("⚠ ENTSO-E API key not configured - update config/api_keys.yaml")
else:
try:
client = EntsoePandasClient(api_key=api_key)
print("✓ ENTSO-E API client initialized successfully")
except Exception as e:
print(f"✗ ENTSO-E API error: {e}")
EOF
# Test OpenMeteo API
python << 'EOF'
import requests
response = requests.get(
"https://api.open-meteo.com/v1/forecast",
params={
"latitude": 52.52,
"longitude": 13.41,
"hourly": "temperature_2m",
"start_date": "2025-01-01",
"end_date": "2025-01-02"
}
)
if response.status_code == 200:
print("✓ OpenMeteo API accessible")
else:
print(f"✗ OpenMeteo API error: {response.status_code}")
EOF
# Test HuggingFace authentication
python << 'EOF'
from huggingface_hub import HfApi
import yaml
with open('config/api_keys.yaml') as f:
config = yaml.safe_load(f)
hf_token = config['hf_token']
hf_username = config['hf_username']
if 'YOUR_HF' in hf_token or 'YOUR_HF' in hf_username:
print("⚠ HuggingFace credentials not configured - update config/api_keys.yaml")
else:
try:
api = HfApi(token=hf_token)
user_info = api.whoami()
print(f"✓ HuggingFace authenticated as: {user_info['name']}")
print(f" Can create datasets: {'datasets' in user_info.get('auth', {}).get('accessToken', {}).get('role', '')}")
except Exception as e:
print(f"✗ HuggingFace authentication error: {e}")
print(f" Verify token has WRITE permissions")
EOF
```
### 3.3 HF Space Verification
```bash
# Check HF Space status
echo "Visit your HF Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting"
echo ""
echo "Verify:"
echo " 1. JupyterLab interface loads"
echo " 2. Hardware shows 'A10G GPU' in bottom-right"
echo " 3. Files from git push are visible"
echo " 4. Can create new notebook"
```
### 3.4 Final Checklist
```bash
# Print final status
cat << 'EOF'
╔═══════════════════════════════════════════════════════════╗
║ DAY 0 SETUP VERIFICATION CHECKLIST ║
╚═══════════════════════════════════════════════════════════╝
Environment:
[ ] Python 3.10+ installed
[ ] Git installed (NO Git LFS needed)
[ ] uv package manager installed
Local Setup:
[ ] Virtual environment created and activated
[ ] All Python dependencies installed (24 packages including jao-py)
[ ] API keys configured (ENTSO-E + OpenMeteo + HuggingFace)
[ ] HuggingFace write token obtained
[ ] Project structure created (8 directories)
[ ] .gitignore configured (data/ excluded)
[ ] Initial Marimo notebook created
[ ] Data management utilities created (hf_datasets_manager.py)
Git & HF Space:
[ ] HF Space created (A10G GPU, $30/month)
[ ] Repository cloned locally
[ ] .gitignore excludes all data files (*.parquet, data/)
[ ] Initial commit pushed to HF Space (code only, NO data)
[ ] HF Space JupyterLab accessible
[ ] Git repo size < 50 MB (no data committed)
Verification Tests:
[ ] Python imports successful (polars, chronos, jao-py, datasets, etc.)
[ ] ENTSO-E API client initializes
[ ] OpenMeteo API responds (status 200)
[ ] HuggingFace authentication successful (write access)
[ ] Marimo notebook opens in browser
Data Strategy Confirmed:
[ ] Code goes in Git (version controlled)
[ ] Data goes in HuggingFace Datasets (separate storage)
[ ] NO Git LFS setup (following data science best practices)
[ ] data/ directory in .gitignore
Ready for Day 1: [ ]
Next Step: Run Day 1 data collection (8 hours)
- Download data locally via jao-py/APIs
- Upload to HuggingFace Datasets (separate from Git)
- Total data: ~12 GB (stored in HF Datasets, NOT Git)
EOF
```
---
## Troubleshooting
### Issue: uv installation fails
```bash
# Alternative: Use pip directly
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
### Issue: Git LFS files not syncing
**Not applicable** - We're using HuggingFace Datasets, not Git LFS.
If you see Git LFS references, you may have an old version of this guide. Data files should NEVER be in Git.
### Issue: HuggingFace authentication fails
```bash
# Verify token is correct
python << 'EOF'
from huggingface_hub import HfApi
import yaml
with open('config/api_keys.yaml') as f:
config = yaml.safe_load(f)
try:
api = HfApi(token=config['hf_token'])
print(api.whoami())
except Exception as e:
print(f"Error: {e}")
print("\nTroubleshooting:")
print("1. Visit: https://huggingface.co/settings/tokens")
print("2. Verify token has WRITE permission")
print("3. Copy token exactly (starts with 'hf_')")
print("4. Update config/api_keys.yaml and .env")
EOF
```
### Issue: Cannot upload to HuggingFace Datasets
```bash
# Common causes:
# 1. Token doesn't have write permissions
# Fix: Create new token with "write" scope
# 2. Dataset name already exists
# Fix: Use different name or add version suffix
# Example: fbmc-cnecs-2023-2025-v2
# 3. File too large (>5GB single file limit)
# Fix: Split into multiple datasets or use sharding
# Test upload with small sample:
python << 'EOF'
from datasets import Dataset
import pandas as pd
# Create tiny test dataset
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
dataset = Dataset.from_pandas(df)
# Try uploading
try:
dataset.push_to_hub("YOUR_USERNAME/test-dataset", token="YOUR_TOKEN")
print("✓ Upload successful - authentication works")
except Exception as e:
print(f"✗ Upload failed: {e}")
EOF
```
### Issue: Marimo notebook won't open
```bash
# Check marimo installation
marimo --version
# Try running without opening browser
marimo run notebooks/01_data_exploration.py
# Check for port conflicts
lsof -i :2718 # Default Marimo port
```
### Issue: ENTSO-E API key invalid
```bash
# Verify key in ENTSO-E Transparency Platform:
# 1. Login: https://transparency.entsoe.eu/
# 2. Navigate: Account Settings → Web API Security Token
# 3. Copy key exactly (no spaces)
# 4. Update: config/api_keys.yaml and .env
```
### Issue: HF Space shows "Building..." forever
```bash
# Check HF Space logs:
# Visit: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting
# Click: "Settings" → "Logs"
# Common fix: Ensure requirements.txt is valid
# Test locally:
pip install -r requirements.txt --dry-run
```
### Issue: jao-py import fails
```bash
# Verify jao-py installation
python -c "import jao; print(jao.__version__)"
# If missing, reinstall
uv pip install jao-py>=0.6.0
# Check package is in environment
uv pip list | grep jao
```
---
## What's Next: Day 1 Preview
**Day 1 Objective**: Download 24 months of historical data (Oct 2023 - Sept 2025)
**Data Collection Tasks:**
1. **JAO FBMC Data** (4-5 hours)
- CNECs: ~900 MB (24 months)
- PTDFs: ~1.5 GB (24 months)
- RAMs: ~800 MB (24 months)
- Shadow prices: ~600 MB (24 months)
- LTN nominations: ~400 MB (24 months)
- Net positions: ~300 MB (24 months)
2. **ENTSO-E Data** (2-3 hours)
- Generation forecasts: 13 zones × 24 months
- Actual generation: 13 zones × 24 months
- Cross-border flows: ~20 borders × 24 months
3. **OpenMeteo Weather** (1-2 hours)
- 52 grid points × 24 months
- 8 variables per point
- Parallel download optimization
**Total Data Size**: ~12 GB (compressed Parquet)
**Day 1 Script**: Will use jao-py Python library with rate limiting and parallel download logic.
---
## Summary
**Time Investment**: 45 minutes
**Result**: Production-ready local + cloud development environment
**You Now Have:**
- ✓ HF Space with A10G GPU ($30/month)
- ✓ Local Python environment (24 packages including jao-py and HF Datasets)
- ✓ jao-py Python library for JAO data access
- ✓ ENTSO-E + OpenMeteo + HuggingFace API access configured
- ✓ HuggingFace Datasets manager for data storage (separate from Git)
- ✓ Data download/upload utilities (hf_datasets_manager.py)
- ✓ Marimo reactive notebook environment
- ✓ .gitignore configured (data/ excluded, following best practices)
- ✓ Complete project structure (8 directories)
**Data Strategy Implemented:**
```
Code (version controlled) → Git Repository (~50 MB)
Data (storage & versioning) → HuggingFace Datasets (~12 GB)
NO Git LFS (following data science best practices)
```
**Ready For**: Day 1 data collection (8 hours)
- Download 24 months data locally (jao-py + APIs)
- Upload to HuggingFace Datasets (not Git)
- Git repo stays clean (code only)
---
**Document Version**: 2.0
**Last Updated**: 2025-10-29
**Project**: FBMC Flow Forecasting MVP (Zero-Shot)
|