PC Forecaster

How the data flows

The pipeline.

Ten stages, run end-to-end by python src/main.py from the project root. Each stage writes a CSV (or a SQLite table) that the next stage reads.

Stages

  1. 1
    data_collection.py
    Merges 2023 + 2025 PCPartPicker CSVs, then ingests real daily prices from HardwareDealsCo for GPU, SSD, and RAM.
  2. 2
    data_cleaning.py
    IQR caps, spec parsing, component_id construction, cross-year matching.
  3. 3
    generate_synthetic.py
    29-month synthetic series with 9 encoded events, then bridges real bucket prices into per-SKU real_blended rows.
  4. 4
    database_setup.py
    SQLite schema with 6 tables + 5 sample queries.
  5. 5
    model_training.py
    LR / DT / RF + naive baseline + TimeSeriesSplit CV + permutation importance, then 3 real-data backtests (GPU / Storage / RAM).
  6. 6
    estimator.py
    Budget / mid / high tier build cost forecast + k-means tier discovery.
  7. 7
    value_metrics.py
    $/GB VRAM, $/core, $/GB metrics on real observed prices.
  8. 8
    spec_classifier.py
    Multi-class spec-to-tier classifier with stratified CV + calibration curves.
  9. 9
    spec_regression.py
    Cross-sectional regression on real prices: train on 2023, test on 2025.
  10. 10
    data_visualization.py
    Plots in visuals/, with annotated event windows and a synthetic-vs-real GPU overlay.

Data Lineage

Components
6,078
Observed prices
8,185
Matched (2023 ∩ 2025)
2,107
Monthly rows
71,256
Real-blended
10,153
External tables
3

Real-blended rows by category

GPU1,804
RAM3,265
Storage5,084

External real-price tables

HardwareDealsCo daily snapshots, Sep 2025 – May 2026.

gpu_chipset_real_prices602
drive_real_prices1,446
ram_real_prices569

Where Each Stage Gets Its Data

StageReads
5 · regressionsynthetic + real-blended price history
5 · real backtestsreal-blended rows only, last 3 months held out
6 · estimatorforecast + k-means on real specs
7 · value metricsreal observed prices (PCPartPicker)
8 · classifierreal observed prices (PCPartPicker)
9 · spec regressionreal observed 2025 prices, cross-sectional