Vocabulary

Terms this doc uses

Vendor TAM

Benchmark label used for comparison only; not a training target or feature.

G1 holdout

Post-hoc business outcome diagnostic; not used for training, source selection, or tuning.

Probe TAM

Formula-driven component output for audit, not production-calibrated TAM.

Source layer

Versioned source-family artifact that writes grid-level candidate fields before denominator and gap-closure scoring.

Denominator v3

Current leakage-safe residential-household denominator probe with base/lower/upper fields and reconciliation status.

Income gate

Public-source 0-10 LPA probability proxy using district/city/cell welfare and affluence context; MPCE calibration remains pending.

Conversion feasibility

Public road, POI, addressability, settlement, and map-coverage proxy for whether a cell can be served.

Execution readiness

Weak connectivity and operational-readiness proxy for acquirable TAM; still blocked on license, coverage, and internal ops data.

Predicted TAM power

Current score layer: a monotone power transform of no-vendor gross TAM that preserves the base total.

Power gamma

The fixed exponent applied to the source-derived gross TAM base; current value comes from the transform decision artifact.

Rank ceiling

A diagnostic order-only transform used to understand upper-bound ranking behavior, not TAM magnitude.

Full-India scored grid

The 0.01-degree national grid scored by deterministic source-derived formulas without vendor labels.

Score index

Compact browser-oriented row-major JSON that stores map scores without repeating every property per cell.

Spatial holdout

Validation split that blocks neighborhood memorization.

City holdout

Whole-city transfer test for city confounding.

Notebook metric summary

Four-row post-hoc table comparing predicted TAM, vendor TAM, and G1 for notebook reporting.

Production claim

Blocked until valid non-GeoIQ predictions pass spatial, city, and chronological checks.

Map

How the pipeline now connects

The document flows from claim boundary to stage execution, then into source, feature, diagnostic, and training gates.

Current TAM prediction pipeline Iclaim, metrics, signals IIexecution model IIIprior-art status IV-Vsignals and buildings VI-VIIpredicted and map VIII-Xdiagnostics, notebooks, policy XI-XIIoutputs and boundary XIIIglossary
I-IIStart with the claim boundary, current metrics, signal classification, and runner.
III-VSeparate prior-art status, source readiness, signal implementation, and building footprint coverage.
VI-VIIShow the selected predicted-TAM transform and national score/map artifacts.
VIII-XIIIDiagnostics, notebook metrics, policy, artifacts, training blockers, and vocabulary close the audit trail.
TAM pipelinegenerated 2026-06-02 15:23:08 UTC13 chapters

Technical status document

TAM Prediction Pipeline Status

Important things first: the current score, notebook metrics, signal-family status, and claim boundary are documented before the implementation substories. Vendor TAM and G1 remain diagnostic benchmarks, not training labels.

Primary entryscripts/pipeline.py
Stage API3 stage names
HTML builderbuild_prediction_pipeline_html.py
Claim stateproduction accuracy blocked
ITechnical statuspp. 1-3

Technical Status

The page begins with the actual status: current score, current post-hoc metrics, signal-family readiness, and the production-claim boundary.

You are here in the mindmap
Map I - the boundary before any metric.
Story role

Separate implemented pipeline work from invalid production claims.

Carry forward

Every later metric is benchmark-only unless a valid non-GeoIQ holdout exists.

No production training claim is allowed from the current artifacts. Spatial holdout metrics are False, city holdout metrics are False, and production_accuracy_claim_allowed is False.

Current notebook and post-hoc metrics

current metrics

These values are read from outputs/notebook_short_metric_summary.json and outputs/posthoc_g1_metric_suite/summary.json. They describe benchmark agreement after the score is generated; they do not authorize a production accuracy claim.

Predicted vs vendor R20.623

Current notebook metric from predicted power TAM vs vendor TAM; benchmark-grid diagnostic only.

Predicted vs G1 R20.189

Current notebook metric against post-hoc G1 hits; not used for training or tuning.

Predicted vs G1 rank R20.552

Ranking diagnostic from the notebook-facing metric summary.

Caught-up ratio114.9%

Ratio diagnostic comparing current predicted-vs-G1 to vendor-vs-G1, not calibrated accuracy.

Top 10% G1 capture46.4%

Current post-hoc top-k capture for the predicted score layer.

Metric claim statepost-hoc only

Metrics join benchmark labels after scoring; production accuracy remains blocked.

Signal-family status matrix

classified signal stack

The pipeline is no longer documented as one undifferentiated feature pile. Each family below ties current layers and exported column groups to prior art and the blocker that must close before production promotion.

#signal familystatelayersgroupsprior artgate
1Household denominator and residential baseimplemented probe
worldpop_population_density: okcensus_2011_official_controls: okghsl_population_builtup: okbuilding_footprints_google_microsoft: ok
Population denominator (5)GHSL built-up/population (10)Denominator v3 and reconciliation (33)Building footprints (32)
population_household_denominator: WorldPop population, census/SHRUG reconciliation, built-up occupancy, household densitybuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy
Production household truth still needs public-anchor reconciliation, residential masks, source QA, and spatial/city holdouts.
2Residential eligibility and physical exclusionsimplemented proxy
esa_worldcover_dynamic_world: proxy_from_local_building_poi_density_until_worldcover_dynamic_worldresidential_morphology_tags: proxy_from_building_landcover_density_until_morphology_qabuilding_footprints_google_microsoft: ok
Landcover and physical exclusions (20)Morphology proxies (6)Land use and slum context (3)
land_use_exclusions_and_risk: water_share, forest_share, mining_share, industrial_land_sharebuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy
Proxy landcover/morphology remains production-blocked until pinned rasters, visual QA, and morphology holdouts pass.
3Income and welfare gateimplemented proxy
income_public_context: proxy_from_income_gate_public_sources_until_calibration
Income gate and welfare context (40)Census housing/amenity assets (23)Nightlights (11)
satellite_welfare_affluence: roof/material proxy, lighting proxy, drinking-water proxy, Landsat/Sentinel embeddingsnightlights_and_economic_activity: VIIRS mean, VIIRS trend, nightlight blob score, commercial activity proxy
MPCE calibration, license review, geography QA, and income-source ablations are still required.
4Conversion and serviceabilityimplemented proxy
osm_overture_ohsome_roads_pois: proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome
Conversion and map coverage (18)Road/serviceability (4)POI/serviceability (8)
roads_accessibility_serviceability: road_length_by_class, distance_to_major_road, road_embedding, travel_frictionpoi_urban_function: POI counts by category, Hex2Vec/ContextualCount embeddings, schools/healthcare/markets, urban function vector
Dated OSM/Overture extracts, ohsome coverage, internal serviceability, and failed-install checks are missing.
5Execution and acquirabilityweak proxy
connectivity_execution_readiness: proxy_from_public_serviceability_until_opencellid_ookla_mlab
Execution readiness (7)
internal_business_reality: leads, installs, retained installs, gross margin
External readiness cannot replace branch, partner, capacity, payment, CAC, and operations coverage.
6Spatial context without target leakageimplemented diagnostic x-surface
GeoHG graph context (107)Spatial/city context (10)
heterogeneous_graph_and_spatial_context: neighbor context, semantic similarity, land-cover hypernodes, POI hypernodes
Graph context is allowed only from independent features; production metrics remain blocked without non-GeoIQ holdouts.
7Outcome and notebook metricspost-hoc only
TAM score outputs (27)Status and reason codes (1)
internal_business_reality: leads, installs, retained installs, gross margin
Vendor TAM and G1 are benchmark labels after score generation; they are not training labels, features, or tuning signals.

Artifact scale summary

Grid rows7,029

Vendor-grid feature rows in the current generated feature table.

Cities34

Cities represented by the current vendor-grid artifacts.

Feature columns211

Full GeoHG-style feature count; vendor TAM training remains blocked.

Source-layer files9

Current source-layer cell-feature artifacts materialized before denominator and gap closure.

Signal families7

Top-level business signal families classified by current state, prior art, and validation blocker.

Prior-art families9

Current signal-stack families from the prior-art classification artifact.

Candidate signal columns18

Source-probe rows that currently write candidate feature columns.

Prior-art payloads23/23

Required local payload readiness from the feature manifest.

Predicted scorepredicted_tam_0_10lpa_power

Current map and notebook score layer; monotone no-vendor power transform.

Power gamma0.60

Scale policy: global_no_vendor_base_total_preserved.

Full-India cells2,905,288

Scored 0.01-degree India grid cells in the current full-India score manifest.

Full-India cities631

District/city groups in the current full-India score manifest.

Predicted TAM total118,559,058

Total predicted power TAM households; total is preserved from the no-vendor gross TAM base.

Generated output artifacts78

Manifest-backed current output files listed later in this document.

Production-ready source probes0

Current source-probe rows cleared for production use. This should remain zero until blockers close.

current statusfeatures writtenprobes benchmarkedtraining blockeddiagnostics post-hoc
IIExecution modelpp. 4-7

Execution Model

The canonical runner is a stage dispatcher, but the important view is the whole path from current sources to generated outputs and gates.

01 - current command path
scripts/pipeline.py canonical CLI and all-stage runner src/tam_pipeline/pipeline.py stage dispatcher get_data source registry, prior-art fetch, MSFT AOI staging enrich_features GeoHG, source layers, denominator, probes prediction_diagnostics benchmark and leakage gates source_fetch / registry readiness and direct payloads geohg_features cell features and graph edges tam_gap_closure probe TAM and source rows statistical_diagnostics leakage, calibration, G1 post-hoc
02 - inputs, stages, generated outputs, and claim gate
Current source to generated-output flow Inputs stay separate from benchmark labels; generated files are manifest-backed current outputs. source registry 49 tracked sources prior-art payloads 23/23 required files present vendor grid 7,029 current feature rows get_data fetch plan, source manifests enrich_features GeoHG, source layers, denominator, probes prediction_diagnostics benchmark and claim gates source_fetch manifest JSON/CSV direct prior-art plan payload readiness geohg_features 7,029 rows plus edges denominator_foundation cell and city controls tam_gap_closure 7,029 probe rows source probes promotion readiness statistical diagnostics 14 manifest files claim state production claim allowed: False
run
scripts/pipeline.py parses stage flags.
dispatch
src/tam_pipeline/pipeline.py calls a stage module.
emit
Each stage returns structured JSON status.
gate
Diagnostics do not fit on forbidden labels.
#stageimplementationrolecanonical command
1get_datasrc/tam_pipeline/stages/get_data.pyBuilds the source registry, plans/fetches direct prior-art payloads, and stages Microsoft AOI footprint shards when enabled.python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update
2enrich_featuressrc/tam_pipeline/stages/enrich_features.pyBuilds GeoHG-style features, source-layer cell features, denominator v3 context, and deterministic gap-closure probe columns.python3 scripts/pipeline.py enrich_features --root .
3prediction_diagnosticssrc/tam_pipeline/stages/prediction_diagnostics.pyRuns current benchmark diagnostics and claim-boundary checks without fitting on vendor TAM or G1.python3 scripts/pipeline.py prediction_diagnostics --root .

Operator commands

Current runner

python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update
python3 scripts/pipeline.py enrich_features --root . --dry-run
python3 scripts/pipeline.py prediction_diagnostics --root . --dry-run
python3 scripts/pipeline.py enrich_features --root .
python3 scripts/pipeline.py prediction_diagnostics --root .
python3 scripts/pipeline.py all --root . --dry-run --skip-manifest-update

Current outputs

outputs/source_fetch/source_fetch_manifest.json
outputs/source_fetch/direct_prior_art_download_manifest.json
outputs/geohg_features/cell_features_geohg_style.csv
outputs/source_layers/source_layer_cell_features_manifest.json
outputs/source_layers/source_layer_contracts.json
outputs/denominator_foundation/cell_denominator_foundation.csv
outputs/tam_gap_closure/tam_gap_closure_features.csv
outputs/full_india_scored/full_india_tam_score_manifest.json
outputs/tam_map/tam_full_india_0_01_grid_index.json
outputs/statistical_diagnostics/statistical_diagnostics_summary.json

Dry-run paths

dry run contract

Dry runs are now explicit stage behavior. The compute-heavy stages print planned inputs and outputs without executing the scripts that write feature, denominator, gap-closure, or diagnostic artifacts.

stagecommandwrite behaviorcurrent evidence
get_datapython3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-updatePlans direct prior-art fetches and Microsoft AOI shard staging; planning manifests may be written for review.command supported
enrich_featurespython3 scripts/pipeline.py enrich_features --root . --dry-runPrints planned feature, denominator, and gap-closure steps without executing builder scripts.no-write JSON plan
prediction_diagnosticspython3 scripts/pipeline.py prediction_diagnostics --root . --dry-runChecks frozen prediction readiness and expected diagnostic files without running diagnostics.no-write JSON plan
allpython3 scripts/pipeline.py all --root . --dry-run --skip-manifest-updatePasses dry-run through all stages; compute stages remain no-write and blocked checks still fail loudly.stage-aware
IIIPrior-art statuspp. 8-11

Prior-Art Status

This chapter starts with the signal-family classification, then drills into payload availability. Payload presence is not the same thing as validated signal readiness.

Signal-family prior-art status

prior art before files

The status matrix connects the prior-art families to the implemented source layers and current validation blockers. File counts are shown after this matrix so the document does not confuse downloaded payloads with production-ready signals.

#signal familystatelayersgroupsprior artgate
1Household denominator and residential baseimplemented probe
worldpop_population_density: okcensus_2011_official_controls: okghsl_population_builtup: okbuilding_footprints_google_microsoft: ok
Population denominator (5)GHSL built-up/population (10)Denominator v3 and reconciliation (33)Building footprints (32)
population_household_denominator: WorldPop population, census/SHRUG reconciliation, built-up occupancy, household densitybuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy
Production household truth still needs public-anchor reconciliation, residential masks, source QA, and spatial/city holdouts.
2Residential eligibility and physical exclusionsimplemented proxy
esa_worldcover_dynamic_world: proxy_from_local_building_poi_density_until_worldcover_dynamic_worldresidential_morphology_tags: proxy_from_building_landcover_density_until_morphology_qabuilding_footprints_google_microsoft: ok
Landcover and physical exclusions (20)Morphology proxies (6)Land use and slum context (3)
land_use_exclusions_and_risk: water_share, forest_share, mining_share, industrial_land_sharebuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy
Proxy landcover/morphology remains production-blocked until pinned rasters, visual QA, and morphology holdouts pass.
3Income and welfare gateimplemented proxy
income_public_context: proxy_from_income_gate_public_sources_until_calibration
Income gate and welfare context (40)Census housing/amenity assets (23)Nightlights (11)
satellite_welfare_affluence: roof/material proxy, lighting proxy, drinking-water proxy, Landsat/Sentinel embeddingsnightlights_and_economic_activity: VIIRS mean, VIIRS trend, nightlight blob score, commercial activity proxy
MPCE calibration, license review, geography QA, and income-source ablations are still required.
4Conversion and serviceabilityimplemented proxy
osm_overture_ohsome_roads_pois: proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome
Conversion and map coverage (18)Road/serviceability (4)POI/serviceability (8)
roads_accessibility_serviceability: road_length_by_class, distance_to_major_road, road_embedding, travel_frictionpoi_urban_function: POI counts by category, Hex2Vec/ContextualCount embeddings, schools/healthcare/markets, urban function vector
Dated OSM/Overture extracts, ohsome coverage, internal serviceability, and failed-install checks are missing.
5Execution and acquirabilityweak proxy
connectivity_execution_readiness: proxy_from_public_serviceability_until_opencellid_ookla_mlab
Execution readiness (7)
internal_business_reality: leads, installs, retained installs, gross margin
External readiness cannot replace branch, partner, capacity, payment, CAC, and operations coverage.
6Spatial context without target leakageimplemented diagnostic x-surface
GeoHG graph context (107)Spatial/city context (10)
heterogeneous_graph_and_spatial_context: neighbor context, semantic similarity, land-cover hypernodes, POI hypernodes
Graph context is allowed only from independent features; production metrics remain blocked without non-GeoIQ holdouts.
7Outcome and notebook metricspost-hoc only
TAM score outputs (27)Status and reason codes (1)
internal_business_reality: leads, installs, retained installs, gross margin
Vendor TAM and G1 are benchmark labels after score generation; they are not training labels, features, or tuning signals.
Source readiness funnel49 tracked sources23/23 required payloads presentdeferred sources stay visible
stage 1

Readiness is an artifact, not a hidden precondition.

The registry and fetch manifests show direct assets, deferred rasters, gated microdata, and source warnings before features are interpreted.

statuscount
catalog_and_grid_fetched_payload_deferred1
catalog_fetched_payload_deferred2
data_fetched35
docs_fetched2
docs_fetched_payload_deferred1
indexes_fetched_payload_deferred1
metadata_fetched_payload_deferred2
overview_fetched_payload_deferred1
paper_fetched1
public_reports_fetched_microdata_gated1
readme_fetched1
readme_fetched_payload_deferred1

Prior-art payload coverage

current prior art

The current feature manifest reports 23 present payloads out of 23 required payloads, with 0 missing.

source slugpayloadspresentdirect-link payloadsneeds download
admin-districts6660
buildings-google2220
education-facilities1110
energy-power-plants1110
env-flood-atlas4440
env-landuse1110
env-soil1110
infra-rural-roads2220
nightlights-viirs1110
police-stations1110
transport-airports1110
unmapped_local_payload1100
urban-municipal1110

Prior-art payload files

required payloadsource slugexistslink modeneeds download
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.dbfadmin-districtspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.prjadmin-districtspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.sbnadmin-districtspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.sbxadmin-districtspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.shpadmin-districtspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.shxadmin-districtspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/remote-sensing/population-density/ind_pd_2020_1km_ASCII_XYZ.csvunmapped_local_payloadpresentlocal/manualno
prior art/yashveeeeeeer_india-geodata/data/buildings/google/google-open-buildings-india-2023.mosaic.jsonbuildings-googlepresentdirectno
prior art/yashveeeeeeer_india-geodata/data/buildings/google/google-open-buildings-india-2023.000000.parquetbuildings-googlepresentdirectno
prior art/yashveeeeeeer_india-geodata/data/education/facilities/INDIA_EDUCATION_FACILITIES_POINTS.geojsoneducation-facilitiespresentdirectno
prior art/yashveeeeeeer_india-geodata/data/police/stations/INDIA_POLICE_STATIONS.geojsonpolice-stationspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/transport/airports/INDIA_AIRPORTS_POINTS.geojsontransport-airportspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/energy/power-plants/INDIA_ENERGY_PLANTS.geojsonenergy-power-plantspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/District_Wise_flood_risk_data.jsonenv-flood-atlaspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/District_Wise_max_flood_area_frac.jsonenv-flood-atlaspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/State_Wise_flood_risk_data.jsonenv-flood-atlaspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/State_Wise_max_flood_area_frac.jsonenv-flood-atlaspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/environment/soil/INDIA_SOIL_MAP_FAO.geojsonenv-soilpresentdirectno
prior art/yashveeeeeeer_india-geodata/data/environment/landuse/hyderabad/Hyderabad_Landuse.geojsonenv-landusepresentdirectno
prior art/yashveeeeeeer_india-geodata/data/remote-sensing/nightlights/nightlights_district_panel.csvnightlights-viirspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/urban/municipal-boundaries/mumbai/slumClusters.geojsonurban-municipalpresentdirectno
prior art/yashveeeeeeer_india-geodata/data/infrastructure/rural-roads/road-network/Haryana.zipinfra-rural-roadspresentdirectno
prior art/yashveeeeeeer_india-geodata/data/infrastructure/rural-roads/road-network/UttarPradesh.zipinfra-rural-roadspresentdirectno
IVSignal implementationpp. 12-15

Signal Implementation

Stage 2 builds the current signal surface: GeoHG-style cell context, source-layer cell features, graph features, denominator v3, and deterministic TAM probe/v2/v3 columns.

inventory columns

372

Combined GeoHG, source-layer, and gap-closure columns after excluding row identifiers.

cell rows

7,029

Rows in cell_features_geohg_style.csv.

source layers

9

Current source-layer cell-feature artifacts.

candidate signals

18

Source-probe rows that write candidate fields.

graph edges

24,203

Area-area edges in the current GeoHG bundle.

signal surface

Current x is the joined independent feature stack plus audit-visible proxy signals. Current y inside the repo is not a supervised training label; it is the deterministic probe/v2/v3 TAM score family plus the selected predicted_tam_0_10lpa_power map layer. Vendor TAM and G1 are benchmark-only labels.

Feature stack visual

Feature enrichment stack spatial grid cell, city, district source signals GHSL, income, landcover built form buildings, roads, land 372 inventory columns 7,029 grid rows signals plus score outputs GeoHG graph 24,203 area edges TAM formulas target-free probes x surface v3 scores vendor TAM and G1 stay outside this feature flow
stage 2

One signal surface, several audit views.

Source families join into source-layer and GeoHG feature tables, then feed denominator v3, conversion, execution, and target-free TAM score formulas. Benchmark labels stay outside this path.

The detailed column inventory is kept below as chips only; precision is summarized once instead of repeated in every group.

Source-layer additions

materialized layers

The source-layer manifest reports 9 current cell-feature files and 5 proxy layers. These fields include WorldPop, Census controls, Google/Microsoft buildings, GHSL, landcover, morphology, income-gate, OSM/Overture, and connectivity signals.

source layerstatusrowscolumnsmodefieldscell feature file
worldpop_population_densityok7,0292direct
worldpop_population_est_nearestworldpop_density_people_per_km2
outputs/source_layers/worldpop_population_density/2020_ascii_xyz_local/cell_features.csv
census_2011_official_controlsok7,0292direct
census_control_populationcensus_control_households
outputs/source_layers/census_2011_official_controls/2011_fixed_release/cell_features.csv
building_footprints_google_microsoftok7,02913direct
building_count_cell_bestbuilding_area_sum_m2_bestbuilding_area_share_bestbuilding_source_coverage_flagbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_scorebuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_disagreement_flag
outputs/source_layers/building_footprints_google_microsoft/google_2023_msft_staged_snapshot/cell_features.csv
ghsl_population_builtupok7,02910direct
ghsl_population_estghsl_population_source_yearghsl_builtup_shareghsl_built_surface_m2ghsl_built_volume_m3ghsl_height_mean_mghsl_non_res_builtup_shareghsl_non_res_volume_shareghsl_settlement_scoreghsl_residential_candidate_share
outputs/source_layers/ghsl_population_builtup/named_release_required/cell_features.csv
esa_worldcover_dynamic_worldproxy_from_local_building_poi_density_until_worldcover_dynamic_world7,02912proxy
landcover_builtup_sharelandcover_water_sharelandcover_tree_forest_sharelandcover_crop_sharelandcover_non_residential_exclusion_sharedynamic_world_built_probabilityresidential_eligible_area_sharephysical_hard_cell_flaghard_exclusion_sharesoft_non_residential_downweight_sharehard_mask_reason_codelandcover_proxy_source_flag
outputs/source_layers/esa_worldcover_dynamic_world/worldcover_2020_2021_dynamic_world_pinned_window_required/cell_features.csv
residential_morphology_tagsproxy_from_building_landcover_density_until_morphology_qa7,0296proxy
morph_dense_old_city_scoremorph_informal_dense_scoremorph_highrise_affordable_scoremorph_cbd_industrial_scoremorph_periurban_vacant_scoremorphology_proxy_source_flag
outputs/source_layers/residential_morphology_tags/building_landcover_public_proxy_v1/cell_features.csv
income_public_contextproxy_from_income_gate_public_sources_until_calibration7,02928proxy
income_public_affluence_context_scoreincome_public_deprivation_context_scoreincome_public_asset_affluence_scoreincome_public_amenity_deficit_scoreincome_public_license_status_codeincome_public_proxy_source_flagincome_gate_city_segment_codeincome_gate_city_segment_prior_0_10lpa_probincome_gate_city_prior_0_10lpa_probincome_gate_district_income_pciincome_gate_district_income_affluence_scoreincome_gate_admin_affluence_scoreincome_gate_admin_deprivation_scoreincome_gate_nfhs_affluence_scoreincome_gate_nfhs_deprivation_scoreincome_gate_shrug_rwi_scoreincome_gate_shrug_consumption_scoreincome_gate_shrug_asset_affluence_scoreincome_gate_meta_rwi_rawincome_gate_meta_rwi_scoreincome_gate_meta_rwi_errorincome_gate_cell_wealth_scoreincome_gate_external_affluence_scoreincome_gate_source_countincome_gate_confidenceincome_gate_granularity_codeincome_gate_granularityincome_gate_status
outputs/source_layers/income_public_context/income_gate_public_context_v2/cell_features.csv
osm_overture_ohsome_roads_poisproxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome7,0299proxy
road_distance_mroad_intersection_densitysettlement_cluster_sizebuilding_cluster_compactnessaddressability_scorepoi_service_mix_scoreosm_mapping_coverage_scoreoverture_road_coverage_scoreconversion_proxy_source_flag
outputs/source_layers/osm_overture_ohsome_roads_pois/dated_extract_required/cell_features.csv
connectivity_execution_readinessproxy_from_public_serviceability_until_opencellid_ookla_mlab7,0293proxy
connectivity_readiness_scoremlab_measurement_coverage_scoreexecution_proxy_source_flag
outputs/source_layers/connectivity_execution_readiness/dated_opencellid_ookla_mlab_required/cell_features.csv

Promotion gates

source contracts

Source-layer contracts fix the source family, model layer, expected fields, and promotion gate. Production readiness is intentionally separate from the presence of raw payloads or cell-feature files.

sourcefamilymodel layerraw filescell fileproduction readypromotion gate
ghsl_population_builtupbetter_denominatorshousehold_denominator_population_builtup_height_volume32TrueFalseGHS-POP/BUILT-S/BUILT-V/BUILT-H/BUILT-C releases must reconcile against Census controls and WorldPop without tuning to vendor TAM.
census_2011_official_controlsbetter_denominatorsofficial_population_household_controls2TrueFalseofficial controls or validated mirrors must be reconciled before production household estimates.
worldpop_population_densitybetter_denominatorsgridded_population_density1TrueFalseuse as one denominator candidate; reconcile to Census/GHSL and residential masks.
building_footprints_google_microsoftbetter_denominators_conversion_feasibilitybuilding_structure_residential_density31TrueFalsefootprints/height/volume are physical evidence only; they cannot become household truth without public-anchor reconciliation.
esa_worldcover_dynamic_worldcan_serve_here_filtersresidential_non_residential_land_mask3TrueFalsepinned releases/windows only; land cover can hard-exclude or downweight cells but cannot label income or households.
residential_morphology_tagsbetter_denominatorsold_city_informal_highrise_cbd_periurban_morphology3TrueFalsemorphology tags are denominator/income modifiers only after visual QA, source disagreement review, and city/morphology holdouts.
income_public_contextincome_affordabilityhigh_income_exclusion_public_context22TrueFalseincome features estimate high-income exclusion only; SHRUG/NFHS/SECC/RWI/district-income gates require license review, geography QA, MPCE extraction, and city/spatial holdouts before production use.
osm_overture_ohsome_roads_poisconversion_feasibilityroad_poi_addressability_clusterability2TrueFalseOSM/Overture features require dated extracts and ohsome coverage so missing mapping is not mistaken for low demand.
connectivity_execution_readinessexecution_realismconnectivity_payment_partner_readiness_proxy2TrueFalseweak external serviceability signals can affect execution readiness only after license and coverage gates pass.

Signal coverage

coverage is not accuracy

Coverage rows show which additions are populated in the current artifacts. High coverage does not validate calibration, residential truth, income truth, or production serviceability.

signal checkcurrent valueinterpretation
source layers written9source-layer cell-feature artifacts available
proxy source layers5layers still marked as public proxy rather than production source truth
candidate features written18source-probe rows that write candidate feature columns
h residential denominator coverage100.0%v3 household denominator fields populated
landcover coverage100.0%landcover/exclusion fields populated
morphology coverage100.0%morphology proxy fields populated
conversion coverage100.0%conversion-feasibility fields populated
execution coverage100.0%execution-readiness fields populated
income gate prior coverage100.0%public income-gate prior fields populated
income gate confidence median0.840median public-income proxy confidence
GHSL builtup coverage100.0%GHSL built surface/volume context populated
GHSL height coverage96.5%GHSL height/non-residential volume context populated
building denominator coverage95.8%Google/Microsoft building denominator fields populated
public anchor reconciliation share24.6%cells touched by current public-anchor reconciliation

Precision summary

precision ledger

All names are exported at grid-row level, but source precision is mixed: cell geometry, district context, nearest 1 km raster points, vector overlays, 2 km buffers, and one-hop graph context.

Spatial/city context10

Source scale: 0.01-degree grid geometry plus Census district IDs.

Exported scale: 0.01-degree grid cells; median cell area 1.40 km2, equivalent square side about 1.18 km. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km. Vendor city grid footprint; median represented city coverage is 135.5 km2 (98 cells), equivalent square side about 11.6 km.

Read as: Centroids and local x/y/radius are cell geometry; censuscode is a district assignment.

Nightlights11

Source scale: VIIRS-derived district panel values, not raw pixel values in this CSV.

Exported scale: District aggregate joined to each cell by censuscode. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: Current precision is district-level even though the underlying satellite product is finer.

Census housing/amenity assets23

Source scale: Census 2011 Houselisting/HLPCA district-total percentage shares from the downloaded pigshell mirror.

Exported scale: District aggregate joined to each cell by censuscode. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: Housing, amenity, and asset shares are old district context repeated on cells; they are not current cell-level observations.

Hazard and flood context7

Source scale: District flood-atlas JSON records.

Exported scale: District aggregate joined to each cell by censuscode. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: Flood fields should not be read as within-cell flood pixels.

Population denominator5

Source scale: WorldPop 2020 1 km ASCII XYZ point grid.

Exported scale: Nearest 1 km source point to each 0.01-degree cell centroid; population estimate scales density by cell area.

Read as: nearest_distance_km exposes the join quality per cell.

GHSL built-up/population10

Source scale: GHSL 2020 30-arcsecond population, built-surface, built-volume, height, and non-residential raster tiles.

Exported scale: Raster/source-layer values aggregated or joined to the 0.01-degree grid cell, about 1.18 km side in the current grid.

Read as: These are independent physical-denominator signals; they still require reconciliation against Census, WorldPop, and building footprints.

Denominator v3 and reconciliation33

Source scale: Mixed public denominator context: WorldPop, Census 2011 controls, GHSL built form, building footprints, and public anchors.

Exported scale: Cell-level reconciliation probes on the 0.01-degree grid. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: H_residential base/lower/upper fields are leakage-safe denominator probes but remain production-blocked until public-anchor QA and holdouts pass.

Building footprints32

Source scale: Google/Microsoft building footprint polygons, meter-scale vector geometries.

Exported scale: Footprint count/area/height/volume proxies aggregated into each 0.01-degree cell; exported precision is the cell, about 1.18 km side in the current grid.

Read as: Cell-level aggregation is about 1 km; footprint areas themselves are vector-derived.

Landcover and physical exclusions20

Source scale: ESA WorldCover/Dynamic World source-layer proxy fields plus local hard/soft exclusion logic.

Exported scale: Built/water/tree/crop/non-residential shares and exclusion flags written at 0.01-degree cell level, about 1.18 km side.

Read as: These fields can suppress or downweight physical impossibility; they do not label income or demand.

Morphology proxies6

Source scale: Derived morphology tags from building, landcover, density, and public-context proxies.

Exported scale: Cell-level scores for dense old city, informal density, highrise affordability, CBD/industrial, and periurban vacancy patterns.

Read as: Use as denominator or income modifiers only after source-disagreement and morphology holdout review.

Income gate and welfare context40

Source scale: Public income/welfare context: district income, NFHS, SHRUG/SECC/RWI-style fields, HLPCA context, and city priors.

Exported scale: Mixed district/city/cell proxy fields written to grid rows. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: Income-gate fields estimate 0-10 LPA probability as a probe; they are not production-calibrated income truth.

Road/serviceability4

Source scale: PMGSY road shapefile line vectors for available states.

Exported scale: Nearest-road distance from cell centroid, capped at 25 km; within_2km flag uses a 2 km threshold.

Read as: Coverage is state-limited; source_available and distance_missing must stay with the distance.

POI/serviceability8

Source scale: Point GeoJSON layers for education, police, airport, and energy POIs.

Exported scale: Counts inside the 0.01-degree cell and counts inside a 2 km centroid buffer.

Read as: The *_2km columns are intentionally broader than the grid cell.

Land use and slum context3

Source scale: Local polygon overlays: Mumbai slum clusters and Hyderabad land-use where overlapping.

Exported scale: Polygon-overlap area/share aggregated into each 0.01-degree cell.

Read as: Coverage is city-specific; zero may mean no local overlap source, not absence of slum or land use.

Conversion and map coverage18

Source scale: Public road, POI, OSM/Overture/ohsome proxy, settlement, addressability, and mapping-coverage context.

Exported scale: Cell-level conversion-feasibility and serviceability modifiers on the 0.01-degree grid.

Read as: These fields affect can-serve and conversion feasibility; they are not demand labels.

Execution readiness7

Source scale: Public connectivity and measurement-coverage proxies for delivery/payment/partner feasibility.

Exported scale: Cell-level readiness scores and coverage flags on the 0.01-degree grid.

Read as: Execution readiness is a weak acquirable-TAM modifier, not household demand or income evidence.

GeoHG graph context107

Source scale: One-hop graph aggregates over adjacent 0.01-degree grid cells.

Exported scale: 8-neighbour GeoHG context on the same grid; one-hop ring is adjacent/diagonal cells, roughly 3.6 km across including the center cell.

Read as: Graph context inherits precision from its base column and adds one 8-neighbour ring of smoothing.

TAM score outputs27

Source scale: Deterministic gap-closure formulas over source-derived fields.

Exported scale: Probe, v2, v3, interval, confidence, priority, serviceable, acquirable, and power-transform inputs at grid-cell level.

Read as: These are outputs and audit fields, not independent input features for supervised training.

Status and reason codes1

Source scale: Pipeline status strings and reason-code outputs.

Exported scale: Per-row flags that explain blockers, fallbacks, proxy status, and calibration state.

Read as: Status fields support audit and filtering; they should not be modeled as demand signals.

Full signal and score inventory

current signal surface

Column chips come from cell_features_geohg_style.csv, source_layer_cell_features_manifest.json, and all non-ID columns in tam_gap_closure_features.csv. Output/status groups are kept visible but are not independent training features.

Spatial/city context10
centroid_loncentroid_latpos_x_kmpos_y_kmpos_radius_kmcity_grid_colcity_grid_rowcity_cell_countcensuscodegrid_area_m2
Nightlights11
nightlight_log1p_mean_2012nightlight_log1p_mean_2019nightlight_log1p_mean_2024nightlight_mean_2012nightlight_mean_2019nightlight_mean_2024nightlight_sum_2012nightlight_sum_2019nightlight_sum_2024nightlight_mean_growth_2012_2024nightlight_bin_id
Census housing/amenity assets23
census_hl_precision_levelcensus_hl_context_presentcensus_hl_housing_quality_scorecensus_hl_basic_amenity_scorecensus_hl_asset_affluence_scorecensus_hl_amenity_deficit_scorecensus_hl_housing_good_sharecensus_hl_roof_concrete_sharecensus_hl_wall_burnt_brick_or_concrete_sharecensus_hl_floor_finished_sharecensus_hl_rooms_3plus_sharecensus_hl_electricity_sharecensus_hl_latrine_sharecensus_hl_lpg_png_sharecensus_hl_banking_sharecensus_hl_tv_sharecensus_hl_computer_internet_sharecensus_hl_mobile_phone_sharecensus_hl_scooter_motorcycle_sharecensus_hl_car_sharecensus_hl_no_asset_sharecensus_hl_source_pathcensus_hl_affluence_context_score
Hazard and flood context7
VulnerabilityHazardExposureRiskMaxAreaMaxFractionflood_risk_bin_id
Population denominator5
worldpop_density_people_per_km2worldpop_density_log1pworldpop_nearest_distance_kmworldpop_population_est_nearestworldpop_households_est_avg_size_4_6
GHSL built-up/population10
ghsl_population_estghsl_population_source_yearghsl_builtup_shareghsl_built_surface_m2ghsl_built_volume_m3ghsl_height_mean_mghsl_non_res_builtup_shareghsl_non_res_volume_shareghsl_settlement_scoreghsl_residential_candidate_share
Denominator v3 and reconciliation33
census_control_populationcensus_control_householdshouseholds_est_primary_probehousehold_size_census_contexthouseholds_est_worldpop_census_avg_sizedenominator_confidencesource_disagreement_log_ratioreconciled_population_probereconciled_households_probedenominator_population_source_countdenominator_disagreement_scoreimpossible_market_flagdenominator_v2_statuspopulation_prior_basepopulation_prior_lowerpopulation_prior_upperhousehold_size_admin_v3residential_allocation_weighthard_exclusion_share_v3non_residential_suppression_score_v3vertical_density_correctionoccupancy_correctionh_residential_households_unreconciledadmin_anchor_coverage_sharepublic_anchor_reconciliation_factorpublic_anchor_calibration_errorpublic_anchor_reconciliation_statush_residential_households_baseh_residential_households_lowerh_residential_households_upperh_residential_denominator_confidencehousehold_denominator_v3_statushousehold_denominator_status
Building footprints32
gobi_building_count_cellgobi_building_area_sum_m2gobi_building_area_sharegobi_building_area_density_km2gobi_building_area_mean_m2gobi_building_confidence_meanmsft_building_count_cellmsft_building_area_sum_m2msft_building_area_sharemsft_building_area_density_km2msft_building_area_mean_m2msft_building_height_mean_mbuilding_count_cell_bestbuilding_area_sum_m2_bestbuilding_count_density_per_km2_bestbuilding_area_density_km2_bestbuilding_area_share_bestbuilding_source_coverage_countbuilding_source_coverage_flagbuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_area_disagreement_log_ratiobuilding_msft_gobi_count_disagreement_log_ratiobuilding_msft_gobi_disagreement_flagbuilding_residential_density_scorebuilding_cluster_compactnessbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_score
Landcover and physical exclusions20
landcover_builtup_sharelandcover_water_sharelandcover_tree_forest_sharelandcover_crop_sharelandcover_non_residential_exclusion_sharedynamic_world_built_probabilityresidential_eligible_area_sharephysical_hard_cell_flaglandcover_proxy_source_flaghard_exclusion_sharesoft_non_residential_downweight_sharehard_mask_reason_codebuilt_form_proxy_scoreresidential_confidence_proberesidential_eligible_area_share_v2physical_exclusion_scoreresidential_filter_confidenceresidential_filter_statusnon_residential_exclusion_scoreresidential_status
Morphology proxies6
morph_dense_old_city_scoremorph_informal_dense_scoremorph_highrise_affordable_scoremorph_cbd_industrial_scoremorph_periurban_vacant_scoremorphology_proxy_source_flag
Income gate and welfare context40
income_public_affluence_context_scoreincome_public_deprivation_context_scoreincome_public_asset_affluence_scoreincome_public_amenity_deficit_scoreincome_public_license_status_codeincome_public_proxy_source_flagincome_gate_city_segment_codeincome_gate_city_segment_prior_0_10lpa_probincome_gate_city_prior_0_10lpa_probincome_gate_district_income_pciincome_gate_district_income_affluence_scoreincome_gate_admin_affluence_scoreincome_gate_admin_deprivation_scoreincome_gate_nfhs_affluence_scoreincome_gate_nfhs_deprivation_scoreincome_gate_shrug_rwi_scoreincome_gate_shrug_consumption_scoreincome_gate_shrug_asset_affluence_scoreincome_gate_meta_rwi_rawincome_gate_meta_rwi_scoreincome_gate_meta_rwi_errorincome_gate_cell_wealth_scoreincome_gate_external_affluence_scoreincome_gate_source_countincome_gate_confidenceincome_gate_granularity_codeincome_gate_granularityincome_gate_statusincome_0_10lpa_prob_pre_gate_proxyincome_gate_prob_candidateincome_gate_final_weightincome_gate_adjustment_deltaincome_0_10lpa_prob_probeincome_proxy_confidenceincome_0_10lpa_prob_lower_v3income_0_10lpa_prob_upper_v3income_gate_context_affluence_scorebase_affluence_proxy_scoreaffluence_proxy_scoreincome_proxy_status
Road/serviceability4
pmgsy_road_nearest_distance_kmpmgsy_road_within_2kmpmgsy_road_source_availablepmgsy_road_distance_missing
POI/serviceability8
poi_education_count_cellpoi_education_count_2kmpoi_police_count_cellpoi_police_count_2kmpoi_airport_count_cellpoi_airport_count_2kmpoi_energy_count_cellpoi_energy_count_2km
Land use and slum context3
mumbai_slum_area_m2mumbai_slum_sharelanduse_overlap_share
Conversion and map coverage18
road_distance_mroad_intersection_densitysettlement_cluster_sizeaddressability_scorepoi_service_mix_scoreosm_mapping_coverage_scoreoverture_road_coverage_scoreconversion_proxy_source_flagroad_access_scorepoi_service_access_scoreserviceability_supply_friction_scoreserviceable_prob_probeserviceability_confidencemap_coverage_confidenceconversion_feasibility_scoreconversion_feasibility_confidenceconversion_feasibility_statusserviceability_status
Execution readiness7
mlab_measurement_coverage_scoreconnectivity_readiness_scoreexecution_proxy_source_flagexecution_readiness_signal_countexecution_readiness_scoreexecution_readiness_confidenceexecution_readiness_status
GeoHG graph context107
graph_ctx_neighbor_mean_gobi_building_count_cellgraph_ctx_self_minus_neighbor_gobi_building_count_cellgraph_ctx_neighbor_mean_gobi_building_area_sum_m2graph_ctx_self_minus_neighbor_gobi_building_area_sum_m2graph_ctx_neighbor_mean_gobi_building_area_sharegraph_ctx_self_minus_neighbor_gobi_building_area_sharegraph_ctx_neighbor_mean_gobi_building_area_density_km2graph_ctx_self_minus_neighbor_gobi_building_area_density_km2graph_ctx_neighbor_mean_gobi_building_area_mean_m2graph_ctx_self_minus_neighbor_gobi_building_area_mean_m2graph_ctx_neighbor_mean_gobi_building_confidence_meangraph_ctx_self_minus_neighbor_gobi_building_confidence_meangraph_ctx_neighbor_mean_msft_building_count_cellgraph_ctx_self_minus_neighbor_msft_building_count_cellgraph_ctx_neighbor_mean_msft_building_area_sum_m2graph_ctx_self_minus_neighbor_msft_building_area_sum_m2graph_ctx_neighbor_mean_msft_building_area_sharegraph_ctx_self_minus_neighbor_msft_building_area_sharegraph_ctx_neighbor_mean_msft_building_area_density_km2graph_ctx_self_minus_neighbor_msft_building_area_density_km2graph_ctx_neighbor_mean_msft_building_area_mean_m2graph_ctx_self_minus_neighbor_msft_building_area_mean_m2graph_ctx_neighbor_mean_msft_building_height_mean_mgraph_ctx_self_minus_neighbor_msft_building_height_mean_mgraph_ctx_neighbor_mean_building_count_cell_bestgraph_ctx_self_minus_neighbor_building_count_cell_bestgraph_ctx_neighbor_mean_building_area_share_bestgraph_ctx_self_minus_neighbor_building_area_share_bestgraph_ctx_neighbor_mean_building_area_density_km2_bestgraph_ctx_self_minus_neighbor_building_area_density_km2_bestgraph_ctx_neighbor_mean_building_residential_density_scoregraph_ctx_self_minus_neighbor_building_residential_density_scoregraph_ctx_neighbor_mean_building_population_per_buildinggraph_ctx_self_minus_neighbor_building_population_per_buildinggraph_ctx_neighbor_mean_building_footprint_area_per_person_m2graph_ctx_self_minus_neighbor_building_footprint_area_per_person_m2graph_ctx_neighbor_mean_building_source_coverage_flaggraph_ctx_self_minus_neighbor_building_source_coverage_flaggraph_ctx_neighbor_mean_building_msft_gobi_disagreement_flaggraph_ctx_self_minus_neighbor_building_msft_gobi_disagreement_flaggraph_ctx_neighbor_mean_ghsl_population_estgraph_ctx_self_minus_neighbor_ghsl_population_estgraph_ctx_neighbor_mean_ghsl_builtup_sharegraph_ctx_self_minus_neighbor_ghsl_builtup_sharegraph_ctx_neighbor_mean_ghsl_settlement_scoregraph_ctx_self_minus_neighbor_ghsl_settlement_scoregraph_ctx_neighbor_mean_landcover_builtup_sharegraph_ctx_self_minus_neighbor_landcover_builtup_sharegraph_ctx_neighbor_mean_landcover_water_sharegraph_ctx_self_minus_neighbor_landcover_water_sharegraph_ctx_neighbor_mean_landcover_tree_forest_sharegraph_ctx_self_minus_neighbor_landcover_tree_forest_sharegraph_ctx_neighbor_mean_landcover_crop_sharegraph_ctx_self_minus_neighbor_landcover_crop_sharegraph_ctx_neighbor_mean_landcover_non_residential_exclusion_sharegraph_ctx_self_minus_neighbor_landcover_non_residential_exclusion_sharegraph_ctx_neighbor_mean_residential_eligible_area_sharegraph_ctx_self_minus_neighbor_residential_eligible_area_sharegraph_ctx_neighbor_mean_physical_hard_cell_flaggraph_ctx_self_minus_neighbor_physical_hard_cell_flaggraph_ctx_neighbor_mean_road_distance_mgraph_ctx_self_minus_neighbor_road_distance_mgraph_ctx_neighbor_mean_road_intersection_densitygraph_ctx_self_minus_neighbor_road_intersection_densitygraph_ctx_neighbor_mean_settlement_cluster_sizegraph_ctx_self_minus_neighbor_settlement_cluster_sizegraph_ctx_neighbor_mean_building_cluster_compactnessgraph_ctx_self_minus_neighbor_building_cluster_compactnessgraph_ctx_neighbor_mean_addressability_scoregraph_ctx_self_minus_neighbor_addressability_scoregraph_ctx_neighbor_mean_poi_service_mix_scoregraph_ctx_self_minus_neighbor_poi_service_mix_scoregraph_ctx_neighbor_mean_osm_mapping_coverage_scoregraph_ctx_self_minus_neighbor_osm_mapping_coverage_scoregraph_ctx_neighbor_mean_mlab_measurement_coverage_scoregraph_ctx_self_minus_neighbor_mlab_measurement_coverage_scoregraph_ctx_neighbor_mean_connectivity_readiness_scoregraph_ctx_self_minus_neighbor_connectivity_readiness_scoregraph_ctx_neighbor_mean_poi_education_count_2kmgraph_ctx_self_minus_neighbor_poi_education_count_2kmgraph_ctx_neighbor_mean_poi_police_count_2kmgraph_ctx_self_minus_neighbor_poi_police_count_2kmgraph_ctx_neighbor_mean_poi_airport_count_2kmgraph_ctx_self_minus_neighbor_poi_airport_count_2kmgraph_ctx_neighbor_mean_poi_energy_count_2kmgraph_ctx_self_minus_neighbor_poi_energy_count_2kmgraph_ctx_neighbor_mean_nightlight_log1p_mean_2024graph_ctx_self_minus_neighbor_nightlight_log1p_mean_2024graph_ctx_neighbor_mean_nightlight_mean_growth_2012_2024graph_ctx_self_minus_neighbor_nightlight_mean_growth_2012_2024graph_ctx_neighbor_mean_Riskgraph_ctx_self_minus_neighbor_Riskgraph_ctx_neighbor_mean_MaxFractiongraph_ctx_self_minus_neighbor_MaxFractiongraph_ctx_neighbor_mean_mumbai_slum_sharegraph_ctx_self_minus_neighbor_mumbai_slum_sharegraph_ctx_neighbor_mean_worldpop_density_people_per_km2graph_ctx_self_minus_neighbor_worldpop_density_people_per_km2graph_ctx_neighbor_mean_worldpop_population_est_nearestgraph_ctx_self_minus_neighbor_worldpop_population_est_nearestgraph_ctx_neighbor_mean_pmgsy_road_nearest_distance_kmgraph_ctx_self_minus_neighbor_pmgsy_road_nearest_distance_kmgraph_ctx_neighbor_mean_pmgsy_road_within_2kmgraph_ctx_self_minus_neighbor_pmgsy_road_within_2kmgraph_ctx_neighbor_mean_pos_radius_kmgraph_ctx_self_minus_neighbor_pos_radius_kmgraph_degree
TAM score outputs27
gross_tam_0_10lpa_probeserviceable_tam_0_10lpa_probeacquirable_tam_0_10lpa_probehouseholds_denominator_v2eligible_households_v2gross_tam_0_10lpa_v2serviceable_tam_0_10lpa_v2acquirable_tam_0_10lpa_v2households_residential_v3households_residential_v3_lowerhouseholds_residential_v3_upperscope_share_in_scope_v3scope_share_status_v3gross_tam_0_10lpa_v3gross_tam_0_10lpa_v3_lowergross_tam_0_10lpa_v3_upperserviceable_tam_0_10lpa_v3acquirable_tam_0_10lpa_v3component_confidence_scorepriority_score_0_100component_confidence_score_v2priority_score_v2_0_100tam_v2_statuscomponent_confidence_score_v3priority_score_v3_0_100tam_v3_statuscalibration_status
Status and reason codes1
reason_codes

Actual scoring math

formula contract

The scoring path is deterministic and ordered: households first, then residential likelihood, income-band probability, conversion/serviceability, execution readiness, v2/v3 TAM outputs, and the no-vendor power score used by maps. The exact formulas come from outputs/tam_gap_closure/tam_gap_closure_manifest.json; vendor TAM and G1 are excluded from all formulas.

1. Household denominatorhouseholds_est_primary_probe
unit: households per grid cell
first_non_null(households_est_worldpop_census_avg_size, worldpop_households_est_avg_size_4_6, households_est_uniform_district_density, 0), clipped at lower bound 0

Intuition: Estimate how many households live in the cell before any TAM probability is applied.

  • clip(census_2011_avg_household_size, 3.0, 7.5), with missing filled as 4.6
  • worldpop_population_est_nearest / household_size_census_context
  • clip(0.35*population_quality + 0.25*distance_quality + 0.25*census_quality + 0.15*disagreement_quality, 0, 1)

Guardrail: WorldPop and Census district context are independent inputs; vendor TAM and G1 are not read.

2. Residential likelihoodresidential_confidence_probe
unit: probability-like confidence, 0 to 1
min(raw_residential_confidence, 0.84 if building_source_coverage_flag > 0 else 0.80)

Intuition: Down-weight cells that look industrial, sparse, or weakly residential before counting them as addressable households.

  • clip(0.50*city_rank(worldpop_density_people_per_km2) + 0.20*city_rank(poi_context) + 0.30*city_rank(building_residential_density_score), 0, 1)
  • clip(5.0 * mumbai_slum_share, 0, 1)
  • 0.35 when airport_or_energy_poi_count > 0 and worldpop_density_city_rank < 0.35, else 0.0

Guardrail: Capped because land-cover and true residential masks are not production-complete yet.

3. Income band probabilityincome_0_10lpa_prob_probe
unit: probability-like share, 0.50 to 0.97
blend pre-gate and gate candidate with weight clip(0.40 + 0.34*income_gate_confidence, 0, 0.74) when gate prior is available

Intuition: Treat 0-10 LPA as broad affordability eligibility: brighter, denser, more asset-rich cells are less likely to be in the lower band, while HLPCA amenity deficit increases the probability.

  • clip(0.50*city_rank(nightlight_log1p_mean_2024) + 0.25*city_rank(poi_context) + 0.15*city_rank(worldpop_density_people_per_km2) - 0.10*slum_residential_signal, 0, 1)
  • when HLPCA present: clip(0.45*census_hl_asset_affluence_score + 0.25*census_hl_housing_quality_score + 0.20*census_hl_basic_amenity_score + 0.10*(1 - census_hl_amenity_deficit_score), 0, 1)
  • legacy base/HLPCA affluence blended toward income_gate_context_affluence_score with weight clip(0.20 + 0.32*income_gate_confidence, 0, 0.52)
  • clip(0.30 + 0.20*nightlight_present + 0.16*census_hl_context_present + 0.18*income_gate_confidence + 0.06*poi_context_positive + 0.05*slum_signal_positive, 0, 0.86)

Guardrail: HLPCA is 2011 district context from a scraped mirror and stays probe-only until official Houselisting validation and MPCE calibration pass.

4. Serviceability probabilityserviceable_prob_probe
unit: probability-like share, 0.20 to 0.90
clip(0.35 + 0.55*serviceability_supply_friction_score, 0.20, 0.90)

Intuition: Convert public road, POI, and graph accessibility into a rough serviceability multiplier.

  • if pmgsy_road_source_available > 0 then 1 - clip(pmgsy_road_nearest_distance_km / 5.0, 0, 1), else 0.40
  • clip(0.45*road_access_score + 0.35*city_rank(poi_context) + 0.20*city_rank(graph_degree), 0, 1)
  • clip(0.55*map_coverage_confidence + 0.45*poi_context_present, 0, 0.80)

Guardrail: Internal branch, capacity, partner, cost, and operations coverage are not present, so this remains a probe.

5. Current TAM score familygross_v2, predicted_power, serviceable, acquirable, priority_v2
unit: households for TAM scores; 0-100 for priority
h_residential_households_base * income_0_10lpa_prob_probe * scope_share_in_scope_v3 | predicted_tam_0_10lpa_power = globally rescaled power(gross_tam_0_10lpa_v3, gamma=0.60), preserving gross TAM total | gross_tam_0_10lpa_v3 * conversion_feasibility_score | serviceable_tam_0_10lpa_v3 * execution_readiness_score | clip(100 * city_rank(acquirable_tam_0_10lpa_v3) * component_confidence_score_v3, 0, 100)

Intuition: The current map score is not a learned vendor-TAM model: it transforms the no-vendor gross v3 TAM surface into a TAM-like score while preserving the base total.

  • city_rank(x) = pandas groupby(city).rank(pct=True, method='average'), then fill missing with 0.5 and clip to [0, 1]
  • clip(0.35*denominator_confidence + 0.25*residential_confidence_probe + 0.20*income_proxy_confidence + 0.20*serviceability_confidence, 0, 1)
  • legacy probe replay remains: clip(households_est_primary_probe * income_0_10lpa_prob_probe * residential_confidence_probe, lower=0) | gross_tam_0_10lpa_probe * serviceable_prob_probe | clip(100 * city_rank(serviceable_tam_0_10lpa_probe) * component_confidence_score, 0, 100)

Guardrail: Vendor TAM and G1 are benchmark-only after formulas are frozen; the power score uses no vendor scaling.

numeric replaygrid_3335

Delhi: values below are recomputed from the CSV row, without benchmark labels.

gross TAM22324.6 households * 0.674 income * 0.655 residential = 9861.7
serviceable TAM9861.7 gross * 0.671 serviceable = 6621.0
confidence0.35*0.833 + 0.25*0.655 + 0.20*0.842 + 0.20*0.643 = 0.752
priority100 * city_rank(6621.0) * 0.752 = 75.2

Current v3 math funnel dry run

Current v3 math funnel households H_residential v3 income P(<10LPA) gross scope share serviceable conversion acquirable execution priority v3 city rank * confidence predicted power TAM gamma 0.60; total preserved
math funnel

Dry runs now follow the current v3 score family.

The table below recomputes each displayed step from generated artifacts. The power step uses policy global_no_vendor_base_total_preserved; the scale constant is recomputed inside the artifact being scored, so vendor-grid and full-India runs can have different constants.

Vendor TAM and G1 are not inputs to any row in this funnel; they remain benchmark-only diagnostics later in the page.

gridcityfunnel stepdry-run mathcomputedartifactdeltastatus
grid_636Saharanpurresidential households v3source-derived H_residential11041.10268711041.1026870pass
grid_636Saharanpurgross v3 TAM11041.10 * income_prob 0.84329309.6710769309.6710760pass
grid_636Saharanpurserviceable v3 TAM9309.67 * conversion 0.88348223.8255748223.8255740pass
grid_636Saharanpuracquirable v3 TAM8223.83 * execution 0.77726391.5509806391.5509800pass
grid_636Saharanpurpriority v3 score100 * city_rank 1.0000 * confidence 0.726172.60500072.6050000pass
grid_636Saharanpurpredicted power TAMgross_v3^0.60 * artifact_scale 21.89505268.7343944528.074581740.66review
grid_639Saharanpurresidential households v3source-derived H_residential9879.8154059879.8154050pass
grid_639Saharanpurgross v3 TAM9879.82 * income_prob 0.84238321.8679688321.8679680pass
grid_639Saharanpurserviceable v3 TAM8321.87 * conversion 0.90197505.3080047505.3080040pass
grid_639Saharanpuracquirable v3 TAM7505.31 * execution 0.78005854.1402435854.1402430pass
grid_639Saharanpurpriority v3 score100 * city_rank 0.9859 * confidence 0.726171.58239471.5823940pass
grid_639Saharanpurpredicted power TAMgross_v3^0.60 * artifact_scale 21.89504925.8161254197.738645728.077review

Formula replay dry run

no-write replay

This recomputes final probe arithmetic from the generated CSV values without rerunning feature builders or reading benchmark labels. A pass means the HTML-visible math matches the written scoring columns.

gridcityoutputrecomputedwrittenabsolute deltastatus
grid_3335Delhigross_tam_0_10lpa_probe9861.6665919861.6665910pass
grid_3335Delhiserviceable_tam_0_10lpa_probe6621.0003716621.0003710pass
grid_3335Delhicomponent_confidence_score0.7523940.7523940pass
grid_3335Delhipriority_score_0_10075.23938975.2393890pass
grid_3333Delhigross_tam_0_10lpa_probe8933.3153958933.3153950pass
grid_3333Delhiserviceable_tam_0_10lpa_probe6290.5675686290.5675680pass
grid_3333Delhicomponent_confidence_score0.7489270.7489270pass
grid_3333Delhipriority_score_0_10074.84587574.8458750pass
grid_3291Delhigross_tam_0_10lpa_probe9592.3499869592.3499860pass
grid_3291Delhiserviceable_tam_0_10lpa_probe6070.1211546070.1211540pass
grid_3291Delhicomponent_confidence_score0.7470120.7470120pass
grid_3291Delhipriority_score_0_10074.60789674.6078960pass
grid_3328Delhigross_tam_0_10lpa_probe8165.7528698165.7528690pass
grid_3328Delhiserviceable_tam_0_10lpa_probe5605.2538025605.2538020pass
grid_3328Delhicomponent_confidence_score0.7497870.7497870pass
grid_3328Delhipriority_score_0_10074.83817174.8381710pass
grid_1176Mumbaigross_tam_0_10lpa_probe7858.5888167858.5888160pass
grid_1176Mumbaiserviceable_tam_0_10lpa_probe5571.0615295571.0615290pass
grid_1176Mumbaicomponent_confidence_score0.7708550.7708550pass
grid_1176Mumbaipriority_score_0_10077.08553477.0855340pass
VBuilding footprintspp. 16-18

Building Footprints

The HTML now reflects the newer Microsoft and combined building-density code path, while making the current artifact coverage explicit.

code path

Microsoft is wired in.

Stage 1 can select AOI shards and Stage 2 has Microsoft and combined building fields.

artifact state

Microsoft coverage is active.

The current Microsoft manifest contributes footprint coverage to the combined signal.

sourcestatuscoverage
Google Open Buildingsok9.3%
Microsoft Global Buildingsok91.6%
Combined building signalok95.8%

Building-related columns

gobi_building_count_cellgobi_building_area_sum_m2gobi_building_area_sharegobi_building_area_density_km2gobi_building_area_mean_m2gobi_building_confidence_meanmsft_building_count_cellmsft_building_area_sum_m2msft_building_area_sharemsft_building_area_density_km2msft_building_area_mean_m2msft_building_height_mean_mbuilding_count_cell_bestbuilding_area_sum_m2_bestbuilding_count_density_per_km2_bestbuilding_area_density_km2_bestbuilding_area_share_bestbuilding_source_coverage_countbuilding_residential_density_scorebuilding_source_coverage_flagbuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_disagreement_flagbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_scorebuilding_cluster_compactness
Microsoft AOI staging selected 24 shards and expected 1617.8 MB; the run status is ok. Current combined building coverage reflects available Google and Microsoft footprint signals.
VIPredicted TAM layerpp. 19-23

Predicted TAM Layer

The newer pipeline promotes one explicit score column for maps and summaries: predicted_tam_0_10lpa_power. It is a deterministic transform of the no-vendor gross TAM base, not a learned vendor-TAM model.

Power-transform score pathgross TAM v2no vendor basepower 0.60monotone shapemapscorescale policy: global_no_vendor_base_total_preservedpreserves no-vendor base total; vendor TAM remains benchmark-only
current score

Magnitude stays count-like.

The transform reshapes the distribution to improve benchmark-grid Pearson agreement while preserving Spearman order and preserving the source-derived gross-TAM total.

The selected score is used consistently by the map, full-India scorer, notebook metric summary, and post-hoc G1 suite.

Output columnpredicted_tam_0_10lpa_power

Selected score used by map, full-India score index, notebook summary, and post-hoc suite.

Base columngross_tam_0_10lpa_v3

Source-derived gross TAM base. Vendor TAM is not used for scaling.

Gamma0.60

Fixed monotone power exponent selected by Cell-2 transform diagnostics.

Scale policyglobal_no_vendor_base_total_preserved

Power weights are rescaled to the no-vendor base total, not to a vendor mean.

Vendor Pearson0.788

Benchmark-grid sanity metric only.

G1 rank ceiling R20.558

Order-only ceiling diagnostic; not TAM magnitude.

Cell-2 transform decision

fieldvaluestatus
primary_map_mechanismpower_060_scaled_gross_tam_v2current
rank_ceiling_mechanismrank_pct_for_order_diagnostic_onlycurrent
rejected_primary_mechanismvendor_mean_scaled_rank_pctcurrent
map_predicted_vs_vendor_pearson0.7881785302692829current
map_predicted_vs_vendor_spearman0.8349283824789576current
map_predicted_vs_g1_spearman0.7469475181022232current
top10_predicted_power_g1_capture0.46703212689927004current
top10_vendor_tam_g1_capture0.42434916553876123current
city_holdout_invalidates_full_caught_up_claimTruecurrent

Headline power checks

checkvaluenote
predicted_vs_vendor_pearson0.788179headline check
predicted_vs_vendor_spearman0.834928headline check
rank_pct_transform_ceiling_r20.557931headline check
full_india_scored_cells2905288.000000headline check
vendor_comparison_cells7029.000000headline check

Holdout transform comparison

groupmechanismgroupsmetric 2 R2 medianmetric 3 R2 mediancaught-up median
cityidentity290.3590.4000.969
citylog1p290.6140.5921.041
citypower_060290.6480.6540.992
cityrank_pct290.6490.6591.005
citysqrt1p290.5310.5380.953
cityyeo_johnson290.6480.6540.992
spatial_blockidentity110.3100.3210.998
spatial_blocklog1p110.5430.4541.097
spatial_blockpower_060110.5840.5491.031
spatial_blockrank_pct110.5830.5411.040
spatial_blocksqrt1p110.4300.4151.071
spatial_blockyeo_johnson110.5840.5491.031

Top-k G1 capture from transform report

scoretop fractiongrid countG1 capturelift
predicted_power_tam5.0%35223.0%4.60
vendor_tam5.0%35224.3%4.86
predicted_power_tam10.0%70346.7%4.67
vendor_tam10.0%70342.4%4.24
predicted_power_tam20.0%1,40671.2%3.56
vendor_tam20.0%1,40667.5%3.38
The transform report also says city holdouts invalidate a full caught-up claim: True. This section documents the chosen display/scoring layer; it does not authorize production accuracy claims.
VIIFull-India mappp. 24-29

Full-India Map & Scores

The new full-India path separates geometry indexing from scoring. Geometry is built from the 0.01-degree India grid; scoring then writes a source-derived score CSV, compact score index, and manifest.

03 - national score and map artifacts
grid index2,905,288 cellsfull-India scorerWorldPop, Census, POI, formulasscore CSVprobe rows and audit columnsscore index JSONrow-major compact map payloadLeaflet HTMLdocs/tam_grid_output_map.html
Scored cells2,905,288

Every indexed 0.01-degree India cell scored in the current manifest.

Grid step0.01

Degrees per cell; about 1.1 km north-south.

District/city groups631

Groups from the district/city context join.

Candidate cells8,868,492

Boundary grid candidates before India-boundary filtering.

Row x col index3,033 x 2,924

Compact row-major map index dimensions.

Predicted total118,559,058

No-vendor gross-TAM total preserved after the power transform.

Full-India score distributions

score columnnon-null cellstotalp50p90maxmeaning
predicted_tam_0_10lpa_power2,905,288118,559,05830.5680.872741.69probe distribution
gross_tam_0_10lpa_v32,905,288118,559,05816.2081.9929119.34probe distribution
serviceable_tam_0_10lpa_v32,905,28848,898,0776.2232.0714441.63probe distribution
acquirable_tam_0_10lpa_v32,905,2888,153,5521.045.352407.43probe distribution
priority_score_v3_0_1002,905,28886033814.628.8755.0368.68probe distribution

Full-India source status

source familycurrent status detailmode
grid_index{"cell_count":2905288}source-derived
district_context{"district_join_share":1.0}source-derived
worldpop_population{"coverage_share":1.0,"median_nearest_distance_km":0.3567097458094222,"p95_nearest_distance_km":0.5752521808981806,"source_rows":4010402}source-derived
poi_context{"education_source_points":19502,"police_source_points":16459,"airport_source_points":2710,"energy_source_points":534}source-derived
formula_inputs{"building_footprints":"not_rebuilt_for_direct_full_india_scoring_zero_coverage_flags_used","pmgsy_roads":"not_rebuilt_for_direct_full_india_scoring_missing_state_fallback_used","graph_degree":"regular_grid_degree_8_proxy_for_full_india_direct_scoring"}source-derived

Score pass/fail checks

checkvalue
all_grid_cells_scoredTrue
grid_step_is_0_01_degreeTrue
vendor_tam_used_as_featureFalse
score_index_writtenTrue

Geometry-index pass/fail checks

checkvalue
grid_index_writtenTrue
all_grid_cells_indexedTrue
grid_step_is_0_01_degreeTrue
vendor_tam_used_as_production_featureFalse
coarse_0_05_grid_publishedFalse
Full-India scores are deterministic probe estimates. The manifest explicitly records vendor_tam_available_for_full_india as False and calibration status as probe_not_production_calibrated.
VIIIDiagnostics gatepp. 30-34

Diagnostics Gate

Stage 3 runs current benchmark diagnostics and claim-boundary checks, but it does not train, tune, calibrate, or fit against vendor TAM or G1.

component agreement

Benchmark-only rows join.

Probe and v2 component columns compare to vendor TAM for audit and sanity checks. These rows are not the current map-score metric.

claim boundary

No production model metric.

The current diagnostics explicitly block accuracy claims until valid holdouts exist. The headline map score remains predicted_tam_0_10lpa_power, reported in the Predicted TAM and Notebook sections.

Component probe/v2 vs vendor checks

This table intentionally includes legacy probe rows such as gross_tam_0_10lpa_probe. Treat it as component QA; the current map-score Pearson/Spearman values are reported separately for predicted_tam_0_10lpa_power.
candidatenPearson rSpearman rWMAPETop-10 overlapvalidity
gross_tam_0_10lpa_probe7,0290.7310.7970.5580.560benchmark only
serviceable_tam_0_10lpa_probe7,0290.7350.8050.6460.555benchmark only
priority_score_0_1007,0290.6070.7490.9780.408benchmark only
gross_tam_0_10lpa_v27,0290.7630.8370.5460.620benchmark only
serviceable_tam_0_10lpa_v27,0290.7330.8450.6830.582benchmark only
acquirable_tam_0_10lpa_v27,0290.6520.8280.8040.512benchmark only
priority_score_v2_0_1007,0290.6130.7780.9780.428benchmark only
gross_tam_0_10lpa_v37,0290.7640.8420.4990.616benchmark only
serviceable_tam_0_10lpa_v37,0290.7370.8480.6070.587benchmark only
acquirable_tam_0_10lpa_v37,0290.6660.8340.7490.519benchmark only
priority_score_v3_0_1007,0290.6190.7830.9780.448benchmark only

G1 post-hoc holdout

candidatekindnG1 hitsSpearman rTop-10 G1 capturevalidity
geoiq_vendor_tam_benchmarkbenchmark_label7,029176,5810.70942.4%benchmark_label_posthoc_holdout
IXNotebook metricspp. 35-38

Notebook & Post-Hoc Metrics

The notebook-facing summaries were updated to use the predicted power TAM column. They are useful for reporting, but remain post-hoc diagnostics because they join to vendor TAM and G1 after score generation.

Metric-suite wiringpredicted powerscore columnvendor TAMbenchmark labelG1 hitspost-hoc outcomesummary, city-wise, top-k, decile, residual, and aggregate artifacts
reporting path

Same score, multiple views.

The short summary, city-wise table, and post-hoc suite all use predicted_tam_0_10lpa_power.

These outputs describe agreement and ranking behavior; they do not feed back into feature construction or transform scaling.

Notebook short metric summary

ordermetricnPearson rPearson R2Spearman R2caught-up pct
1predicted_tam_vs_vendor_tam7,0290.7900.6230.708n/a
2predicted_tam_vs_g1_hits7,0290.4350.1890.552n/a
3vendor_tam_vs_g1_hits7,0290.4060.1650.502n/a
4metric_2_divided_by_metric_3_caught_up7,0291.0721.1491.100114.9%

Post-hoc G1 overall suite

metricnPearson rPearson R2Spearman rSpearman R2log1p R2
predicted_tam_vs_vendor_tam7,0290.7900.6230.8420.7080.566
predicted_tam_vs_g1_hits7,0290.4350.1890.7430.5520.492
vendor_tam_vs_g1_hits7,0290.4060.1650.7090.5020.376
metric_2_divided_by_metric_3_caught_up7,0291.0721.1491.0491.1001.307

Post-hoc top-k suite

scoretop fractiongrid countG1 captureliftNDCG
predicted_tam5.0%35123.6%4.730.308
predicted_tam10.0%70346.4%4.640.455
predicted_tam20.0%1,40670.1%3.510.575
vendor_tam5.0%35124.3%4.860.314
vendor_tam10.0%70342.4%4.240.421
vendor_tam20.0%1,40667.5%3.380.554
Caught-up ratios can exceed 100% because they are ratios of post-hoc correlations or R-squares, not calibrated accuracy. Treat them as comparison diagnostics, not proof that the solution has matched or surpassed vendor TAM.
XLeakage policypp. 39-42

Leakage Policy

The strongest part of the current pipeline is the explicit claim boundary around forbidden labels and invalidated metrics.

flagvalue
vendor_tam_used_as_featureFalse
vendor_tam_used_as_benchmark_labelTrue
vendor_tam_used_as_training_labelFalse
vendor_tam_trained_diagnostics_excludedTrue
g1_used_for_trainingFalse
g1_used_as_featureFalse
g1_used_for_source_selectionFalse
random_cv_allowed_for_accuracy_claimFalse
production_accuracy_claim_allowedFalse
checkvalue
spatial_holdout_metrics_presentFalse
city_holdout_metrics_presentFalse
g1_holdout_diagnostics_presentTrue
redundancy_review_requiredTrue
multicollinearity_review_requiredTrue
city_confounding_review_requiredFalse
prediction_calibration_review_requiredTrue
production_accuracy_claim_allowedFalse
interpretation

Vendor TAM appears as a benchmark label, not as a feature or training label. G1 appears only after the fact. Random CV is not allowed for production accuracy claims.

XIGenerated outputspp. 43-48

Generated Outputs

The updated HTML is an audit reader over the files the current pipeline writes, grouped by the stage that owns them.

Generated outputs by stage

stageoutputpathstatuscurrent meaning
get_datasource_fetch_manifest_jsonoutputs/source_fetch/source_fetch_manifest.jsonpresent json / 78.8 KBSource registry fetch status, local file hashes, access notes, and next actions.
get_datasource_fetch_manifest_csvoutputs/source_fetch/source_fetch_manifest.csvpresent 49 rowsTabular source-fetch ledger for review.
get_datadirect_prior_art_download_manifest_jsonoutputs/source_fetch/direct_prior_art_download_manifest.jsonpresent json / 26.8 KBDirect prior-art payload plan and availability record.
get_datadirect_prior_art_download_manifest_csvoutputs/source_fetch/direct_prior_art_download_manifest.csvpresent 26 rowsTabular direct-prior-art payload ledger.
get_datamicrosoft_buildings_aoi_manifest_jsonoutputs/source_fetch/microsoft_buildings_aoi_manifest.jsonpresent json / 1.1 KBAOI shard selection and Microsoft building-footprint staging status.
get_datamicrosoft_buildings_aoi_manifest_csvoutputs/source_fetch/microsoft_buildings_aoi_manifest.csvpresent 24 rowsTabular Microsoft AOI shard/staging record.
get_dataghsl_builtup_tiles_manifest_jsonoutputs/source_fetch/ghsl_builtup_2020_4326_30ss_tiles_manifest.jsonpresent json / 19.0 KBGHSL 2020 built surface/volume AOI tile staging manifest.
get_dataghsl_builtup_tiles_manifest_csvoutputs/source_fetch/ghsl_builtup_2020_4326_30ss_tiles_manifest.csvpresent 24 rowsTabular GHSL built surface/volume tile ledger.
get_dataincome_gate_source_fetch_manifest_jsonoutputs/source_fetch/income_gate/income_gate_source_fetch_manifest.jsonpresent json / 30.0 KBIncome-gate public-source fetch manifest and access ledger.
get_dataincome_gate_source_fetch_manifest_csvoutputs/source_fetch/income_gate/income_gate_source_fetch_manifest.csvpresent 19 rowsTabular income-gate public-source fetch ledger.
source_layersource_layer_features_manifestoutputs/source_layers/source_layer_cell_features_manifest.jsonpresent json / 8.8 KBManifest for source-layer cell-feature materialization.
source_layersource_layer_contracts_jsonoutputs/source_layers/source_layer_contracts.jsonpresent json / 17.2 KBSource-layer contracts, expected fields, source families, and promotion gates.
source_layersource_layer_contracts_csvoutputs/source_layers/source_layer_contracts.csvpresent 9 rowsTabular source-layer contract ledger.
source_layerworldpop_source_layer_featuresoutputs/source_layers/worldpop_population_density/2020_ascii_xyz_local/cell_features.csvpresent 7,029 rowsWorldPop population-density source-layer cell features.
source_layercensus_controls_source_layer_featuresoutputs/source_layers/census_2011_official_controls/2011_fixed_release/cell_features.csvpresent 7,029 rowsCensus 2011 official-control source-layer cell features.
source_layerbuilding_source_layer_featuresoutputs/source_layers/building_footprints_google_microsoft/google_2023_msft_staged_snapshot/cell_features.csvpresent 7,029 rowsGoogle/Microsoft building footprint, height, volume, and disagreement source-layer features.
source_layerghsl_source_layer_featuresoutputs/source_layers/ghsl_population_builtup/named_release_required/cell_features.csvpresent 7,029 rowsGHSL population, built surface, built volume, height, settlement, and residential-candidate source-layer features.
source_layerlandcover_source_layer_featuresoutputs/source_layers/esa_worldcover_dynamic_world/worldcover_2020_2021_dynamic_world_pinned_window_required/cell_features.csvpresent 7,029 rowsESA/Dynamic World landcover proxy, physical exclusion, and residential eligibility source-layer features.
source_layermorphology_source_layer_featuresoutputs/source_layers/residential_morphology_tags/building_landcover_public_proxy_v1/cell_features.csvpresent 7,029 rowsResidential morphology proxy source-layer features.
source_layerincome_public_source_layer_featuresoutputs/source_layers/income_public_context/income_gate_public_context_v2/cell_features.csvpresent 7,029 rowsIncome-gate public welfare and affluence context source-layer features.
source_layerosm_overture_source_layer_featuresoutputs/source_layers/osm_overture_ohsome_roads_pois/dated_extract_required/cell_features.csvpresent 7,029 rowsRoad, addressability, settlement, mapping coverage, and conversion source-layer features.
source_layerconnectivity_source_layer_featuresoutputs/source_layers/connectivity_execution_readiness/dated_opencellid_ookla_mlab_required/cell_features.csvpresent 7,029 rowsConnectivity and execution-readiness source-layer features.
denominatorcell_denominator_foundationoutputs/denominator_foundation/cell_denominator_foundation.csvpresent 7,029 rowsCurrent cell-level denominator controls.
denominatorcity_summaryoutputs/denominator_foundation/city_denominator_foundation_summary.csvpresent 34 rowsCity-level rollup for the stage that wrote it.
denominatormanifestoutputs/denominator_foundation/denominator_foundation_manifest.jsonpresent json / 6.9 KBManifest for the denominator-foundation stage.
geohgcell_featuresoutputs/geohg_features/cell_features_geohg_style.csvpresent 7,029 rowsPrimary independent feature table used by downstream diagnostics.
geohgcell_labelsoutputs/geohg_features/cell_labels_vendor_tam.csvpresent 7,029 rowsVendor TAM benchmark labels only; not a production training target.
geohgarea_area_edgesoutputs/geohg_features/area_area_edges.csvpresent 24,203 rowsGeoHG-style area graph edges.
geohgentity_area_edgesoutputs/geohg_features/entity_area_edges.csvpresent 21,381 rowsSemantic entity-to-area edges.
geohgpoi_area_edgesoutputs/geohg_features/poi_area_edges.csvpresent 5,668 rowsPOI/serviceability context edges.
geohgspatial_block_predictionsoutputs/geohg_features/geohg_spatial_block_predictions.csvpresent 7,029 rowsDiagnostic spatial-block prediction artifact.
geohgmetricsoutputs/geohg_features/geohg_feature_metrics.jsonpresent json / 1.4 KBFeature bundle metrics and graph counts.
gap_closurefeaturesoutputs/tam_gap_closure/tam_gap_closure_features.csvpresent 7,029 rowsDeterministic TAM gap-closure probe features.
gap_closurecity_summaryoutputs/tam_gap_closure/tam_gap_closure_city_summary.csvpresent 34 rowsCity-level rollup for the stage that wrote it.
gap_closurebenchmark_by_cityoutputs/tam_gap_closure/tam_gap_closure_benchmark_by_city.csvpresent 374 rowsCity-level probe-vs-vendor benchmark breakdown.
gap_closurebenchmark_metricsoutputs/tam_gap_closure/tam_gap_closure_benchmark_metrics.jsonpresent json / 2.8 KBOverall probe-vs-vendor benchmark metrics.
gap_closuresource_probe_csvoutputs/source_registry/source_probe_summary.csvpresent 21 rowsSource-probe readiness ledger used for production gating.
gap_closuresource_probe_jsonoutputs/source_registry/source_probe_summary.jsonpresent json / 15.5 KBJSON copy of source-probe readiness ledger.
transformcell2_decisionoutputs/cell2_metric_transform_experiments/cell2_best_mechanism_decision.jsonpresent json / 1.5 KBSelected transform policy and rejected alternatives.
transformcell2_analysis_htmloutputs/cell2_metric_transform_experiments/cell2_best_mechanism_analysis.htmlpresent html / 8.4 KBStandalone Cell-2 transform analysis report.
transformcell2_holdout_summaryoutputs/cell2_metric_transform_experiments/cell2_best_mechanism_holdout_summary.csvpresent 12 rowsCity and spatial-block transform holdout summary.
transformcell2_holdoutsoutputs/cell2_metric_transform_experiments/cell2_best_mechanism_holdouts.csvpresent 240 rowsDetailed transform holdout rows.
transformcell2_topkoutputs/cell2_metric_transform_experiments/cell2_best_mechanism_topk.csvpresent 6 rowsTop-k G1 capture comparison for predicted power TAM and vendor TAM.
transformcell2_all_grid_rankingoutputs/cell2_metric_transform_experiments/cell2_best_mechanism_all_grid_ranking.csvpresent 6 rowsAll-grid ranking evidence for transform diagnostics.
transformmain_solution_power_summary_jsonoutputs/main_solution_power_tam_summary.jsonpresent json / 373 BCompact headline checks for the power-transformed solution.
transformmain_solution_power_summary_csvoutputs/main_solution_power_tam_summary.csvpresent 5 rowsCSV copy of the headline power-solution checks.
full_indiafull_india_scores_csvoutputs/full_india_scored/full_india_tam_scores.csvpresent 2,905,288 rowsEvery full-India 0.01-degree cell scored with deterministic source-derived probes.
full_indiafull_india_score_indexoutputs/full_india_scored/full_india_tam_score_index.jsonpresent json / 43.5 MBCompact row-major score index for map hover and browser payload efficiency.
full_indiafull_india_score_manifestoutputs/full_india_scored/full_india_tam_score_manifest.jsonpresent json / 11.7 KBFull-India scoring manifest, leakage policy, transform policy, and distribution metrics.
mapfull_india_grid_manifestoutputs/tam_map/tam_full_india_0_01_grid_manifest.jsonpresent json / 1.2 KBGeometry-only full-India 0.01-degree grid manifest.
mapfull_india_grid_indexoutputs/tam_map/tam_full_india_0_01_grid_index.jsonpresent json / 111.0 KBCompact geometry index for all India grid cells.
mapfull_india_map_manifestoutputs/tam_map/tam_grid_map_manifest.jsonpresent json / 1.2 KBLeaflet map manifest for full-India geometry and score layer wiring.
mapvendor_map_manifestoutputs/tam_map/tam_grid_map_manifest_vendor.jsonpresent json / 7.5 KBLeaflet map manifest for vendor-grid benchmark inspection.
mapvendor_map_geojsonoutputs/tam_map/tam_grid_cells_vendor.geojsonpresent geojson / 21.5 MBVendor-grid GeoJSON with benchmark and predicted-layer fields.
mapmap_htmldocs/tam_grid_output_map.htmlpresent html / 65.0 KBInteractive Leaflet map document.
diagnosticssummaryoutputs/statistical_diagnostics/statistical_diagnostics_summary.jsonpresent json / 9.0 KBStage summary, pass/fail flags, leakage policy, and interpretation notes.
diagnosticsfeature_profileoutputs/statistical_diagnostics/feature_profile.csvpresent 223 rowsFeature distribution profile for diagnostics.
diagnosticsunivariate_feature_signaloutputs/statistical_diagnostics/univariate_feature_signal.csvpresent 223 rowsSingle-feature signal scan against benchmark/probe targets.
diagnosticscity_distribution_shiftoutputs/statistical_diagnostics/city_distribution_shift.csvpresent 224 rowsCity distribution shift checks.
diagnosticsfeature_redundancy_pairsoutputs/statistical_diagnostics/feature_redundancy_pairs.csvpresent 160 rowsHighly related feature-pair review list.
diagnosticsfeature_vifoutputs/statistical_diagnostics/feature_vif.csvpresent 30 rowsMulticollinearity review table.
diagnosticsspatial_permutation_importanceoutputs/statistical_diagnostics/spatial_permutation_importance.csvpresent 1 rowsSpatial permutation diagnostic output.
diagnosticsfeature_action_planoutputs/statistical_diagnostics/feature_action_plan.csvpresent 223 rowsFeature diagnostic next-action list.
diagnosticsmodel_metric_comparisonoutputs/statistical_diagnostics/model_metric_comparison.csvpresent 17 rowsModel metric table retained as diagnostic only.
diagnosticsprediction_metric_comparisonoutputs/statistical_diagnostics/prediction_metric_comparison.csvpresent 1 rowsPrediction/probe metric comparison table.
diagnosticsprediction_residuals_by_cityoutputs/statistical_diagnostics/prediction_residuals_by_city.csvpresent 0 rowsCity residual review table.
diagnosticsprediction_decile_calibrationoutputs/statistical_diagnostics/prediction_decile_calibration.csvpresent 0 rowsDecile calibration diagnostic table.
diagnosticsprediction_city_bootstrap_cioutputs/statistical_diagnostics/prediction_city_bootstrap_ci.csvpresent 0 rowsCity bootstrap confidence interval table.
diagnosticsg1_holdout_candidate_diagnosticsoutputs/statistical_diagnostics/g1_holdout_candidate_diagnostics.csvpresent 2 rowsPost-hoc G1 diagnostic table; not used for tuning.
notebooknotebook_short_metric_summary_csvoutputs/notebook_short_metric_summary.csvpresent 4 rowsNotebook-facing four-row metric summary using predicted power TAM.
notebooknotebook_short_metric_summary_jsonoutputs/notebook_short_metric_summary.jsonpresent json / 1.2 KBJSON copy of the notebook-facing metric summary.
notebooknotebook_short_metric_summary_citywise_csvoutputs/notebook_short_metric_summary_citywise.csvpresent 136 rowsCity-wise metric summary for notebook review.
notebooknotebook_short_metric_summary_citywise_jsonoutputs/notebook_short_metric_summary_citywise.jsonpresent json / 46.4 KBJSON copy of the city-wise metric summary.
notebookposthoc_g1_summaryoutputs/posthoc_g1_metric_suite/summary.jsonpresent json / 28.8 KBStructured post-hoc G1 metric suite summary.
notebookposthoc_g1_overalloutputs/posthoc_g1_metric_suite/overall.csvpresent 4 rowsOverall post-hoc G1 correlations.
notebookposthoc_g1_topkoutputs/posthoc_g1_metric_suite/topk.csvpresent 6 rowsTop-k post-hoc G1 capture table.
notebookposthoc_g1_decileoutputs/posthoc_g1_metric_suite/decile.csvpresent 20 rowsDecile post-hoc G1 calibration table.
notebookposthoc_g1_city_aggregateoutputs/posthoc_g1_metric_suite/city_aggregate.csvpresent 2 rowsCity-aggregate post-hoc G1 summary.

Source probe readiness

probe ledger

21 source-probe rows are tracked; 18 currently write candidate features and 0 are production-ready.

gapsourcepromotioncoveragefeature writtenproduction ready
1_household_denominatorcensus_pca_district_context_2011model_candidate_context99.2%TrueFalse
1_household_denominatorworldpop_ascii_xyz_population_density_2020model_candidate_probe100.0%TrueFalse
1_household_denominatorghsl_population_builtupmodel_candidate_physical_probe100.0%TrueFalse
1_household_denominatorreconciled_population_household_probe_v2model_candidate_probe100.0%TrueFalse
1_household_denominatorprs_eth_popcornmodel_candidate_probe_pending_grid_mapping24.0%FalseFalse
2_residential_built_formgoogle_open_buildings_gobi_2023model_candidate_context9.3%TrueFalse
2_residential_built_formmicrosoft_global_buildingsmodel_candidate_context91.6%TrueFalse
2_residential_built_formcombined_building_residential_densitymodel_candidate_context95.8%TrueFalse
2_residential_built_formbuilding_height_vertical_density_proxymodel_candidate_probe100.0%TrueFalse
2_residential_built_formesa_worldcover_dynamic_worldmodel_candidate_context100.0%TrueFalse
2_residential_built_formresidential_morphology_tagsproxy_candidate_pending_visual_qa100.0%TrueFalse
3_income_affordabilitynightlights_district_panelmodel_candidate_context99.2%TrueFalse
3_income_affordabilitypigshell_hlpca_housing_amenity_assets_2011model_candidate_context99.2%TrueFalse
3_income_affordabilityincome_public_contextproxy_candidate_public_context100.0%TrueFalse
3_income_affordabilityincome_gate_public_sourcesmodel_candidate_income_gate_probe100.0%TrueFalse
3_income_affordabilitymospi_hces_2023_24_public_reportblocked_tables_not_extracted0.0%FalseFalse
4_serviceabilityyashveer_police_education_poismodel_candidate_context100.0%TrueFalse
4_serviceabilitypmgsy_rural_roads_hr_uppartial_model_candidate_with_missing_flag59.6%TrueFalse
4_serviceabilityosm_overture_ohsome_roads_poismodel_candidate_context100.0%TrueFalse
4_serviceabilityopencellid_ookla_mlab_execution_readinessmodel_candidate_context100.0%TrueFalse
5_calibrationindependent_component_calibrationprobe_ready_not_production_calibrated100.0%FalseFalse

Code map

source filelinesrole
scripts/pipeline.py148canonical
scripts/get_data.py50stage/wrapper
scripts/enrich_features.py53stage/wrapper
scripts/prediction_diagnostics.py72stage/wrapper
scripts/build_geohg_features.py77stage/wrapper
scripts/build_source_layer_cell_features.py28stage/wrapper
scripts/build_source_layer_contracts.py25stage/wrapper
scripts/build_denominator_foundation.py28stage/wrapper
scripts/build_tam_gap_closure_features.py28stage/wrapper
scripts/build_full_india_tam_scores.py687stage/wrapper
scripts/build_tam_grid_map_data.py48stage/wrapper
scripts/build_notebook_short_metric_summary.py195stage/wrapper
src/tam_pipeline/pipeline.py50canonical
src/tam_pipeline/stages/get_data.py98stage/wrapper
src/tam_pipeline/stages/enrich_features.py203stage/wrapper
src/tam_pipeline/stages/prediction_diagnostics.py136stage/wrapper
src/tam_pipeline/stages/model_training.py14stage/wrapper
src/tam_pipeline/stages/common.py66stage/wrapper
src/tam_geohg/graph_features.py27stage/wrapper
src/tam_geohg/predicted_tam.py82stage/wrapper
src/tam_geohg/map_inputs.py333stage/wrapper
src/tam_geohg/map_metrics.py136stage/wrapper
src/tam_geohg/map_export.py41stage/wrapper
src/tam_geohg/map_manifest.py48stage/wrapper
src/tam_geohg/map_geojson.py48stage/wrapper
src/tam_geohg/full_india_grid_index.py157stage/wrapper
src/tam_pipeline/payloads/scripts/build_source_layer_cell_features_payload.py1,422stage/wrapper
src/tam_pipeline/payloads/scripts/build_source_layer_contracts_payload.py456stage/wrapper
src/tam_pipeline/payloads/scripts/build_denominator_foundation_payload.py703stage/wrapper
src/tam_pipeline/payloads/scripts/build_tam_gap_closure_features_payload.py1,714stage/wrapper
src/tam_pipeline/payloads/tam_notebook_support/tam_notebook_support_payload.py2,983stage/wrapper

Recent pipeline logs

logpathbytes
20260602_183522_build_geohg_features.logoutputs/pipeline_logs/20260602_183522_build_geohg_features.log0
20260602_171809_build_tam_gap-closure_features.logoutputs/pipeline_logs/20260602_171809_build_tam_gap-closure_features.log26,120
20260602_171808_build_denominator_foundation.logoutputs/pipeline_logs/20260602_171808_build_denominator_foundation.log7,093
20260602_171806_build_source-layer_cell_features.logoutputs/pipeline_logs/20260602_171806_build_source-layer_cell_features.log2,281
20260602_170830_build_geohg_features.logoutputs/pipeline_logs/20260602_170830_build_geohg_features.log66,569
20260601_201950_build_tam_gap-closure_features.logoutputs/pipeline_logs/20260601_201950_build_tam_gap-closure_features.log24,341
20260601_201950_build_source-layer_cell_features.logoutputs/pipeline_logs/20260601_201950_build_source-layer_cell_features.log1,803
20260601_201949_build_denominator_foundation.logoutputs/pipeline_logs/20260601_201949_build_denominator_foundation.log4,443
XIITraining boundarypp. 49-51

Training Boundary

A real training workflow needs an approved prediction surface and defensible holdouts. Current artifacts do not establish production accuracy.

approve sources
Promote source probes only after blockers close and production-ready flags pass.
rerun gate
python3 scripts/pipeline.py prediction_diagnostics --root .
require holds
Add spatial, city, and chronological validation.
then claim
Discuss accuracy only after nonzero valid joins and holdout metrics.
Until that sequence is complete, the right public wording is: feature/probe pipeline and diagnostics are available; production model accuracy is not established.
XIIIGlossarypp. 52-53

Glossary Printout

The permanent glossary mirrors the floating Lingo panel for print and review.

termmeaning
Vendor TAMBenchmark label used for comparison only; not a training target or feature.
G1 holdoutPost-hoc business outcome diagnostic; not used for training, source selection, or tuning.
Probe TAMFormula-driven component output for audit, not production-calibrated TAM.
Source layerVersioned source-family artifact that writes grid-level candidate fields before denominator and gap-closure scoring.
Denominator v3Current leakage-safe residential-household denominator probe with base/lower/upper fields and reconciliation status.
Income gatePublic-source 0-10 LPA probability proxy using district/city/cell welfare and affluence context; MPCE calibration remains pending.
Conversion feasibilityPublic road, POI, addressability, settlement, and map-coverage proxy for whether a cell can be served.
Execution readinessWeak connectivity and operational-readiness proxy for acquirable TAM; still blocked on license, coverage, and internal ops data.
Predicted TAM powerCurrent score layer: a monotone power transform of no-vendor gross TAM that preserves the base total.
Power gammaThe fixed exponent applied to the source-derived gross TAM base; current value comes from the transform decision artifact.
Rank ceilingA diagnostic order-only transform used to understand upper-bound ranking behavior, not TAM magnitude.
Full-India scored gridThe 0.01-degree national grid scored by deterministic source-derived formulas without vendor labels.
Score indexCompact browser-oriented row-major JSON that stores map scores without repeating every property per cell.
Spatial holdoutValidation split that blocks neighborhood memorization.
City holdoutWhole-city transfer test for city confounding.
Notebook metric summaryFour-row post-hoc table comparing predicted TAM, vendor TAM, and G1 for notebook reporting.
Production claimBlocked until valid non-GeoIQ predictions pass spatial, city, and chronological checks.