Technical status document

TAM Prediction Pipeline Status

Important things first: the current score, notebook metrics, signal-family status, and claim boundary are documented before the implementation substories. Vendor TAM and G1 remain diagnostic benchmarks, not training labels.

Primary entryscripts/pipeline.py

Stage API3 stage names

HTML builderbuild_prediction_pipeline_html.py

Claim stateproduction accuracy blocked

ITechnical statuspp. 1-3

Technical Status

The page begins with the actual status: current score, current post-hoc metrics, signal-family readiness, and the production-claim boundary.

You are here in the mindmap

Map I - the boundary before any metric.

Story role

Separate implemented pipeline work from invalid production claims.

Carry forward

Every later metric is benchmark-only unless a valid non-GeoIQ holdout exists.

No production training claim is allowed from the current artifacts. Spatial holdout metrics are False, city holdout metrics are False, and production_accuracy_claim_allowed is False.

Current notebook and post-hoc metrics

current metrics

These values are read from outputs/notebook_short_metric_summary.json and outputs/posthoc_g1_metric_suite/summary.json. They describe benchmark agreement after the score is generated; they do not authorize a production accuracy claim.

Predicted vs vendor R20.623

Current notebook metric from predicted power TAM vs vendor TAM; benchmark-grid diagnostic only.

Predicted vs G1 R20.189

Current notebook metric against post-hoc G1 hits; not used for training or tuning.

Predicted vs G1 rank R20.552

Ranking diagnostic from the notebook-facing metric summary.

Caught-up ratio114.9%

Ratio diagnostic comparing current predicted-vs-G1 to vendor-vs-G1, not calibrated accuracy.

Top 10% G1 capture46.4%

Current post-hoc top-k capture for the predicted score layer.

Metric claim statepost-hoc only

Metrics join benchmark labels after scoring; production accuracy remains blocked.

Signal-family status matrix

classified signal stack

The pipeline is no longer documented as one undifferentiated feature pile. Each family below ties current layers and exported column groups to prior art and the blocker that must close before production promotion.

#	signal family	state	layers	groups	prior art	gate
1	Household denominator and residential base	implemented probe	`worldpop_population_density: okcensus_2011_official_controls: okghsl_population_builtup: okbuilding_footprints_google_microsoft: ok`	`Population denominator (5)GHSL built-up/population (10)Denominator v3 and reconciliation (33)Building footprints (32)`	`population_household_denominator: WorldPop population, census/SHRUG reconciliation, built-up occupancy, household densitybuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy`	Production household truth still needs public-anchor reconciliation, residential masks, source QA, and spatial/city holdouts.
2	Residential eligibility and physical exclusions	implemented proxy	`esa_worldcover_dynamic_world: proxy_from_local_building_poi_density_until_worldcover_dynamic_worldresidential_morphology_tags: proxy_from_building_landcover_density_until_morphology_qabuilding_footprints_google_microsoft: ok`	`Landcover and physical exclusions (20)Morphology proxies (6)Land use and slum context (3)`	`land_use_exclusions_and_risk: water_share, forest_share, mining_share, industrial_land_sharebuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy`	Proxy landcover/morphology remains production-blocked until pinned rasters, visual QA, and morphology holdouts pass.
3	Income and welfare gate	implemented proxy	`income_public_context: proxy_from_income_gate_public_sources_until_calibration`	`Income gate and welfare context (40)Census housing/amenity assets (23)Nightlights (11)`	`satellite_welfare_affluence: roof/material proxy, lighting proxy, drinking-water proxy, Landsat/Sentinel embeddingsnightlights_and_economic_activity: VIIRS mean, VIIRS trend, nightlight blob score, commercial activity proxy`	MPCE calibration, license review, geography QA, and income-source ablations are still required.
4	Conversion and serviceability	implemented proxy	`osm_overture_ohsome_roads_pois: proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome`	`Conversion and map coverage (18)Road/serviceability (4)POI/serviceability (8)`	`roads_accessibility_serviceability: road_length_by_class, distance_to_major_road, road_embedding, travel_frictionpoi_urban_function: POI counts by category, Hex2Vec/ContextualCount embeddings, schools/healthcare/markets, urban function vector`	Dated OSM/Overture extracts, ohsome coverage, internal serviceability, and failed-install checks are missing.
5	Execution and acquirability	weak proxy	`connectivity_execution_readiness: proxy_from_public_serviceability_until_opencellid_ookla_mlab`	`Execution readiness (7)`	`internal_business_reality: leads, installs, retained installs, gross margin`	External readiness cannot replace branch, partner, capacity, payment, CAC, and operations coverage.
6	Spatial context without target leakage	implemented diagnostic x-surface		`GeoHG graph context (107)Spatial/city context (10)`	`heterogeneous_graph_and_spatial_context: neighbor context, semantic similarity, land-cover hypernodes, POI hypernodes`	Graph context is allowed only from independent features; production metrics remain blocked without non-GeoIQ holdouts.
7	Outcome and notebook metrics	post-hoc only		`TAM score outputs (27)Status and reason codes (1)`	`internal_business_reality: leads, installs, retained installs, gross margin`	Vendor TAM and G1 are benchmark labels after score generation; they are not training labels, features, or tuning signals.

Artifact scale summary

Grid rows7,029

Vendor-grid feature rows in the current generated feature table.

Cities34

Cities represented by the current vendor-grid artifacts.

Feature columns211

Full GeoHG-style feature count; vendor TAM training remains blocked.

Source-layer files9

Current source-layer cell-feature artifacts materialized before denominator and gap closure.

Signal families7

Top-level business signal families classified by current state, prior art, and validation blocker.

Prior-art families9

Current signal-stack families from the prior-art classification artifact.

Candidate signal columns18

Source-probe rows that currently write candidate feature columns.

Prior-art payloads23/23

Required local payload readiness from the feature manifest.

Predicted scorepredicted_tam_0_10lpa_power

Current map and notebook score layer; monotone no-vendor power transform.

Power gamma0.60

Scale policy: global_no_vendor_base_total_preserved.

Full-India cells2,905,288

Scored 0.01-degree India grid cells in the current full-India score manifest.

Full-India cities631

District/city groups in the current full-India score manifest.

Predicted TAM total118,559,058

Total predicted power TAM households; total is preserved from the no-vendor gross TAM base.

Generated output artifacts78

Manifest-backed current output files listed later in this document.

Production-ready source probes0

Current source-probe rows cleared for production use. This should remain zero until blockers close.

current statusfeatures writtenprobes benchmarkedtraining blockeddiagnostics post-hoc

IIExecution modelpp. 4-7

Execution Model

The canonical runner is a stage dispatcher, but the important view is the whole path from current sources to generated outputs and gates.

01 - current command path

02 - inputs, stages, generated outputs, and claim gate

run

scripts/pipeline.py parses stage flags.

dispatch

src/tam_pipeline/pipeline.py calls a stage module.

emit

Each stage returns structured JSON status.

gate

Diagnostics do not fit on forbidden labels.

#	stage	implementation	role	canonical command
1	get_data	`src/tam_pipeline/stages/get_data.py`	Builds the source registry, plans/fetches direct prior-art payloads, and stages Microsoft AOI footprint shards when enabled.	`python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update`
2	enrich_features	`src/tam_pipeline/stages/enrich_features.py`	Builds GeoHG-style features, source-layer cell features, denominator v3 context, and deterministic gap-closure probe columns.	`python3 scripts/pipeline.py enrich_features --root .`
3	prediction_diagnostics	`src/tam_pipeline/stages/prediction_diagnostics.py`	Runs current benchmark diagnostics and claim-boundary checks without fitting on vendor TAM or G1.	`python3 scripts/pipeline.py prediction_diagnostics --root .`

Operator commands

Current runner

python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update
python3 scripts/pipeline.py enrich_features --root . --dry-run
python3 scripts/pipeline.py prediction_diagnostics --root . --dry-run
python3 scripts/pipeline.py enrich_features --root .
python3 scripts/pipeline.py prediction_diagnostics --root .
python3 scripts/pipeline.py all --root . --dry-run --skip-manifest-update

Current outputs

outputs/source_fetch/source_fetch_manifest.json
outputs/source_fetch/direct_prior_art_download_manifest.json
outputs/geohg_features/cell_features_geohg_style.csv
outputs/source_layers/source_layer_cell_features_manifest.json
outputs/source_layers/source_layer_contracts.json
outputs/denominator_foundation/cell_denominator_foundation.csv
outputs/tam_gap_closure/tam_gap_closure_features.csv
outputs/full_india_scored/full_india_tam_score_manifest.json
outputs/tam_map/tam_full_india_0_01_grid_index.json
outputs/statistical_diagnostics/statistical_diagnostics_summary.json

Dry-run paths

dry run contract

Dry runs are now explicit stage behavior. The compute-heavy stages print planned inputs and outputs without executing the scripts that write feature, denominator, gap-closure, or diagnostic artifacts.

stage	command	write behavior	current evidence
`get_data`	`python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update`	Plans direct prior-art fetches and Microsoft AOI shard staging; planning manifests may be written for review.	command supported
`enrich_features`	`python3 scripts/pipeline.py enrich_features --root . --dry-run`	Prints planned feature, denominator, and gap-closure steps without executing builder scripts.	no-write JSON plan
`prediction_diagnostics`	`python3 scripts/pipeline.py prediction_diagnostics --root . --dry-run`	Checks frozen prediction readiness and expected diagnostic files without running diagnostics.	no-write JSON plan
`all`	`python3 scripts/pipeline.py all --root . --dry-run --skip-manifest-update`	Passes dry-run through all stages; compute stages remain no-write and blocked checks still fail loudly.	stage-aware

IIIPrior-art statuspp. 8-11

Prior-Art Status

This chapter starts with the signal-family classification, then drills into payload availability. Payload presence is not the same thing as validated signal readiness.

Signal-family prior-art status

prior art before files

The status matrix connects the prior-art families to the implemented source layers and current validation blockers. File counts are shown after this matrix so the document does not confuse downloaded payloads with production-ready signals.

#	signal family	state	layers	groups	prior art	gate
1	Household denominator and residential base	implemented probe	`worldpop_population_density: okcensus_2011_official_controls: okghsl_population_builtup: okbuilding_footprints_google_microsoft: ok`	`Population denominator (5)GHSL built-up/population (10)Denominator v3 and reconciliation (33)Building footprints (32)`	`population_household_denominator: WorldPop population, census/SHRUG reconciliation, built-up occupancy, household densitybuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy`	Production household truth still needs public-anchor reconciliation, residential masks, source QA, and spatial/city holdouts.
2	Residential eligibility and physical exclusions	implemented proxy	`esa_worldcover_dynamic_world: proxy_from_local_building_poi_density_until_worldcover_dynamic_worldresidential_morphology_tags: proxy_from_building_landcover_density_until_morphology_qabuilding_footprints_google_microsoft: ok`	`Landcover and physical exclusions (20)Morphology proxies (6)Land use and slum context (3)`	`land_use_exclusions_and_risk: water_share, forest_share, mining_share, industrial_land_sharebuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy`	Proxy landcover/morphology remains production-blocked until pinned rasters, visual QA, and morphology holdouts pass.
3	Income and welfare gate	implemented proxy	`income_public_context: proxy_from_income_gate_public_sources_until_calibration`	`Income gate and welfare context (40)Census housing/amenity assets (23)Nightlights (11)`	`satellite_welfare_affluence: roof/material proxy, lighting proxy, drinking-water proxy, Landsat/Sentinel embeddingsnightlights_and_economic_activity: VIIRS mean, VIIRS trend, nightlight blob score, commercial activity proxy`	MPCE calibration, license review, geography QA, and income-source ablations are still required.
4	Conversion and serviceability	implemented proxy	`osm_overture_ohsome_roads_pois: proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome`	`Conversion and map coverage (18)Road/serviceability (4)POI/serviceability (8)`	`roads_accessibility_serviceability: road_length_by_class, distance_to_major_road, road_embedding, travel_frictionpoi_urban_function: POI counts by category, Hex2Vec/ContextualCount embeddings, schools/healthcare/markets, urban function vector`	Dated OSM/Overture extracts, ohsome coverage, internal serviceability, and failed-install checks are missing.
5	Execution and acquirability	weak proxy	`connectivity_execution_readiness: proxy_from_public_serviceability_until_opencellid_ookla_mlab`	`Execution readiness (7)`	`internal_business_reality: leads, installs, retained installs, gross margin`	External readiness cannot replace branch, partner, capacity, payment, CAC, and operations coverage.
6	Spatial context without target leakage	implemented diagnostic x-surface		`GeoHG graph context (107)Spatial/city context (10)`	`heterogeneous_graph_and_spatial_context: neighbor context, semantic similarity, land-cover hypernodes, POI hypernodes`	Graph context is allowed only from independent features; production metrics remain blocked without non-GeoIQ holdouts.
7	Outcome and notebook metrics	post-hoc only		`TAM score outputs (27)Status and reason codes (1)`	`internal_business_reality: leads, installs, retained installs, gross margin`	Vendor TAM and G1 are benchmark labels after score generation; they are not training labels, features, or tuning signals.

stage 1

Readiness is an artifact, not a hidden precondition.

The registry and fetch manifests show direct assets, deferred rasters, gated microdata, and source warnings before features are interpreted.

status	count
catalog_and_grid_fetched_payload_deferred	1
catalog_fetched_payload_deferred	2
data_fetched	35
docs_fetched	2
docs_fetched_payload_deferred	1
indexes_fetched_payload_deferred	1
metadata_fetched_payload_deferred	2
overview_fetched_payload_deferred	1
paper_fetched	1
public_reports_fetched_microdata_gated	1
readme_fetched	1
readme_fetched_payload_deferred	1

Prior-art payload coverage

current prior art

The current feature manifest reports 23 present payloads out of 23 required payloads, with 0 missing.

source slug	payloads	present	direct-link payloads
`admin-districts`	6	6	6
`buildings-google`	2	2	2
`education-facilities`	1	1	1
`energy-power-plants`	1	1	1
`env-flood-atlas`	4	4	4
`env-landuse`	1	1	1
`env-soil`	1	1	1
`infra-rural-roads`	2	2	2
`nightlights-viirs`	1	1	1
`police-stations`	1	1	1
`transport-airports`	1	1	1
`unmapped_local_payload`	1	1	0
`urban-municipal`	1	1	1

Prior-art payload files

required payload	source slug	exists	link mode	needs download
`prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.dbf`	`admin-districts`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.prj`	`admin-districts`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.sbn`	`admin-districts`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.sbx`	`admin-districts`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.shp`	`admin-districts`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.shx`	`admin-districts`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/remote-sensing/population-density/ind_pd_2020_1km_ASCII_XYZ.csv`	`unmapped_local_payload`	present	local/manual	no
`prior art/yashveeeeeeer_india-geodata/data/buildings/google/google-open-buildings-india-2023.mosaic.json`	`buildings-google`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/buildings/google/google-open-buildings-india-2023.000000.parquet`	`buildings-google`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/education/facilities/INDIA_EDUCATION_FACILITIES_POINTS.geojson`	`education-facilities`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/police/stations/INDIA_POLICE_STATIONS.geojson`	`police-stations`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/transport/airports/INDIA_AIRPORTS_POINTS.geojson`	`transport-airports`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/energy/power-plants/INDIA_ENERGY_PLANTS.geojson`	`energy-power-plants`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/District_Wise_flood_risk_data.json`	`env-flood-atlas`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/District_Wise_max_flood_area_frac.json`	`env-flood-atlas`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/State_Wise_flood_risk_data.json`	`env-flood-atlas`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/State_Wise_max_flood_area_frac.json`	`env-flood-atlas`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/environment/soil/INDIA_SOIL_MAP_FAO.geojson`	`env-soil`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/environment/landuse/hyderabad/Hyderabad_Landuse.geojson`	`env-landuse`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/remote-sensing/nightlights/nightlights_district_panel.csv`	`nightlights-viirs`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/urban/municipal-boundaries/mumbai/slumClusters.geojson`	`urban-municipal`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/infrastructure/rural-roads/road-network/Haryana.zip`	`infra-rural-roads`	present	direct	no
`prior art/yashveeeeeeer_india-geodata/data/infrastructure/rural-roads/road-network/UttarPradesh.zip`	`infra-rural-roads`	present	direct	no

IVSignal implementationpp. 12-15

Signal Implementation

Stage 2 builds the current signal surface: GeoHG-style cell context, source-layer cell features, graph features, denominator v3, and deterministic TAM probe/v2/v3 columns.

inventory columns

372

Combined GeoHG, source-layer, and gap-closure columns after excluding row identifiers.

cell rows

7,029

Rows in cell_features_geohg_style.csv.

source layers

9

Current source-layer cell-feature artifacts.

candidate signals

18

Source-probe rows that write candidate fields.

graph edges

24,203

Area-area edges in the current GeoHG bundle.

signal surface

Current x is the joined independent feature stack plus audit-visible proxy signals. Current y inside the repo is not a supervised training label; it is the deterministic probe/v2/v3 TAM score family plus the selected predicted_tam_0_10lpa_power map layer. Vendor TAM and G1 are benchmark-only labels.

Feature stack visual

stage 2

One signal surface, several audit views.

Source families join into source-layer and GeoHG feature tables, then feed denominator v3, conversion, execution, and target-free TAM score formulas. Benchmark labels stay outside this path.

The detailed column inventory is kept below as chips only; precision is summarized once instead of repeated in every group.

Source-layer additions

materialized layers

The source-layer manifest reports 9 current cell-feature files and 5 proxy layers. These fields include WorldPop, Census controls, Google/Microsoft buildings, GHSL, landcover, morphology, income-gate, OSM/Overture, and connectivity signals.

source layer	status	rows	columns	mode	fields	cell feature file
`worldpop_population_density`	ok	7,029	2	direct	`worldpop_population_est_nearestworldpop_density_people_per_km2`	`outputs/source_layers/worldpop_population_density/2020_ascii_xyz_local/cell_features.csv`
`census_2011_official_controls`	ok	7,029	2	direct	`census_control_populationcensus_control_households`	`outputs/source_layers/census_2011_official_controls/2011_fixed_release/cell_features.csv`
`building_footprints_google_microsoft`	ok	7,029	13	direct	`building_count_cell_bestbuilding_area_sum_m2_bestbuilding_area_share_bestbuilding_source_coverage_flagbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_scorebuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_disagreement_flag`	`outputs/source_layers/building_footprints_google_microsoft/google_2023_msft_staged_snapshot/cell_features.csv`
`ghsl_population_builtup`	ok	7,029	10	direct	`ghsl_population_estghsl_population_source_yearghsl_builtup_shareghsl_built_surface_m2ghsl_built_volume_m3ghsl_height_mean_mghsl_non_res_builtup_shareghsl_non_res_volume_shareghsl_settlement_scoreghsl_residential_candidate_share`	`outputs/source_layers/ghsl_population_builtup/named_release_required/cell_features.csv`
`esa_worldcover_dynamic_world`	proxy_from_local_building_poi_density_until_worldcover_dynamic_world	7,029	12	proxy	`landcover_builtup_sharelandcover_water_sharelandcover_tree_forest_sharelandcover_crop_sharelandcover_non_residential_exclusion_sharedynamic_world_built_probabilityresidential_eligible_area_sharephysical_hard_cell_flaghard_exclusion_sharesoft_non_residential_downweight_sharehard_mask_reason_codelandcover_proxy_source_flag`	`outputs/source_layers/esa_worldcover_dynamic_world/worldcover_2020_2021_dynamic_world_pinned_window_required/cell_features.csv`
`residential_morphology_tags`	proxy_from_building_landcover_density_until_morphology_qa	7,029	6	proxy	`morph_dense_old_city_scoremorph_informal_dense_scoremorph_highrise_affordable_scoremorph_cbd_industrial_scoremorph_periurban_vacant_scoremorphology_proxy_source_flag`	`outputs/source_layers/residential_morphology_tags/building_landcover_public_proxy_v1/cell_features.csv`
`income_public_context`	proxy_from_income_gate_public_sources_until_calibration	7,029	28	proxy	`income_public_affluence_context_scoreincome_public_deprivation_context_scoreincome_public_asset_affluence_scoreincome_public_amenity_deficit_scoreincome_public_license_status_codeincome_public_proxy_source_flagincome_gate_city_segment_codeincome_gate_city_segment_prior_0_10lpa_probincome_gate_city_prior_0_10lpa_probincome_gate_district_income_pciincome_gate_district_income_affluence_scoreincome_gate_admin_affluence_scoreincome_gate_admin_deprivation_scoreincome_gate_nfhs_affluence_scoreincome_gate_nfhs_deprivation_scoreincome_gate_shrug_rwi_scoreincome_gate_shrug_consumption_scoreincome_gate_shrug_asset_affluence_scoreincome_gate_meta_rwi_rawincome_gate_meta_rwi_scoreincome_gate_meta_rwi_errorincome_gate_cell_wealth_scoreincome_gate_external_affluence_scoreincome_gate_source_countincome_gate_confidenceincome_gate_granularity_codeincome_gate_granularityincome_gate_status`	`outputs/source_layers/income_public_context/income_gate_public_context_v2/cell_features.csv`
`osm_overture_ohsome_roads_pois`	proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome	7,029	9	proxy	`road_distance_mroad_intersection_densitysettlement_cluster_sizebuilding_cluster_compactnessaddressability_scorepoi_service_mix_scoreosm_mapping_coverage_scoreoverture_road_coverage_scoreconversion_proxy_source_flag`	`outputs/source_layers/osm_overture_ohsome_roads_pois/dated_extract_required/cell_features.csv`
`connectivity_execution_readiness`	proxy_from_public_serviceability_until_opencellid_ookla_mlab	7,029	3	proxy	`connectivity_readiness_scoremlab_measurement_coverage_scoreexecution_proxy_source_flag`	`outputs/source_layers/connectivity_execution_readiness/dated_opencellid_ookla_mlab_required/cell_features.csv`

Promotion gates

source contracts

Source-layer contracts fix the source family, model layer, expected fields, and promotion gate. Production readiness is intentionally separate from the presence of raw payloads or cell-feature files.

source	family	model layer	raw files	cell file	production ready	promotion gate
`ghsl_population_builtup`	better_denominators	household_denominator_population_builtup_height_volume	32	True	False	GHS-POP/BUILT-S/BUILT-V/BUILT-H/BUILT-C releases must reconcile against Census controls and WorldPop without tuning to vendor TAM.
`census_2011_official_controls`	better_denominators	official_population_household_controls	2	True	False	official controls or validated mirrors must be reconciled before production household estimates.
`worldpop_population_density`	better_denominators	gridded_population_density	1	True	False	use as one denominator candidate; reconcile to Census/GHSL and residential masks.
`building_footprints_google_microsoft`	better_denominators_conversion_feasibility	building_structure_residential_density	31	True	False	footprints/height/volume are physical evidence only; they cannot become household truth without public-anchor reconciliation.
`esa_worldcover_dynamic_world`	can_serve_here_filters	residential_non_residential_land_mask	3	True	False	pinned releases/windows only; land cover can hard-exclude or downweight cells but cannot label income or households.
`residential_morphology_tags`	better_denominators	old_city_informal_highrise_cbd_periurban_morphology	3	True	False	morphology tags are denominator/income modifiers only after visual QA, source disagreement review, and city/morphology holdouts.
`income_public_context`	income_affordability	high_income_exclusion_public_context	22	True	False	income features estimate high-income exclusion only; SHRUG/NFHS/SECC/RWI/district-income gates require license review, geography QA, MPCE extraction, and city/spatial holdouts before production use.
`osm_overture_ohsome_roads_pois`	conversion_feasibility	road_poi_addressability_clusterability	2	True	False	OSM/Overture features require dated extracts and ohsome coverage so missing mapping is not mistaken for low demand.
`connectivity_execution_readiness`	execution_realism	connectivity_payment_partner_readiness_proxy	2	True	False	weak external serviceability signals can affect execution readiness only after license and coverage gates pass.

Signal coverage

coverage is not accuracy

Coverage rows show which additions are populated in the current artifacts. High coverage does not validate calibration, residential truth, income truth, or production serviceability.

signal check	current value	interpretation
source layers written	9	source-layer cell-feature artifacts available
proxy source layers	5	layers still marked as public proxy rather than production source truth
candidate features written	18	source-probe rows that write candidate feature columns
h residential denominator coverage	100.0%	v3 household denominator fields populated
landcover coverage	100.0%	landcover/exclusion fields populated
morphology coverage	100.0%	morphology proxy fields populated
conversion coverage	100.0%	conversion-feasibility fields populated
execution coverage	100.0%	execution-readiness fields populated
income gate prior coverage	100.0%	public income-gate prior fields populated
income gate confidence median	0.840	median public-income proxy confidence
GHSL builtup coverage	100.0%	GHSL built surface/volume context populated
GHSL height coverage	96.5%	GHSL height/non-residential volume context populated
building denominator coverage	95.8%	Google/Microsoft building denominator fields populated
public anchor reconciliation share	24.6%	cells touched by current public-anchor reconciliation

Precision summary

precision ledger

All names are exported at grid-row level, but source precision is mixed: cell geometry, district context, nearest 1 km raster points, vector overlays, 2 km buffers, and one-hop graph context.

Spatial/city context10

Source scale: 0.01-degree grid geometry plus Census district IDs.

Exported scale: 0.01-degree grid cells; median cell area 1.40 km2, equivalent square side about 1.18 km. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km. Vendor city grid footprint; median represented city coverage is 135.5 km2 (98 cells), equivalent square side about 11.6 km.

Read as: Centroids and local x/y/radius are cell geometry; censuscode is a district assignment.

Nightlights11

Source scale: VIIRS-derived district panel values, not raw pixel values in this CSV.

Exported scale: District aggregate joined to each cell by censuscode. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: Current precision is district-level even though the underlying satellite product is finer.

Census housing/amenity assets23

Source scale: Census 2011 Houselisting/HLPCA district-total percentage shares from the downloaded pigshell mirror.

Read as: Housing, amenity, and asset shares are old district context repeated on cells; they are not current cell-level observations.

Hazard and flood context7

Source scale: District flood-atlas JSON records.

Read as: Flood fields should not be read as within-cell flood pixels.

Population denominator5

Source scale: WorldPop 2020 1 km ASCII XYZ point grid.

Exported scale: Nearest 1 km source point to each 0.01-degree cell centroid; population estimate scales density by cell area.

Read as: nearest_distance_km exposes the join quality per cell.

GHSL built-up/population10

Source scale: GHSL 2020 30-arcsecond population, built-surface, built-volume, height, and non-residential raster tiles.

Exported scale: Raster/source-layer values aggregated or joined to the 0.01-degree grid cell, about 1.18 km side in the current grid.

Read as: These are independent physical-denominator signals; they still require reconciliation against Census, WorldPop, and building footprints.

Denominator v3 and reconciliation33

Source scale: Mixed public denominator context: WorldPop, Census 2011 controls, GHSL built form, building footprints, and public anchors.

Exported scale: Cell-level reconciliation probes on the 0.01-degree grid. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: H_residential base/lower/upper fields are leakage-safe denominator probes but remain production-blocked until public-anchor QA and holdouts pass.

Building footprints32

Source scale: Google/Microsoft building footprint polygons, meter-scale vector geometries.

Exported scale: Footprint count/area/height/volume proxies aggregated into each 0.01-degree cell; exported precision is the cell, about 1.18 km side in the current grid.

Read as: Cell-level aggregation is about 1 km; footprint areas themselves are vector-derived.

Landcover and physical exclusions20

Source scale: ESA WorldCover/Dynamic World source-layer proxy fields plus local hard/soft exclusion logic.

Exported scale: Built/water/tree/crop/non-residential shares and exclusion flags written at 0.01-degree cell level, about 1.18 km side.

Read as: These fields can suppress or downweight physical impossibility; they do not label income or demand.

Morphology proxies6

Source scale: Derived morphology tags from building, landcover, density, and public-context proxies.

Exported scale: Cell-level scores for dense old city, informal density, highrise affordability, CBD/industrial, and periurban vacancy patterns.

Read as: Use as denominator or income modifiers only after source-disagreement and morphology holdout review.

Income gate and welfare context40

Source scale: Public income/welfare context: district income, NFHS, SHRUG/SECC/RWI-style fields, HLPCA context, and city priors.

Exported scale: Mixed district/city/cell proxy fields written to grid rows. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.

Read as: Income-gate fields estimate 0-10 LPA probability as a probe; they are not production-calibrated income truth.

Road/serviceability4

Source scale: PMGSY road shapefile line vectors for available states.

Exported scale: Nearest-road distance from cell centroid, capped at 25 km; within_2km flag uses a 2 km threshold.

Read as: Coverage is state-limited; source_available and distance_missing must stay with the distance.

POI/serviceability8

Source scale: Point GeoJSON layers for education, police, airport, and energy POIs.

Exported scale: Counts inside the 0.01-degree cell and counts inside a 2 km centroid buffer.

Read as: The *_2km columns are intentionally broader than the grid cell.

Land use and slum context3

Source scale: Local polygon overlays: Mumbai slum clusters and Hyderabad land-use where overlapping.

Exported scale: Polygon-overlap area/share aggregated into each 0.01-degree cell.

Read as: Coverage is city-specific; zero may mean no local overlap source, not absence of slum or land use.

Conversion and map coverage18

Source scale: Public road, POI, OSM/Overture/ohsome proxy, settlement, addressability, and mapping-coverage context.

Exported scale: Cell-level conversion-feasibility and serviceability modifiers on the 0.01-degree grid.

Read as: These fields affect can-serve and conversion feasibility; they are not demand labels.

Execution readiness7

Source scale: Public connectivity and measurement-coverage proxies for delivery/payment/partner feasibility.

Exported scale: Cell-level readiness scores and coverage flags on the 0.01-degree grid.

Read as: Execution readiness is a weak acquirable-TAM modifier, not household demand or income evidence.

GeoHG graph context107

Source scale: One-hop graph aggregates over adjacent 0.01-degree grid cells.

Exported scale: 8-neighbour GeoHG context on the same grid; one-hop ring is adjacent/diagonal cells, roughly 3.6 km across including the center cell.

Read as: Graph context inherits precision from its base column and adds one 8-neighbour ring of smoothing.

TAM score outputs27

Source scale: Deterministic gap-closure formulas over source-derived fields.

Exported scale: Probe, v2, v3, interval, confidence, priority, serviceable, acquirable, and power-transform inputs at grid-cell level.

Read as: These are outputs and audit fields, not independent input features for supervised training.

Status and reason codes1

Source scale: Pipeline status strings and reason-code outputs.

Exported scale: Per-row flags that explain blockers, fallbacks, proxy status, and calibration state.

Read as: Status fields support audit and filtering; they should not be modeled as demand signals.

Full signal and score inventory

current signal surface

Column chips come from cell_features_geohg_style.csv, source_layer_cell_features_manifest.json, and all non-ID columns in tam_gap_closure_features.csv. Output/status groups are kept visible but are not independent training features.

Spatial/city context10

centroid_loncentroid_latpos_x_kmpos_y_kmpos_radius_kmcity_grid_colcity_grid_rowcity_cell_countcensuscodegrid_area_m2

Nightlights11

nightlight_log1p_mean_2012nightlight_log1p_mean_2019nightlight_log1p_mean_2024nightlight_mean_2012nightlight_mean_2019nightlight_mean_2024nightlight_sum_2012nightlight_sum_2019nightlight_sum_2024nightlight_mean_growth_2012_2024nightlight_bin_id

Census housing/amenity assets23

census_hl_precision_levelcensus_hl_context_presentcensus_hl_housing_quality_scorecensus_hl_basic_amenity_scorecensus_hl_asset_affluence_scorecensus_hl_amenity_deficit_scorecensus_hl_housing_good_sharecensus_hl_roof_concrete_sharecensus_hl_wall_burnt_brick_or_concrete_sharecensus_hl_floor_finished_sharecensus_hl_rooms_3plus_sharecensus_hl_electricity_sharecensus_hl_latrine_sharecensus_hl_lpg_png_sharecensus_hl_banking_sharecensus_hl_tv_sharecensus_hl_computer_internet_sharecensus_hl_mobile_phone_sharecensus_hl_scooter_motorcycle_sharecensus_hl_car_sharecensus_hl_no_asset_sharecensus_hl_source_pathcensus_hl_affluence_context_score

Hazard and flood context7

VulnerabilityHazardExposureRiskMaxAreaMaxFractionflood_risk_bin_id

Population denominator5

worldpop_density_people_per_km2worldpop_density_log1pworldpop_nearest_distance_kmworldpop_population_est_nearestworldpop_households_est_avg_size_4_6

GHSL built-up/population10

ghsl_population_estghsl_population_source_yearghsl_builtup_shareghsl_built_surface_m2ghsl_built_volume_m3ghsl_height_mean_mghsl_non_res_builtup_shareghsl_non_res_volume_shareghsl_settlement_scoreghsl_residential_candidate_share

Denominator v3 and reconciliation33

census_control_populationcensus_control_householdshouseholds_est_primary_probehousehold_size_census_contexthouseholds_est_worldpop_census_avg_sizedenominator_confidencesource_disagreement_log_ratioreconciled_population_probereconciled_households_probedenominator_population_source_countdenominator_disagreement_scoreimpossible_market_flagdenominator_v2_statuspopulation_prior_basepopulation_prior_lowerpopulation_prior_upperhousehold_size_admin_v3residential_allocation_weighthard_exclusion_share_v3non_residential_suppression_score_v3vertical_density_correctionoccupancy_correctionh_residential_households_unreconciledadmin_anchor_coverage_sharepublic_anchor_reconciliation_factorpublic_anchor_calibration_errorpublic_anchor_reconciliation_statush_residential_households_baseh_residential_households_lowerh_residential_households_upperh_residential_denominator_confidencehousehold_denominator_v3_statushousehold_denominator_status

Building footprints32

gobi_building_count_cellgobi_building_area_sum_m2gobi_building_area_sharegobi_building_area_density_km2gobi_building_area_mean_m2gobi_building_confidence_meanmsft_building_count_cellmsft_building_area_sum_m2msft_building_area_sharemsft_building_area_density_km2msft_building_area_mean_m2msft_building_height_mean_mbuilding_count_cell_bestbuilding_area_sum_m2_bestbuilding_count_density_per_km2_bestbuilding_area_density_km2_bestbuilding_area_share_bestbuilding_source_coverage_countbuilding_source_coverage_flagbuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_area_disagreement_log_ratiobuilding_msft_gobi_count_disagreement_log_ratiobuilding_msft_gobi_disagreement_flagbuilding_residential_density_scorebuilding_cluster_compactnessbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_score

Landcover and physical exclusions20

landcover_builtup_sharelandcover_water_sharelandcover_tree_forest_sharelandcover_crop_sharelandcover_non_residential_exclusion_sharedynamic_world_built_probabilityresidential_eligible_area_sharephysical_hard_cell_flaglandcover_proxy_source_flaghard_exclusion_sharesoft_non_residential_downweight_sharehard_mask_reason_codebuilt_form_proxy_scoreresidential_confidence_proberesidential_eligible_area_share_v2physical_exclusion_scoreresidential_filter_confidenceresidential_filter_statusnon_residential_exclusion_scoreresidential_status

Morphology proxies6

morph_dense_old_city_scoremorph_informal_dense_scoremorph_highrise_affordable_scoremorph_cbd_industrial_scoremorph_periurban_vacant_scoremorphology_proxy_source_flag

Income gate and welfare context40

income_public_affluence_context_scoreincome_public_deprivation_context_scoreincome_public_asset_affluence_scoreincome_public_amenity_deficit_scoreincome_public_license_status_codeincome_public_proxy_source_flagincome_gate_city_segment_codeincome_gate_city_segment_prior_0_10lpa_probincome_gate_city_prior_0_10lpa_probincome_gate_district_income_pciincome_gate_district_income_affluence_scoreincome_gate_admin_affluence_scoreincome_gate_admin_deprivation_scoreincome_gate_nfhs_affluence_scoreincome_gate_nfhs_deprivation_scoreincome_gate_shrug_rwi_scoreincome_gate_shrug_consumption_scoreincome_gate_shrug_asset_affluence_scoreincome_gate_meta_rwi_rawincome_gate_meta_rwi_scoreincome_gate_meta_rwi_errorincome_gate_cell_wealth_scoreincome_gate_external_affluence_scoreincome_gate_source_countincome_gate_confidenceincome_gate_granularity_codeincome_gate_granularityincome_gate_statusincome_0_10lpa_prob_pre_gate_proxyincome_gate_prob_candidateincome_gate_final_weightincome_gate_adjustment_deltaincome_0_10lpa_prob_probeincome_proxy_confidenceincome_0_10lpa_prob_lower_v3income_0_10lpa_prob_upper_v3income_gate_context_affluence_scorebase_affluence_proxy_scoreaffluence_proxy_scoreincome_proxy_status

Road/serviceability4

pmgsy_road_nearest_distance_kmpmgsy_road_within_2kmpmgsy_road_source_availablepmgsy_road_distance_missing

POI/serviceability8

poi_education_count_cellpoi_education_count_2kmpoi_police_count_cellpoi_police_count_2kmpoi_airport_count_cellpoi_airport_count_2kmpoi_energy_count_cellpoi_energy_count_2km

Land use and slum context3

mumbai_slum_area_m2mumbai_slum_sharelanduse_overlap_share

Conversion and map coverage18

road_distance_mroad_intersection_densitysettlement_cluster_sizeaddressability_scorepoi_service_mix_scoreosm_mapping_coverage_scoreoverture_road_coverage_scoreconversion_proxy_source_flagroad_access_scorepoi_service_access_scoreserviceability_supply_friction_scoreserviceable_prob_probeserviceability_confidencemap_coverage_confidenceconversion_feasibility_scoreconversion_feasibility_confidenceconversion_feasibility_statusserviceability_status

Execution readiness7

mlab_measurement_coverage_scoreconnectivity_readiness_scoreexecution_proxy_source_flagexecution_readiness_signal_countexecution_readiness_scoreexecution_readiness_confidenceexecution_readiness_status

GeoHG graph context107

graph_ctx_neighbor_mean_gobi_building_count_cellgraph_ctx_self_minus_neighbor_gobi_building_count_cellgraph_ctx_neighbor_mean_gobi_building_area_sum_m2graph_ctx_self_minus_neighbor_gobi_building_area_sum_m2graph_ctx_neighbor_mean_gobi_building_area_sharegraph_ctx_self_minus_neighbor_gobi_building_area_sharegraph_ctx_neighbor_mean_gobi_building_area_density_km2graph_ctx_self_minus_neighbor_gobi_building_area_density_km2graph_ctx_neighbor_mean_gobi_building_area_mean_m2graph_ctx_self_minus_neighbor_gobi_building_area_mean_m2graph_ctx_neighbor_mean_gobi_building_confidence_meangraph_ctx_self_minus_neighbor_gobi_building_confidence_meangraph_ctx_neighbor_mean_msft_building_count_cellgraph_ctx_self_minus_neighbor_msft_building_count_cellgraph_ctx_neighbor_mean_msft_building_area_sum_m2graph_ctx_self_minus_neighbor_msft_building_area_sum_m2graph_ctx_neighbor_mean_msft_building_area_sharegraph_ctx_self_minus_neighbor_msft_building_area_sharegraph_ctx_neighbor_mean_msft_building_area_density_km2graph_ctx_self_minus_neighbor_msft_building_area_density_km2graph_ctx_neighbor_mean_msft_building_area_mean_m2graph_ctx_self_minus_neighbor_msft_building_area_mean_m2graph_ctx_neighbor_mean_msft_building_height_mean_mgraph_ctx_self_minus_neighbor_msft_building_height_mean_mgraph_ctx_neighbor_mean_building_count_cell_bestgraph_ctx_self_minus_neighbor_building_count_cell_bestgraph_ctx_neighbor_mean_building_area_share_bestgraph_ctx_self_minus_neighbor_building_area_share_bestgraph_ctx_neighbor_mean_building_area_density_km2_bestgraph_ctx_self_minus_neighbor_building_area_density_km2_bestgraph_ctx_neighbor_mean_building_residential_density_scoregraph_ctx_self_minus_neighbor_building_residential_density_scoregraph_ctx_neighbor_mean_building_population_per_buildinggraph_ctx_self_minus_neighbor_building_population_per_buildinggraph_ctx_neighbor_mean_building_footprint_area_per_person_m2graph_ctx_self_minus_neighbor_building_footprint_area_per_person_m2graph_ctx_neighbor_mean_building_source_coverage_flaggraph_ctx_self_minus_neighbor_building_source_coverage_flaggraph_ctx_neighbor_mean_building_msft_gobi_disagreement_flaggraph_ctx_self_minus_neighbor_building_msft_gobi_disagreement_flaggraph_ctx_neighbor_mean_ghsl_population_estgraph_ctx_self_minus_neighbor_ghsl_population_estgraph_ctx_neighbor_mean_ghsl_builtup_sharegraph_ctx_self_minus_neighbor_ghsl_builtup_sharegraph_ctx_neighbor_mean_ghsl_settlement_scoregraph_ctx_self_minus_neighbor_ghsl_settlement_scoregraph_ctx_neighbor_mean_landcover_builtup_sharegraph_ctx_self_minus_neighbor_landcover_builtup_sharegraph_ctx_neighbor_mean_landcover_water_sharegraph_ctx_self_minus_neighbor_landcover_water_sharegraph_ctx_neighbor_mean_landcover_tree_forest_sharegraph_ctx_self_minus_neighbor_landcover_tree_forest_sharegraph_ctx_neighbor_mean_landcover_crop_sharegraph_ctx_self_minus_neighbor_landcover_crop_sharegraph_ctx_neighbor_mean_landcover_non_residential_exclusion_sharegraph_ctx_self_minus_neighbor_landcover_non_residential_exclusion_sharegraph_ctx_neighbor_mean_residential_eligible_area_sharegraph_ctx_self_minus_neighbor_residential_eligible_area_sharegraph_ctx_neighbor_mean_physical_hard_cell_flaggraph_ctx_self_minus_neighbor_physical_hard_cell_flaggraph_ctx_neighbor_mean_road_distance_mgraph_ctx_self_minus_neighbor_road_distance_mgraph_ctx_neighbor_mean_road_intersection_densitygraph_ctx_self_minus_neighbor_road_intersection_densitygraph_ctx_neighbor_mean_settlement_cluster_sizegraph_ctx_self_minus_neighbor_settlement_cluster_sizegraph_ctx_neighbor_mean_building_cluster_compactnessgraph_ctx_self_minus_neighbor_building_cluster_compactnessgraph_ctx_neighbor_mean_addressability_scoregraph_ctx_self_minus_neighbor_addressability_scoregraph_ctx_neighbor_mean_poi_service_mix_scoregraph_ctx_self_minus_neighbor_poi_service_mix_scoregraph_ctx_neighbor_mean_osm_mapping_coverage_scoregraph_ctx_self_minus_neighbor_osm_mapping_coverage_scoregraph_ctx_neighbor_mean_mlab_measurement_coverage_scoregraph_ctx_self_minus_neighbor_mlab_measurement_coverage_scoregraph_ctx_neighbor_mean_connectivity_readiness_scoregraph_ctx_self_minus_neighbor_connectivity_readiness_scoregraph_ctx_neighbor_mean_poi_education_count_2kmgraph_ctx_self_minus_neighbor_poi_education_count_2kmgraph_ctx_neighbor_mean_poi_police_count_2kmgraph_ctx_self_minus_neighbor_poi_police_count_2kmgraph_ctx_neighbor_mean_poi_airport_count_2kmgraph_ctx_self_minus_neighbor_poi_airport_count_2kmgraph_ctx_neighbor_mean_poi_energy_count_2kmgraph_ctx_self_minus_neighbor_poi_energy_count_2kmgraph_ctx_neighbor_mean_nightlight_log1p_mean_2024graph_ctx_self_minus_neighbor_nightlight_log1p_mean_2024graph_ctx_neighbor_mean_nightlight_mean_growth_2012_2024graph_ctx_self_minus_neighbor_nightlight_mean_growth_2012_2024graph_ctx_neighbor_mean_Riskgraph_ctx_self_minus_neighbor_Riskgraph_ctx_neighbor_mean_MaxFractiongraph_ctx_self_minus_neighbor_MaxFractiongraph_ctx_neighbor_mean_mumbai_slum_sharegraph_ctx_self_minus_neighbor_mumbai_slum_sharegraph_ctx_neighbor_mean_worldpop_density_people_per_km2graph_ctx_self_minus_neighbor_worldpop_density_people_per_km2graph_ctx_neighbor_mean_worldpop_population_est_nearestgraph_ctx_self_minus_neighbor_worldpop_population_est_nearestgraph_ctx_neighbor_mean_pmgsy_road_nearest_distance_kmgraph_ctx_self_minus_neighbor_pmgsy_road_nearest_distance_kmgraph_ctx_neighbor_mean_pmgsy_road_within_2kmgraph_ctx_self_minus_neighbor_pmgsy_road_within_2kmgraph_ctx_neighbor_mean_pos_radius_kmgraph_ctx_self_minus_neighbor_pos_radius_kmgraph_degree

TAM score outputs27

gross_tam_0_10lpa_probeserviceable_tam_0_10lpa_probeacquirable_tam_0_10lpa_probehouseholds_denominator_v2eligible_households_v2gross_tam_0_10lpa_v2serviceable_tam_0_10lpa_v2acquirable_tam_0_10lpa_v2households_residential_v3households_residential_v3_lowerhouseholds_residential_v3_upperscope_share_in_scope_v3scope_share_status_v3gross_tam_0_10lpa_v3gross_tam_0_10lpa_v3_lowergross_tam_0_10lpa_v3_upperserviceable_tam_0_10lpa_v3acquirable_tam_0_10lpa_v3component_confidence_scorepriority_score_0_100component_confidence_score_v2priority_score_v2_0_100tam_v2_statuscomponent_confidence_score_v3priority_score_v3_0_100tam_v3_statuscalibration_status

Status and reason codes1

reason_codes

Actual scoring math

formula contract

The scoring path is deterministic and ordered: households first, then residential likelihood, income-band probability, conversion/serviceability, execution readiness, v2/v3 TAM outputs, and the no-vendor power score used by maps. The exact formulas come from outputs/tam_gap_closure/tam_gap_closure_manifest.json; vendor TAM and G1 are excluded from all formulas.

1. Household denominatorhouseholds_est_primary_probe

unit: households per grid cell

first_non_null(households_est_worldpop_census_avg_size, worldpop_households_est_avg_size_4_6, households_est_uniform_district_density, 0), clipped at lower bound 0

Intuition: Estimate how many households live in the cell before any TAM probability is applied.

clip(census_2011_avg_household_size, 3.0, 7.5), with missing filled as 4.6
worldpop_population_est_nearest / household_size_census_context
clip(0.35*population_quality + 0.25*distance_quality + 0.25*census_quality + 0.15*disagreement_quality, 0, 1)

Guardrail: WorldPop and Census district context are independent inputs; vendor TAM and G1 are not read.

2. Residential likelihoodresidential_confidence_probe

unit: probability-like confidence, 0 to 1

min(raw_residential_confidence, 0.84 if building_source_coverage_flag > 0 else 0.80)

Intuition: Down-weight cells that look industrial, sparse, or weakly residential before counting them as addressable households.

clip(0.50*city_rank(worldpop_density_people_per_km2) + 0.20*city_rank(poi_context) + 0.30*city_rank(building_residential_density_score), 0, 1)
clip(5.0 * mumbai_slum_share, 0, 1)
0.35 when airport_or_energy_poi_count > 0 and worldpop_density_city_rank < 0.35, else 0.0

Guardrail: Capped because land-cover and true residential masks are not production-complete yet.

3. Income band probabilityincome_0_10lpa_prob_probe

unit: probability-like share, 0.50 to 0.97

blend pre-gate and gate candidate with weight clip(0.40 + 0.34*income_gate_confidence, 0, 0.74) when gate prior is available

Intuition: Treat 0-10 LPA as broad affordability eligibility: brighter, denser, more asset-rich cells are less likely to be in the lower band, while HLPCA amenity deficit increases the probability.

clip(0.50*city_rank(nightlight_log1p_mean_2024) + 0.25*city_rank(poi_context) + 0.15*city_rank(worldpop_density_people_per_km2) - 0.10*slum_residential_signal, 0, 1)
when HLPCA present: clip(0.45*census_hl_asset_affluence_score + 0.25*census_hl_housing_quality_score + 0.20*census_hl_basic_amenity_score + 0.10*(1 - census_hl_amenity_deficit_score), 0, 1)
legacy base/HLPCA affluence blended toward income_gate_context_affluence_score with weight clip(0.20 + 0.32*income_gate_confidence, 0, 0.52)
clip(0.30 + 0.20*nightlight_present + 0.16*census_hl_context_present + 0.18*income_gate_confidence + 0.06*poi_context_positive + 0.05*slum_signal_positive, 0, 0.86)

Guardrail: HLPCA is 2011 district context from a scraped mirror and stays probe-only until official Houselisting validation and MPCE calibration pass.

4. Serviceability probabilityserviceable_prob_probe

unit: probability-like share, 0.20 to 0.90

clip(0.35 + 0.55*serviceability_supply_friction_score, 0.20, 0.90)

Intuition: Convert public road, POI, and graph accessibility into a rough serviceability multiplier.

if pmgsy_road_source_available > 0 then 1 - clip(pmgsy_road_nearest_distance_km / 5.0, 0, 1), else 0.40
clip(0.45*road_access_score + 0.35*city_rank(poi_context) + 0.20*city_rank(graph_degree), 0, 1)
clip(0.55*map_coverage_confidence + 0.45*poi_context_present, 0, 0.80)

Guardrail: Internal branch, capacity, partner, cost, and operations coverage are not present, so this remains a probe.

5. Current TAM score familygross_v2, predicted_power, serviceable, acquirable, priority_v2

unit: households for TAM scores; 0-100 for priority

h_residential_households_base * income_0_10lpa_prob_probe * scope_share_in_scope_v3 | predicted_tam_0_10lpa_power = globally rescaled power(gross_tam_0_10lpa_v3, gamma=0.60), preserving gross TAM total | gross_tam_0_10lpa_v3 * conversion_feasibility_score | serviceable_tam_0_10lpa_v3 * execution_readiness_score | clip(100 * city_rank(acquirable_tam_0_10lpa_v3) * component_confidence_score_v3, 0, 100)

Intuition: The current map score is not a learned vendor-TAM model: it transforms the no-vendor gross v3 TAM surface into a TAM-like score while preserving the base total.

city_rank(x) = pandas groupby(city).rank(pct=True, method='average'), then fill missing with 0.5 and clip to [0, 1]
clip(0.35*denominator_confidence + 0.25*residential_confidence_probe + 0.20*income_proxy_confidence + 0.20*serviceability_confidence, 0, 1)
legacy probe replay remains: clip(households_est_primary_probe * income_0_10lpa_prob_probe * residential_confidence_probe, lower=0) | gross_tam_0_10lpa_probe * serviceable_prob_probe | clip(100 * city_rank(serviceable_tam_0_10lpa_probe) * component_confidence_score, 0, 100)

Guardrail: Vendor TAM and G1 are benchmark-only after formulas are frozen; the power score uses no vendor scaling.

numeric replaygrid_3335

Delhi: values below are recomputed from the CSV row, without benchmark labels.

gross TAM22324.6 households * 0.674 income * 0.655 residential = 9861.7

serviceable TAM9861.7 gross * 0.671 serviceable = 6621.0

confidence0.35*0.833 + 0.25*0.655 + 0.20*0.842 + 0.20*0.643 = 0.752

priority100 * city_rank(6621.0) * 0.752 = 75.2

Current v3 math funnel dry run

math funnel

Dry runs now follow the current v3 score family.

The table below recomputes each displayed step from generated artifacts. The power step uses policy global_no_vendor_base_total_preserved; the scale constant is recomputed inside the artifact being scored, so vendor-grid and full-India runs can have different constants.

Vendor TAM and G1 are not inputs to any row in this funnel; they remain benchmark-only diagnostics later in the page.

grid	city	funnel step	dry-run math	computed	artifact	delta	status
`grid_636`	Saharanpur	residential households v3	`source-derived H_residential`	11041.102687	11041.102687	0	pass
`grid_636`	Saharanpur	gross v3 TAM	`11041.10 * income_prob 0.8432`	9309.671076	9309.671076	0	pass
`grid_636`	Saharanpur	serviceable v3 TAM	`9309.67 * conversion 0.8834`	8223.825574	8223.825574	0	pass
`grid_636`	Saharanpur	acquirable v3 TAM	`8223.83 * execution 0.7772`	6391.550980	6391.550980	0	pass
`grid_636`	Saharanpur	priority v3 score	`100 * city_rank 1.0000 * confidence 0.7261`	72.605000	72.605000	0	pass
`grid_636`	Saharanpur	predicted power TAM	`gross_v3^0.60 * artifact_scale 21.8950`	5268.734394	4528.074581	740.66	review
`grid_639`	Saharanpur	residential households v3	`source-derived H_residential`	9879.815405	9879.815405	0	pass
`grid_639`	Saharanpur	gross v3 TAM	`9879.82 * income_prob 0.8423`	8321.867968	8321.867968	0	pass
`grid_639`	Saharanpur	serviceable v3 TAM	`8321.87 * conversion 0.9019`	7505.308004	7505.308004	0	pass
`grid_639`	Saharanpur	acquirable v3 TAM	`7505.31 * execution 0.7800`	5854.140243	5854.140243	0	pass
`grid_639`	Saharanpur	priority v3 score	`100 * city_rank 0.9859 * confidence 0.7261`	71.582394	71.582394	0	pass
`grid_639`	Saharanpur	predicted power TAM	`gross_v3^0.60 * artifact_scale 21.8950`	4925.816125	4197.738645	728.077	review

Formula replay dry run

no-write replay

This recomputes final probe arithmetic from the generated CSV values without rerunning feature builders or reading benchmark labels. A pass means the HTML-visible math matches the written scoring columns.

grid	city	output	recomputed	written	status
`grid_3335`	Delhi	`gross_tam_0_10lpa_probe`	9861.666591	9861.666591	pass
`grid_3335`	Delhi	`serviceable_tam_0_10lpa_probe`	6621.000371	6621.000371	pass
`grid_3335`	Delhi	`component_confidence_score`	0.752394	0.752394	pass
`grid_3335`	Delhi	`priority_score_0_100`	75.239389	75.239389	pass
`grid_3333`	Delhi	`gross_tam_0_10lpa_probe`	8933.315395	8933.315395	pass
`grid_3333`	Delhi	`serviceable_tam_0_10lpa_probe`	6290.567568	6290.567568	pass
`grid_3333`	Delhi	`component_confidence_score`	0.748927	0.748927	pass
`grid_3333`	Delhi	`priority_score_0_100`	74.845875	74.845875	pass
`grid_3291`	Delhi	`gross_tam_0_10lpa_probe`	9592.349986	9592.349986	pass
`grid_3291`	Delhi	`serviceable_tam_0_10lpa_probe`	6070.121154	6070.121154	pass
`grid_3291`	Delhi	`component_confidence_score`	0.747012	0.747012	pass
`grid_3291`	Delhi	`priority_score_0_100`	74.607896	74.607896	pass
`grid_3328`	Delhi	`gross_tam_0_10lpa_probe`	8165.752869	8165.752869	pass
`grid_3328`	Delhi	`serviceable_tam_0_10lpa_probe`	5605.253802	5605.253802	pass
`grid_3328`	Delhi	`component_confidence_score`	0.749787	0.749787	pass
`grid_3328`	Delhi	`priority_score_0_100`	74.838171	74.838171	pass
`grid_1176`	Mumbai	`gross_tam_0_10lpa_probe`	7858.588816	7858.588816	pass
`grid_1176`	Mumbai	`serviceable_tam_0_10lpa_probe`	5571.061529	5571.061529	pass
`grid_1176`	Mumbai	`component_confidence_score`	0.770855	0.770855	pass
`grid_1176`	Mumbai	`priority_score_0_100`	77.085534	77.085534	pass

VBuilding footprintspp. 16-18

Building Footprints

The HTML now reflects the newer Microsoft and combined building-density code path, while making the current artifact coverage explicit.

code path

Microsoft is wired in.

Stage 1 can select AOI shards and Stage 2 has Microsoft and combined building fields.

↔

artifact state

Microsoft coverage is active.

The current Microsoft manifest contributes footprint coverage to the combined signal.

source	status	coverage
Google Open Buildings	ok	9.3%
Microsoft Global Buildings	ok	91.6%
Combined building signal	ok	95.8%

Building-related columns

gobi_building_count_cellgobi_building_area_sum_m2gobi_building_area_sharegobi_building_area_density_km2gobi_building_area_mean_m2gobi_building_confidence_meanmsft_building_count_cellmsft_building_area_sum_m2msft_building_area_sharemsft_building_area_density_km2msft_building_area_mean_m2msft_building_height_mean_mbuilding_count_cell_bestbuilding_area_sum_m2_bestbuilding_count_density_per_km2_bestbuilding_area_density_km2_bestbuilding_area_share_bestbuilding_source_coverage_countbuilding_residential_density_scorebuilding_source_coverage_flagbuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_disagreement_flagbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_scorebuilding_cluster_compactness

Microsoft AOI staging selected 24 shards and expected 1617.8 MB; the run status is ok. Current combined building coverage reflects available Google and Microsoft footprint signals.

VIPredicted TAM layerpp. 19-23

Predicted TAM Layer

The newer pipeline promotes one explicit score column for maps and summaries: predicted_tam_0_10lpa_power. It is a deterministic transform of the no-vendor gross TAM base, not a learned vendor-TAM model.

current score

Magnitude stays count-like.

The transform reshapes the distribution to improve benchmark-grid Pearson agreement while preserving Spearman order and preserving the source-derived gross-TAM total.

The selected score is used consistently by the map, full-India scorer, notebook metric summary, and post-hoc G1 suite.

Output columnpredicted_tam_0_10lpa_power

Selected score used by map, full-India score index, notebook summary, and post-hoc suite.

Base columngross_tam_0_10lpa_v3

Source-derived gross TAM base. Vendor TAM is not used for scaling.

Gamma0.60

Fixed monotone power exponent selected by Cell-2 transform diagnostics.

Scale policyglobal_no_vendor_base_total_preserved

Power weights are rescaled to the no-vendor base total, not to a vendor mean.

Vendor Pearson0.788

Benchmark-grid sanity metric only.

G1 rank ceiling R20.558

Order-only ceiling diagnostic; not TAM magnitude.

Cell-2 transform decision

field	value	status
primary_map_mechanism	power_060_scaled_gross_tam_v2	current
rank_ceiling_mechanism	rank_pct_for_order_diagnostic_only	current
rejected_primary_mechanism	vendor_mean_scaled_rank_pct	current
map_predicted_vs_vendor_pearson	0.7881785302692829	current
map_predicted_vs_vendor_spearman	0.8349283824789576	current
map_predicted_vs_g1_spearman	0.7469475181022232	current
top10_predicted_power_g1_capture	0.46703212689927004	current
top10_vendor_tam_g1_capture	0.42434916553876123	current
city_holdout_invalidates_full_caught_up_claim	True	current

Headline power checks

check	value	note
`predicted_vs_vendor_pearson`	0.788179	headline check
`predicted_vs_vendor_spearman`	0.834928	headline check
`rank_pct_transform_ceiling_r2`	0.557931	headline check
`full_india_scored_cells`	2905288.000000	headline check
`vendor_comparison_cells`	7029.000000	headline check

Holdout transform comparison

group	mechanism	groups	metric 2 R2 median	metric 3 R2 median	caught-up median
city	`identity`	29	0.359	0.400	0.969
city	`log1p`	29	0.614	0.592	1.041
city	`power_060`	29	0.648	0.654	0.992
city	`rank_pct`	29	0.649	0.659	1.005
city	`sqrt1p`	29	0.531	0.538	0.953
city	`yeo_johnson`	29	0.648	0.654	0.992
spatial_block	`identity`	11	0.310	0.321	0.998
spatial_block	`log1p`	11	0.543	0.454	1.097
spatial_block	`power_060`	11	0.584	0.549	1.031
spatial_block	`rank_pct`	11	0.583	0.541	1.040
spatial_block	`sqrt1p`	11	0.430	0.415	1.071
spatial_block	`yeo_johnson`	11	0.584	0.549	1.031

Top-k G1 capture from transform report

score	top fraction	grid count	G1 capture	lift
`predicted_power_tam`	5.0%	352	23.0%	4.60
`vendor_tam`	5.0%	352	24.3%	4.86
`predicted_power_tam`	10.0%	703	46.7%	4.67
`vendor_tam`	10.0%	703	42.4%	4.24
`predicted_power_tam`	20.0%	1,406	71.2%	3.56
`vendor_tam`	20.0%	1,406	67.5%	3.38

The transform report also says city holdouts invalidate a full caught-up claim: True. This section documents the chosen display/scoring layer; it does not authorize production accuracy claims.

VIIFull-India mappp. 24-29

Full-India Map & Scores

The new full-India path separates geometry indexing from scoring. Geometry is built from the 0.01-degree India grid; scoring then writes a source-derived score CSV, compact score index, and manifest.

03 - national score and map artifacts

Scored cells2,905,288

Every indexed 0.01-degree India cell scored in the current manifest.

Grid step0.01

Degrees per cell; about 1.1 km north-south.

District/city groups631

Groups from the district/city context join.

Candidate cells8,868,492

Boundary grid candidates before India-boundary filtering.

Row x col index3,033 x 2,924

Compact row-major map index dimensions.

Predicted total118,559,058

No-vendor gross-TAM total preserved after the power transform.

Full-India score distributions

score column	non-null cells	total	p50	p90	max	meaning
`predicted_tam_0_10lpa_power`	2,905,288	118,559,058	30.56	80.87	2741.69	probe distribution
`gross_tam_0_10lpa_v3`	2,905,288	118,559,058	16.20	81.99	29119.34	probe distribution
`serviceable_tam_0_10lpa_v3`	2,905,288	48,898,077	6.22	32.07	14441.63	probe distribution
`acquirable_tam_0_10lpa_v3`	2,905,288	8,153,552	1.04	5.35	2407.43	probe distribution
`priority_score_v3_0_100`	2,905,288	86033814.6	28.87	55.03	68.68	probe distribution

Full-India source status

source family	current status detail	mode
`grid_index`	{"cell_count":2905288}	source-derived
`district_context`	{"district_join_share":1.0}	source-derived
`worldpop_population`	{"coverage_share":1.0,"median_nearest_distance_km":0.3567097458094222,"p95_nearest_distance_km":0.5752521808981806,"source_rows":4010402}	source-derived
`poi_context`	{"education_source_points":19502,"police_source_points":16459,"airport_source_points":2710,"energy_source_points":534}	source-derived
`formula_inputs`	{"building_footprints":"not_rebuilt_for_direct_full_india_scoring_zero_coverage_flags_used","pmgsy_roads":"not_rebuilt_for_direct_full_india_scoring_missing_state_fallback_used","graph_degree":"regular_grid_degree_8_proxy_for_full_india_direct_scoring"}	source-derived

Score pass/fail checks

check	value
all_grid_cells_scored	True
grid_step_is_0_01_degree	True
vendor_tam_used_as_feature	False
score_index_written	True

Geometry-index pass/fail checks

check	value
grid_index_written	True
all_grid_cells_indexed	True
grid_step_is_0_01_degree	True
vendor_tam_used_as_production_feature	False
coarse_0_05_grid_published	False

Full-India scores are deterministic probe estimates. The manifest explicitly records vendor_tam_available_for_full_india as False and calibration status as probe_not_production_calibrated.

VIIIDiagnostics gatepp. 30-34

Diagnostics Gate

Stage 3 runs current benchmark diagnostics and claim-boundary checks, but it does not train, tune, calibrate, or fit against vendor TAM or G1.

component agreement

Benchmark-only rows join.

Probe and v2 component columns compare to vendor TAM for audit and sanity checks. These rows are not the current map-score metric.

↔

claim boundary

No production model metric.

The current diagnostics explicitly block accuracy claims until valid holdouts exist. The headline map score remains predicted_tam_0_10lpa_power, reported in the Predicted TAM and Notebook sections.

Component probe/v2 vs vendor checks

This table intentionally includes legacy probe rows such as gross_tam_0_10lpa_probe. Treat it as component QA; the current map-score Pearson/Spearman values are reported separately for predicted_tam_0_10lpa_power.

candidate	n	Pearson r	Spearman r	WMAPE	Top-10 overlap	validity
`gross_tam_0_10lpa_probe`	7,029	0.731	0.797	0.558	0.560	benchmark only
`serviceable_tam_0_10lpa_probe`	7,029	0.735	0.805	0.646	0.555	benchmark only
`priority_score_0_100`	7,029	0.607	0.749	0.978	0.408	benchmark only
`gross_tam_0_10lpa_v2`	7,029	0.763	0.837	0.546	0.620	benchmark only
`serviceable_tam_0_10lpa_v2`	7,029	0.733	0.845	0.683	0.582	benchmark only
`acquirable_tam_0_10lpa_v2`	7,029	0.652	0.828	0.804	0.512	benchmark only
`priority_score_v2_0_100`	7,029	0.613	0.778	0.978	0.428	benchmark only
`gross_tam_0_10lpa_v3`	7,029	0.764	0.842	0.499	0.616	benchmark only
`serviceable_tam_0_10lpa_v3`	7,029	0.737	0.848	0.607	0.587	benchmark only
`acquirable_tam_0_10lpa_v3`	7,029	0.666	0.834	0.749	0.519	benchmark only
`priority_score_v3_0_100`	7,029	0.619	0.783	0.978	0.448	benchmark only

G1 post-hoc holdout

candidate	kind	n	G1 hits	Spearman r	Top-10 G1 capture	validity
`geoiq_vendor_tam_benchmark`	benchmark_label	7,029	176,581	0.709	42.4%	benchmark_label_posthoc_holdout

IXNotebook metricspp. 35-38

Notebook & Post-Hoc Metrics

The notebook-facing summaries were updated to use the predicted power TAM column. They are useful for reporting, but remain post-hoc diagnostics because they join to vendor TAM and G1 after score generation.

reporting path

Same score, multiple views.

The short summary, city-wise table, and post-hoc suite all use predicted_tam_0_10lpa_power.

These outputs describe agreement and ranking behavior; they do not feed back into feature construction or transform scaling.

Notebook short metric summary

order	metric	n	Pearson r	Pearson R2	Spearman R2	caught-up pct
1	`predicted_tam_vs_vendor_tam`	7,029	0.790	0.623	0.708	n/a
2	`predicted_tam_vs_g1_hits`	7,029	0.435	0.189	0.552	n/a
3	`vendor_tam_vs_g1_hits`	7,029	0.406	0.165	0.502	n/a
4	`metric_2_divided_by_metric_3_caught_up`	7,029	1.072	1.149	1.100	114.9%

Post-hoc G1 overall suite

metric	n	Pearson r	Pearson R2	Spearman r	Spearman R2	log1p R2
`predicted_tam_vs_vendor_tam`	7,029	0.790	0.623	0.842	0.708	0.566
`predicted_tam_vs_g1_hits`	7,029	0.435	0.189	0.743	0.552	0.492
`vendor_tam_vs_g1_hits`	7,029	0.406	0.165	0.709	0.502	0.376
`metric_2_divided_by_metric_3_caught_up`	7,029	1.072	1.149	1.049	1.100	1.307

Post-hoc top-k suite

score	top fraction	grid count	G1 capture	lift	NDCG
`predicted_tam`	5.0%	351	23.6%	4.73	0.308
`predicted_tam`	10.0%	703	46.4%	4.64	0.455
`predicted_tam`	20.0%	1,406	70.1%	3.51	0.575
`vendor_tam`	5.0%	351	24.3%	4.86	0.314
`vendor_tam`	10.0%	703	42.4%	4.24	0.421
`vendor_tam`	20.0%	1,406	67.5%	3.38	0.554

Caught-up ratios can exceed 100% because they are ratios of post-hoc correlations or R-squares, not calibrated accuracy. Treat them as comparison diagnostics, not proof that the solution has matched or surpassed vendor TAM.

XLeakage policypp. 39-42

Leakage Policy

The strongest part of the current pipeline is the explicit claim boundary around forbidden labels and invalidated metrics.

flag	value
vendor_tam_used_as_feature	False
vendor_tam_used_as_benchmark_label	True
vendor_tam_used_as_training_label	False
vendor_tam_trained_diagnostics_excluded	True
g1_used_for_training	False
g1_used_as_feature	False
g1_used_for_source_selection	False
random_cv_allowed_for_accuracy_claim	False
production_accuracy_claim_allowed	False

check	value
spatial_holdout_metrics_present	False
city_holdout_metrics_present	False
g1_holdout_diagnostics_present	True
redundancy_review_required	True
multicollinearity_review_required	True
city_confounding_review_required	False
prediction_calibration_review_required	True
production_accuracy_claim_allowed	False

interpretation

Vendor TAM appears as a benchmark label, not as a feature or training label. G1 appears only after the fact. Random CV is not allowed for production accuracy claims.

XIGenerated outputspp. 43-48

Generated Outputs

The updated HTML is an audit reader over the files the current pipeline writes, grouped by the stage that owns them.

Generated outputs by stage

stage	output	path	status	current meaning
get_data	`source_fetch_manifest_json`	`outputs/source_fetch/source_fetch_manifest.json`	present json / 78.8 KB	Source registry fetch status, local file hashes, access notes, and next actions.
get_data	`source_fetch_manifest_csv`	`outputs/source_fetch/source_fetch_manifest.csv`	present 49 rows	Tabular source-fetch ledger for review.
get_data	`direct_prior_art_download_manifest_json`	`outputs/source_fetch/direct_prior_art_download_manifest.json`	present json / 26.8 KB	Direct prior-art payload plan and availability record.
get_data	`direct_prior_art_download_manifest_csv`	`outputs/source_fetch/direct_prior_art_download_manifest.csv`	present 26 rows	Tabular direct-prior-art payload ledger.
get_data	`microsoft_buildings_aoi_manifest_json`	`outputs/source_fetch/microsoft_buildings_aoi_manifest.json`	present json / 1.1 KB	AOI shard selection and Microsoft building-footprint staging status.
get_data	`microsoft_buildings_aoi_manifest_csv`	`outputs/source_fetch/microsoft_buildings_aoi_manifest.csv`	present 24 rows	Tabular Microsoft AOI shard/staging record.
get_data	`ghsl_builtup_tiles_manifest_json`	`outputs/source_fetch/ghsl_builtup_2020_4326_30ss_tiles_manifest.json`	present json / 19.0 KB	GHSL 2020 built surface/volume AOI tile staging manifest.
get_data	`ghsl_builtup_tiles_manifest_csv`	`outputs/source_fetch/ghsl_builtup_2020_4326_30ss_tiles_manifest.csv`	present 24 rows	Tabular GHSL built surface/volume tile ledger.
get_data	`income_gate_source_fetch_manifest_json`	`outputs/source_fetch/income_gate/income_gate_source_fetch_manifest.json`	present json / 30.0 KB	Income-gate public-source fetch manifest and access ledger.
get_data	`income_gate_source_fetch_manifest_csv`	`outputs/source_fetch/income_gate/income_gate_source_fetch_manifest.csv`	present 19 rows	Tabular income-gate public-source fetch ledger.
source_layer	`source_layer_features_manifest`	`outputs/source_layers/source_layer_cell_features_manifest.json`	present json / 8.8 KB	Manifest for source-layer cell-feature materialization.
source_layer	`source_layer_contracts_json`	`outputs/source_layers/source_layer_contracts.json`	present json / 17.2 KB	Source-layer contracts, expected fields, source families, and promotion gates.
source_layer	`source_layer_contracts_csv`	`outputs/source_layers/source_layer_contracts.csv`	present 9 rows	Tabular source-layer contract ledger.
source_layer	`worldpop_source_layer_features`	`outputs/source_layers/worldpop_population_density/2020_ascii_xyz_local/cell_features.csv`	present 7,029 rows	WorldPop population-density source-layer cell features.
source_layer	`census_controls_source_layer_features`	`outputs/source_layers/census_2011_official_controls/2011_fixed_release/cell_features.csv`	present 7,029 rows	Census 2011 official-control source-layer cell features.
source_layer	`building_source_layer_features`	`outputs/source_layers/building_footprints_google_microsoft/google_2023_msft_staged_snapshot/cell_features.csv`	present 7,029 rows	Google/Microsoft building footprint, height, volume, and disagreement source-layer features.
source_layer	`ghsl_source_layer_features`	`outputs/source_layers/ghsl_population_builtup/named_release_required/cell_features.csv`	present 7,029 rows	GHSL population, built surface, built volume, height, settlement, and residential-candidate source-layer features.
source_layer	`landcover_source_layer_features`	`outputs/source_layers/esa_worldcover_dynamic_world/worldcover_2020_2021_dynamic_world_pinned_window_required/cell_features.csv`	present 7,029 rows	ESA/Dynamic World landcover proxy, physical exclusion, and residential eligibility source-layer features.
source_layer	`morphology_source_layer_features`	`outputs/source_layers/residential_morphology_tags/building_landcover_public_proxy_v1/cell_features.csv`	present 7,029 rows	Residential morphology proxy source-layer features.
source_layer	`income_public_source_layer_features`	`outputs/source_layers/income_public_context/income_gate_public_context_v2/cell_features.csv`	present 7,029 rows	Income-gate public welfare and affluence context source-layer features.
source_layer	`osm_overture_source_layer_features`	`outputs/source_layers/osm_overture_ohsome_roads_pois/dated_extract_required/cell_features.csv`	present 7,029 rows	Road, addressability, settlement, mapping coverage, and conversion source-layer features.
source_layer	`connectivity_source_layer_features`	`outputs/source_layers/connectivity_execution_readiness/dated_opencellid_ookla_mlab_required/cell_features.csv`	present 7,029 rows	Connectivity and execution-readiness source-layer features.
denominator	`cell_denominator_foundation`	`outputs/denominator_foundation/cell_denominator_foundation.csv`	present 7,029 rows	Current cell-level denominator controls.
denominator	`city_summary`	`outputs/denominator_foundation/city_denominator_foundation_summary.csv`	present 34 rows	City-level rollup for the stage that wrote it.
denominator	`manifest`	`outputs/denominator_foundation/denominator_foundation_manifest.json`	present json / 6.9 KB	Manifest for the denominator-foundation stage.
geohg	`cell_features`	`outputs/geohg_features/cell_features_geohg_style.csv`	present 7,029 rows	Primary independent feature table used by downstream diagnostics.
geohg	`cell_labels`	`outputs/geohg_features/cell_labels_vendor_tam.csv`	present 7,029 rows	Vendor TAM benchmark labels only; not a production training target.
geohg	`area_area_edges`	`outputs/geohg_features/area_area_edges.csv`	present 24,203 rows	GeoHG-style area graph edges.
geohg	`entity_area_edges`	`outputs/geohg_features/entity_area_edges.csv`	present 21,381 rows	Semantic entity-to-area edges.
geohg	`poi_area_edges`	`outputs/geohg_features/poi_area_edges.csv`	present 5,668 rows	POI/serviceability context edges.
geohg	`spatial_block_predictions`	`outputs/geohg_features/geohg_spatial_block_predictions.csv`	present 7,029 rows	Diagnostic spatial-block prediction artifact.
geohg	`metrics`	`outputs/geohg_features/geohg_feature_metrics.json`	present json / 1.4 KB	Feature bundle metrics and graph counts.
gap_closure	`features`	`outputs/tam_gap_closure/tam_gap_closure_features.csv`	present 7,029 rows	Deterministic TAM gap-closure probe features.
gap_closure	`city_summary`	`outputs/tam_gap_closure/tam_gap_closure_city_summary.csv`	present 34 rows	City-level rollup for the stage that wrote it.
gap_closure	`benchmark_by_city`	`outputs/tam_gap_closure/tam_gap_closure_benchmark_by_city.csv`	present 374 rows	City-level probe-vs-vendor benchmark breakdown.
gap_closure	`benchmark_metrics`	`outputs/tam_gap_closure/tam_gap_closure_benchmark_metrics.json`	present json / 2.8 KB	Overall probe-vs-vendor benchmark metrics.
gap_closure	`source_probe_csv`	`outputs/source_registry/source_probe_summary.csv`	present 21 rows	Source-probe readiness ledger used for production gating.
gap_closure	`source_probe_json`	`outputs/source_registry/source_probe_summary.json`	present json / 15.5 KB	JSON copy of source-probe readiness ledger.
transform	`cell2_decision`	`outputs/cell2_metric_transform_experiments/cell2_best_mechanism_decision.json`	present json / 1.5 KB	Selected transform policy and rejected alternatives.
transform	`cell2_analysis_html`	`outputs/cell2_metric_transform_experiments/cell2_best_mechanism_analysis.html`	present html / 8.4 KB	Standalone Cell-2 transform analysis report.
transform	`cell2_holdout_summary`	`outputs/cell2_metric_transform_experiments/cell2_best_mechanism_holdout_summary.csv`	present 12 rows	City and spatial-block transform holdout summary.
transform	`cell2_holdouts`	`outputs/cell2_metric_transform_experiments/cell2_best_mechanism_holdouts.csv`	present 240 rows	Detailed transform holdout rows.
transform	`cell2_topk`	`outputs/cell2_metric_transform_experiments/cell2_best_mechanism_topk.csv`	present 6 rows	Top-k G1 capture comparison for predicted power TAM and vendor TAM.
transform	`cell2_all_grid_ranking`	`outputs/cell2_metric_transform_experiments/cell2_best_mechanism_all_grid_ranking.csv`	present 6 rows	All-grid ranking evidence for transform diagnostics.
transform	`main_solution_power_summary_json`	`outputs/main_solution_power_tam_summary.json`	present json / 373 B	Compact headline checks for the power-transformed solution.
transform	`main_solution_power_summary_csv`	`outputs/main_solution_power_tam_summary.csv`	present 5 rows	CSV copy of the headline power-solution checks.
full_india	`full_india_scores_csv`	`outputs/full_india_scored/full_india_tam_scores.csv`	present 2,905,288 rows	Every full-India 0.01-degree cell scored with deterministic source-derived probes.
full_india	`full_india_score_index`	`outputs/full_india_scored/full_india_tam_score_index.json`	present json / 43.5 MB	Compact row-major score index for map hover and browser payload efficiency.
full_india	`full_india_score_manifest`	`outputs/full_india_scored/full_india_tam_score_manifest.json`	present json / 11.7 KB	Full-India scoring manifest, leakage policy, transform policy, and distribution metrics.
map	`full_india_grid_manifest`	`outputs/tam_map/tam_full_india_0_01_grid_manifest.json`	present json / 1.2 KB	Geometry-only full-India 0.01-degree grid manifest.
map	`full_india_grid_index`	`outputs/tam_map/tam_full_india_0_01_grid_index.json`	present json / 111.0 KB	Compact geometry index for all India grid cells.
map	`full_india_map_manifest`	`outputs/tam_map/tam_grid_map_manifest.json`	present json / 1.2 KB	Leaflet map manifest for full-India geometry and score layer wiring.
map	`vendor_map_manifest`	`outputs/tam_map/tam_grid_map_manifest_vendor.json`	present json / 7.5 KB	Leaflet map manifest for vendor-grid benchmark inspection.
map	`vendor_map_geojson`	`outputs/tam_map/tam_grid_cells_vendor.geojson`	present geojson / 21.5 MB	Vendor-grid GeoJSON with benchmark and predicted-layer fields.
map	`map_html`	`docs/tam_grid_output_map.html`	present html / 65.0 KB	Interactive Leaflet map document.
diagnostics	`summary`	`outputs/statistical_diagnostics/statistical_diagnostics_summary.json`	present json / 9.0 KB	Stage summary, pass/fail flags, leakage policy, and interpretation notes.
diagnostics	`feature_profile`	`outputs/statistical_diagnostics/feature_profile.csv`	present 223 rows	Feature distribution profile for diagnostics.
diagnostics	`univariate_feature_signal`	`outputs/statistical_diagnostics/univariate_feature_signal.csv`	present 223 rows	Single-feature signal scan against benchmark/probe targets.
diagnostics	`city_distribution_shift`	`outputs/statistical_diagnostics/city_distribution_shift.csv`	present 224 rows	City distribution shift checks.
diagnostics	`feature_redundancy_pairs`	`outputs/statistical_diagnostics/feature_redundancy_pairs.csv`	present 160 rows	Highly related feature-pair review list.
diagnostics	`feature_vif`	`outputs/statistical_diagnostics/feature_vif.csv`	present 30 rows	Multicollinearity review table.
diagnostics	`spatial_permutation_importance`	`outputs/statistical_diagnostics/spatial_permutation_importance.csv`	present 1 rows	Spatial permutation diagnostic output.
diagnostics	`feature_action_plan`	`outputs/statistical_diagnostics/feature_action_plan.csv`	present 223 rows	Feature diagnostic next-action list.
diagnostics	`model_metric_comparison`	`outputs/statistical_diagnostics/model_metric_comparison.csv`	present 17 rows	Model metric table retained as diagnostic only.
diagnostics	`prediction_metric_comparison`	`outputs/statistical_diagnostics/prediction_metric_comparison.csv`	present 1 rows	Prediction/probe metric comparison table.
diagnostics	`prediction_residuals_by_city`	`outputs/statistical_diagnostics/prediction_residuals_by_city.csv`	present 0 rows	City residual review table.
diagnostics	`prediction_decile_calibration`	`outputs/statistical_diagnostics/prediction_decile_calibration.csv`	present 0 rows	Decile calibration diagnostic table.
diagnostics	`prediction_city_bootstrap_ci`	`outputs/statistical_diagnostics/prediction_city_bootstrap_ci.csv`	present 0 rows	City bootstrap confidence interval table.
diagnostics	`g1_holdout_candidate_diagnostics`	`outputs/statistical_diagnostics/g1_holdout_candidate_diagnostics.csv`	present 2 rows	Post-hoc G1 diagnostic table; not used for tuning.
notebook	`notebook_short_metric_summary_csv`	`outputs/notebook_short_metric_summary.csv`	present 4 rows	Notebook-facing four-row metric summary using predicted power TAM.
notebook	`notebook_short_metric_summary_json`	`outputs/notebook_short_metric_summary.json`	present json / 1.2 KB	JSON copy of the notebook-facing metric summary.
notebook	`notebook_short_metric_summary_citywise_csv`	`outputs/notebook_short_metric_summary_citywise.csv`	present 136 rows	City-wise metric summary for notebook review.
notebook	`notebook_short_metric_summary_citywise_json`	`outputs/notebook_short_metric_summary_citywise.json`	present json / 46.4 KB	JSON copy of the city-wise metric summary.
notebook	`posthoc_g1_summary`	`outputs/posthoc_g1_metric_suite/summary.json`	present json / 28.8 KB	Structured post-hoc G1 metric suite summary.
notebook	`posthoc_g1_overall`	`outputs/posthoc_g1_metric_suite/overall.csv`	present 4 rows	Overall post-hoc G1 correlations.
notebook	`posthoc_g1_topk`	`outputs/posthoc_g1_metric_suite/topk.csv`	present 6 rows	Top-k post-hoc G1 capture table.
notebook	`posthoc_g1_decile`	`outputs/posthoc_g1_metric_suite/decile.csv`	present 20 rows	Decile post-hoc G1 calibration table.
notebook	`posthoc_g1_city_aggregate`	`outputs/posthoc_g1_metric_suite/city_aggregate.csv`	present 2 rows	City-aggregate post-hoc G1 summary.

Source probe readiness

probe ledger

21 source-probe rows are tracked; 18 currently write candidate features and 0 are production-ready.

gap	source	promotion	coverage	feature written	production ready
1_household_denominator	`census_pca_district_context_2011`	model_candidate_context	99.2%	True	False
1_household_denominator	`worldpop_ascii_xyz_population_density_2020`	model_candidate_probe	100.0%	True	False
1_household_denominator	`ghsl_population_builtup`	model_candidate_physical_probe	100.0%	True	False
1_household_denominator	`reconciled_population_household_probe_v2`	model_candidate_probe	100.0%	True	False
1_household_denominator	`prs_eth_popcorn`	model_candidate_probe_pending_grid_mapping	24.0%	False	False
2_residential_built_form	`google_open_buildings_gobi_2023`	model_candidate_context	9.3%	True	False
2_residential_built_form	`microsoft_global_buildings`	model_candidate_context	91.6%	True	False
2_residential_built_form	`combined_building_residential_density`	model_candidate_context	95.8%	True	False
2_residential_built_form	`building_height_vertical_density_proxy`	model_candidate_probe	100.0%	True	False
2_residential_built_form	`esa_worldcover_dynamic_world`	model_candidate_context	100.0%	True	False
2_residential_built_form	`residential_morphology_tags`	proxy_candidate_pending_visual_qa	100.0%	True	False
3_income_affordability	`nightlights_district_panel`	model_candidate_context	99.2%	True	False
3_income_affordability	`pigshell_hlpca_housing_amenity_assets_2011`	model_candidate_context	99.2%	True	False
3_income_affordability	`income_public_context`	proxy_candidate_public_context	100.0%	True	False
3_income_affordability	`income_gate_public_sources`	model_candidate_income_gate_probe	100.0%	True	False
3_income_affordability	`mospi_hces_2023_24_public_report`	blocked_tables_not_extracted	0.0%	False	False
4_serviceability	`yashveer_police_education_pois`	model_candidate_context	100.0%	True	False
4_serviceability	`pmgsy_rural_roads_hr_up`	partial_model_candidate_with_missing_flag	59.6%	True	False
4_serviceability	`osm_overture_ohsome_roads_pois`	model_candidate_context	100.0%	True	False
4_serviceability	`opencellid_ookla_mlab_execution_readiness`	model_candidate_context	100.0%	True	False
5_calibration	`independent_component_calibration`	probe_ready_not_production_calibrated	100.0%	False	False

Code map

source file	lines	role
`scripts/pipeline.py`	148	canonical
`scripts/get_data.py`	50	stage/wrapper
`scripts/enrich_features.py`	53	stage/wrapper
`scripts/prediction_diagnostics.py`	72	stage/wrapper
`scripts/build_geohg_features.py`	77	stage/wrapper
`scripts/build_source_layer_cell_features.py`	28	stage/wrapper
`scripts/build_source_layer_contracts.py`	25	stage/wrapper
`scripts/build_denominator_foundation.py`	28	stage/wrapper
`scripts/build_tam_gap_closure_features.py`	28	stage/wrapper
`scripts/build_full_india_tam_scores.py`	687	stage/wrapper
`scripts/build_tam_grid_map_data.py`	48	stage/wrapper
`scripts/build_notebook_short_metric_summary.py`	195	stage/wrapper
`src/tam_pipeline/pipeline.py`	50	canonical
`src/tam_pipeline/stages/get_data.py`	98	stage/wrapper
`src/tam_pipeline/stages/enrich_features.py`	203	stage/wrapper
`src/tam_pipeline/stages/prediction_diagnostics.py`	136	stage/wrapper
`src/tam_pipeline/stages/model_training.py`	14	stage/wrapper
`src/tam_pipeline/stages/common.py`	66	stage/wrapper
`src/tam_geohg/graph_features.py`	27	stage/wrapper
`src/tam_geohg/predicted_tam.py`	82	stage/wrapper
`src/tam_geohg/map_inputs.py`	333	stage/wrapper
`src/tam_geohg/map_metrics.py`	136	stage/wrapper
`src/tam_geohg/map_export.py`	41	stage/wrapper
`src/tam_geohg/map_manifest.py`	48	stage/wrapper
`src/tam_geohg/map_geojson.py`	48	stage/wrapper
`src/tam_geohg/full_india_grid_index.py`	157	stage/wrapper
`src/tam_pipeline/payloads/scripts/build_source_layer_cell_features_payload.py`	1,422	stage/wrapper
`src/tam_pipeline/payloads/scripts/build_source_layer_contracts_payload.py`	456	stage/wrapper
`src/tam_pipeline/payloads/scripts/build_denominator_foundation_payload.py`	703	stage/wrapper
`src/tam_pipeline/payloads/scripts/build_tam_gap_closure_features_payload.py`	1,714	stage/wrapper
`src/tam_pipeline/payloads/tam_notebook_support/tam_notebook_support_payload.py`	2,983	stage/wrapper

Recent pipeline logs

log	path	bytes
`20260602_183522_build_geohg_features.log`	`outputs/pipeline_logs/20260602_183522_build_geohg_features.log`	0
`20260602_171809_build_tam_gap-closure_features.log`	`outputs/pipeline_logs/20260602_171809_build_tam_gap-closure_features.log`	26,120
`20260602_171808_build_denominator_foundation.log`	`outputs/pipeline_logs/20260602_171808_build_denominator_foundation.log`	7,093
`20260602_171806_build_source-layer_cell_features.log`	`outputs/pipeline_logs/20260602_171806_build_source-layer_cell_features.log`	2,281
`20260602_170830_build_geohg_features.log`	`outputs/pipeline_logs/20260602_170830_build_geohg_features.log`	66,569
`20260601_201950_build_tam_gap-closure_features.log`	`outputs/pipeline_logs/20260601_201950_build_tam_gap-closure_features.log`	24,341
`20260601_201950_build_source-layer_cell_features.log`	`outputs/pipeline_logs/20260601_201950_build_source-layer_cell_features.log`	1,803
`20260601_201949_build_denominator_foundation.log`	`outputs/pipeline_logs/20260601_201949_build_denominator_foundation.log`	4,443

XIITraining boundarypp. 49-51

Training Boundary

A real training workflow needs an approved prediction surface and defensible holdouts. Current artifacts do not establish production accuracy.

approve sources

Promote source probes only after blockers close and production-ready flags pass.

rerun gate

python3 scripts/pipeline.py prediction_diagnostics --root .

require holds

Add spatial, city, and chronological validation.

then claim

Discuss accuracy only after nonzero valid joins and holdout metrics.

Until that sequence is complete, the right public wording is: feature/probe pipeline and diagnostics are available; production model accuracy is not established.

XIIIGlossarypp. 52-53

Glossary Printout

The permanent glossary mirrors the floating Lingo panel for print and review.

term	meaning
`Vendor TAM`	Benchmark label used for comparison only; not a training target or feature.
`G1 holdout`	Post-hoc business outcome diagnostic; not used for training, source selection, or tuning.
`Probe TAM`	Formula-driven component output for audit, not production-calibrated TAM.
`Source layer`	Versioned source-family artifact that writes grid-level candidate fields before denominator and gap-closure scoring.
`Denominator v3`	Current leakage-safe residential-household denominator probe with base/lower/upper fields and reconciliation status.
`Income gate`	Public-source 0-10 LPA probability proxy using district/city/cell welfare and affluence context; MPCE calibration remains pending.
`Conversion feasibility`	Public road, POI, addressability, settlement, and map-coverage proxy for whether a cell can be served.
`Execution readiness`	Weak connectivity and operational-readiness proxy for acquirable TAM; still blocked on license, coverage, and internal ops data.
`Predicted TAM power`	Current score layer: a monotone power transform of no-vendor gross TAM that preserves the base total.
`Power gamma`	The fixed exponent applied to the source-derived gross TAM base; current value comes from the transform decision artifact.
`Rank ceiling`	A diagnostic order-only transform used to understand upper-bound ranking behavior, not TAM magnitude.
`Full-India scored grid`	The 0.01-degree national grid scored by deterministic source-derived formulas without vendor labels.
`Score index`	Compact browser-oriented row-major JSON that stores map scores without repeating every property per cell.
`Spatial holdout`	Validation split that blocks neighborhood memorization.
`City holdout`	Whole-city transfer test for city confounding.
`Notebook metric summary`	Four-row post-hoc table comparing predicted TAM, vendor TAM, and G1 for notebook reporting.
`Production claim`	Blocked until valid non-GeoIQ predictions pass spatial, city, and chronological checks.

Generated by scripts/build_prediction_pipeline_html.py from current source files and output manifests. Primary source manifests: outputs/geohg_features/feature_manifest.json, outputs/tam_gap_closure/tam_gap_closure_manifest.json, outputs/full_india_scored/full_india_tam_score_manifest.json, outputs/cell2_metric_transform_experiments/cell2_best_mechanism_decision.json, outputs/posthoc_g1_metric_suite/summary.json, and outputs/statistical_diagnostics/statistical_diagnostics_summary.json.

Terms this doc uses

How the pipeline now connects

TAM Prediction Pipeline Status

Technical Status

Current notebook and post-hoc metrics

Signal-family status matrix

Artifact scale summary

Execution Model

Operator commands

Current runner

Current outputs

Dry-run paths

Prior-Art Status

Signal-family prior-art status

Readiness is an artifact, not a hidden precondition.

Prior-art payload coverage

Prior-art payload files

Signal Implementation

372

7,029

9

18

24,203

Feature stack visual

One signal surface, several audit views.

Source-layer additions

Promotion gates

Signal coverage

Precision summary

Full signal and score inventory

Actual scoring math

Current v3 math funnel dry run

Dry runs now follow the current v3 score family.

Formula replay dry run

Building Footprints

Microsoft is wired in.

Microsoft coverage is active.

Building-related columns

Predicted TAM Layer

Magnitude stays count-like.

Cell-2 transform decision

Headline power checks

Holdout transform comparison

Top-k G1 capture from transform report

Full-India Map & Scores

Full-India score distributions

Full-India source status

Score pass/fail checks

Geometry-index pass/fail checks

Diagnostics Gate

Benchmark-only rows join.

No production model metric.

Component probe/v2 vs vendor checks

G1 post-hoc holdout

Notebook & Post-Hoc Metrics

Same score, multiple views.

Notebook short metric summary

Post-hoc G1 overall suite

Post-hoc top-k suite

Leakage Policy

Generated Outputs

Generated outputs by stage

Source probe readiness

Code map

Recent pipeline logs

Training Boundary

Glossary Printout