Technical status document
TAM Prediction Pipeline Status
Important things first: the current score, notebook metrics, signal-family status, and claim boundary are documented before the implementation substories. Vendor TAM and G1 remain diagnostic benchmarks, not training labels.
Technical Status
The page begins with the actual status: current score, current post-hoc metrics, signal-family readiness, and the production-claim boundary.
Separate implemented pipeline work from invalid production claims.
Every later metric is benchmark-only unless a valid non-GeoIQ holdout exists.
False, city holdout metrics are False, and production_accuracy_claim_allowed is False.Current notebook and post-hoc metrics
These values are read from outputs/notebook_short_metric_summary.json and outputs/posthoc_g1_metric_suite/summary.json. They describe benchmark agreement after the score is generated; they do not authorize a production accuracy claim.
Current notebook metric from predicted power TAM vs vendor TAM; benchmark-grid diagnostic only.
Current notebook metric against post-hoc G1 hits; not used for training or tuning.
Ranking diagnostic from the notebook-facing metric summary.
Ratio diagnostic comparing current predicted-vs-G1 to vendor-vs-G1, not calibrated accuracy.
Current post-hoc top-k capture for the predicted score layer.
Metrics join benchmark labels after scoring; production accuracy remains blocked.
Signal-family status matrix
The pipeline is no longer documented as one undifferentiated feature pile. Each family below ties current layers and exported column groups to prior art and the blocker that must close before production promotion.
| # | signal family | state | layers | groups | prior art | gate |
|---|---|---|---|---|---|---|
| 1 | Household denominator and residential base | implemented probe | worldpop_population_density: okcensus_2011_official_controls: okghsl_population_builtup: okbuilding_footprints_google_microsoft: ok | Population denominator (5)GHSL built-up/population (10)Denominator v3 and reconciliation (33)Building footprints (32) | population_household_denominator: WorldPop population, census/SHRUG reconciliation, built-up occupancy, household densitybuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy | Production household truth still needs public-anchor reconciliation, residential masks, source QA, and spatial/city holdouts. |
| 2 | Residential eligibility and physical exclusions | implemented proxy | esa_worldcover_dynamic_world: proxy_from_local_building_poi_density_until_worldcover_dynamic_worldresidential_morphology_tags: proxy_from_building_landcover_density_until_morphology_qabuilding_footprints_google_microsoft: ok | Landcover and physical exclusions (20)Morphology proxies (6)Land use and slum context (3) | land_use_exclusions_and_risk: water_share, forest_share, mining_share, industrial_land_sharebuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy | Proxy landcover/morphology remains production-blocked until pinned rasters, visual QA, and morphology holdouts pass. |
| 3 | Income and welfare gate | implemented proxy | income_public_context: proxy_from_income_gate_public_sources_until_calibration | Income gate and welfare context (40)Census housing/amenity assets (23)Nightlights (11) | satellite_welfare_affluence: roof/material proxy, lighting proxy, drinking-water proxy, Landsat/Sentinel embeddingsnightlights_and_economic_activity: VIIRS mean, VIIRS trend, nightlight blob score, commercial activity proxy | MPCE calibration, license review, geography QA, and income-source ablations are still required. |
| 4 | Conversion and serviceability | implemented proxy | osm_overture_ohsome_roads_pois: proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome | Conversion and map coverage (18)Road/serviceability (4)POI/serviceability (8) | roads_accessibility_serviceability: road_length_by_class, distance_to_major_road, road_embedding, travel_frictionpoi_urban_function: POI counts by category, Hex2Vec/ContextualCount embeddings, schools/healthcare/markets, urban function vector | Dated OSM/Overture extracts, ohsome coverage, internal serviceability, and failed-install checks are missing. |
| 5 | Execution and acquirability | weak proxy | connectivity_execution_readiness: proxy_from_public_serviceability_until_opencellid_ookla_mlab | Execution readiness (7) | internal_business_reality: leads, installs, retained installs, gross margin | External readiness cannot replace branch, partner, capacity, payment, CAC, and operations coverage. |
| 6 | Spatial context without target leakage | implemented diagnostic x-surface | GeoHG graph context (107)Spatial/city context (10) | heterogeneous_graph_and_spatial_context: neighbor context, semantic similarity, land-cover hypernodes, POI hypernodes | Graph context is allowed only from independent features; production metrics remain blocked without non-GeoIQ holdouts. | |
| 7 | Outcome and notebook metrics | post-hoc only | TAM score outputs (27)Status and reason codes (1) | internal_business_reality: leads, installs, retained installs, gross margin | Vendor TAM and G1 are benchmark labels after score generation; they are not training labels, features, or tuning signals. |
Artifact scale summary
Vendor-grid feature rows in the current generated feature table.
Cities represented by the current vendor-grid artifacts.
Full GeoHG-style feature count; vendor TAM training remains blocked.
Current source-layer cell-feature artifacts materialized before denominator and gap closure.
Top-level business signal families classified by current state, prior art, and validation blocker.
Current signal-stack families from the prior-art classification artifact.
Source-probe rows that currently write candidate feature columns.
Required local payload readiness from the feature manifest.
predicted_tam_0_10lpa_powerCurrent map and notebook score layer; monotone no-vendor power transform.
Scale policy: global_no_vendor_base_total_preserved.
Scored 0.01-degree India grid cells in the current full-India score manifest.
District/city groups in the current full-India score manifest.
Total predicted power TAM households; total is preserved from the no-vendor gross TAM base.
Manifest-backed current output files listed later in this document.
Current source-probe rows cleared for production use. This should remain zero until blockers close.
Execution Model
The canonical runner is a stage dispatcher, but the important view is the whole path from current sources to generated outputs and gates.
scripts/pipeline.py parses stage flags.src/tam_pipeline/pipeline.py calls a stage module.| # | stage | implementation | role | canonical command |
|---|---|---|---|---|
| 1 | get_data | src/tam_pipeline/stages/get_data.py | Builds the source registry, plans/fetches direct prior-art payloads, and stages Microsoft AOI footprint shards when enabled. | python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update |
| 2 | enrich_features | src/tam_pipeline/stages/enrich_features.py | Builds GeoHG-style features, source-layer cell features, denominator v3 context, and deterministic gap-closure probe columns. | python3 scripts/pipeline.py enrich_features --root . |
| 3 | prediction_diagnostics | src/tam_pipeline/stages/prediction_diagnostics.py | Runs current benchmark diagnostics and claim-boundary checks without fitting on vendor TAM or G1. | python3 scripts/pipeline.py prediction_diagnostics --root . |
Operator commands
Current runner
python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update
python3 scripts/pipeline.py enrich_features --root . --dry-run
python3 scripts/pipeline.py prediction_diagnostics --root . --dry-run
python3 scripts/pipeline.py enrich_features --root .
python3 scripts/pipeline.py prediction_diagnostics --root .
python3 scripts/pipeline.py all --root . --dry-run --skip-manifest-updateCurrent outputs
outputs/source_fetch/source_fetch_manifest.json
outputs/source_fetch/direct_prior_art_download_manifest.json
outputs/geohg_features/cell_features_geohg_style.csv
outputs/source_layers/source_layer_cell_features_manifest.json
outputs/source_layers/source_layer_contracts.json
outputs/denominator_foundation/cell_denominator_foundation.csv
outputs/tam_gap_closure/tam_gap_closure_features.csv
outputs/full_india_scored/full_india_tam_score_manifest.json
outputs/tam_map/tam_full_india_0_01_grid_index.json
outputs/statistical_diagnostics/statistical_diagnostics_summary.jsonDry-run paths
Dry runs are now explicit stage behavior. The compute-heavy stages print planned inputs and outputs without executing the scripts that write feature, denominator, gap-closure, or diagnostic artifacts.
| stage | command | write behavior | current evidence |
|---|---|---|---|
get_data | python3 scripts/pipeline.py get_data --root . --dry-run --skip-manifest-update | Plans direct prior-art fetches and Microsoft AOI shard staging; planning manifests may be written for review. | command supported |
enrich_features | python3 scripts/pipeline.py enrich_features --root . --dry-run | Prints planned feature, denominator, and gap-closure steps without executing builder scripts. | no-write JSON plan |
prediction_diagnostics | python3 scripts/pipeline.py prediction_diagnostics --root . --dry-run | Checks frozen prediction readiness and expected diagnostic files without running diagnostics. | no-write JSON plan |
all | python3 scripts/pipeline.py all --root . --dry-run --skip-manifest-update | Passes dry-run through all stages; compute stages remain no-write and blocked checks still fail loudly. | stage-aware |
Prior-Art Status
This chapter starts with the signal-family classification, then drills into payload availability. Payload presence is not the same thing as validated signal readiness.
Signal-family prior-art status
The status matrix connects the prior-art families to the implemented source layers and current validation blockers. File counts are shown after this matrix so the document does not confuse downloaded payloads with production-ready signals.
| # | signal family | state | layers | groups | prior art | gate |
|---|---|---|---|---|---|---|
| 1 | Household denominator and residential base | implemented probe | worldpop_population_density: okcensus_2011_official_controls: okghsl_population_builtup: okbuilding_footprints_google_microsoft: ok | Population denominator (5)GHSL built-up/population (10)Denominator v3 and reconciliation (33)Building footprints (32) | population_household_denominator: WorldPop population, census/SHRUG reconciliation, built-up occupancy, household densitybuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy | Production household truth still needs public-anchor reconciliation, residential masks, source QA, and spatial/city holdouts. |
| 2 | Residential eligibility and physical exclusions | implemented proxy | esa_worldcover_dynamic_world: proxy_from_local_building_poi_density_until_worldcover_dynamic_worldresidential_morphology_tags: proxy_from_building_landcover_density_until_morphology_qabuilding_footprints_google_microsoft: ok | Landcover and physical exclusions (20)Morphology proxies (6)Land use and slum context (3) | land_use_exclusions_and_risk: water_share, forest_share, mining_share, industrial_land_sharebuildings_and_settlement_structure: building_count, built_area_share, building_density, occupancy_proxy | Proxy landcover/morphology remains production-blocked until pinned rasters, visual QA, and morphology holdouts pass. |
| 3 | Income and welfare gate | implemented proxy | income_public_context: proxy_from_income_gate_public_sources_until_calibration | Income gate and welfare context (40)Census housing/amenity assets (23)Nightlights (11) | satellite_welfare_affluence: roof/material proxy, lighting proxy, drinking-water proxy, Landsat/Sentinel embeddingsnightlights_and_economic_activity: VIIRS mean, VIIRS trend, nightlight blob score, commercial activity proxy | MPCE calibration, license review, geography QA, and income-source ablations are still required. |
| 4 | Conversion and serviceability | implemented proxy | osm_overture_ohsome_roads_pois: proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome | Conversion and map coverage (18)Road/serviceability (4)POI/serviceability (8) | roads_accessibility_serviceability: road_length_by_class, distance_to_major_road, road_embedding, travel_frictionpoi_urban_function: POI counts by category, Hex2Vec/ContextualCount embeddings, schools/healthcare/markets, urban function vector | Dated OSM/Overture extracts, ohsome coverage, internal serviceability, and failed-install checks are missing. |
| 5 | Execution and acquirability | weak proxy | connectivity_execution_readiness: proxy_from_public_serviceability_until_opencellid_ookla_mlab | Execution readiness (7) | internal_business_reality: leads, installs, retained installs, gross margin | External readiness cannot replace branch, partner, capacity, payment, CAC, and operations coverage. |
| 6 | Spatial context without target leakage | implemented diagnostic x-surface | GeoHG graph context (107)Spatial/city context (10) | heterogeneous_graph_and_spatial_context: neighbor context, semantic similarity, land-cover hypernodes, POI hypernodes | Graph context is allowed only from independent features; production metrics remain blocked without non-GeoIQ holdouts. | |
| 7 | Outcome and notebook metrics | post-hoc only | TAM score outputs (27)Status and reason codes (1) | internal_business_reality: leads, installs, retained installs, gross margin | Vendor TAM and G1 are benchmark labels after score generation; they are not training labels, features, or tuning signals. |
Readiness is an artifact, not a hidden precondition.
The registry and fetch manifests show direct assets, deferred rasters, gated microdata, and source warnings before features are interpreted.
| status | count |
|---|---|
| catalog_and_grid_fetched_payload_deferred | 1 |
| catalog_fetched_payload_deferred | 2 |
| data_fetched | 35 |
| docs_fetched | 2 |
| docs_fetched_payload_deferred | 1 |
| indexes_fetched_payload_deferred | 1 |
| metadata_fetched_payload_deferred | 2 |
| overview_fetched_payload_deferred | 1 |
| paper_fetched | 1 |
| public_reports_fetched_microdata_gated | 1 |
| readme_fetched | 1 |
| readme_fetched_payload_deferred | 1 |
Prior-art payload coverage
The current feature manifest reports 23 present payloads out of 23 required payloads, with 0 missing.
| source slug | payloads | present | direct-link payloads | needs download |
|---|---|---|---|---|
admin-districts | 6 | 6 | 6 | 0 |
buildings-google | 2 | 2 | 2 | 0 |
education-facilities | 1 | 1 | 1 | 0 |
energy-power-plants | 1 | 1 | 1 | 0 |
env-flood-atlas | 4 | 4 | 4 | 0 |
env-landuse | 1 | 1 | 1 | 0 |
env-soil | 1 | 1 | 1 | 0 |
infra-rural-roads | 2 | 2 | 2 | 0 |
nightlights-viirs | 1 | 1 | 1 | 0 |
police-stations | 1 | 1 | 1 | 0 |
transport-airports | 1 | 1 | 1 | 0 |
unmapped_local_payload | 1 | 1 | 0 | 0 |
urban-municipal | 1 | 1 | 1 | 0 |
Prior-art payload files
| required payload | source slug | exists | link mode | needs download |
|---|---|---|---|---|
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.dbf | admin-districts | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.prj | admin-districts | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.sbn | admin-districts | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.sbx | admin-districts | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.shp | admin-districts | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/administrative/districts/census-2011/2011_Dist.shx | admin-districts | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/remote-sensing/population-density/ind_pd_2020_1km_ASCII_XYZ.csv | unmapped_local_payload | present | local/manual | no |
prior art/yashveeeeeeer_india-geodata/data/buildings/google/google-open-buildings-india-2023.mosaic.json | buildings-google | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/buildings/google/google-open-buildings-india-2023.000000.parquet | buildings-google | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/education/facilities/INDIA_EDUCATION_FACILITIES_POINTS.geojson | education-facilities | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/police/stations/INDIA_POLICE_STATIONS.geojson | police-stations | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/transport/airports/INDIA_AIRPORTS_POINTS.geojson | transport-airports | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/energy/power-plants/INDIA_ENERGY_PLANTS.geojson | energy-power-plants | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/District_Wise_flood_risk_data.json | env-flood-atlas | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/District_Wise_max_flood_area_frac.json | env-flood-atlas | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/State_Wise_flood_risk_data.json | env-flood-atlas | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/environment/flood-atlas/State_Wise_max_flood_area_frac.json | env-flood-atlas | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/environment/soil/INDIA_SOIL_MAP_FAO.geojson | env-soil | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/environment/landuse/hyderabad/Hyderabad_Landuse.geojson | env-landuse | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/remote-sensing/nightlights/nightlights_district_panel.csv | nightlights-viirs | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/urban/municipal-boundaries/mumbai/slumClusters.geojson | urban-municipal | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/infrastructure/rural-roads/road-network/Haryana.zip | infra-rural-roads | present | direct | no |
prior art/yashveeeeeeer_india-geodata/data/infrastructure/rural-roads/road-network/UttarPradesh.zip | infra-rural-roads | present | direct | no |
Signal Implementation
Stage 2 builds the current signal surface: GeoHG-style cell context, source-layer cell features, graph features, denominator v3, and deterministic TAM probe/v2/v3 columns.
372
Combined GeoHG, source-layer, and gap-closure columns after excluding row identifiers.
7,029
Rows in cell_features_geohg_style.csv.
9
Current source-layer cell-feature artifacts.
18
Source-probe rows that write candidate fields.
24,203
Area-area edges in the current GeoHG bundle.
Current x is the joined independent feature stack plus audit-visible proxy signals. Current y inside the repo is not a supervised training label; it is the deterministic probe/v2/v3 TAM score family plus the selected predicted_tam_0_10lpa_power map layer. Vendor TAM and G1 are benchmark-only labels.
Feature stack visual
One signal surface, several audit views.
Source families join into source-layer and GeoHG feature tables, then feed denominator v3, conversion, execution, and target-free TAM score formulas. Benchmark labels stay outside this path.
The detailed column inventory is kept below as chips only; precision is summarized once instead of repeated in every group.
Source-layer additions
The source-layer manifest reports 9 current cell-feature files and 5 proxy layers. These fields include WorldPop, Census controls, Google/Microsoft buildings, GHSL, landcover, morphology, income-gate, OSM/Overture, and connectivity signals.
| source layer | status | rows | columns | mode | fields | cell feature file |
|---|---|---|---|---|---|---|
worldpop_population_density | ok | 7,029 | 2 | direct | worldpop_population_est_nearestworldpop_density_people_per_km2 | outputs/source_layers/worldpop_population_density/2020_ascii_xyz_local/cell_features.csv |
census_2011_official_controls | ok | 7,029 | 2 | direct | census_control_populationcensus_control_households | outputs/source_layers/census_2011_official_controls/2011_fixed_release/cell_features.csv |
building_footprints_google_microsoft | ok | 7,029 | 13 | direct | building_count_cell_bestbuilding_area_sum_m2_bestbuilding_area_share_bestbuilding_source_coverage_flagbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_scorebuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_disagreement_flag | outputs/source_layers/building_footprints_google_microsoft/google_2023_msft_staged_snapshot/cell_features.csv |
ghsl_population_builtup | ok | 7,029 | 10 | direct | ghsl_population_estghsl_population_source_yearghsl_builtup_shareghsl_built_surface_m2ghsl_built_volume_m3ghsl_height_mean_mghsl_non_res_builtup_shareghsl_non_res_volume_shareghsl_settlement_scoreghsl_residential_candidate_share | outputs/source_layers/ghsl_population_builtup/named_release_required/cell_features.csv |
esa_worldcover_dynamic_world | proxy_from_local_building_poi_density_until_worldcover_dynamic_world | 7,029 | 12 | proxy | landcover_builtup_sharelandcover_water_sharelandcover_tree_forest_sharelandcover_crop_sharelandcover_non_residential_exclusion_sharedynamic_world_built_probabilityresidential_eligible_area_sharephysical_hard_cell_flaghard_exclusion_sharesoft_non_residential_downweight_sharehard_mask_reason_codelandcover_proxy_source_flag | outputs/source_layers/esa_worldcover_dynamic_world/worldcover_2020_2021_dynamic_world_pinned_window_required/cell_features.csv |
residential_morphology_tags | proxy_from_building_landcover_density_until_morphology_qa | 7,029 | 6 | proxy | morph_dense_old_city_scoremorph_informal_dense_scoremorph_highrise_affordable_scoremorph_cbd_industrial_scoremorph_periurban_vacant_scoremorphology_proxy_source_flag | outputs/source_layers/residential_morphology_tags/building_landcover_public_proxy_v1/cell_features.csv |
income_public_context | proxy_from_income_gate_public_sources_until_calibration | 7,029 | 28 | proxy | income_public_affluence_context_scoreincome_public_deprivation_context_scoreincome_public_asset_affluence_scoreincome_public_amenity_deficit_scoreincome_public_license_status_codeincome_public_proxy_source_flagincome_gate_city_segment_codeincome_gate_city_segment_prior_0_10lpa_probincome_gate_city_prior_0_10lpa_probincome_gate_district_income_pciincome_gate_district_income_affluence_scoreincome_gate_admin_affluence_scoreincome_gate_admin_deprivation_scoreincome_gate_nfhs_affluence_scoreincome_gate_nfhs_deprivation_scoreincome_gate_shrug_rwi_scoreincome_gate_shrug_consumption_scoreincome_gate_shrug_asset_affluence_scoreincome_gate_meta_rwi_rawincome_gate_meta_rwi_scoreincome_gate_meta_rwi_errorincome_gate_cell_wealth_scoreincome_gate_external_affluence_scoreincome_gate_source_countincome_gate_confidenceincome_gate_granularity_codeincome_gate_granularityincome_gate_status | outputs/source_layers/income_public_context/income_gate_public_context_v2/cell_features.csv |
osm_overture_ohsome_roads_pois | proxy_from_pmgsy_poi_buildings_until_dated_osm_overture_ohsome | 7,029 | 9 | proxy | road_distance_mroad_intersection_densitysettlement_cluster_sizebuilding_cluster_compactnessaddressability_scorepoi_service_mix_scoreosm_mapping_coverage_scoreoverture_road_coverage_scoreconversion_proxy_source_flag | outputs/source_layers/osm_overture_ohsome_roads_pois/dated_extract_required/cell_features.csv |
connectivity_execution_readiness | proxy_from_public_serviceability_until_opencellid_ookla_mlab | 7,029 | 3 | proxy | connectivity_readiness_scoremlab_measurement_coverage_scoreexecution_proxy_source_flag | outputs/source_layers/connectivity_execution_readiness/dated_opencellid_ookla_mlab_required/cell_features.csv |
Promotion gates
Source-layer contracts fix the source family, model layer, expected fields, and promotion gate. Production readiness is intentionally separate from the presence of raw payloads or cell-feature files.
| source | family | model layer | raw files | cell file | production ready | promotion gate |
|---|---|---|---|---|---|---|
ghsl_population_builtup | better_denominators | household_denominator_population_builtup_height_volume | 32 | True | False | GHS-POP/BUILT-S/BUILT-V/BUILT-H/BUILT-C releases must reconcile against Census controls and WorldPop without tuning to vendor TAM. |
census_2011_official_controls | better_denominators | official_population_household_controls | 2 | True | False | official controls or validated mirrors must be reconciled before production household estimates. |
worldpop_population_density | better_denominators | gridded_population_density | 1 | True | False | use as one denominator candidate; reconcile to Census/GHSL and residential masks. |
building_footprints_google_microsoft | better_denominators_conversion_feasibility | building_structure_residential_density | 31 | True | False | footprints/height/volume are physical evidence only; they cannot become household truth without public-anchor reconciliation. |
esa_worldcover_dynamic_world | can_serve_here_filters | residential_non_residential_land_mask | 3 | True | False | pinned releases/windows only; land cover can hard-exclude or downweight cells but cannot label income or households. |
residential_morphology_tags | better_denominators | old_city_informal_highrise_cbd_periurban_morphology | 3 | True | False | morphology tags are denominator/income modifiers only after visual QA, source disagreement review, and city/morphology holdouts. |
income_public_context | income_affordability | high_income_exclusion_public_context | 22 | True | False | income features estimate high-income exclusion only; SHRUG/NFHS/SECC/RWI/district-income gates require license review, geography QA, MPCE extraction, and city/spatial holdouts before production use. |
osm_overture_ohsome_roads_pois | conversion_feasibility | road_poi_addressability_clusterability | 2 | True | False | OSM/Overture features require dated extracts and ohsome coverage so missing mapping is not mistaken for low demand. |
connectivity_execution_readiness | execution_realism | connectivity_payment_partner_readiness_proxy | 2 | True | False | weak external serviceability signals can affect execution readiness only after license and coverage gates pass. |
Signal coverage
Coverage rows show which additions are populated in the current artifacts. High coverage does not validate calibration, residential truth, income truth, or production serviceability.
| signal check | current value | interpretation |
|---|---|---|
| source layers written | 9 | source-layer cell-feature artifacts available |
| proxy source layers | 5 | layers still marked as public proxy rather than production source truth |
| candidate features written | 18 | source-probe rows that write candidate feature columns |
| h residential denominator coverage | 100.0% | v3 household denominator fields populated |
| landcover coverage | 100.0% | landcover/exclusion fields populated |
| morphology coverage | 100.0% | morphology proxy fields populated |
| conversion coverage | 100.0% | conversion-feasibility fields populated |
| execution coverage | 100.0% | execution-readiness fields populated |
| income gate prior coverage | 100.0% | public income-gate prior fields populated |
| income gate confidence median | 0.840 | median public-income proxy confidence |
| GHSL builtup coverage | 100.0% | GHSL built surface/volume context populated |
| GHSL height coverage | 96.5% | GHSL height/non-residential volume context populated |
| building denominator coverage | 95.8% | Google/Microsoft building denominator fields populated |
| public anchor reconciliation share | 24.6% | cells touched by current public-anchor reconciliation |
Precision summary
All names are exported at grid-row level, but source precision is mixed: cell geometry, district context, nearest 1 km raster points, vector overlays, 2 km buffers, and one-hop graph context.
Source scale: 0.01-degree grid geometry plus Census district IDs.
Exported scale: 0.01-degree grid cells; median cell area 1.40 km2, equivalent square side about 1.18 km. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km. Vendor city grid footprint; median represented city coverage is 135.5 km2 (98 cells), equivalent square side about 11.6 km.
Read as: Centroids and local x/y/radius are cell geometry; censuscode is a district assignment.
Source scale: VIIRS-derived district panel values, not raw pixel values in this CSV.
Exported scale: District aggregate joined to each cell by censuscode. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.
Read as: Current precision is district-level even though the underlying satellite product is finer.
Source scale: Census 2011 Houselisting/HLPCA district-total percentage shares from the downloaded pigshell mirror.
Exported scale: District aggregate joined to each cell by censuscode. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.
Read as: Housing, amenity, and asset shares are old district context repeated on cells; they are not current cell-level observations.
Source scale: District flood-atlas JSON records.
Exported scale: District aggregate joined to each cell by censuscode. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.
Read as: Flood fields should not be read as within-cell flood pixels.
Source scale: WorldPop 2020 1 km ASCII XYZ point grid.
Exported scale: Nearest 1 km source point to each 0.01-degree cell centroid; population estimate scales density by cell area.
Read as: nearest_distance_km exposes the join quality per cell.
Source scale: GHSL 2020 30-arcsecond population, built-surface, built-volume, height, and non-residential raster tiles.
Exported scale: Raster/source-layer values aggregated or joined to the 0.01-degree grid cell, about 1.18 km side in the current grid.
Read as: These are independent physical-denominator signals; they still require reconciliation against Census, WorldPop, and building footprints.
Source scale: Mixed public denominator context: WorldPop, Census 2011 controls, GHSL built form, building footprints, and public anchors.
Exported scale: Cell-level reconciliation probes on the 0.01-degree grid. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.
Read as: H_residential base/lower/upper fields are leakage-safe denominator probes but remain production-blocked until public-anchor QA and holdouts pass.
Source scale: Google/Microsoft building footprint polygons, meter-scale vector geometries.
Exported scale: Footprint count/area/height/volume proxies aggregated into each 0.01-degree cell; exported precision is the cell, about 1.18 km side in the current grid.
Read as: Cell-level aggregation is about 1 km; footprint areas themselves are vector-derived.
Source scale: ESA WorldCover/Dynamic World source-layer proxy fields plus local hard/soft exclusion logic.
Exported scale: Built/water/tree/crop/non-residential shares and exclusion flags written at 0.01-degree cell level, about 1.18 km side.
Read as: These fields can suppress or downweight physical impossibility; they do not label income or demand.
Source scale: Derived morphology tags from building, landcover, density, and public-context proxies.
Exported scale: Cell-level scores for dense old city, informal density, highrise affordability, CBD/industrial, and periurban vacancy patterns.
Read as: Use as denominator or income modifiers only after source-disagreement and morphology holdout review.
Source scale: Public income/welfare context: district income, NFHS, SHRUG/SECC/RWI-style fields, HLPCA context, and city priors.
Exported scale: Mixed district/city/cell proxy fields written to grid rows. Census 2011 district join; same district value repeats on all cells in that district. Within current grid coverage, median represented district footprint is 135.9 km2 (99 cells), equivalent square side about 11.7 km.
Read as: Income-gate fields estimate 0-10 LPA probability as a probe; they are not production-calibrated income truth.
Source scale: PMGSY road shapefile line vectors for available states.
Exported scale: Nearest-road distance from cell centroid, capped at 25 km; within_2km flag uses a 2 km threshold.
Read as: Coverage is state-limited; source_available and distance_missing must stay with the distance.
Source scale: Point GeoJSON layers for education, police, airport, and energy POIs.
Exported scale: Counts inside the 0.01-degree cell and counts inside a 2 km centroid buffer.
Read as: The *_2km columns are intentionally broader than the grid cell.
Source scale: Local polygon overlays: Mumbai slum clusters and Hyderabad land-use where overlapping.
Exported scale: Polygon-overlap area/share aggregated into each 0.01-degree cell.
Read as: Coverage is city-specific; zero may mean no local overlap source, not absence of slum or land use.
Source scale: Public road, POI, OSM/Overture/ohsome proxy, settlement, addressability, and mapping-coverage context.
Exported scale: Cell-level conversion-feasibility and serviceability modifiers on the 0.01-degree grid.
Read as: These fields affect can-serve and conversion feasibility; they are not demand labels.
Source scale: Public connectivity and measurement-coverage proxies for delivery/payment/partner feasibility.
Exported scale: Cell-level readiness scores and coverage flags on the 0.01-degree grid.
Read as: Execution readiness is a weak acquirable-TAM modifier, not household demand or income evidence.
Source scale: One-hop graph aggregates over adjacent 0.01-degree grid cells.
Exported scale: 8-neighbour GeoHG context on the same grid; one-hop ring is adjacent/diagonal cells, roughly 3.6 km across including the center cell.
Read as: Graph context inherits precision from its base column and adds one 8-neighbour ring of smoothing.
Source scale: Deterministic gap-closure formulas over source-derived fields.
Exported scale: Probe, v2, v3, interval, confidence, priority, serviceable, acquirable, and power-transform inputs at grid-cell level.
Read as: These are outputs and audit fields, not independent input features for supervised training.
Source scale: Pipeline status strings and reason-code outputs.
Exported scale: Per-row flags that explain blockers, fallbacks, proxy status, and calibration state.
Read as: Status fields support audit and filtering; they should not be modeled as demand signals.
Full signal and score inventory
Column chips come from cell_features_geohg_style.csv, source_layer_cell_features_manifest.json, and all non-ID columns in tam_gap_closure_features.csv. Output/status groups are kept visible but are not independent training features.
centroid_loncentroid_latpos_x_kmpos_y_kmpos_radius_kmcity_grid_colcity_grid_rowcity_cell_countcensuscodegrid_area_m2nightlight_log1p_mean_2012nightlight_log1p_mean_2019nightlight_log1p_mean_2024nightlight_mean_2012nightlight_mean_2019nightlight_mean_2024nightlight_sum_2012nightlight_sum_2019nightlight_sum_2024nightlight_mean_growth_2012_2024nightlight_bin_idcensus_hl_precision_levelcensus_hl_context_presentcensus_hl_housing_quality_scorecensus_hl_basic_amenity_scorecensus_hl_asset_affluence_scorecensus_hl_amenity_deficit_scorecensus_hl_housing_good_sharecensus_hl_roof_concrete_sharecensus_hl_wall_burnt_brick_or_concrete_sharecensus_hl_floor_finished_sharecensus_hl_rooms_3plus_sharecensus_hl_electricity_sharecensus_hl_latrine_sharecensus_hl_lpg_png_sharecensus_hl_banking_sharecensus_hl_tv_sharecensus_hl_computer_internet_sharecensus_hl_mobile_phone_sharecensus_hl_scooter_motorcycle_sharecensus_hl_car_sharecensus_hl_no_asset_sharecensus_hl_source_pathcensus_hl_affluence_context_scoreVulnerabilityHazardExposureRiskMaxAreaMaxFractionflood_risk_bin_idworldpop_density_people_per_km2worldpop_density_log1pworldpop_nearest_distance_kmworldpop_population_est_nearestworldpop_households_est_avg_size_4_6ghsl_population_estghsl_population_source_yearghsl_builtup_shareghsl_built_surface_m2ghsl_built_volume_m3ghsl_height_mean_mghsl_non_res_builtup_shareghsl_non_res_volume_shareghsl_settlement_scoreghsl_residential_candidate_sharecensus_control_populationcensus_control_householdshouseholds_est_primary_probehousehold_size_census_contexthouseholds_est_worldpop_census_avg_sizedenominator_confidencesource_disagreement_log_ratioreconciled_population_probereconciled_households_probedenominator_population_source_countdenominator_disagreement_scoreimpossible_market_flagdenominator_v2_statuspopulation_prior_basepopulation_prior_lowerpopulation_prior_upperhousehold_size_admin_v3residential_allocation_weighthard_exclusion_share_v3non_residential_suppression_score_v3vertical_density_correctionoccupancy_correctionh_residential_households_unreconciledadmin_anchor_coverage_sharepublic_anchor_reconciliation_factorpublic_anchor_calibration_errorpublic_anchor_reconciliation_statush_residential_households_baseh_residential_households_lowerh_residential_households_upperh_residential_denominator_confidencehousehold_denominator_v3_statushousehold_denominator_statusgobi_building_count_cellgobi_building_area_sum_m2gobi_building_area_sharegobi_building_area_density_km2gobi_building_area_mean_m2gobi_building_confidence_meanmsft_building_count_cellmsft_building_area_sum_m2msft_building_area_sharemsft_building_area_density_km2msft_building_area_mean_m2msft_building_height_mean_mbuilding_count_cell_bestbuilding_area_sum_m2_bestbuilding_count_density_per_km2_bestbuilding_area_density_km2_bestbuilding_area_share_bestbuilding_source_coverage_countbuilding_source_coverage_flagbuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_area_disagreement_log_ratiobuilding_msft_gobi_count_disagreement_log_ratiobuilding_msft_gobi_disagreement_flagbuilding_residential_density_scorebuilding_cluster_compactnessbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_scorelandcover_builtup_sharelandcover_water_sharelandcover_tree_forest_sharelandcover_crop_sharelandcover_non_residential_exclusion_sharedynamic_world_built_probabilityresidential_eligible_area_sharephysical_hard_cell_flaglandcover_proxy_source_flaghard_exclusion_sharesoft_non_residential_downweight_sharehard_mask_reason_codebuilt_form_proxy_scoreresidential_confidence_proberesidential_eligible_area_share_v2physical_exclusion_scoreresidential_filter_confidenceresidential_filter_statusnon_residential_exclusion_scoreresidential_statusmorph_dense_old_city_scoremorph_informal_dense_scoremorph_highrise_affordable_scoremorph_cbd_industrial_scoremorph_periurban_vacant_scoremorphology_proxy_source_flagincome_public_affluence_context_scoreincome_public_deprivation_context_scoreincome_public_asset_affluence_scoreincome_public_amenity_deficit_scoreincome_public_license_status_codeincome_public_proxy_source_flagincome_gate_city_segment_codeincome_gate_city_segment_prior_0_10lpa_probincome_gate_city_prior_0_10lpa_probincome_gate_district_income_pciincome_gate_district_income_affluence_scoreincome_gate_admin_affluence_scoreincome_gate_admin_deprivation_scoreincome_gate_nfhs_affluence_scoreincome_gate_nfhs_deprivation_scoreincome_gate_shrug_rwi_scoreincome_gate_shrug_consumption_scoreincome_gate_shrug_asset_affluence_scoreincome_gate_meta_rwi_rawincome_gate_meta_rwi_scoreincome_gate_meta_rwi_errorincome_gate_cell_wealth_scoreincome_gate_external_affluence_scoreincome_gate_source_countincome_gate_confidenceincome_gate_granularity_codeincome_gate_granularityincome_gate_statusincome_0_10lpa_prob_pre_gate_proxyincome_gate_prob_candidateincome_gate_final_weightincome_gate_adjustment_deltaincome_0_10lpa_prob_probeincome_proxy_confidenceincome_0_10lpa_prob_lower_v3income_0_10lpa_prob_upper_v3income_gate_context_affluence_scorebase_affluence_proxy_scoreaffluence_proxy_scoreincome_proxy_statuspmgsy_road_nearest_distance_kmpmgsy_road_within_2kmpmgsy_road_source_availablepmgsy_road_distance_missingpoi_education_count_cellpoi_education_count_2kmpoi_police_count_cellpoi_police_count_2kmpoi_airport_count_cellpoi_airport_count_2kmpoi_energy_count_cellpoi_energy_count_2kmmumbai_slum_area_m2mumbai_slum_sharelanduse_overlap_shareroad_distance_mroad_intersection_densitysettlement_cluster_sizeaddressability_scorepoi_service_mix_scoreosm_mapping_coverage_scoreoverture_road_coverage_scoreconversion_proxy_source_flagroad_access_scorepoi_service_access_scoreserviceability_supply_friction_scoreserviceable_prob_probeserviceability_confidencemap_coverage_confidenceconversion_feasibility_scoreconversion_feasibility_confidenceconversion_feasibility_statusserviceability_statusmlab_measurement_coverage_scoreconnectivity_readiness_scoreexecution_proxy_source_flagexecution_readiness_signal_countexecution_readiness_scoreexecution_readiness_confidenceexecution_readiness_statusgraph_ctx_neighbor_mean_gobi_building_count_cellgraph_ctx_self_minus_neighbor_gobi_building_count_cellgraph_ctx_neighbor_mean_gobi_building_area_sum_m2graph_ctx_self_minus_neighbor_gobi_building_area_sum_m2graph_ctx_neighbor_mean_gobi_building_area_sharegraph_ctx_self_minus_neighbor_gobi_building_area_sharegraph_ctx_neighbor_mean_gobi_building_area_density_km2graph_ctx_self_minus_neighbor_gobi_building_area_density_km2graph_ctx_neighbor_mean_gobi_building_area_mean_m2graph_ctx_self_minus_neighbor_gobi_building_area_mean_m2graph_ctx_neighbor_mean_gobi_building_confidence_meangraph_ctx_self_minus_neighbor_gobi_building_confidence_meangraph_ctx_neighbor_mean_msft_building_count_cellgraph_ctx_self_minus_neighbor_msft_building_count_cellgraph_ctx_neighbor_mean_msft_building_area_sum_m2graph_ctx_self_minus_neighbor_msft_building_area_sum_m2graph_ctx_neighbor_mean_msft_building_area_sharegraph_ctx_self_minus_neighbor_msft_building_area_sharegraph_ctx_neighbor_mean_msft_building_area_density_km2graph_ctx_self_minus_neighbor_msft_building_area_density_km2graph_ctx_neighbor_mean_msft_building_area_mean_m2graph_ctx_self_minus_neighbor_msft_building_area_mean_m2graph_ctx_neighbor_mean_msft_building_height_mean_mgraph_ctx_self_minus_neighbor_msft_building_height_mean_mgraph_ctx_neighbor_mean_building_count_cell_bestgraph_ctx_self_minus_neighbor_building_count_cell_bestgraph_ctx_neighbor_mean_building_area_share_bestgraph_ctx_self_minus_neighbor_building_area_share_bestgraph_ctx_neighbor_mean_building_area_density_km2_bestgraph_ctx_self_minus_neighbor_building_area_density_km2_bestgraph_ctx_neighbor_mean_building_residential_density_scoregraph_ctx_self_minus_neighbor_building_residential_density_scoregraph_ctx_neighbor_mean_building_population_per_buildinggraph_ctx_self_minus_neighbor_building_population_per_buildinggraph_ctx_neighbor_mean_building_footprint_area_per_person_m2graph_ctx_self_minus_neighbor_building_footprint_area_per_person_m2graph_ctx_neighbor_mean_building_source_coverage_flaggraph_ctx_self_minus_neighbor_building_source_coverage_flaggraph_ctx_neighbor_mean_building_msft_gobi_disagreement_flaggraph_ctx_self_minus_neighbor_building_msft_gobi_disagreement_flaggraph_ctx_neighbor_mean_ghsl_population_estgraph_ctx_self_minus_neighbor_ghsl_population_estgraph_ctx_neighbor_mean_ghsl_builtup_sharegraph_ctx_self_minus_neighbor_ghsl_builtup_sharegraph_ctx_neighbor_mean_ghsl_settlement_scoregraph_ctx_self_minus_neighbor_ghsl_settlement_scoregraph_ctx_neighbor_mean_landcover_builtup_sharegraph_ctx_self_minus_neighbor_landcover_builtup_sharegraph_ctx_neighbor_mean_landcover_water_sharegraph_ctx_self_minus_neighbor_landcover_water_sharegraph_ctx_neighbor_mean_landcover_tree_forest_sharegraph_ctx_self_minus_neighbor_landcover_tree_forest_sharegraph_ctx_neighbor_mean_landcover_crop_sharegraph_ctx_self_minus_neighbor_landcover_crop_sharegraph_ctx_neighbor_mean_landcover_non_residential_exclusion_sharegraph_ctx_self_minus_neighbor_landcover_non_residential_exclusion_sharegraph_ctx_neighbor_mean_residential_eligible_area_sharegraph_ctx_self_minus_neighbor_residential_eligible_area_sharegraph_ctx_neighbor_mean_physical_hard_cell_flaggraph_ctx_self_minus_neighbor_physical_hard_cell_flaggraph_ctx_neighbor_mean_road_distance_mgraph_ctx_self_minus_neighbor_road_distance_mgraph_ctx_neighbor_mean_road_intersection_densitygraph_ctx_self_minus_neighbor_road_intersection_densitygraph_ctx_neighbor_mean_settlement_cluster_sizegraph_ctx_self_minus_neighbor_settlement_cluster_sizegraph_ctx_neighbor_mean_building_cluster_compactnessgraph_ctx_self_minus_neighbor_building_cluster_compactnessgraph_ctx_neighbor_mean_addressability_scoregraph_ctx_self_minus_neighbor_addressability_scoregraph_ctx_neighbor_mean_poi_service_mix_scoregraph_ctx_self_minus_neighbor_poi_service_mix_scoregraph_ctx_neighbor_mean_osm_mapping_coverage_scoregraph_ctx_self_minus_neighbor_osm_mapping_coverage_scoregraph_ctx_neighbor_mean_mlab_measurement_coverage_scoregraph_ctx_self_minus_neighbor_mlab_measurement_coverage_scoregraph_ctx_neighbor_mean_connectivity_readiness_scoregraph_ctx_self_minus_neighbor_connectivity_readiness_scoregraph_ctx_neighbor_mean_poi_education_count_2kmgraph_ctx_self_minus_neighbor_poi_education_count_2kmgraph_ctx_neighbor_mean_poi_police_count_2kmgraph_ctx_self_minus_neighbor_poi_police_count_2kmgraph_ctx_neighbor_mean_poi_airport_count_2kmgraph_ctx_self_minus_neighbor_poi_airport_count_2kmgraph_ctx_neighbor_mean_poi_energy_count_2kmgraph_ctx_self_minus_neighbor_poi_energy_count_2kmgraph_ctx_neighbor_mean_nightlight_log1p_mean_2024graph_ctx_self_minus_neighbor_nightlight_log1p_mean_2024graph_ctx_neighbor_mean_nightlight_mean_growth_2012_2024graph_ctx_self_minus_neighbor_nightlight_mean_growth_2012_2024graph_ctx_neighbor_mean_Riskgraph_ctx_self_minus_neighbor_Riskgraph_ctx_neighbor_mean_MaxFractiongraph_ctx_self_minus_neighbor_MaxFractiongraph_ctx_neighbor_mean_mumbai_slum_sharegraph_ctx_self_minus_neighbor_mumbai_slum_sharegraph_ctx_neighbor_mean_worldpop_density_people_per_km2graph_ctx_self_minus_neighbor_worldpop_density_people_per_km2graph_ctx_neighbor_mean_worldpop_population_est_nearestgraph_ctx_self_minus_neighbor_worldpop_population_est_nearestgraph_ctx_neighbor_mean_pmgsy_road_nearest_distance_kmgraph_ctx_self_minus_neighbor_pmgsy_road_nearest_distance_kmgraph_ctx_neighbor_mean_pmgsy_road_within_2kmgraph_ctx_self_minus_neighbor_pmgsy_road_within_2kmgraph_ctx_neighbor_mean_pos_radius_kmgraph_ctx_self_minus_neighbor_pos_radius_kmgraph_degreegross_tam_0_10lpa_probeserviceable_tam_0_10lpa_probeacquirable_tam_0_10lpa_probehouseholds_denominator_v2eligible_households_v2gross_tam_0_10lpa_v2serviceable_tam_0_10lpa_v2acquirable_tam_0_10lpa_v2households_residential_v3households_residential_v3_lowerhouseholds_residential_v3_upperscope_share_in_scope_v3scope_share_status_v3gross_tam_0_10lpa_v3gross_tam_0_10lpa_v3_lowergross_tam_0_10lpa_v3_upperserviceable_tam_0_10lpa_v3acquirable_tam_0_10lpa_v3component_confidence_scorepriority_score_0_100component_confidence_score_v2priority_score_v2_0_100tam_v2_statuscomponent_confidence_score_v3priority_score_v3_0_100tam_v3_statuscalibration_statusreason_codesActual scoring math
The scoring path is deterministic and ordered: households first, then residential likelihood, income-band probability, conversion/serviceability, execution readiness, v2/v3 TAM outputs, and the no-vendor power score used by maps. The exact formulas come from outputs/tam_gap_closure/tam_gap_closure_manifest.json; vendor TAM and G1 are excluded from all formulas.
households_est_primary_probefirst_non_null(households_est_worldpop_census_avg_size, worldpop_households_est_avg_size_4_6, households_est_uniform_district_density, 0), clipped at lower bound 0Intuition: Estimate how many households live in the cell before any TAM probability is applied.
- clip(census_2011_avg_household_size, 3.0, 7.5), with missing filled as 4.6
- worldpop_population_est_nearest / household_size_census_context
- clip(0.35*population_quality + 0.25*distance_quality + 0.25*census_quality + 0.15*disagreement_quality, 0, 1)
Guardrail: WorldPop and Census district context are independent inputs; vendor TAM and G1 are not read.
residential_confidence_probemin(raw_residential_confidence, 0.84 if building_source_coverage_flag > 0 else 0.80)Intuition: Down-weight cells that look industrial, sparse, or weakly residential before counting them as addressable households.
- clip(0.50*city_rank(worldpop_density_people_per_km2) + 0.20*city_rank(poi_context) + 0.30*city_rank(building_residential_density_score), 0, 1)
- clip(5.0 * mumbai_slum_share, 0, 1)
- 0.35 when airport_or_energy_poi_count > 0 and worldpop_density_city_rank < 0.35, else 0.0
Guardrail: Capped because land-cover and true residential masks are not production-complete yet.
income_0_10lpa_prob_probeblend pre-gate and gate candidate with weight clip(0.40 + 0.34*income_gate_confidence, 0, 0.74) when gate prior is availableIntuition: Treat 0-10 LPA as broad affordability eligibility: brighter, denser, more asset-rich cells are less likely to be in the lower band, while HLPCA amenity deficit increases the probability.
- clip(0.50*city_rank(nightlight_log1p_mean_2024) + 0.25*city_rank(poi_context) + 0.15*city_rank(worldpop_density_people_per_km2) - 0.10*slum_residential_signal, 0, 1)
- when HLPCA present: clip(0.45*census_hl_asset_affluence_score + 0.25*census_hl_housing_quality_score + 0.20*census_hl_basic_amenity_score + 0.10*(1 - census_hl_amenity_deficit_score), 0, 1)
- legacy base/HLPCA affluence blended toward income_gate_context_affluence_score with weight clip(0.20 + 0.32*income_gate_confidence, 0, 0.52)
- clip(0.30 + 0.20*nightlight_present + 0.16*census_hl_context_present + 0.18*income_gate_confidence + 0.06*poi_context_positive + 0.05*slum_signal_positive, 0, 0.86)
Guardrail: HLPCA is 2011 district context from a scraped mirror and stays probe-only until official Houselisting validation and MPCE calibration pass.
serviceable_prob_probeclip(0.35 + 0.55*serviceability_supply_friction_score, 0.20, 0.90)Intuition: Convert public road, POI, and graph accessibility into a rough serviceability multiplier.
- if pmgsy_road_source_available > 0 then 1 - clip(pmgsy_road_nearest_distance_km / 5.0, 0, 1), else 0.40
- clip(0.45*road_access_score + 0.35*city_rank(poi_context) + 0.20*city_rank(graph_degree), 0, 1)
- clip(0.55*map_coverage_confidence + 0.45*poi_context_present, 0, 0.80)
Guardrail: Internal branch, capacity, partner, cost, and operations coverage are not present, so this remains a probe.
gross_v2, predicted_power, serviceable, acquirable, priority_v2h_residential_households_base * income_0_10lpa_prob_probe * scope_share_in_scope_v3 | predicted_tam_0_10lpa_power = globally rescaled power(gross_tam_0_10lpa_v3, gamma=0.60), preserving gross TAM total | gross_tam_0_10lpa_v3 * conversion_feasibility_score | serviceable_tam_0_10lpa_v3 * execution_readiness_score | clip(100 * city_rank(acquirable_tam_0_10lpa_v3) * component_confidence_score_v3, 0, 100)Intuition: The current map score is not a learned vendor-TAM model: it transforms the no-vendor gross v3 TAM surface into a TAM-like score while preserving the base total.
- city_rank(x) = pandas groupby(city).rank(pct=True, method='average'), then fill missing with 0.5 and clip to [0, 1]
- clip(0.35*denominator_confidence + 0.25*residential_confidence_probe + 0.20*income_proxy_confidence + 0.20*serviceability_confidence, 0, 1)
- legacy probe replay remains: clip(households_est_primary_probe * income_0_10lpa_prob_probe * residential_confidence_probe, lower=0) | gross_tam_0_10lpa_probe * serviceable_prob_probe | clip(100 * city_rank(serviceable_tam_0_10lpa_probe) * component_confidence_score, 0, 100)
Guardrail: Vendor TAM and G1 are benchmark-only after formulas are frozen; the power score uses no vendor scaling.
grid_3335Delhi: values below are recomputed from the CSV row, without benchmark labels.
22324.6 households * 0.674 income * 0.655 residential = 9861.79861.7 gross * 0.671 serviceable = 6621.00.35*0.833 + 0.25*0.655 + 0.20*0.842 + 0.20*0.643 = 0.752100 * city_rank(6621.0) * 0.752 = 75.2Current v3 math funnel dry run
Dry runs now follow the current v3 score family.
The table below recomputes each displayed step from generated artifacts. The power step uses policy global_no_vendor_base_total_preserved; the scale constant is recomputed inside the artifact being scored, so vendor-grid and full-India runs can have different constants.
Vendor TAM and G1 are not inputs to any row in this funnel; they remain benchmark-only diagnostics later in the page.
| grid | city | funnel step | dry-run math | computed | artifact | delta | status |
|---|---|---|---|---|---|---|---|
grid_636 | Saharanpur | residential households v3 | source-derived H_residential | 11041.102687 | 11041.102687 | 0 | pass |
grid_636 | Saharanpur | gross v3 TAM | 11041.10 * income_prob 0.8432 | 9309.671076 | 9309.671076 | 0 | pass |
grid_636 | Saharanpur | serviceable v3 TAM | 9309.67 * conversion 0.8834 | 8223.825574 | 8223.825574 | 0 | pass |
grid_636 | Saharanpur | acquirable v3 TAM | 8223.83 * execution 0.7772 | 6391.550980 | 6391.550980 | 0 | pass |
grid_636 | Saharanpur | priority v3 score | 100 * city_rank 1.0000 * confidence 0.7261 | 72.605000 | 72.605000 | 0 | pass |
grid_636 | Saharanpur | predicted power TAM | gross_v3^0.60 * artifact_scale 21.8950 | 5268.734394 | 4528.074581 | 740.66 | review |
grid_639 | Saharanpur | residential households v3 | source-derived H_residential | 9879.815405 | 9879.815405 | 0 | pass |
grid_639 | Saharanpur | gross v3 TAM | 9879.82 * income_prob 0.8423 | 8321.867968 | 8321.867968 | 0 | pass |
grid_639 | Saharanpur | serviceable v3 TAM | 8321.87 * conversion 0.9019 | 7505.308004 | 7505.308004 | 0 | pass |
grid_639 | Saharanpur | acquirable v3 TAM | 7505.31 * execution 0.7800 | 5854.140243 | 5854.140243 | 0 | pass |
grid_639 | Saharanpur | priority v3 score | 100 * city_rank 0.9859 * confidence 0.7261 | 71.582394 | 71.582394 | 0 | pass |
grid_639 | Saharanpur | predicted power TAM | gross_v3^0.60 * artifact_scale 21.8950 | 4925.816125 | 4197.738645 | 728.077 | review |
Formula replay dry run
This recomputes final probe arithmetic from the generated CSV values without rerunning feature builders or reading benchmark labels. A pass means the HTML-visible math matches the written scoring columns.
| grid | city | output | recomputed | written | absolute delta | status |
|---|---|---|---|---|---|---|
grid_3335 | Delhi | gross_tam_0_10lpa_probe | 9861.666591 | 9861.666591 | 0 | pass |
grid_3335 | Delhi | serviceable_tam_0_10lpa_probe | 6621.000371 | 6621.000371 | 0 | pass |
grid_3335 | Delhi | component_confidence_score | 0.752394 | 0.752394 | 0 | pass |
grid_3335 | Delhi | priority_score_0_100 | 75.239389 | 75.239389 | 0 | pass |
grid_3333 | Delhi | gross_tam_0_10lpa_probe | 8933.315395 | 8933.315395 | 0 | pass |
grid_3333 | Delhi | serviceable_tam_0_10lpa_probe | 6290.567568 | 6290.567568 | 0 | pass |
grid_3333 | Delhi | component_confidence_score | 0.748927 | 0.748927 | 0 | pass |
grid_3333 | Delhi | priority_score_0_100 | 74.845875 | 74.845875 | 0 | pass |
grid_3291 | Delhi | gross_tam_0_10lpa_probe | 9592.349986 | 9592.349986 | 0 | pass |
grid_3291 | Delhi | serviceable_tam_0_10lpa_probe | 6070.121154 | 6070.121154 | 0 | pass |
grid_3291 | Delhi | component_confidence_score | 0.747012 | 0.747012 | 0 | pass |
grid_3291 | Delhi | priority_score_0_100 | 74.607896 | 74.607896 | 0 | pass |
grid_3328 | Delhi | gross_tam_0_10lpa_probe | 8165.752869 | 8165.752869 | 0 | pass |
grid_3328 | Delhi | serviceable_tam_0_10lpa_probe | 5605.253802 | 5605.253802 | 0 | pass |
grid_3328 | Delhi | component_confidence_score | 0.749787 | 0.749787 | 0 | pass |
grid_3328 | Delhi | priority_score_0_100 | 74.838171 | 74.838171 | 0 | pass |
grid_1176 | Mumbai | gross_tam_0_10lpa_probe | 7858.588816 | 7858.588816 | 0 | pass |
grid_1176 | Mumbai | serviceable_tam_0_10lpa_probe | 5571.061529 | 5571.061529 | 0 | pass |
grid_1176 | Mumbai | component_confidence_score | 0.770855 | 0.770855 | 0 | pass |
grid_1176 | Mumbai | priority_score_0_100 | 77.085534 | 77.085534 | 0 | pass |
Building Footprints
The HTML now reflects the newer Microsoft and combined building-density code path, while making the current artifact coverage explicit.
Microsoft is wired in.
Stage 1 can select AOI shards and Stage 2 has Microsoft and combined building fields.
Microsoft coverage is active.
The current Microsoft manifest contributes footprint coverage to the combined signal.
| source | status | coverage |
|---|---|---|
| Google Open Buildings | ok | 9.3% |
| Microsoft Global Buildings | ok | 91.6% |
| Combined building signal | ok | 95.8% |
Building-related columns
gobi_building_count_cellgobi_building_area_sum_m2gobi_building_area_sharegobi_building_area_density_km2gobi_building_area_mean_m2gobi_building_confidence_meanmsft_building_count_cellmsft_building_area_sum_m2msft_building_area_sharemsft_building_area_density_km2msft_building_area_mean_m2msft_building_height_mean_mbuilding_count_cell_bestbuilding_area_sum_m2_bestbuilding_count_density_per_km2_bestbuilding_area_density_km2_bestbuilding_area_share_bestbuilding_source_coverage_countbuilding_residential_density_scorebuilding_source_coverage_flagbuilding_population_per_buildingbuilding_footprint_area_per_person_m2building_msft_gobi_disagreement_flagbuilding_height_mean_mbuilding_floor_count_proxybuilding_volume_proxy_m3building_vertical_density_proxybuilding_residential_volume_proxy_m3building_compactness_scorebuilding_cluster_compactness24 shards and expected 1617.8 MB; the run status is ok. Current combined building coverage reflects available Google and Microsoft footprint signals.Predicted TAM Layer
The newer pipeline promotes one explicit score column for maps and summaries: predicted_tam_0_10lpa_power. It is a deterministic transform of the no-vendor gross TAM base, not a learned vendor-TAM model.
Magnitude stays count-like.
The transform reshapes the distribution to improve benchmark-grid Pearson agreement while preserving Spearman order and preserving the source-derived gross-TAM total.
The selected score is used consistently by the map, full-India scorer, notebook metric summary, and post-hoc G1 suite.
predicted_tam_0_10lpa_powerSelected score used by map, full-India score index, notebook summary, and post-hoc suite.
gross_tam_0_10lpa_v3Source-derived gross TAM base. Vendor TAM is not used for scaling.
Fixed monotone power exponent selected by Cell-2 transform diagnostics.
global_no_vendor_base_total_preservedPower weights are rescaled to the no-vendor base total, not to a vendor mean.
Benchmark-grid sanity metric only.
Order-only ceiling diagnostic; not TAM magnitude.
Cell-2 transform decision
| field | value | status |
|---|---|---|
| primary_map_mechanism | power_060_scaled_gross_tam_v2 | current |
| rank_ceiling_mechanism | rank_pct_for_order_diagnostic_only | current |
| rejected_primary_mechanism | vendor_mean_scaled_rank_pct | current |
| map_predicted_vs_vendor_pearson | 0.7881785302692829 | current |
| map_predicted_vs_vendor_spearman | 0.8349283824789576 | current |
| map_predicted_vs_g1_spearman | 0.7469475181022232 | current |
| top10_predicted_power_g1_capture | 0.46703212689927004 | current |
| top10_vendor_tam_g1_capture | 0.42434916553876123 | current |
| city_holdout_invalidates_full_caught_up_claim | True | current |
Headline power checks
| check | value | note |
|---|---|---|
predicted_vs_vendor_pearson | 0.788179 | headline check |
predicted_vs_vendor_spearman | 0.834928 | headline check |
rank_pct_transform_ceiling_r2 | 0.557931 | headline check |
full_india_scored_cells | 2905288.000000 | headline check |
vendor_comparison_cells | 7029.000000 | headline check |
Holdout transform comparison
| group | mechanism | groups | metric 2 R2 median | metric 3 R2 median | caught-up median |
|---|---|---|---|---|---|
| city | identity | 29 | 0.359 | 0.400 | 0.969 |
| city | log1p | 29 | 0.614 | 0.592 | 1.041 |
| city | power_060 | 29 | 0.648 | 0.654 | 0.992 |
| city | rank_pct | 29 | 0.649 | 0.659 | 1.005 |
| city | sqrt1p | 29 | 0.531 | 0.538 | 0.953 |
| city | yeo_johnson | 29 | 0.648 | 0.654 | 0.992 |
| spatial_block | identity | 11 | 0.310 | 0.321 | 0.998 |
| spatial_block | log1p | 11 | 0.543 | 0.454 | 1.097 |
| spatial_block | power_060 | 11 | 0.584 | 0.549 | 1.031 |
| spatial_block | rank_pct | 11 | 0.583 | 0.541 | 1.040 |
| spatial_block | sqrt1p | 11 | 0.430 | 0.415 | 1.071 |
| spatial_block | yeo_johnson | 11 | 0.584 | 0.549 | 1.031 |
Top-k G1 capture from transform report
| score | top fraction | grid count | G1 capture | lift |
|---|---|---|---|---|
predicted_power_tam | 5.0% | 352 | 23.0% | 4.60 |
vendor_tam | 5.0% | 352 | 24.3% | 4.86 |
predicted_power_tam | 10.0% | 703 | 46.7% | 4.67 |
vendor_tam | 10.0% | 703 | 42.4% | 4.24 |
predicted_power_tam | 20.0% | 1,406 | 71.2% | 3.56 |
vendor_tam | 20.0% | 1,406 | 67.5% | 3.38 |
True. This section documents the chosen display/scoring layer; it does not authorize production accuracy claims.Full-India Map & Scores
The new full-India path separates geometry indexing from scoring. Geometry is built from the 0.01-degree India grid; scoring then writes a source-derived score CSV, compact score index, and manifest.
Every indexed 0.01-degree India cell scored in the current manifest.
Degrees per cell; about 1.1 km north-south.
Groups from the district/city context join.
Boundary grid candidates before India-boundary filtering.
Compact row-major map index dimensions.
No-vendor gross-TAM total preserved after the power transform.
Full-India score distributions
| score column | non-null cells | total | p50 | p90 | max | meaning |
|---|---|---|---|---|---|---|
predicted_tam_0_10lpa_power | 2,905,288 | 118,559,058 | 30.56 | 80.87 | 2741.69 | probe distribution |
gross_tam_0_10lpa_v3 | 2,905,288 | 118,559,058 | 16.20 | 81.99 | 29119.34 | probe distribution |
serviceable_tam_0_10lpa_v3 | 2,905,288 | 48,898,077 | 6.22 | 32.07 | 14441.63 | probe distribution |
acquirable_tam_0_10lpa_v3 | 2,905,288 | 8,153,552 | 1.04 | 5.35 | 2407.43 | probe distribution |
priority_score_v3_0_100 | 2,905,288 | 86033814.6 | 28.87 | 55.03 | 68.68 | probe distribution |
Full-India source status
| source family | current status detail | mode |
|---|---|---|
grid_index | {"cell_count":2905288} | source-derived |
district_context | {"district_join_share":1.0} | source-derived |
worldpop_population | {"coverage_share":1.0,"median_nearest_distance_km":0.3567097458094222,"p95_nearest_distance_km":0.5752521808981806,"source_rows":4010402} | source-derived |
poi_context | {"education_source_points":19502,"police_source_points":16459,"airport_source_points":2710,"energy_source_points":534} | source-derived |
formula_inputs | {"building_footprints":"not_rebuilt_for_direct_full_india_scoring_zero_coverage_flags_used","pmgsy_roads":"not_rebuilt_for_direct_full_india_scoring_missing_state_fallback_used","graph_degree":"regular_grid_degree_8_proxy_for_full_india_direct_scoring"} | source-derived |
Score pass/fail checks
| check | value |
|---|---|
| all_grid_cells_scored | True |
| grid_step_is_0_01_degree | True |
| vendor_tam_used_as_feature | False |
| score_index_written | True |
Geometry-index pass/fail checks
| check | value |
|---|---|
| grid_index_written | True |
| all_grid_cells_indexed | True |
| grid_step_is_0_01_degree | True |
| vendor_tam_used_as_production_feature | False |
| coarse_0_05_grid_published | False |
vendor_tam_available_for_full_india as False and calibration status as probe_not_production_calibrated.Diagnostics Gate
Stage 3 runs current benchmark diagnostics and claim-boundary checks, but it does not train, tune, calibrate, or fit against vendor TAM or G1.
Benchmark-only rows join.
Probe and v2 component columns compare to vendor TAM for audit and sanity checks. These rows are not the current map-score metric.
No production model metric.
The current diagnostics explicitly block accuracy claims until valid holdouts exist. The headline map score remains predicted_tam_0_10lpa_power, reported in the Predicted TAM and Notebook sections.
Component probe/v2 vs vendor checks
gross_tam_0_10lpa_probe. Treat it as component QA; the current map-score Pearson/Spearman values are reported separately for predicted_tam_0_10lpa_power.| candidate | n | Pearson r | Spearman r | WMAPE | Top-10 overlap | validity |
|---|---|---|---|---|---|---|
gross_tam_0_10lpa_probe | 7,029 | 0.731 | 0.797 | 0.558 | 0.560 | benchmark only |
serviceable_tam_0_10lpa_probe | 7,029 | 0.735 | 0.805 | 0.646 | 0.555 | benchmark only |
priority_score_0_100 | 7,029 | 0.607 | 0.749 | 0.978 | 0.408 | benchmark only |
gross_tam_0_10lpa_v2 | 7,029 | 0.763 | 0.837 | 0.546 | 0.620 | benchmark only |
serviceable_tam_0_10lpa_v2 | 7,029 | 0.733 | 0.845 | 0.683 | 0.582 | benchmark only |
acquirable_tam_0_10lpa_v2 | 7,029 | 0.652 | 0.828 | 0.804 | 0.512 | benchmark only |
priority_score_v2_0_100 | 7,029 | 0.613 | 0.778 | 0.978 | 0.428 | benchmark only |
gross_tam_0_10lpa_v3 | 7,029 | 0.764 | 0.842 | 0.499 | 0.616 | benchmark only |
serviceable_tam_0_10lpa_v3 | 7,029 | 0.737 | 0.848 | 0.607 | 0.587 | benchmark only |
acquirable_tam_0_10lpa_v3 | 7,029 | 0.666 | 0.834 | 0.749 | 0.519 | benchmark only |
priority_score_v3_0_100 | 7,029 | 0.619 | 0.783 | 0.978 | 0.448 | benchmark only |
G1 post-hoc holdout
| candidate | kind | n | G1 hits | Spearman r | Top-10 G1 capture | validity |
|---|---|---|---|---|---|---|
geoiq_vendor_tam_benchmark | benchmark_label | 7,029 | 176,581 | 0.709 | 42.4% | benchmark_label_posthoc_holdout |
Notebook & Post-Hoc Metrics
The notebook-facing summaries were updated to use the predicted power TAM column. They are useful for reporting, but remain post-hoc diagnostics because they join to vendor TAM and G1 after score generation.
Same score, multiple views.
The short summary, city-wise table, and post-hoc suite all use predicted_tam_0_10lpa_power.
These outputs describe agreement and ranking behavior; they do not feed back into feature construction or transform scaling.
Notebook short metric summary
| order | metric | n | Pearson r | Pearson R2 | Spearman R2 | caught-up pct |
|---|---|---|---|---|---|---|
| 1 | predicted_tam_vs_vendor_tam | 7,029 | 0.790 | 0.623 | 0.708 | n/a |
| 2 | predicted_tam_vs_g1_hits | 7,029 | 0.435 | 0.189 | 0.552 | n/a |
| 3 | vendor_tam_vs_g1_hits | 7,029 | 0.406 | 0.165 | 0.502 | n/a |
| 4 | metric_2_divided_by_metric_3_caught_up | 7,029 | 1.072 | 1.149 | 1.100 | 114.9% |
Post-hoc G1 overall suite
| metric | n | Pearson r | Pearson R2 | Spearman r | Spearman R2 | log1p R2 |
|---|---|---|---|---|---|---|
predicted_tam_vs_vendor_tam | 7,029 | 0.790 | 0.623 | 0.842 | 0.708 | 0.566 |
predicted_tam_vs_g1_hits | 7,029 | 0.435 | 0.189 | 0.743 | 0.552 | 0.492 |
vendor_tam_vs_g1_hits | 7,029 | 0.406 | 0.165 | 0.709 | 0.502 | 0.376 |
metric_2_divided_by_metric_3_caught_up | 7,029 | 1.072 | 1.149 | 1.049 | 1.100 | 1.307 |
Post-hoc top-k suite
| score | top fraction | grid count | G1 capture | lift | NDCG |
|---|---|---|---|---|---|
predicted_tam | 5.0% | 351 | 23.6% | 4.73 | 0.308 |
predicted_tam | 10.0% | 703 | 46.4% | 4.64 | 0.455 |
predicted_tam | 20.0% | 1,406 | 70.1% | 3.51 | 0.575 |
vendor_tam | 5.0% | 351 | 24.3% | 4.86 | 0.314 |
vendor_tam | 10.0% | 703 | 42.4% | 4.24 | 0.421 |
vendor_tam | 20.0% | 1,406 | 67.5% | 3.38 | 0.554 |
Leakage Policy
The strongest part of the current pipeline is the explicit claim boundary around forbidden labels and invalidated metrics.
| flag | value |
|---|---|
| vendor_tam_used_as_feature | False |
| vendor_tam_used_as_benchmark_label | True |
| vendor_tam_used_as_training_label | False |
| vendor_tam_trained_diagnostics_excluded | True |
| g1_used_for_training | False |
| g1_used_as_feature | False |
| g1_used_for_source_selection | False |
| random_cv_allowed_for_accuracy_claim | False |
| production_accuracy_claim_allowed | False |
| check | value |
|---|---|
| spatial_holdout_metrics_present | False |
| city_holdout_metrics_present | False |
| g1_holdout_diagnostics_present | True |
| redundancy_review_required | True |
| multicollinearity_review_required | True |
| city_confounding_review_required | False |
| prediction_calibration_review_required | True |
| production_accuracy_claim_allowed | False |
Vendor TAM appears as a benchmark label, not as a feature or training label. G1 appears only after the fact. Random CV is not allowed for production accuracy claims.
Generated Outputs
The updated HTML is an audit reader over the files the current pipeline writes, grouped by the stage that owns them.
Generated outputs by stage
| stage | output | path | status | current meaning |
|---|---|---|---|---|
| get_data | source_fetch_manifest_json | outputs/source_fetch/source_fetch_manifest.json | present json / 78.8 KB | Source registry fetch status, local file hashes, access notes, and next actions. |
| get_data | source_fetch_manifest_csv | outputs/source_fetch/source_fetch_manifest.csv | present 49 rows | Tabular source-fetch ledger for review. |
| get_data | direct_prior_art_download_manifest_json | outputs/source_fetch/direct_prior_art_download_manifest.json | present json / 26.8 KB | Direct prior-art payload plan and availability record. |
| get_data | direct_prior_art_download_manifest_csv | outputs/source_fetch/direct_prior_art_download_manifest.csv | present 26 rows | Tabular direct-prior-art payload ledger. |
| get_data | microsoft_buildings_aoi_manifest_json | outputs/source_fetch/microsoft_buildings_aoi_manifest.json | present json / 1.1 KB | AOI shard selection and Microsoft building-footprint staging status. |
| get_data | microsoft_buildings_aoi_manifest_csv | outputs/source_fetch/microsoft_buildings_aoi_manifest.csv | present 24 rows | Tabular Microsoft AOI shard/staging record. |
| get_data | ghsl_builtup_tiles_manifest_json | outputs/source_fetch/ghsl_builtup_2020_4326_30ss_tiles_manifest.json | present json / 19.0 KB | GHSL 2020 built surface/volume AOI tile staging manifest. |
| get_data | ghsl_builtup_tiles_manifest_csv | outputs/source_fetch/ghsl_builtup_2020_4326_30ss_tiles_manifest.csv | present 24 rows | Tabular GHSL built surface/volume tile ledger. |
| get_data | income_gate_source_fetch_manifest_json | outputs/source_fetch/income_gate/income_gate_source_fetch_manifest.json | present json / 30.0 KB | Income-gate public-source fetch manifest and access ledger. |
| get_data | income_gate_source_fetch_manifest_csv | outputs/source_fetch/income_gate/income_gate_source_fetch_manifest.csv | present 19 rows | Tabular income-gate public-source fetch ledger. |
| source_layer | source_layer_features_manifest | outputs/source_layers/source_layer_cell_features_manifest.json | present json / 8.8 KB | Manifest for source-layer cell-feature materialization. |
| source_layer | source_layer_contracts_json | outputs/source_layers/source_layer_contracts.json | present json / 17.2 KB | Source-layer contracts, expected fields, source families, and promotion gates. |
| source_layer | source_layer_contracts_csv | outputs/source_layers/source_layer_contracts.csv | present 9 rows | Tabular source-layer contract ledger. |
| source_layer | worldpop_source_layer_features | outputs/source_layers/worldpop_population_density/2020_ascii_xyz_local/cell_features.csv | present 7,029 rows | WorldPop population-density source-layer cell features. |
| source_layer | census_controls_source_layer_features | outputs/source_layers/census_2011_official_controls/2011_fixed_release/cell_features.csv | present 7,029 rows | Census 2011 official-control source-layer cell features. |
| source_layer | building_source_layer_features | outputs/source_layers/building_footprints_google_microsoft/google_2023_msft_staged_snapshot/cell_features.csv | present 7,029 rows | Google/Microsoft building footprint, height, volume, and disagreement source-layer features. |
| source_layer | ghsl_source_layer_features | outputs/source_layers/ghsl_population_builtup/named_release_required/cell_features.csv | present 7,029 rows | GHSL population, built surface, built volume, height, settlement, and residential-candidate source-layer features. |
| source_layer | landcover_source_layer_features | outputs/source_layers/esa_worldcover_dynamic_world/worldcover_2020_2021_dynamic_world_pinned_window_required/cell_features.csv | present 7,029 rows | ESA/Dynamic World landcover proxy, physical exclusion, and residential eligibility source-layer features. |
| source_layer | morphology_source_layer_features | outputs/source_layers/residential_morphology_tags/building_landcover_public_proxy_v1/cell_features.csv | present 7,029 rows | Residential morphology proxy source-layer features. |
| source_layer | income_public_source_layer_features | outputs/source_layers/income_public_context/income_gate_public_context_v2/cell_features.csv | present 7,029 rows | Income-gate public welfare and affluence context source-layer features. |
| source_layer | osm_overture_source_layer_features | outputs/source_layers/osm_overture_ohsome_roads_pois/dated_extract_required/cell_features.csv | present 7,029 rows | Road, addressability, settlement, mapping coverage, and conversion source-layer features. |
| source_layer | connectivity_source_layer_features | outputs/source_layers/connectivity_execution_readiness/dated_opencellid_ookla_mlab_required/cell_features.csv | present 7,029 rows | Connectivity and execution-readiness source-layer features. |
| denominator | cell_denominator_foundation | outputs/denominator_foundation/cell_denominator_foundation.csv | present 7,029 rows | Current cell-level denominator controls. |
| denominator | city_summary | outputs/denominator_foundation/city_denominator_foundation_summary.csv | present 34 rows | City-level rollup for the stage that wrote it. |
| denominator | manifest | outputs/denominator_foundation/denominator_foundation_manifest.json | present json / 6.9 KB | Manifest for the denominator-foundation stage. |
| geohg | cell_features | outputs/geohg_features/cell_features_geohg_style.csv | present 7,029 rows | Primary independent feature table used by downstream diagnostics. |
| geohg | cell_labels | outputs/geohg_features/cell_labels_vendor_tam.csv | present 7,029 rows | Vendor TAM benchmark labels only; not a production training target. |
| geohg | area_area_edges | outputs/geohg_features/area_area_edges.csv | present 24,203 rows | GeoHG-style area graph edges. |
| geohg | entity_area_edges | outputs/geohg_features/entity_area_edges.csv | present 21,381 rows | Semantic entity-to-area edges. |
| geohg | poi_area_edges | outputs/geohg_features/poi_area_edges.csv | present 5,668 rows | POI/serviceability context edges. |
| geohg | spatial_block_predictions | outputs/geohg_features/geohg_spatial_block_predictions.csv | present 7,029 rows | Diagnostic spatial-block prediction artifact. |
| geohg | metrics | outputs/geohg_features/geohg_feature_metrics.json | present json / 1.4 KB | Feature bundle metrics and graph counts. |
| gap_closure | features | outputs/tam_gap_closure/tam_gap_closure_features.csv | present 7,029 rows | Deterministic TAM gap-closure probe features. |
| gap_closure | city_summary | outputs/tam_gap_closure/tam_gap_closure_city_summary.csv | present 34 rows | City-level rollup for the stage that wrote it. |
| gap_closure | benchmark_by_city | outputs/tam_gap_closure/tam_gap_closure_benchmark_by_city.csv | present 374 rows | City-level probe-vs-vendor benchmark breakdown. |
| gap_closure | benchmark_metrics | outputs/tam_gap_closure/tam_gap_closure_benchmark_metrics.json | present json / 2.8 KB | Overall probe-vs-vendor benchmark metrics. |
| gap_closure | source_probe_csv | outputs/source_registry/source_probe_summary.csv | present 21 rows | Source-probe readiness ledger used for production gating. |
| gap_closure | source_probe_json | outputs/source_registry/source_probe_summary.json | present json / 15.5 KB | JSON copy of source-probe readiness ledger. |
| transform | cell2_decision | outputs/cell2_metric_transform_experiments/cell2_best_mechanism_decision.json | present json / 1.5 KB | Selected transform policy and rejected alternatives. |
| transform | cell2_analysis_html | outputs/cell2_metric_transform_experiments/cell2_best_mechanism_analysis.html | present html / 8.4 KB | Standalone Cell-2 transform analysis report. |
| transform | cell2_holdout_summary | outputs/cell2_metric_transform_experiments/cell2_best_mechanism_holdout_summary.csv | present 12 rows | City and spatial-block transform holdout summary. |
| transform | cell2_holdouts | outputs/cell2_metric_transform_experiments/cell2_best_mechanism_holdouts.csv | present 240 rows | Detailed transform holdout rows. |
| transform | cell2_topk | outputs/cell2_metric_transform_experiments/cell2_best_mechanism_topk.csv | present 6 rows | Top-k G1 capture comparison for predicted power TAM and vendor TAM. |
| transform | cell2_all_grid_ranking | outputs/cell2_metric_transform_experiments/cell2_best_mechanism_all_grid_ranking.csv | present 6 rows | All-grid ranking evidence for transform diagnostics. |
| transform | main_solution_power_summary_json | outputs/main_solution_power_tam_summary.json | present json / 373 B | Compact headline checks for the power-transformed solution. |
| transform | main_solution_power_summary_csv | outputs/main_solution_power_tam_summary.csv | present 5 rows | CSV copy of the headline power-solution checks. |
| full_india | full_india_scores_csv | outputs/full_india_scored/full_india_tam_scores.csv | present 2,905,288 rows | Every full-India 0.01-degree cell scored with deterministic source-derived probes. |
| full_india | full_india_score_index | outputs/full_india_scored/full_india_tam_score_index.json | present json / 43.5 MB | Compact row-major score index for map hover and browser payload efficiency. |
| full_india | full_india_score_manifest | outputs/full_india_scored/full_india_tam_score_manifest.json | present json / 11.7 KB | Full-India scoring manifest, leakage policy, transform policy, and distribution metrics. |
| map | full_india_grid_manifest | outputs/tam_map/tam_full_india_0_01_grid_manifest.json | present json / 1.2 KB | Geometry-only full-India 0.01-degree grid manifest. |
| map | full_india_grid_index | outputs/tam_map/tam_full_india_0_01_grid_index.json | present json / 111.0 KB | Compact geometry index for all India grid cells. |
| map | full_india_map_manifest | outputs/tam_map/tam_grid_map_manifest.json | present json / 1.2 KB | Leaflet map manifest for full-India geometry and score layer wiring. |
| map | vendor_map_manifest | outputs/tam_map/tam_grid_map_manifest_vendor.json | present json / 7.5 KB | Leaflet map manifest for vendor-grid benchmark inspection. |
| map | vendor_map_geojson | outputs/tam_map/tam_grid_cells_vendor.geojson | present geojson / 21.5 MB | Vendor-grid GeoJSON with benchmark and predicted-layer fields. |
| map | map_html | docs/tam_grid_output_map.html | present html / 65.0 KB | Interactive Leaflet map document. |
| diagnostics | summary | outputs/statistical_diagnostics/statistical_diagnostics_summary.json | present json / 9.0 KB | Stage summary, pass/fail flags, leakage policy, and interpretation notes. |
| diagnostics | feature_profile | outputs/statistical_diagnostics/feature_profile.csv | present 223 rows | Feature distribution profile for diagnostics. |
| diagnostics | univariate_feature_signal | outputs/statistical_diagnostics/univariate_feature_signal.csv | present 223 rows | Single-feature signal scan against benchmark/probe targets. |
| diagnostics | city_distribution_shift | outputs/statistical_diagnostics/city_distribution_shift.csv | present 224 rows | City distribution shift checks. |
| diagnostics | feature_redundancy_pairs | outputs/statistical_diagnostics/feature_redundancy_pairs.csv | present 160 rows | Highly related feature-pair review list. |
| diagnostics | feature_vif | outputs/statistical_diagnostics/feature_vif.csv | present 30 rows | Multicollinearity review table. |
| diagnostics | spatial_permutation_importance | outputs/statistical_diagnostics/spatial_permutation_importance.csv | present 1 rows | Spatial permutation diagnostic output. |
| diagnostics | feature_action_plan | outputs/statistical_diagnostics/feature_action_plan.csv | present 223 rows | Feature diagnostic next-action list. |
| diagnostics | model_metric_comparison | outputs/statistical_diagnostics/model_metric_comparison.csv | present 17 rows | Model metric table retained as diagnostic only. |
| diagnostics | prediction_metric_comparison | outputs/statistical_diagnostics/prediction_metric_comparison.csv | present 1 rows | Prediction/probe metric comparison table. |
| diagnostics | prediction_residuals_by_city | outputs/statistical_diagnostics/prediction_residuals_by_city.csv | present 0 rows | City residual review table. |
| diagnostics | prediction_decile_calibration | outputs/statistical_diagnostics/prediction_decile_calibration.csv | present 0 rows | Decile calibration diagnostic table. |
| diagnostics | prediction_city_bootstrap_ci | outputs/statistical_diagnostics/prediction_city_bootstrap_ci.csv | present 0 rows | City bootstrap confidence interval table. |
| diagnostics | g1_holdout_candidate_diagnostics | outputs/statistical_diagnostics/g1_holdout_candidate_diagnostics.csv | present 2 rows | Post-hoc G1 diagnostic table; not used for tuning. |
| notebook | notebook_short_metric_summary_csv | outputs/notebook_short_metric_summary.csv | present 4 rows | Notebook-facing four-row metric summary using predicted power TAM. |
| notebook | notebook_short_metric_summary_json | outputs/notebook_short_metric_summary.json | present json / 1.2 KB | JSON copy of the notebook-facing metric summary. |
| notebook | notebook_short_metric_summary_citywise_csv | outputs/notebook_short_metric_summary_citywise.csv | present 136 rows | City-wise metric summary for notebook review. |
| notebook | notebook_short_metric_summary_citywise_json | outputs/notebook_short_metric_summary_citywise.json | present json / 46.4 KB | JSON copy of the city-wise metric summary. |
| notebook | posthoc_g1_summary | outputs/posthoc_g1_metric_suite/summary.json | present json / 28.8 KB | Structured post-hoc G1 metric suite summary. |
| notebook | posthoc_g1_overall | outputs/posthoc_g1_metric_suite/overall.csv | present 4 rows | Overall post-hoc G1 correlations. |
| notebook | posthoc_g1_topk | outputs/posthoc_g1_metric_suite/topk.csv | present 6 rows | Top-k post-hoc G1 capture table. |
| notebook | posthoc_g1_decile | outputs/posthoc_g1_metric_suite/decile.csv | present 20 rows | Decile post-hoc G1 calibration table. |
| notebook | posthoc_g1_city_aggregate | outputs/posthoc_g1_metric_suite/city_aggregate.csv | present 2 rows | City-aggregate post-hoc G1 summary. |
Source probe readiness
21 source-probe rows are tracked; 18 currently write candidate features and 0 are production-ready.
| gap | source | promotion | coverage | feature written | production ready |
|---|---|---|---|---|---|
| 1_household_denominator | census_pca_district_context_2011 | model_candidate_context | 99.2% | True | False |
| 1_household_denominator | worldpop_ascii_xyz_population_density_2020 | model_candidate_probe | 100.0% | True | False |
| 1_household_denominator | ghsl_population_builtup | model_candidate_physical_probe | 100.0% | True | False |
| 1_household_denominator | reconciled_population_household_probe_v2 | model_candidate_probe | 100.0% | True | False |
| 1_household_denominator | prs_eth_popcorn | model_candidate_probe_pending_grid_mapping | 24.0% | False | False |
| 2_residential_built_form | google_open_buildings_gobi_2023 | model_candidate_context | 9.3% | True | False |
| 2_residential_built_form | microsoft_global_buildings | model_candidate_context | 91.6% | True | False |
| 2_residential_built_form | combined_building_residential_density | model_candidate_context | 95.8% | True | False |
| 2_residential_built_form | building_height_vertical_density_proxy | model_candidate_probe | 100.0% | True | False |
| 2_residential_built_form | esa_worldcover_dynamic_world | model_candidate_context | 100.0% | True | False |
| 2_residential_built_form | residential_morphology_tags | proxy_candidate_pending_visual_qa | 100.0% | True | False |
| 3_income_affordability | nightlights_district_panel | model_candidate_context | 99.2% | True | False |
| 3_income_affordability | pigshell_hlpca_housing_amenity_assets_2011 | model_candidate_context | 99.2% | True | False |
| 3_income_affordability | income_public_context | proxy_candidate_public_context | 100.0% | True | False |
| 3_income_affordability | income_gate_public_sources | model_candidate_income_gate_probe | 100.0% | True | False |
| 3_income_affordability | mospi_hces_2023_24_public_report | blocked_tables_not_extracted | 0.0% | False | False |
| 4_serviceability | yashveer_police_education_pois | model_candidate_context | 100.0% | True | False |
| 4_serviceability | pmgsy_rural_roads_hr_up | partial_model_candidate_with_missing_flag | 59.6% | True | False |
| 4_serviceability | osm_overture_ohsome_roads_pois | model_candidate_context | 100.0% | True | False |
| 4_serviceability | opencellid_ookla_mlab_execution_readiness | model_candidate_context | 100.0% | True | False |
| 5_calibration | independent_component_calibration | probe_ready_not_production_calibrated | 100.0% | False | False |
Code map
| source file | lines | role |
|---|---|---|
scripts/pipeline.py | 148 | canonical |
scripts/get_data.py | 50 | stage/wrapper |
scripts/enrich_features.py | 53 | stage/wrapper |
scripts/prediction_diagnostics.py | 72 | stage/wrapper |
scripts/build_geohg_features.py | 77 | stage/wrapper |
scripts/build_source_layer_cell_features.py | 28 | stage/wrapper |
scripts/build_source_layer_contracts.py | 25 | stage/wrapper |
scripts/build_denominator_foundation.py | 28 | stage/wrapper |
scripts/build_tam_gap_closure_features.py | 28 | stage/wrapper |
scripts/build_full_india_tam_scores.py | 687 | stage/wrapper |
scripts/build_tam_grid_map_data.py | 48 | stage/wrapper |
scripts/build_notebook_short_metric_summary.py | 195 | stage/wrapper |
src/tam_pipeline/pipeline.py | 50 | canonical |
src/tam_pipeline/stages/get_data.py | 98 | stage/wrapper |
src/tam_pipeline/stages/enrich_features.py | 203 | stage/wrapper |
src/tam_pipeline/stages/prediction_diagnostics.py | 136 | stage/wrapper |
src/tam_pipeline/stages/model_training.py | 14 | stage/wrapper |
src/tam_pipeline/stages/common.py | 66 | stage/wrapper |
src/tam_geohg/graph_features.py | 27 | stage/wrapper |
src/tam_geohg/predicted_tam.py | 82 | stage/wrapper |
src/tam_geohg/map_inputs.py | 333 | stage/wrapper |
src/tam_geohg/map_metrics.py | 136 | stage/wrapper |
src/tam_geohg/map_export.py | 41 | stage/wrapper |
src/tam_geohg/map_manifest.py | 48 | stage/wrapper |
src/tam_geohg/map_geojson.py | 48 | stage/wrapper |
src/tam_geohg/full_india_grid_index.py | 157 | stage/wrapper |
src/tam_pipeline/payloads/scripts/build_source_layer_cell_features_payload.py | 1,422 | stage/wrapper |
src/tam_pipeline/payloads/scripts/build_source_layer_contracts_payload.py | 456 | stage/wrapper |
src/tam_pipeline/payloads/scripts/build_denominator_foundation_payload.py | 703 | stage/wrapper |
src/tam_pipeline/payloads/scripts/build_tam_gap_closure_features_payload.py | 1,714 | stage/wrapper |
src/tam_pipeline/payloads/tam_notebook_support/tam_notebook_support_payload.py | 2,983 | stage/wrapper |
Recent pipeline logs
| log | path | bytes |
|---|---|---|
20260602_183522_build_geohg_features.log | outputs/pipeline_logs/20260602_183522_build_geohg_features.log | 0 |
20260602_171809_build_tam_gap-closure_features.log | outputs/pipeline_logs/20260602_171809_build_tam_gap-closure_features.log | 26,120 |
20260602_171808_build_denominator_foundation.log | outputs/pipeline_logs/20260602_171808_build_denominator_foundation.log | 7,093 |
20260602_171806_build_source-layer_cell_features.log | outputs/pipeline_logs/20260602_171806_build_source-layer_cell_features.log | 2,281 |
20260602_170830_build_geohg_features.log | outputs/pipeline_logs/20260602_170830_build_geohg_features.log | 66,569 |
20260601_201950_build_tam_gap-closure_features.log | outputs/pipeline_logs/20260601_201950_build_tam_gap-closure_features.log | 24,341 |
20260601_201950_build_source-layer_cell_features.log | outputs/pipeline_logs/20260601_201950_build_source-layer_cell_features.log | 1,803 |
20260601_201949_build_denominator_foundation.log | outputs/pipeline_logs/20260601_201949_build_denominator_foundation.log | 4,443 |
Training Boundary
A real training workflow needs an approved prediction surface and defensible holdouts. Current artifacts do not establish production accuracy.
python3 scripts/pipeline.py prediction_diagnostics --root .Glossary Printout
The permanent glossary mirrors the floating Lingo panel for print and review.
| term | meaning |
|---|---|
Vendor TAM | Benchmark label used for comparison only; not a training target or feature. |
G1 holdout | Post-hoc business outcome diagnostic; not used for training, source selection, or tuning. |
Probe TAM | Formula-driven component output for audit, not production-calibrated TAM. |
Source layer | Versioned source-family artifact that writes grid-level candidate fields before denominator and gap-closure scoring. |
Denominator v3 | Current leakage-safe residential-household denominator probe with base/lower/upper fields and reconciliation status. |
Income gate | Public-source 0-10 LPA probability proxy using district/city/cell welfare and affluence context; MPCE calibration remains pending. |
Conversion feasibility | Public road, POI, addressability, settlement, and map-coverage proxy for whether a cell can be served. |
Execution readiness | Weak connectivity and operational-readiness proxy for acquirable TAM; still blocked on license, coverage, and internal ops data. |
Predicted TAM power | Current score layer: a monotone power transform of no-vendor gross TAM that preserves the base total. |
Power gamma | The fixed exponent applied to the source-derived gross TAM base; current value comes from the transform decision artifact. |
Rank ceiling | A diagnostic order-only transform used to understand upper-bound ranking behavior, not TAM magnitude. |
Full-India scored grid | The 0.01-degree national grid scored by deterministic source-derived formulas without vendor labels. |
Score index | Compact browser-oriented row-major JSON that stores map scores without repeating every property per cell. |
Spatial holdout | Validation split that blocks neighborhood memorization. |
City holdout | Whole-city transfer test for city confounding. |
Notebook metric summary | Four-row post-hoc table comparing predicted TAM, vendor TAM, and G1 for notebook reporting. |
Production claim | Blocked until valid non-GeoIQ predictions pass spatial, city, and chronological checks. |