diff --git a/reports/il_npa/index.qmd b/reports/il_npa/index.qmd
index f328d0c..4942924 100644
--- a/reports/il_npa/index.qmd
+++ b/reports/il_npa/index.qmd
@@ -76,6 +76,231 @@ If we consider 1.5 times the cost of PRP as the threshold for electrification, t
 
 ## Data and Methods
 
-## Assumptions
+### Overview
+
+To identify opportunities for cost-effective targeted electrification in Chicago's Peoples Gas service territory, we compared the **upfront capital costs** of two approaches to managing aging natural gas infrastructure:
+
+- **Pipeline Replacement Program (PRP)**: Replacing deteriorating gas mains with new gas pipes, the traditional utility approach to infrastructure management.
+- **Non-Pipe Alternative (NPA)**: Electrifying buildings (eliminating their gas service), upgrading the electric distribution grid to handle increased demand, and decommissioning the gas infrastructure.
+
+We conducted a **census block-level geospatial analysis** of Peoples Gas's planned construction projects in Chicago, focusing on fully residential blocks where electrification costs are well-established and building participation is more feasible. For each block, we calculated both PRP and NPA costs, then identified blocks where electrification would cost the same or less than pipeline replacement.
+
+Our measurement of **cost-effectiveness** is the ratio of NPA cost to PRP cost. We analyze upfront capital costs only, excluding ongoing operational costs, utility rate impacts, and financing mechanisms. This conservative approach focuses on infrastructure investment decisions facing the utility and ratepayers.
+
+::: {.callout-note}
+The cost assumptions underlying our analysis are open-source and viewable in [this Google Sheet](https://docs.google.com/spreadsheets/d/1xxa47dClvp0rosZhUP1R7790CNXMLSD_0ExrPccR3p0/edit), along with all data inputs and calculations. The geospatial data processing code is available in the `notebooks/` directory of this report.
+:::
+
+### Assumptions
+
+Our analysis assumes:
+
+- **Fully residential blocks only**, due to the variability and uncertainty in commercial and industrial electrification costs. We exclude blocks with commercial or industrial properties. A complete non-pipe alternative assessment would require detailed commercial building analysis.
+- **Street centerlines as a proxy for gas pipeline routes**. We use publicly available street centerline data from the Chicago Data Portal to estimate pipeline miles, as detailed gas infrastructure maps are not publicly available. Actual gas pipeline routes may deviate from street centerlines in some locations, but this provides a reasonable approximation at the block level.
+- **Upfront capital costs only**. We compare only the initial infrastructure investment costs, not ongoing operational expenses (utility bills, maintenance), carbon emissions, or ratepayer cost allocation mechanisms. A full lifecycle analysis would include these operational factors.
+- **Average electrification costs** for residential buildings: air-source heat pumps for space heating and cooling, heat pump water heaters, and induction stoves. Actual costs vary significantly by building characteristics (age, size, existing electrical service, heating system configuration), but we use averages that may over- or under-estimate costs for specific buildings.
+- **Grid upgrade costs** based on the Long Run Marginal Cost (LRMC) of electric distribution capacity, calculated from the incremental peak demand increase from electrification. This captures long-term costs of expanding distribution infrastructure.
+- **Pipeline decommissioning costs** for retiring gas mains in place, estimated per mile of main retired. Excavation and removal costs would be higher if required by local regulations.
+- **Census block as the unit of analysis**. Blocks are the smallest geographic unit that allows us to aggregate parcel-level data while maintaining spatial relationships with pipeline infrastructure.
+- **All residents would participate** in a coordinated electrification program. In practice, participation rates would depend on program design, incentive levels, and customer preferences, requiring careful program design and potentially phased implementation.
+- **Programmatic filtering with manual verification** to identify fully residential blocks. Our method uses automated parcel classification with manual verification using satellite imagery for edge cases. This approach is replicable but could be refined with additional building data.
+- **Current cost estimates** for all equipment and infrastructure, without adjusting for potential future cost changes from technology improvements, inflation, or economies of scale.
+
+### Data Sources
+
+This analysis integrates multiple geospatial and administrative datasets:
+
+**Peoples Gas Construction Polygons**
+Planned pipeline replacement project areas provided by Peoples Gas, with construction timelines and project status. We analyze only projects with "planned" status.
+
+**Census Block Boundaries**
+Chicago census block boundaries from the 2010 Census (GEOID10), the finest-grain geographic unit published by the Census Bureau. Blocks serve as our primary unit of analysis for aggregating parcel and infrastructure data.
+
+**Cook County Parcel Data**
+Property parcel boundaries and assessor classifications for all parcels in Cook County.[^parcels] We use assessor classification codes to categorize parcels as residential (single-family or multi-family), commercial, industrial, mixed-use, or vacant.
+
+[^parcels]: Cook County Assessor's Office, accessed via Cook County Data Portal
+
+**Chicago Building Footprints**
+Building polygon geometries from the Chicago Data Portal, with unit counts where available from building permit records.[^buildings] We match buildings to parcels to estimate residential unit counts, particularly for multi-family buildings.
+
+[^buildings]: [Chicago Building Footprints](https://data.cityofchicago.org/Buildings/Building-Footprints-current-/hz9b-7nh8)
+
+**Chicago Street Centerlines**
+Street centerline geometries from the Chicago Data Portal, representing the center of street rights-of-way where gas mains typically run.[^streets] We use street lengths as a proxy for gas pipeline miles.
+
+[^streets]: [Chicago Street Center Lines](https://data.cityofchicago.org/Transportation/Street-Center-Lines/6imu-meau)
+
+**Cost Estimates**
+Cost assumptions for pipeline replacement, electrification equipment, grid upgrades, and pipeline decommissioning, compiled from utility rate cases and prior studies.[^costs] All cost parameters are documented in a public spreadsheet.
+
+[^costs]: [Cost Assumptions Spreadsheet](https://docs.google.com/spreadsheets/d/1xxa47dClvp0rosZhUP1R7790CNXMLSD_0ExrPccR3p0/edit)
+
+### Geospatial Processing Methodology
+
+Our geospatial analysis assigns parcels and street infrastructure to census blocks within construction project areas.
+
+#### Identifying Blocks in Construction Areas
+
+We intersected Peoples Gas construction polygons with Chicago census block boundaries to identify all blocks that would be affected by planned pipeline replacement projects. Since construction polygons are irregular and may cover only portions of blocks, we created both:
+
+- **Clipped block segments**: The portion of each block that falls within a construction polygon, used for calculating street mile allocation.
+- **Full block geometries**: Complete census block boundaries, used for assigning parcels and reporting results.
+
+Some blocks intersect multiple construction polygons. In these cases, we aggregated construction dates and infrastructure data across all relevant projects.
+
+#### Parcel Identification and Assignment
+
+To identify parcels affected by pipeline replacement projects, we applied an **8-meter buffer** around construction polygons. This buffer captures parcels adjacent to streets where pipeline work will occur, even when the construction polygon only covers the street right-of-way. The buffer is used exclusively for parcel identification; all other spatial calculations use unbuffered construction polygons.
+
+We assigned each parcel to exactly one census block using a **largest-overlap method**: for each parcel, we calculated its area of overlap with all intersecting blocks (using full, unclipped block boundaries) and assigned the parcel to the block with which it had the greatest overlap. This one-to-one assignment ensures each parcel is counted only once in our analysis.
+
+#### Parcel Classification and Unit Counting
+
+We classified each parcel using Cook County assessor property class codes:
+
+- **Single-family residential**: Detached homes, townhomes, and other single-unit dwellings
+- **Multi-family residential**: Buildings with 2+ dwelling units, including small apartment buildings and large complexes
+- **Mixed-use**: Properties with both residential and commercial uses
+- **Commercial**: Retail, office, and service properties
+- **Industrial**: Manufacturing, warehousing, and industrial facilities
+- **Vacant**: Unimproved land
+
+For residential parcels, we estimated unit counts by spatially matching parcels to building footprints and using unit counts from building permit records where available. For parcels without building matches, we used assessor records to estimate unit counts based on property classification.
+
+Mixed-use parcels were treated as multi-family residential in our unit counts, as they contain dwelling units and would need to be addressed in any targeted electrification program.
+
+#### Street Miles Allocation
+
+Gas pipelines typically run along street centerlines. To estimate pipeline miles for each block, we:
+
+1. **Clipped street centerlines** to construction polygon boundaries, retaining only street segments within planned construction areas
+2. **Calculated total street miles** within each construction polygon (using the unbuffered polygons)
+3. **Allocated street miles to blocks proportionally** by each block's perimeter within the construction polygon
+
+The proportional allocation uses each clipped block segment's perimeter as a fraction of the total block perimeter across all blocks in that construction polygon. This approach ensures that all street miles within a construction polygon are allocated to adjacent blocks, and that blocks with more street frontage receive proportionally more pipeline miles.
+
+#### Residential Block Filtering
+
+We filtered to **fully residential blocks** using several criteria:
+
+- Blocks where residential parcels comprise >90% of non-vacant parcels
+- Manual exclusion of blocks with institutional uses (schools, churches) that don't appear in commercial/industrial classifications
+- Exclusion of blocks with ≤2 parcels, which are typically not residential.
+- Exclusion of blocks where all parcels have railroad ("RR") assessor classification
+- Manual verification of edge cases using satellite imagery
+
+This filtering process identified `r comma(n_res_blocks)` fully residential blocks out of `r comma(n_blocks_total)` total blocks in construction areas. The conservative filtering ensures that our analysis focuses on blocks where coordinated residential electrification would be most feasible.
+
+### Cost Calculations
+
+For each residential block, we calculated both PRP and NPA costs using the following methods:
+
+#### Pipeline Replacement Program (PRP) Costs
+
+PRP cost represents the capital investment to replace gas mains:
+
+$$\text{PRP Cost} = \text{Street Miles} \times \text{Cost per Mile of Pipeline}$$
+
+We use a cost per mile of $`r dollar(cost_lpp_mile, accuracy = 1, scale = 1, suffix = "")`, based on Peoples Gas's TK REFERENCE This includes materials, labor, street restoration, and contractor overhead, but excludes utility administrative costs and return on equity.
+
+
+
+#### Non-Pipe Alternative (NPA) Costs
+
+NPA cost represents the total capital investment to electrify all residential units on a block and decommission the gas infrastructure:
+
+$$\text{NPA Cost} = \text{SF Electrification} + \text{MF Electrification} + \text{Grid Upgrades} + \text{Decommissioning}$$
+
+**Single-family electrification cost:**
+$$\text{SF Electrification} = \text{Number of SF Parcels} \times \text{Cost per SF Home}$$
+
+We use $`r dollar(cost_elec_sf, accuracy = 1, scale = 1, suffix = "")` per single-family home, which includes:
+- Air-source heat pump for space heating and cooling (typically 2-4 tons capacity)
+- Heat pump water heater (50-80 gallon capacity)
+- Induction cooktop or range
+- Installation labor and soft costs
+- Clothes dryer
+
+**Multi-family electrification cost:**
+$$\text{MF Electrification} = \text{Number of MF Units} \times \text{Cost per MF Unit}$$
+
+We use $`r dollar(cost_elec_mf, accuracy = 1, scale = 1, suffix = "")` per multi-family unit, which reflects lower per-unit costs in larger buildings due to shared infrastructure and economies of scale in multi-unit installations.
+
+**Grid upgrade cost:**
+$$\text{Grid Upgrades} = \text{Total Residential Units} \times \text{LRMC per Household}$$
+
+Electrification increases peak electricity demand, particularly during cold winter days when heat pumps draw maximum power. We calculate grid upgrade costs using the **Long Run Marginal Cost (LRMC)** of electric distribution capacity:
+
+$$\text{LRMC per Household} = \Delta \text{Peak Demand (kW)} \times \text{LRMC per kW}$$
+
+Where:
+- Peak demand increase = `r peak_kw_winter` kW (winter) - `r peak_kw_summer` kW (summer) = `r peak_kw_delta` kW
+- LRMC = $`r dollar(lrmc_peak_kw, accuracy = 1, scale = 1, suffix = "")` per kW, based on historical ComEd distribution capacity costs
+- Grid upgrade cost per household = $`r dollar(grid_upgrade_cost_hh, accuracy = 1, scale = 1, suffix = "")`
+
+This LRMC approach captures the long-term cost of expanding distribution infrastructure to serve increased electric loads.
+
+**Pipeline decommissioning cost:**
+$$\text{Decommissioning} = \text{Street Miles} \times \text{Cost per Mile to Retire}$$
+
+We use $`r dollar(cost_decomm_mile, accuracy = 1, scale = 1, suffix = "")` per mile to retire gas mains in place. TK REFERENCE
+
+#### Cost-Effectiveness Ratio
+
+For each residential block, we calculate:
+
+$$\text{Cost-Effectiveness Ratio} = \frac{\text{NPA Cost}}{\text{PRP Cost}}$$
+
+- **Ratio < 1.0**: Electrification is cheaper than pipeline replacement
+- **Ratio = 1.0**: Costs are equal
+- **Ratio > 1.0**: Pipeline replacement is cheaper than electrification
+
+#### Scattershot Electrification Modeling
+
+To create a more realistic counterfactual for the pipeline replacement scenario, we model **scattershot electrification**: the gradual, uncoordinated transition to electric heating that would occur even if pipelines are replaced. Whether driven by increasing gas costs, equipment failures, climate policies, or consumer preference, a portion of homes will electrify over time regardless of gas infrastructure investments.
+
+This creates an **unmanaged gas transition** where ratepayer capital is spent on both the gas system (pipeline replacement) and electrification, with no cost optimization or coordination. Modeling scattershot electrification provides a more accurate representation of the true cost of continuing to invest in gas infrastructure.
+
+We base our scattershot assumptions on Chicago's Climate Action Plan, TK REFERENCE which targets significant residential electrification by 2035. For each block, we assume that **`r percent(PCT_ELEC)` of residential units** will electrify over a **`r TIME_PERIOD`-year period**, distributed evenly across years.
+
+**Annual scattershot electrification cost:**
+$$\text{Scattershot Annual Cost} = \frac{\text{PCT\_ELEC} \times \text{NPA Cost}}{\text{TIME\_PERIOD}}$$
+
+This represents the annual cost of electrifying `r percent(PCT_ELEC / TIME_PERIOD)` of homes each year, including equipment installation and grid upgrades (but not decommissioning, since the gas infrastructure remains in operation).
+
+**Net present value of scattershot costs:**
+$$\text{Scattershot NPV} = \sum_{t=0}^{TIME\_PERIOD-1} \frac{\text{Scattershot Annual Cost}}{(1 + \text{DISCOUNT\_RATE})^t}$$
+
+Where t starts at 0 to match the timing of PRP costs (years 0-9). We use a **`r percent(DISCOUNT_RATE)` discount rate** to calculate the present value of future electrification expenditures, reflecting the time value of money and opportunity cost of capital.
+
+**PRP cost with scattershot electrification:**
+$$\text{PRP Cost (with scattershot)} = \text{PRP Cost} + \text{Scattershot NPV}$$
+
+This combined cost represents the total infrastructure investment under a "business as usual" scenario: replacing pipelines upfront while paying for gradual electrification over the next decade.
+
+**Cost-effectiveness ratio with scattershot:**
+$$\text{Cost-Effectiveness Ratio (scattershot)} = \frac{\text{NPA Cost}}{\text{PRP Cost (with scattershot)}}$$
+
+Comparing NPA to PRP-with-scattershot shows whether coordinated upfront electrification is more cost-effective than an unmanaged transition with duplicative infrastructure investments.
+
+### Cost-Effectiveness Evaluation Scenarios
+
+We evaluate the economic case for targeted electrification under three different scenarios:
+
+**Scenario 1: Strict Block-Level Cost-Effectiveness (Ratio ≤ 1.0)**
+Electrify only blocks where NPA cost is less than or equal to PRP cost. This conservative approach ensures immediate cost savings on every block and avoids any subsidization across blocks. However, it may miss opportunities to maximize overall system cost savings by using savings from low-cost blocks to enable electrification of moderate-cost blocks.
+
+**Scenario 2: Portfolio Cost-Neutral Approach (Cumulative Savings ≥ 0)**
+Maximize the number of electrified blocks while maintaining overall cost neutrality across the portfolio. We rank blocks by cost-effectiveness ratio and electrify them in order of increasing ratio, stopping when cumulative net savings reach zero. This approach allows low-cost blocks to "subsidize" higher-cost blocks, maximizing electrification within a fixed budget constraint equal to total PRP spending.
+
+**Scenario 3: Avoiding Future Scattershot Electrification**
+Use the PRP-with-scattershot cost TK SECTION as the comparison baseline. This scenario recognizes that pipeline replacement does not prevent future electrification—it merely makes it uncoordinated and more expensive. By comparing NPA to PRP-with-scattershot, we identify blocks where coordinated electrification avoids duplicative infrastructure investments.
+
+Under this scenario, the comparison is:
+- **With NPA**: Coordinated upfront electrification + grid upgrades + decommissioning
+- **Without NPA (PRP-with-scattershot)**: Pipeline replacement + NPV of `r percent(PCT_ELEC)` gradual electrification over `r TIME_PERIOD` years + ongoing grid upgrades
+
+This approach accounts for Chicago's Climate Action Plan electrification goals and provides a more realistic assessment of the true cost of continued gas infrastructure investment in a decarbonizing economy.
 
 ## References
diff --git a/reports/il_npa/notebooks/analysis.qmd b/reports/il_npa/notebooks/analysis.qmd
index 1b3ce75..a6b40fb 100644
--- a/reports/il_npa/notebooks/analysis.qmd
+++ b/reports/il_npa/notebooks/analysis.qmd
@@ -16,6 +16,65 @@ format:
     page-layout: full
 ---
 
+# Overview
+
+This analysis compares the cost of two approaches to managing aging natural gas infrastructure in Chicago:
+
+1. **Pipeline Replacement Program (PRP)**: Traditional approach of replacing old pipes with new gas pipes
+2. **Non-Pipe Alternative (NPA)**: Electrifying buildings and decommissioning gas infrastructure
+
+We analyze data from Peoples Gas construction projects to identify census blocks where electrification would cost the same or less than pipeline replacement. This analysis focuses on fully residential blocks due to lack of data on the variable costs of electrifying commercial and industrial properties. Our measurement of cost effectiveness is the ratio of the cost of electrification to the cost of pipeline replacement. We only consider upfront costs, including pipeline replacement or decommissioning, electrification, and electirc grid upgrades. We do not include the ongoing costs of operating the electrified buildings or rate payer impacts.
+
+We consider 3 measures of cost effectiveness:
+1. For each block, electrification should be strictly cheaper than pipeline replacement.
+Under this measure we would only electrify blocks that are fully residential and have a cost ratio of less than 1.0. Meaning electrification is strictly cheaper than pipeline replacement.
+
+2. Across the portfolio of fully residential blocks, targeted electrification investments should cost no more than pipe replacement.
+Under this measure, every block that is electrified with a cost ratio of less than 1.0 should be electrified. After that, we would electrify as many blocks as possible up to the point where the net savings from electrification is 0. Imagine a universe of 10 blocks. 5 can be electrified for a total cost $1,000 cheaper than pipeline replacement. If the 6th block cost $1,000 more to electrify, we would apply our savings from the first 5 blocks to the 6th block. and the remaining 4 blocks would have their pipeline replaced.
+
+3. Electrification should cost no more than the sum pipeline replacement and scattershot electrification.
+Given the climate goals of IL and Chicago, we expect electrification to occur regardless of NPA programs. If pipelines are replaced, and some percentage of the affected homes electrify anyways. This is the less economically efficient option because investments are made in both electrification and pipeline replacement when targeted electrification could have avoided the pipleline spending altogether. By this measure, we would include the future costs of scattershot electrification and pipeline replacement in the PRP cost calculation.
+
+
+## Data Sources
+
+This analysis uses:
+- Peoples Gas planned construction project data
+- Cook County property records (parcel data and building characteristics)
+- Chicago building footprint data
+- Chicago neighborhood boundaries
+- Cost estimates for electrification and pipeline replacement
+
+## Methodology Overview
+
+The analysis follows these steps:
+
+1. **Load and clean data**: Import geospatial data on blocks, buildings, and parcels
+2. **Calculate costs**: Estimate costs for both pipeline replacement and electrification for each block
+3. **Filter to residential blocks**: Identify blocks that are fully or predominantly residential
+4. **Compare scenarios**: Calculate cost ratios and identify blocks where electrification is cheaper
+5. **Analyze patterns**: Examine how housing density and building types affect cost comparisons
+
+Key terms:
+- PRP: Pipeline Replacement Program
+- NPA: Non-Pipe Alternative
+- NPV: Net Present Value (accounting for the time value of money)
+- LRMC: Long Run Marginal Cost
+
+# Limitations and Assumptions
+
+This analysis makes several simplifying assumptions:
+
+- Electrification costs are based on average estimates and will vary by building
+- We assume all residents would be willing participants in electrification
+- Grid upgrade costs use historical averages and may change with new technology
+- The analysis doesn't account for financing mechanisms or incentive programs
+- Commercial and industrial properties are excluded due to cost uncertainty
+
+# Setup
+
+We begin by loading required R packages and setting file paths. We use the outputs from the reports/il_npa/notebooks/geo_data_cleaning.qmd notebook to load the block level data.
+
 ```{r}
 #| echo: false
 library(sf)
@@ -37,6 +96,12 @@ source("/workspaces/reports2/lib/ggplot/switchbox_theme.R")
 # options(dplyr.print_max = Inf)
 ```
 
+# Data Loading
+
+## Block-Level Summary Data
+
+We load pre-processed data that summarizes each census block's characteristics, including the number of residential units, commercial properties, and miles of gas pipeline.
+
 ```{r}
 # Load summary data based on aggregation level
 geojson_files <- list.files(outputs_dir,
@@ -63,9 +128,27 @@ cat("Loaded:", basename(latest_geojson), "\n")
 cat("  ", nrow(pg_summary), "block segments\n")
 cat("  Unique city blocks:", n_distinct(pg_summary$geoid10), "\n")
 
-unit_label <- "blocks"
 ```
 
+
+```{r}
+#manually correct 2 blocks that are coded incorrectly in the parcel data
+pg_summary <- pg_summary |>
+  mutate(
+    sf_parcels = case_when(
+      geoid10 == "170311602004001" ~ 9,
+      geoid10 == "170311602003005" ~ 6,
+      .default = sf_parcels
+    ),
+    total_residential_units = case_when(
+      geoid10 == "170311602004001" ~ 9,
+      geoid10 == "170311602003005" ~ 6,
+      .default = total_residential_units
+    )
+  )
+```
+
+
 ```{r}
 # Load neighborhoods GeoJSON from S3
 s3_uri <- "s3://data.sb.east/gis/pgl/Neighborhoods_2012b_20251215.geojson"
@@ -85,6 +168,16 @@ cat("  ", nrow(neighborhoods), "neighborhoods\n")
 cat("  CRS:", st_crs(neighborhoods)$input, "\n")
 ```
 
+## Cost Assumptions
+
+The analysis uses cost estimates for:
+- Pipeline replacement per mile
+- Single-family home electrification (heat pump, water heater, stove)
+- Multi-family unit electrification
+- Long-run Marginal Cost of electric grid distribution upgrades per household
+- Pipeline decommissioning per mile
+
+
 ```{r}
 # Read from the first sheet of the Google Sheet
 gsheet_url <- "https://docs.google.com/spreadsheets/d/1xxa47dClvp0rosZhUP1R7790CNXMLSD_0ExrPccR3p0/export?format=csv&gid=0"
@@ -94,22 +187,6 @@ prp_df <- read_csv(gsheet_url)
 prp_df
 ```
 
-```{r}
-#manually correct 2 blocks that are coded incorrectly in the parcel data
-pg_summary <- pg_summary |>
-  mutate(
-    sf_parcels = case_when(
-      geoid10 == "170311602004001" ~ 9,
-      geoid10 == "170311602003005" ~ 6,
-      .default = sf_parcels
-    ),
-    total_residential_units = case_when(
-      geoid10 == "170311602004001" ~ 9,
-      geoid10 == "170311602003005" ~ 6,
-      .default = total_residential_units
-    )
-  )
-```
 
 ```{r}
 # cost estimates
@@ -126,17 +203,37 @@ lrmc_peak_kw <- prp_df |> filter(variable == "LMRC_historical") |> pull(value)
 grid_upgrade_cost_hh <- lrmc_peak_kw * peak_kw_delta
 
 # Scattershot electrification constants
-PCT_ELEC <- 0.30  # 30% electrification rate (Chicago Climate Action Plan goal)
-DISCOUNT_RATE <- 0.05  # 5% discount rate for NPV
-TIME_PERIOD <- 10  # 10-year timeline
+PCT_ELEC <- 0.30 # 30% electrification rate (Chicago Climate Action Plan goal)
+DISCOUNT_RATE <- 0.05 # 5% discount rate for NPV
+TIME_PERIOD <- 10 # 10-year timeline
 
-# Residential threshold - some blocks have no commerical or industrial but they do have churches, schools, etc. This is a rough threshold to identify fully residential blocks. If the sum of residential parcels divided by the non-vacant parcel count is greater than this threshold, we consider the block fully residential.
+# Residential threshold - some blocks have no commerical or industrial but they do have churches, schools, etc which we cannot identify programatically at this time. RES_THRESHOLD is a rough threshold to identify fully residential blocks. If the sum of residential parcels divided by the non-vacant parcel count is greater than this threshold, we consider the block fully residential.
 RES_THRESHOLD <- 0.9
 ```
 
+# Cost Calculations
+
+For each city block, we calculate:
+
+1. **PRP cost**: Miles of pipeline × cost per mile
+2. **NPA cost**:
+   - Electrification equipment for each home
+   - Grid upgrades to handle increased electricity demand
+   - Pipeline decommissioning costs
+
+We also test a "scattershot" scenario that assumes 30% of homes electrify on their own over 10 years while gas infrastructure remains in place.
+
+## Identifying Residential Blocks
+
+We filter to blocks that are fully residential because:
+- Residential electrification costs are well-established
+- Commercial/industrial needs are more complex and variable
+
+The filter identifies blocks where 90%+ of properties are residential, with manual adjustments based on visual inspection of satelite imagery.
+
 ```{r}
 #| echo: false
-cost_elec_comm <- 50000
+cost_elec_comm <- 0 # placeholder for commercial electrification. We did not analyze commercial electrification in this analysis.
 
 pg_summary <- pg_summary |>
   mutate(
@@ -180,7 +277,8 @@ pg_summary <- pg_summary |>
     )
   )
 
-#visual inspection --> add back in some blocks that are confirmed residential but dont conform to the programmatic filters
+#visual inspection --> add back in some blocks that are confirmed residential but dont conform to the programmatic filters. The assessor code "EX" is applied to a wide parcels. Some are schools or churches. These blocks should NOT be considered fully residential. Others are easements, parks, or vacant lots. These we want to retain.
+
 special_cases <- c(
   "170315609005003",
   "170315609003000",
@@ -268,6 +366,8 @@ pg_summary <- pg_summary |>
 ```{r}
 # Calculate scattershot electrification costs (unmanaged gas transition scenario)
 # Assumes PCT_ELEC of residential units electrify over TIME_PERIOD years
+# We are agnostic to the timing here. Each block assumes the PRP occurs in year 0 and scattershot occurs in years 0-9.
+
 pg_summary <- pg_summary |>
   mutate(
     # Annual cost of scattershot electrification (evenly distributed)
@@ -292,7 +392,7 @@ pg_summary <- pg_summary |>
       npa_over_prp_ss <= 1 ~ "50-100 %",
       npa_over_prp_ss <= 1.5 ~ "100-150 %",
       npa_over_prp_ss <= 2 ~ "150-200 %",
-      TRUE ~ "> 200 %"
+      .default ~ "> 200 %"
     ),
 
     comp_cost_discrete_ss = if_else(
@@ -414,7 +514,9 @@ n_miles_1_5_ss <- sum(res_1_5_ss$street_miles, na.rm = TRUE)
 ```
 
 ```{r}
-res_only |>
+
+# percentage of fully residential blocks in each cost bin
+pct_blocks_in_cost_bins <- res_only |>
 data.frame() |>
 summarise(.by = comp_cost_discrete,
 n = n(),
@@ -425,6 +527,7 @@ arrange(comp_cost_discrete)
 
 
 ```{r}
+# Color scale for the cost comparison categories
 COMP_COST_COLS <-c("< 50 %" = "#1b8813",
       "50-100 %"= "#9ec23c",
       "100-150 %" = "#d3bc14",
@@ -435,6 +538,8 @@ COMP_COST_COLS <-c("< 50 %" = "#1b8813",
 
 # Neighborhoods Deeper Dive
 
+We want to report summary statistics by neighborhood for a select set of neighborhoods.
+
 ```{r}
 target_areas <- c(
   "Englewood",
@@ -474,7 +579,9 @@ pct_res_cheaper_ss = (n_res_cheaper_ss / n_res)*100,
 ```
 
 
-# Plotting
+# Visualizations
+
+The following charts illustrate the cost comparison findings and identify patterns in the data.
 
 ```{r}
 #| label: fig-histogram-npa-over-prp
@@ -1447,307 +1554,6 @@ ggplot() +
 ```
 
 
-```{r}
-plot_pg_polygons_map <- function(pg_summary, fill_col, discrete = FALSE,
-                                  cmap = "viridis", legend = TRUE,
-                                  color_dict = NULL, title = NULL) {
-  pg_map <- pg_summary |>
-    filter(!is.na(geometry)) |>
-    st_transform(4326)
-
-  bounds <- st_bbox(pg_map)
-  center_lat <- mean(c(bounds[2], bounds[4]))
-  center_lon <- mean(c(bounds[1], bounds[3]))
-
-  m <- leaflet(pg_map) |>
-    addProviderTiles("CartoDB.Positron") |>
-    setView(lng = center_lon, lat = center_lat, zoom = 11)
-
-  if (!is.null(title)) {
-    m <- m |>
-      addControl(title, position = "topleft",
-                 className = "map-title")
-  }
-
-  fill_vals <- pg_map[[fill_col]]
-  popup_cols <- c('street_miles', 'sf_parcels', 'mf_units',
-                  'commercial_parcels', 'industrial_parcels',
-                  'prp_cost', 'elec_cost_total')
-
-  popup_text <- sapply(1:nrow(pg_map), function(i) {
-    val <- fill_vals[i]
-    if (is.na(val)) return("")
-
-    txt <- paste0("<b>", fill_col, ":</b> ",
-                  if (discrete) val else round(val, 2), "<br>")
-    for (col in popup_cols) {
-      if (col %in% names(pg_map)) {
-        col_val <- pg_map[[col]][i]
-        if (!is.na(col_val)) {
-          txt <- paste0(txt, "<b>", col, ":</b> ",
-                       if (is.numeric(col_val)) round(col_val, 2) else col_val,
-                       "<br>")
-        }
-      }
-    }
-    txt
-  })
-
-  if (discrete) {
-    categories <- sort(unique(fill_vals[!is.na(fill_vals)]))
-
-    if (!is.null(color_dict)) {
-      pal <- colorFactor(palette = color_dict, domain = categories)
-    } else {
-      pal <- colorFactor(palette = "Set3", domain = categories)
-    }
-
-    m <- m |>
-      addPolygons(
-        fillColor = ~pal(fill_vals),
-        fillOpacity = 0.6,
-        color = "black",
-        weight = 1,
-        popup = popup_text
-      )
-
-    if (legend) {
-      m <- m |> addLegend(pal = pal, values = categories, title = fill_col)
-    }
-  } else {
-    vals <- fill_vals[!is.na(fill_vals)]
-    vmin <- min(vals)
-    vmax <- max(vals)
-
-    pal <- colorNumeric(palette = cmap, domain = c(vmin, vmax))
-
-    m <- m |>
-      addPolygons(
-        fillColor = ~pal(fill_vals),
-        fillOpacity = 0.6,
-        color = "black",
-        weight = 1,
-        popup = popup_text
-      )
-
-    if (legend) {
-      m <- m |> addLegend(pal = pal, values = vals, title = fill_col)
-    }
-  }
-
-  m
-}
-```
-
-
-
-
-# EDA
-```{r}
-valid_data <- pg_summary |>
-  filter(is.finite(npa_over_prp)) |>
-  pull(npa_over_prp)
-
-ggplot(data.frame(x = res_only$npa_over_prp), aes(x = x)) +
-  geom_histogram(bins = 30, fill = 'skyblue', color = 'black') +
-  labs(x = 'npa_over_prp', y = 'Count',
-       title = 'Histogram of npa_over_prp (finite values only)') +
-  theme_minimal() +
-  theme(panel.grid = element_line())
-```
-
-```{r}
-ggplot(pg_summary, aes(x = comm_ind_total_count)) +
-  geom_histogram(bins = 30, fill = 'orange', color = 'black') +
-  labs(x = 'comm_ind_total_count', y = 'Count',
-       title = 'Histogram of comm_ind_total_count') +
-  theme_minimal() +
-  theme(panel.grid = element_line())
-```
-
-
-```{r}
-ggplot(pg_summary, aes(x = pct_residential)) +
-  geom_histogram(bins = 100, fill = 'green', color = 'black') +
-  labs(x = 'pct_residential', y = 'Count',
-       title = 'Histogram of pct_residential') +
-  theme_minimal() +
-  theme(panel.grid = element_line())
-```
-
-
-
-```{r}
-ggplot(pg_summary, aes(x = street_miles)) +
-  geom_histogram(bins = 100, fill = 'blue', color = 'black') +
-  labs(x = 'street_miles', y = 'Count',
-       title = 'Histogram of street_miles') +
-  theme_minimal() +
-  theme(panel.grid = element_line())
-```
-
-## Diagnostic Histograms
-
-```{r}
-# Calculate block perimeter and area
-pg_summary <- pg_summary |>
-  st_transform(32616) |>
-  mutate(
-    block_perimeter_miles = as.numeric(st_length(geometry)) / 1609.34,
-    block_area_sq_km = as.numeric(st_area(geometry)) / 1e6,
-    perimeter_to_street_miles_ratio = street_miles / block_perimeter_miles,
-    parcels_per_sq_km = non_vacant_parcels / block_area_sq_km
-  ) |>
-  st_transform(4326)  # Transform back to WGS84
-
-cat("Sample diagnostic values (first 5 rows):\n")
-pg_summary |>
-  select(block_perimeter_miles, street_miles, perimeter_to_street_miles_ratio,
-         block_area_sq_km, parcels_per_sq_km) |>
-  head() |>
-  print()
-
-cat("\nBlock perimeter stats:",
-    sprintf("min=%.4f, max=%.4f, median=%.4f miles\n",
-            min(pg_summary$block_perimeter_miles, na.rm = TRUE),
-            max(pg_summary$block_perimeter_miles, na.rm = TRUE),
-            median(pg_summary$block_perimeter_miles, na.rm = TRUE)))
-cat("Street miles stats:",
-    sprintf("min=%.6f, max=%.4f, median=%.6f miles\n",
-            min(pg_summary$street_miles, na.rm = TRUE),
-            max(pg_summary$street_miles, na.rm = TRUE),
-            median(pg_summary$street_miles, na.rm = TRUE)))
-
-valid_perimeter_ratio <- pg_summary |>
-  filter(is.finite(perimeter_to_street_miles_ratio), street_miles > 0) |>
-  pull(perimeter_to_street_miles_ratio)
-
-valid_parcels_density <- pg_summary |>
-  filter(is.finite(parcels_per_sq_km), block_area_sq_km > 0) |>
-  pull(parcels_per_sq_km)
-```
-
-```{r}
-# Histogram: Ratio of street miles to block perimeter - split into 3 ranges
-ratio_le_1 <- valid_perimeter_ratio[valid_perimeter_ratio <= 1]
-ratio_1_to_2 <- valid_perimeter_ratio[valid_perimeter_ratio > 1 & valid_perimeter_ratio <= 2]
-ratio_gt_2 <- valid_perimeter_ratio[valid_perimeter_ratio > 2]
-
-# Plot 1: Ratio <= 1
-if (length(ratio_le_1) > 0) {
-  ggplot(data.frame(x = ratio_le_1), aes(x = x)) +
-    geom_histogram(bins = 30, fill = 'steelblue', color = 'black', alpha = 0.7) +
-    geom_vline(xintercept = median(ratio_le_1), color = 'red', linetype = 'dashed', linewidth = 1) +
-    labs(x = 'Street Miles to Block Perimeter Ratio', y = 'Count',
-         title = 'Histogram of Street Miles to Block Perimeter Ratio (Ratio ≤ 1)') +
-    theme_minimal() +
-    theme(panel.grid = element_line(color = "gray90"))
-
-  cat("Summary statistics for ratio ≤ 1:\n")
-  cat("  Count:", length(ratio_le_1), "\n")
-  cat("  Mean:", round(mean(ratio_le_1), 2), "\n")
-  cat("  Median:", round(median(ratio_le_1), 2), "\n")
-  cat("  Std Dev:", round(sd(ratio_le_1), 2), "\n")
-  cat("  Min:", round(min(ratio_le_1), 2), "\n")
-  cat("  Max:", round(max(ratio_le_1), 2), "\n")
-}
-
-# Plot 2: Ratio 1-2
-if (length(ratio_1_to_2) > 0) {
-  ggplot(data.frame(x = ratio_1_to_2), aes(x = x)) +
-    geom_histogram(bins = 30, fill = 'steelblue', color = 'black', alpha = 0.7) +
-    geom_vline(xintercept = median(ratio_1_to_2), color = 'red', linetype = 'dashed', linewidth = 1) +
-    labs(x = 'Street Miles to Block Perimeter Ratio', y = 'Count',
-         title = 'Histogram of Street Miles to Block Perimeter Ratio (1 < Ratio ≤ 2)') +
-    theme_minimal() +
-    theme(panel.grid = element_line(color = "gray90"))
-
-  cat("\nSummary statistics for 1 < ratio ≤ 2:\n")
-  cat("  Count:", length(ratio_1_to_2), "\n")
-  cat("  Mean:", round(mean(ratio_1_to_2), 2), "\n")
-  cat("  Median:", round(median(ratio_1_to_2), 2), "\n")
-  cat("  Std Dev:", round(sd(ratio_1_to_2), 2), "\n")
-  cat("  Min:", round(min(ratio_1_to_2), 2), "\n")
-  cat("  Max:", round(max(ratio_1_to_2), 2), "\n")
-}
-
-# Plot 3: Ratio > 2
-if (length(ratio_gt_2) > 0) {
-  ggplot(data.frame(x = ratio_gt_2), aes(x = x)) +
-    geom_histogram(bins = 50, fill = 'steelblue', color = 'black', alpha = 0.7) +
-    geom_vline(xintercept = median(ratio_gt_2), color = 'red', linetype = 'dashed', linewidth = 1) +
-    labs(x = 'Street Miles to Block Perimeter Ratio', y = 'Count',
-         title = 'Histogram of Street Miles to Block Perimeter Ratio (Ratio > 2)') +
-    theme_minimal() +
-    theme(panel.grid = element_line(color = "gray90"))
-
-  cat("\nSummary statistics for ratio > 2:\n")
-  cat("  Count:", length(ratio_gt_2), "\n")
-  cat("  Mean:", round(mean(ratio_gt_2), 2), "\n")
-  cat("  Median:", round(median(ratio_gt_2), 2), "\n")
-  cat("  Std Dev:", round(sd(ratio_gt_2), 2), "\n")
-  cat("  Min:", round(min(ratio_gt_2), 2), "\n")
-  cat("  Max:", round(max(ratio_gt_2), 2), "\n")
-}
-
-cat("\nOverall summary statistics for all ratios:\n")
-cat("  Total Count:", length(valid_perimeter_ratio), "\n")
-cat("  Ratio ≤ 1:", length(ratio_le_1),
-    sprintf("(%.1f%%)\n", length(ratio_le_1)/length(valid_perimeter_ratio)*100))
-cat("  1 < Ratio ≤ 2:", length(ratio_1_to_2),
-    sprintf("(%.1f%%)\n", length(ratio_1_to_2)/length(valid_perimeter_ratio)*100))
-cat("  Ratio > 2:", length(ratio_gt_2),
-    sprintf("(%.1f%%)\n", length(ratio_gt_2)/length(valid_perimeter_ratio)*100))
-```
-
-```{r}
-# List block_poly_id where street to perimeter ratio < 0.2
-if ('block_poly_id' %in% names(pg_summary)) {
-  low_ratio_blocks <- pg_summary |>
-    filter(is.finite(perimeter_to_street_miles_ratio),
-           perimeter_to_street_miles_ratio < 0.2,
-           block_perimeter_miles > 0)
-
-  block_poly_ids <- sort(low_ratio_blocks$block_poly_id)
-
-  cat("Blocks with street to perimeter ratio < 0.2:", length(block_poly_ids), "\n")
-  cat("\nblock_poly_id list (", length(block_poly_ids), " blocks):\n", sep = "")
-  print(block_poly_ids)
-
-  if (nrow(low_ratio_blocks) > 0) {
-    cat("\nSample details (first 10 blocks):\n")
-    low_ratio_blocks |>
-      select(block_poly_id, perimeter_to_street_miles_ratio,
-             street_miles, block_perimeter_miles, non_vacant_parcels) |>
-      head(10) |>
-      print()
-  }
-} else {
-  cat("block_poly_id column not found in pg_summary. Available columns:\n")
-  print(names(pg_summary))
-}
-```
-
-```{r}
-# Histogram: Number of parcels per square kilometer
-ggplot(data.frame(x = valid_parcels_density), aes(x = x)) +
-  geom_histogram(bins = 50, fill = 'coral', color = 'black', alpha = 0.7) +
-  geom_vline(xintercept = median(valid_parcels_density),
-             color = 'red', linetype = 'dashed', linewidth = 1) +
-  labs(x = 'Parcels per Square Kilometer', y = 'Count',
-       title = 'Histogram of Parcels per Square Kilometer by Block') +
-  theme_minimal() +
-
-
-cat("Summary statistics for parcels per square kilometer:\n")
-cat("  Count:", length(valid_parcels_density), "\n")
-cat("  Mean:", round(mean(valid_parcels_density)), "\n")
-cat("  Median:", round(median(valid_parcels_density)), "\n")
-cat("  Std Dev:", round(sd(valid_parcels_density)), "\n")
-cat("  Min:", round(min(valid_parcels_density)), "\n")
-cat("  Max:", round(max(valid_parcels_density)), "\n")
-```
-
 
 
 # Export the pg_summary GeoDataFrame to GeoJSON with diagnostic attributes
@@ -1766,19 +1572,6 @@ cat("✅ Exported pg_summary to", output_geojson, "\n")
 cat("   Includes diagnostic attributes: block_perimeter_miles, block_area_sq_km, perimeter_to_street_miles_ratio, parcels_per_sq_km\n")
 cat("   Total features:", nrow(pg_summary_export), "\n")
 
-# # separate layers for closed and open construction polygons
-# closed_construction_polygons <- pg_summary_export |>
-#   filter(status_simple == "closed")
-# open_construction_polygons <- pg_summary_export |>
-#   filter(status_simple == "planned")
-#
-# # export closed and open construction polygons to GeoJSON
-# st_write(closed_construction_polygons,
-#          file.path(outputs_dir, paste0("final_peoplesgas_with_buildings_streets_", ANALYSIS_LEVEL, "_closed_", format(Sys.Date(), "%Y%m%d"), ".geojson")),
-#          delete_dsn = TRUE)
-# st_write(open_construction_polygons,
-#          file.path(outputs_dir, paste0("final_peoplesgas_with_buildings_streets_", ANALYSIS_LEVEL, "_open_", format(Sys.Date(), "%Y%m%d"), ".geojson")),
-#          delete_dsn = TRUE)
 ```
 
 
diff --git a/reports/il_npa/notebooks/clean_peoples_construction_polygons.py b/reports/il_npa/notebooks/clean_peoples_construction_polygons.py
index cc4b181..1d45570 100644
--- a/reports/il_npa/notebooks/clean_peoples_construction_polygons.py
+++ b/reports/il_npa/notebooks/clean_peoples_construction_polygons.py
@@ -7,7 +7,7 @@
 -------------------------------------------
 
 This script prepares and cleans Peoples Gas construction polygons data for further spatial analysis. Below are the key transformations performed:
-
+Data was sourced from the Peoples Gas Construction Polygons dataset using reports/il_npa/utils/download_peoplesgas_data.py
 1. **File Loading and Setup**
    - Sets up directories for input (`../data/geo_data`, `../utils`) and output (`../data/outputs`).
    - Locates and reads the most recent available Peoples Gas construction polygons file.
@@ -21,10 +21,27 @@
    - Removes polygons classified as `"PI / SI"` in the `PRP_TYPE` column (with or without trailing space).
    - Removes polygons with `STATUS` set to `"Street and landscape restoration"`.
 
-4. **Diagnostics**
+4. **Unioning Overlapping Polygons**
+   - Uses a threshold of 1 square meter to union overlapping polygons.
+   - Uses a connected components algorithm to find overlapping polygons.
+   - Uses a union operation to combine overlapping polygons.
+
+5. **Status assignment**
+   - Assigns a status to each polygon based on the `C_START` column.
+   - If the `C_START` date is before the `MIN_START` date, the polygon is assigned the status "closed".
+   - If the `C_START` date is after the `MIN_START` date, the polygon is assigned the status "planned".
+
+6. **Diagnostics**
    - Reports counts and summaries to provide transparency about any data removed or modified.
 
-The result is a cleaned and filtered Peoples Gas construction polygon dataset, ready for unioning, further geometric processing, and integration with additional city datasets in downstream workflows.
+The result is 2 versions of the cleaned and filtered Peoples Gas construction polygon dataset, ready for integration with additional city datasets in downstream workflows.
+
+1. `peoples_polygons_unioned.geojson` - Unioned polygons by status_simple (overlap > 1 sq meter)
+    - This file is used for further geometric processing and integration with additional city datasets in downstream workflows.
+    - Planned and closed polygons are unioned separately. No clipping is performed where closed polygons overlap with planned polygons.
+2. `peoples_polygons_unioned_clipped.geojson` - Unioned polygons by status_simple (overlap > 1 sq meter) clipped by closed polygons
+    - This file was not used in the analysis.
+    - Planned and closed polygons are unioned separately. Planned polygons are clipped by the union of all closed polygons.
 """
 
 from datetime import datetime
@@ -32,6 +49,7 @@
 
 import geopandas as gpd
 import pandas as pd
+from shapely.ops import unary_union
 
 timestamp = datetime.now().strftime("%Y%m%d")
 # Set paths
@@ -40,7 +58,7 @@
 outputs_dir.mkdir(parents=True, exist_ok=True)  # Ensure outputs directory exists
 utils_dir = Path("../utils")
 
-
+MIN_START = "2026-01-01"
 # --- Code Cell 5 ---
 
 # Find the most recent Peoples Gas data file
@@ -82,7 +100,7 @@
 
     # Create status_simple column
     filtered["status_simple"] = filtered["C_START"].apply(
-        lambda x: "closed" if pd.notnull(x) and x < pd.Timestamp("2026-01-01") else "planned"
+        lambda x: "closed" if pd.notnull(x) and x < pd.Timestamp(MIN_START) else "planned"
     )
 
     # Show summary stats for C_START grouped by status_simple
@@ -92,7 +110,6 @@
         print(stats)
 
     # Create a new spatial object that contains a union of overlapping polygons by "status_simple"
-    from shapely.ops import unary_union
 
     # Threshold for unioning overlapping polygons: 1 square meter
     # Note: This is applied in UTM Zone 16N (EPSG:32616) which uses meters, so area is in square meters
diff --git a/reports/il_npa/notebooks/geo_data_cleaning.qmd b/reports/il_npa/notebooks/geo_data_cleaning.qmd
index 1dba4b6..6f0e171 100644
--- a/reports/il_npa/notebooks/geo_data_cleaning.qmd
+++ b/reports/il_npa/notebooks/geo_data_cleaning.qmd
@@ -17,8 +17,38 @@ format:
 
 # BEFORE RUNNING THIS NOTEBOOK, RUN THE FOLLOWING :
 # 1. run just fetch-all-data
-# 2. clean_peoples_construction_polygons.py
-# 3. match_parcels_buildings.py
+# 2. reports/il_npa/notebooks/clean_peoples_construction_polygons.py - produced peoples_polygons_unioned.geojson
+# 3. reports/il_npa/notebooks/match_parcels_buildings.py - produced parcels_with_buildings.geojson
+
+This notebook cleans and prepares the geo data for the analysis. We need to create a single geojson file that contains census block level data for:
+ 1. Miles of pipleline to be replaced for each block. We use street miles as a proxy. Conceptually, we want to split the cost of each pipeline segment between the adjacent blocks on either side of the street.
+ 2. Counts of parcels and residential units for each block. Counts must be disaggrregated by type of parcel (residential, mixed-use, commercial, industrial) and by type of residential unit (single-family, multi-family)
+
+We need 2 working versions of the inital block level data:
+ 1 Blocks clipped to only include the area that overlaps with the Peoples Gas construction polygons.
+ 2 Blocks that overlap with the Peoples Gas construction polygons that are not clipped.
+
+Both verisons have the same data but for blocks which only partially overlap with the Peoples Gas construction polygons, the clipped version will only retain the geometry of the overlap. The un-clipped version will retain the full geometry of the block.
+
+# Parcel Data
+
+We use the Peoples Gas construction polygons to subset the complete parcel dataset down to parcels affected by PRP. Because the construction polygons are irregular, we add a small buffer (8m) to the construction polygons to ensure that we capture all parcels that are affected by the construction. Only the parcels use the buffer.
+
+From this filtered subset of parcels, we assign each parcel to unclipped blocks using one-to-one assignment (largest overlap). We then aggregate the parcel data by block to get the total number of parcels and residential units for each block.
+
+The assessor class codes only provide so much information about the type of parcel. Because our analysis is focused on fully residential blocks, we need to make some additional rules for identifying blocks that have no commercial or industrial parcels but are still not fully residential. Examples of this are blocks that are fully covered by a church, school, parks (assessor code == "EX") or railroad properties (assessor code == "RR"). Typically if a block has only one or two parcels it is not a residential block so we drop blocks with fewer than 3 parcels. If all of the parcels on a block have the RR class code, we drop the block. We mapped the results of this programmatic filtering and manually identified some additional blocks to exclude by visually comparing the results to satelite imagery. This process would need to be better refined to replicate this analysis in other areas.
+
+
+# Street Data
+We use the Chicago Street Centerlines dataset to calculate street miles as a proxy for pipeline miles to be replaced. This dataset is downloaded from the Chicago Data Portal using reports/il_npa/utils/download_chicago_streets.py.
+
+Street data is clipped to only include segments that fall within the Peoples Gas construction polygons.
+
+For each clipped block, we calculate the perimeter of the block in meters. We assign a scaling factor each block equal to the ratio of the block perimeter over the sum of the perimeters of all blocks in the construction area. We then multiply the total length of the street segments contained within the construction polygon by the scaling factor to assign the street miles to each of the blocks.
+
+Note that the exclusion of blocks described above is executed before the street data is apportioned to the blocks. If you imagine a fully residential block across the street from a large park. The cost of replacing the pipeline segement running between these 2 blocks should be fully apportioned to the residential block, since the park would not be considered for electrification.
+
+
 
 # Imports
 
@@ -95,21 +125,21 @@ def read_geojson_with_s3_fallback(local_path, s3_bucket, s3_key):
         # Clean up temporary file
         os.unlink(tmp_path)
 
-        print(f"  ✅ Successfully loaded from S3")
+        print(f"  Successfully loaded from S3")
         return gdf
 
     except NoCredentialsError:
-        print(f"  ❌ Error: AWS credentials not found. Cannot read from S3.")
+        print(f"  Error: AWS credentials not found. Cannot read from S3.")
         raise
     except ClientError as e:
         error_code = e.response.get("Error", {}).get("Code", "Unknown")
         if error_code == "NoSuchKey":
-            print(f"  ❌ Error: File not found in S3: s3://{s3_bucket}/{s3_key}")
+            print(f"  Error: File not found in S3: s3://{s3_bucket}/{s3_key}")
         else:
-            print(f"  ❌ AWS Error ({error_code}): {e}")
+            print(f"  AWS Error ({error_code}): {e}")
         raise
     except Exception as e:
-        print(f"  ❌ Unexpected error reading from S3: {e}")
+        print(f"  Unexpected error reading from S3: {e}")
         raise
 
 
@@ -311,7 +341,7 @@ Load all geospatial datasets for analysis.
 ## Peoples Gas Construction Polygons
 
 ```{python}
-# Load pre-processed Peoples Gas polygons (unioned and clipped)
+# Load pre-processed Peoples Gas polygons (unioned; not clipped)
 pg_file = outputs_dir / 'peoples_polygons_unioned.geojson'
 print(f"Loading: {pg_file.name}")
 pg_polygons = gpd.read_file(pg_file)
@@ -395,7 +425,7 @@ if street_files:
     print(f"Loading: {street_file.name}")
 else:
     # No local file found, try to find most recent in S3
-    print("⚠️  No local streets file found, attempting to find most recent in S3...")
+    print("No local streets file found, attempting to find most recent in S3...")
     try:
         s3_client = boto3.client("s3", region_name=os.environ.get("AWS_REGION", "us-west-2"))
         # List objects with prefix
@@ -412,7 +442,7 @@ else:
         else:
             raise FileNotFoundError("No streets files found in S3")
     except Exception as e:
-        print(f"❌ Error finding streets in S3: {e}")
+        print(f"Error finding streets in S3: {e}")
         print("No street data found. Run: just fetch-streets")
         streets = None
         s3_key = None
@@ -435,7 +465,7 @@ if street_files or (s3_key is not None and local_path is not None):
         print(f"  Sample bounds: {streets.total_bounds}")
         # Check if we have proper street properties
         if 'street_nam' in streets.columns:
-            print(f"  ✓ Has street properties (street_nam, etc.)")
+            print(f"  Has street properties (street_nam, etc.)")
         else:
             print(f"  WARNING: Missing street properties - may need to re-download")
             print(f"  Available columns: {list(streets.columns[:10])}")
@@ -649,13 +679,13 @@ print(f"  Unique parcels: {parcels_by_block.index.nunique():,}")
 print(f"  Unique blocks: {parcels_by_block['geoid10'].nunique():,}")
 
 # Diagnostic: Verify one-to-one relationship
-print("\n🔍 Verifying parcel-to-block relationship (one-to-one check)...")
+print("\nVerifying parcel-to-block relationship (one-to-one check)...")
 if len(parcels_by_block) == parcels_by_block.index.nunique():
     print(
-        f"  ✅ Confirmed: All {len(parcels_by_block):,} parcels assigned to exactly one block")
+        f"  Confirmed: All {len(parcels_by_block):,} parcels assigned to exactly one block")
 else:
     print(
-        f"  ⚠️  WARNING: {len(parcels_by_block):,} rows but only {parcels_by_block.index.nunique():,} unique parcels")
+        f"  WARNING: {len(parcels_by_block):,} rows but only {parcels_by_block.index.nunique():,} unique parcels")
     print(f"  This should not happen with one-to-one assignment - investigate!")
 
 # Validation: Check that all expected columns from parcels_with_units are preserved
@@ -687,7 +717,7 @@ if "building_units" in parcels_by_block.columns:
 ```{python}
 # IMPORTANT: Streets are clipped to UNBUFFERED construction polygons
 # This ensures street miles are calculated only for the actual construction area,
-# not the 5m buffer zone. The buffer is only used for identifying affected parcels.
+# not the 8m buffer zone. The buffer is only used for identifying affected parcels.
 print("\nClipping streets to construction polygon boundaries (unbuffered)...")
 print("  Validating geometries...")
 streets_valid = prepare_geometries_for_utm(streets, "street segments")
@@ -779,17 +809,6 @@ print(f"Total parcels in construction areas: {parcel_counts.sum():,}")
 print(f"Total parcels with unit data: {parcels_with_units_count.sum():,}")
 print(f"Total residential units: {unit_counts.sum():,.0f}")
 
-# Validation: Check that sum of parcel counts equals number of parcels in parcels_in_construction - visually inspected unmatched parcels and they are correctly exlcuded
-# expected_parcel_count = len(parcels_in_construction)
-# actual_parcel_count = parcel_counts.sum()
-# if expected_parcel_count != actual_parcel_count:
-#     raise ValueError(
-#         f"Parcel count validation failed: "
-#         f"Expected {expected_parcel_count:,} parcels (from parcels_in_construction), "
-#         f"but sum of parcel_counts is {actual_parcel_count:,}"
-#     )
-# print(f"  ✓ Validation passed: Parcel count sum ({actual_parcel_count:,}) matches parcels_in_construction ({expected_parcel_count:,})")
-
 # Breakdown by property type
 print(f"\nParcels by Type (across all construction areas):")
 for prop_type in ['residential', 'commercial', 'industrial', 'mixed-use', 'vacant']:
@@ -833,7 +852,7 @@ if "assessorbldgclass" in parcels_by_block.columns:
     blocks_all_rr = block_rr_check[block_rr_check["assessorbldgclass"]].index.tolist()
     print(f"  Blocks where all parcels have assessorbldgclass == 'RR': {len(blocks_all_rr):,}")
 else:
-    print("  ⚠️  WARNING: 'assessorbldgclass' column not found in parcels_by_block")
+    print("  WARNING: 'assessorbldgclass' column not found in parcels_by_block")
     print("  Available columns:", list(parcels_by_block.columns))
     blocks_all_rr = []
 
@@ -868,14 +887,12 @@ excluded_geoid10_list = (
             "170314905002003",
             "170316609003011",
             "170318350001018",
-            # "170311602003005",
-            # "170311602004001",
             "170311609002002",
             "170311610001001",
         ]
     )
 )
-print(f"\n✅ Created excluded_geoid10_list with {len(excluded_geoid10_list):,} geoid10 values")
+print(f"\nCreated excluded_geoid10_list with {len(excluded_geoid10_list):,} geoid10 values")
 
 # Display first 20 geoid10 values as a sample
 if len(excluded_geoid10_list) > 0:
@@ -1026,8 +1043,8 @@ print("All intermediate clipped datasets exported")
 
 ## Combined Summary Dataset
 At this point we have:
-- summary statistics at the unclippedblock level for parcel and residential unit counts
-- summary statistics at the clipped block level for street miles. these contain duplicate records for blocks that overlap multiple construction polygons.
+- summary statistics at the unclipped block level for parcel and residential unit counts
+- summary statistics at the clipped block level for street miles. These contain duplicate records for blocks that overlap multiple construction polygons.
 
 We need to aggregate the street miles by geoid10 to get the total street miles for each block and merge into a final dataset where each row is one block that contain counts of all parcels (within 8m of a construction polygon) and the apportioned street miles.
 
@@ -1133,7 +1150,7 @@ if expected_geoid_count != actual_geoid_count:
     )
 
 print(f"\nBlock-level summary created with {len(summary):,} unique blocks")
-print(f"  ✓ Validation passed: {actual_geoid_count:,} unique blocks match blocks_in_construction_clipped")
+print(f"  Validation passed: {actual_geoid_count:,} unique blocks match blocks_in_construction_clipped")
 
 # Show totals
 print("\nTotals Across All City Blocks:")
@@ -1150,7 +1167,7 @@ print(f"  Total residential units: {summary['total_residential_units'].sum():,}"
 
 # Export Final Dataset
 
-Export the block-level summary statistics with full block geometries to GeoJSON.
+Export the block-level summary statistics with full block geometries to GeoJSON. We will read this into reports/il_npa/notebooks/analysis.qmd to perform the analysis.
 
 ```{python}
 if summary is None:
diff --git a/reports/il_npa/notebooks/match_parcels_buildings.py b/reports/il_npa/notebooks/match_parcels_buildings.py
index 3535bbc..cd943dc 100644
--- a/reports/il_npa/notebooks/match_parcels_buildings.py
+++ b/reports/il_npa/notebooks/match_parcels_buildings.py
@@ -11,10 +11,10 @@
 
 **Key workflow steps:**
 1. Load the most recent parcels and building footprints from the geo_data directory, as GeoDataFrames.
-2. Load the Cook County Assessor lookup table for unit counts and building class information.
+2. Load the Cook County Assessor lookup table for unit counts and building class information. This table was manually assembled using codes published by the Cook county assessor's office. https://prodassets.cookcountyassessoril.gov/s3fs-public/form_documents/classcode.pdf
 3. Ensure all spatial datasets use a projected CRS suitable for accurate spatial operations.
 4. Clean and validate geometries to avoid spatial errors.
-5. Perform a spatial join between parcels and buildings to determine which buildings are located within which parcels.
+5. Perform a spatial join between parcels and buildings to determine which buildings are located within which parcels. Parcel data provides us with the building type (residential, mixed-use, commercial, industrial) but does not have the number of units in the building for multi-family buildings. Building data provides us with the number of units in the building for multi-family buildings.
 6. For buildings that fall within a parcel, the script aims to establish a 1:1 correspondence between buildings and parcels:
     - If a parcel matches to multiple buildings (a common situation for multi-building lots or complex footprints), the script identifies the building that has the greatest area overlap with the parcel and uses only this building for that parcel.
     - This "area of overlap" is computed via intersection of building and parcel geometries.
@@ -86,25 +86,25 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
         # Clean up temporary file
         os.unlink(tmp_path)
 
-        print("  ✅ Successfully loaded from S3")
+        print("  Successfully loaded from S3")
         return gdf
 
     except NoCredentialsError:
-        print("  ❌ Error: AWS credentials not found. Cannot read from S3.")
+        print("  Error: AWS credentials not found. Cannot read from S3.")
         raise
     except ClientError as e:
         error_code = e.response.get("Error", {}).get("Code", "Unknown")
         if error_code == "NoSuchKey":
-            print(f"  ❌ Error: File not found in S3: s3://{s3_bucket}/{s3_key}")
+            print(f"  Error: File not found in S3: s3://{s3_bucket}/{s3_key}")
         else:
-            print(f"  ❌ AWS Error ({error_code}): {e}")
+            print(f"  AWS Error ({error_code}): {e}")
         raise
     except Exception as e:
-        print(f"  ❌ Unexpected error reading from S3: {e}")
+        print(f"  Unexpected error reading from S3: {e}")
         raise
 
 
-print("🔍 Parcel-Building Matching Script")
+print("Parcel-Building Matching Script")
 print("=" * 60)
 
 # Find the most recent parcels file (full dataset)
@@ -117,7 +117,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
     print(f"Loading parcels: {parcels_file.name}")
 else:
     # No local file found, try to find most recent in S3
-    print("⚠️  No local parcels file found, attempting to find most recent in S3...")
+    print("No local parcels file found, attempting to find most recent in S3...")
     try:
         s3_client = boto3.client("s3", region_name=os.environ.get("AWS_REGION", "us-west-2"))
         # List objects with prefix
@@ -133,7 +133,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
         else:
             raise FileNotFoundError("No parcels files found in S3")
     except Exception as e:
-        print(f"❌ Error finding parcels in S3: {e}")
+        print(f"Error finding parcels in S3: {e}")
         print("Please run: just fetch-parcels")
         exit(1)
     local_path = geo_data_dir / "cook_county_parcels_NOT_FOUND.geojson"  # Won't exist, triggers S3 read
@@ -155,7 +155,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
     print(f"Loading buildings: {buildings_file.name}")
 else:
     # No local file found, try to find most recent in S3
-    print("⚠️  No local buildings file found, attempting to find most recent in S3...")
+    print("No local buildings file found, attempting to find most recent in S3...")
     try:
         s3_client = boto3.client("s3", region_name=os.environ.get("AWS_REGION", "us-west-2"))
         # List objects with prefix
@@ -171,7 +171,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
         else:
             raise FileNotFoundError("No buildings files found in S3")
     except Exception as e:
-        print(f"❌ Error finding buildings in S3: {e}")
+        print(f"Error finding buildings in S3: {e}")
         print("Please run: just fetch-buildings")
         exit(1)
     local_path = geo_data_dir / "chicago_buildings_NOT_FOUND.geojson"  # Won't exist, triggers S3 read
@@ -186,7 +186,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
 # Load assessor lookup
 lookup_file = data_dir / "cook_county_assessor_lookup.csv"
 if not lookup_file.exists():
-    print("⚠️  Assessor lookup file not found locally, downloading from Google Sheets...")
+    print("Assessor lookup file not found locally, downloading from Google Sheets...")
     try:
         import urllib.request
 
@@ -198,9 +198,9 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
 
         # Download the file
         urllib.request.urlretrieve(sheet_url, lookup_file)
-        print("  ✅ Successfully downloaded assessor lookup from Google Sheets")
+        print("  Successfully downloaded assessor lookup from Google Sheets")
     except Exception as e:
-        print(f"  ❌ Error downloading assessor lookup: {e}")
+        print(f"  Error downloading assessor lookup: {e}")
         print(
             "  Please manually download from: https://docs.google.com/spreadsheets/d/1xxa47dClvp0rosZhUP1R7790CNXMLSD_0ExrPccR3p0/edit?gid=770211799#gid=770211799"
         )
@@ -210,7 +210,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
 assessor_lookup = pd.read_csv(lookup_file, dtype={"assessor_class": str})
 
 # Classify parcels
-print("\n📋 Classifying parcels...")
+print("\nClassifying parcels...")
 parcels_classified = parcels.merge(assessor_lookup, left_on="assessorbldgclass", right_on="assessor_class", how="left")
 
 classified_count = parcels_classified["type"].notna().sum()
@@ -249,7 +249,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
     parcels_classified_utm["geometry"] = parcels_classified_utm["geometry"].make_valid()
     buildings_utm["geometry"] = buildings_utm["geometry"].make_valid()
 
-    print("\n🔍 Starting parcel-building matching...")
+    print("\nStarting parcel-building matching...")
     print(f"  Parcels: {len(parcels_classified_utm):,}")
     print(f"  Buildings: {len(buildings_utm):,}")
 
@@ -300,14 +300,14 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
         & (parcels_with_units["latitude"].astype(float).round(8) == round(41.74593814, 8))
     ]
     if len(test_parcel) > 0:
-        print("\n✅ Validation check for test parcel (-87.57623748, 41.74593814):")
+        print("\nValidation check for test parcel (-87.57623748, 41.74593814):")
         print(f"   Matched building_id: {test_parcel['matched_building_id'].values[0]}")
         print(f"   Building units: {test_parcel['building_units_raw'].values[0]}")
         print("   Expected building_id: 631646")
         if test_parcel["matched_building_id"].values[0] == "631646":
-            print("   ✓ Match is correct!")
+            print("   Match is correct!")
         else:
-            print("   ⚠️  Match differs from expected")
+            print("   Match differs from expected")
 
     # Create working column starting with raw data
     parcels_with_units["building_units"] = parcels_with_units["building_units_raw"].copy()
@@ -354,7 +354,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
     raw_data_count = parcels_with_units["building_units_raw"].notna().sum()
     interpolated_count = matched_count - raw_data_count
 
-    print("\n📊 Building Unit Data Summary:")
+    print("\nBuilding Unit Data Summary:")
     print(f"  Total parcels: {total_parcels:,}")
     print(f"  From building footprint data: {raw_data_count:,} ({raw_data_count / total_parcels * 100:.1f}%)")
     print(
@@ -373,7 +373,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
     parcels_with_units["building_units_raw"] = None
 
 # Export parcels_with_units as GeoJSON
-print("\n💾 Exporting parcels_with_units...")
+print("\nExporting parcels_with_units...")
 output_file = outputs_dir / f"parcels_with_units_{timestamp}.geojson"
 
 # Ensure geometry is in WGS84 for export (parcels_classified_utm was in UTM, but parcels_with_units may be in original CRS)
@@ -381,7 +381,7 @@ def read_geojson_with_s3_fallback(local_path: Path, s3_bucket: str, s3_key: str)
     parcels_with_units = parcels_with_units.to_crs("EPSG:4326")
 
 parcels_with_units.to_file(output_file, driver="GeoJSON")
-print(f"✅ Exported {len(parcels_with_units):,} parcels with units to {output_file.name}")
-print(f"📁 File: {output_file}")
+print(f"Exported {len(parcels_with_units):,} parcels with units to {output_file.name}")
+print(f"File: {output_file}")
 
-print("\n✅ Parcel-building matching complete!")
+print("\nParcel-building matching complete!")