Strategic Undergrounding Analytics
A grid-wide algorithm — built, validated, and handed over — that ranks every overhead lateral by undergrounding ROI.
Problem
Most electric utilities decide which overhead lines to underground based on storm damage history, customer complaints, and institutional memory. The decisions are feeder-by-feeder, anecdotal, and often driven by the most recent severe-weather event rather than by a grid-wide analysis.
The capital flowing through those decisions is substantial. The dollars not allocated to the laterals with the highest combined ROI from reliability, vegetation exposure, and infrastructure-age perspectives are dollars not flowing where they’d produce the most outage reduction per dollar spent. There was no system at OG&E that could, in one place, look at every overhead single-phase lateral segment across the entire distribution system and rank it as an undergrounding candidate.
Approach
Map OG&E’s entire grid infrastructure on a Google Maps frontend. Behind it, build a composite scoring engine that runs in SQL against SAP HANA and Oracle, computing a multi-factor score for every single-phase overhead lateral in the system. Capital decisions sort by composite, top-down.
Two layers had to come first before scoring could even run: a lateral-segmentation algorithm (because most utility databases don’t have a clean “lateral segment” object), and a cost-modeling layer that priced out the nine-plus installation methods you might use to underground a given lateral depending on whether it’s urban front-lot, urban back-lot, or rural.
What I Built
The lateral-segmentation algorithm
Walks the device hierarchy from the station cross-reference tables through to every protective device, identifying every discrete single-phase overhead lateral segment in the system. Aggregates each lateral’s customer count, CAD-ticket count, pole age, length-in-feet, and CMI (Customer Minutes Interrupted) — decomposed into Normal, Vegetation, and Storm cause categories, with separate accounting for CMI attributable to the protective device versus the transformers downstream of it (apportioned by customer share).
The segmentation work is the part that made the rest of the system possible. The segments live implicitly in the device hierarchy; walking that hierarchy to materialize them was the unglamorous prerequisite to everything downstream.
The composite scoring engine
Implemented as SQL views (V_OH_SP_LATERALS → V_LATERAL_CANDIDATES). Computes four percentile-rank scores per candidate:
- ROI 30-Year Score — joins a precomputed cost-benefit analysis (30-year avoided cost) and ranks by ROI percentage.
- Vegetation Score — percentile rank of vegetation-cause CMI.
- Reliability Score — percentile rank of total CMI.
- Infrastructure Score — percentile rank of average pole age.
Averages the four into a composite that ranks every lateral grid-wide.
The cost-modeling layer
Nine-plus installation methods, each priced as a separate SQL aggregation. Different methods apply to different deployment contexts:
- Bored / Trenched — Rural, front-lot
- Radial Bored / Radial Trenched — Urban, front-lot and back-lot
- Looped Bored / Looped Trenched / Looped Trenched Direct Bury — Urban, front-lot
- OH Rebuild — Rural and Urban (the “don’t underground, rebuild overhead” baseline)
Plus the avoided O&M streams on overhead infrastructure: routine maintenance, vegetation cycles, vegetation non-cycles, normal-cause CMI cost, storm-cause CMI cost.
The visualization layer
Google Maps JavaScript API frontend with custom icons for each device class (fuse, switch, recloser, capacitor bank, fault indicator, transformer, lightning arrestor, voltage regulator, primary meter, electronic and hydraulic reclosers, bus bar, trip saver, step transformer). Overhead vs. underground line segments rendered with different styles.
Per-candidate insights dashboards rendered with HighCharts, in four tabs:
- Customer Metrics — customers per region, urban vs. rural, by city, by district.
- Cost Analysis — urban vs. rural cost comparison, yearly O&M, 30-year avoided costs.
- Reliability Metrics — CMI urban vs. rural, by region, by city, by district; CAD tickets at each level.
- Infrastructure Insights — average pole age per region; pole-age distribution.
Data sources unified
- SAP HANA — sourced cause-categorized CMI per device per outage (via reliability views; HANA’s broader content is migrating to Snowflake).
- Oracle — a parallel view supporting cross-DB validation.
- GIS device data — device coordinates, types, and support-structure installation dates.
- CAD ticket data — truck-roll counts per device.
- Underground / overhead unit cost data — the cost basis for the 30-year cost-benefit analysis.
- Urban-area boundary data — for classifying laterals as urban or rural.
Results
A whole-grid ranking of every overhead single-phase lateral as an undergrounding candidate, designed to replace feeder-by-feeder anecdotal decision-making with data.
The platform enabled OG&E management to make “data-driven capital investment decisions rather than anecdotal ones.”
Subsequently, OG&E management redirected capital priorities — the undergrounding investment program the platform was built to inform was paused in favor of other initiatives (the AI integration work being a major one). The platform itself stands as a complete deliverable: the lateral-segmentation algorithm, the composite scoring engine, the nine-plus installation-method cost models, and the Google Maps visualization layer were all built, validated, and handed over. The downstream capital program it was designed to drive was reprioritized before a meaningful dollar value of undergrounding spend flowed through it.
Stack notes
The interesting part of building this wasn’t the algorithm — it was the data model that made the algorithm tractable. SQL views composed in three layers (raw lateral aggregation → cost-and-CMI joins → percentile-rank composite) is a pattern I’d reuse anywhere I needed to do grid-wide ranking analytics. Each layer is independently inspectable; the composite at the top is fast because everything underneath it has already been aggregated.
The second non-obvious choice was percentile rank rather than absolute weighting for the four scores. Absolute weighting requires you to decide ahead of time that “$1M of vegetation CMI = X% reliability improvement = Y pole-age years.” Percentile rank sidesteps the conversion problem entirely — it asks, relative to the rest of the grid, where does this lateral fall? — and produces a composite that’s stable across re-runs as the underlying data evolves.