← All case studies

Leading AI Integration at OG&E

AWS Bedrock + Anthropic Claude models — building the first production AI capability inside Grid Operations.

Role Lead engineer, AI Model Integration
Period Mar 2026 – ongoing
Headline 4 LLM capabilities · reliability · cost · safety · compliance
Stack AWS Bedrock · Anthropic Claude models · Model Context Protocol (MCP) · API gateway · Role-aware context manager · Per-query audit logging · Nighthawk platform

Problem

OG&E Grid Operations runs on operational data spread across multiple systems — ADMS (the integrated distribution-management platform, with OMS for outage management, SOM for switching-order management, and FLISR as core modules), CAD (field ticketing, a separate work-order system), and Resource Availability (the in-house workforce-scheduling app). Each system — and within ADMS, each module — has its own query surface, schema, and access-control model. Day-to-day, three concrete friction points in Grid Operations keep showing up:

  1. Manual cause-code QA. Distribution outages get cause codes assigned in CAD by field crews. Reliability reporting consumes those codes for SAIDI / SAIFI computation. Validating that the codes match what the CAD notes describe is manual review work that pulls reliability staff off their primary analysis — and bad codes flow downstream into capital-investment decisions.
  2. Switching documentation drift. Operators write switching orders in SOM. Errors and procedural inconsistencies get caught downstream, sometimes during execution, sometimes in post-event audits, when they should be caught at the moment of writing.
  3. Resource Availability adoption ceiling. The Resource Availability system has a learning curve that’s a soft barrier for non-technical departments. They could benefit from the system but won’t adopt unless updates are easy.

Plus a fourth friction point that isn’t unique to Grid Operations — it’s organization-wide:

  1. BI access is gated by technical skill. Anyone in any department who needs a custom analytics view today either writes SQL themselves or files a request with a business analyst. The result: high-value questions wait days for answers; lower-value questions never get asked.

A general-purpose LLM is the right tool for all four — but only if it can operate against the real operational data, under the requesting user’s actual permissions, with full auditability.

Approach

Build the platform layer that lets an LLM operate safely inside the utility, then build the applications on top of it.

Foundation: AWS Bedrock with Anthropic Claude models (API). Bedrock provides the model-hosting and security envelope; Claude provides the reasoning quality required for operational-data interpretation.

The platform layer has three parts:

  • API gateway — single surface for AI-mediated queries against MySQL (Nighthawk), Oracle (OMS / CAD), SAP HANA → Snowflake (meter data, migration in flight), and ClickHouse (SCADA / FLISR / alarms — sourced from Cassandra as the system of record, with the Cassandra → ClickHouse pipeline in flight).
  • Role-aware context manager — injects the requesting user’s role, data scope, and authorized permissions into every AI call. The model doesn’t see anything the user isn’t allowed to see.
  • Per-query audit log — captures prompt, model, scope, response, and outcome for compliance review. Every AI-mediated action is reviewable after the fact.

Sub-initiative under OG&E’s AI Team.

What I’m Building

Four production-bound capabilities — three target the Grid Operations friction points; the fourth is a horizontal capability for the whole organization.

(a) Cause Code QA

The AI reads CAD notes for completed outages, cross-references them against the assigned cause code, and flags mismatches before reliability reporting consumes the data.

Co-developed with a colleague on the reliability-reporting side. Sits inside the existing ADMS QA workflow so review queues aren’t bypassed; the AI annotates, humans approve.

(b) Switching Dashboard AI Assist

The AI runs alongside operators as they write switching in SOM, validating entries against switching procedures and OG&E-specific writing standards. Surfaces inconsistencies at write-time rather than downstream. Targets fewer errors caught later and more consistent switching documentation across operators and shifts.

(c) Resource Availability — Natural-Language Interface

Non-technical users describe what they need (a new on-call assignment, a leave request, a coverage update) in plain language. The AI asks the right clarifying follow-up questions, confirms its interpretation, and writes the changes through the existing Resource Availability API — with full audit trail.

Removes the training barrier currently limiting adoption by adjacent departments.

(d) Nighthawk AI BI — Natural-Language-to-Dashboard Generation

Anyone describes the analytics they need (“show me the top 5 circuits by storm-related CMI for Q1, by district, with the year-over-year change”) in plain language. The AI parses intent, plans the appropriate queries against the right data sources via the API gateway, runs them under the requesting user’s RBAC scope, and generates a complete dashboard — visualizations, tables, and headline numbers — on demand.

The Nighthawk BI Hub already exists as a prototype with a full CRUD API, security validation layer, and preview-rendering layer for five content types (dashboards, reports, KPIs, analyses, custom). This capability adds the natural-language intent-parsing and dashboard-generation layer on top — the AI translates plain-English requests into the same dashboard schema the preview already renders.

Positioned as a horizontal capability, not a Grid Operations workflow. Where (a), (b), and (c) target specific grid-operations friction points, AI BI is engineered to deliver value across the entire organization — Finance, Planning, Reliability, Customer Service, executive leadership, anyone who needs analytics and either doesn’t write SQL or doesn’t have a business analyst on call.

Three observations make it the most transferable capability of the four:

  • BI access is universally gated by technical skill. Every operations-heavy organization has the same bottleneck: getting a custom analytics view requires either SQL knowledge or analyst time to scope, build, and share. Collapsing that gap is a horizontal value proposition — not a utility one.
  • Institutional BA knowledge becomes a permanent organizational asset. The system prompt and few-shot examples encode the business-analyst expertise that today lives in individual heads — query patterns, domain terminology, common analyses, what “good” looks like for each kind of analytical question. The organization keeps that capability whether or not any particular analyst is available.
  • Existing access controls are preserved. All queries execute under the requesting user’s RBAC scope. The AI never bypasses or escalates access. Users only ever see what they are already authorized to see.

Decision support that today takes days of analyst time (ticket → scope → build → share) becomes a question answered in minutes. The pattern transfers cleanly to any organization that runs on data and has business analysts as the bottleneck.

Funding the work — the capital-justification artifact

A separate but inseparable workstream from building the AI itself: producing the artifact that lets each of the four capabilities above get funded through the utility’s capital-approval process.

The capital-justification artifact is what most AI initiatives in regulated industries quietly stall on. Engineers can build AI but typically can’t speak capital-accounting. Operations and capital-planning teams know how to quantify O&M but typically can’t credibly estimate the engineering effort of an AI integration. The bridging role — engineer-as-capital-author — is rare, and it determines whether AI projects make it into the funded queue or live indefinitely as “interesting ideas.”

What I author for each capital-eligible AI work order:

  • A standardized 9-phase development model applied uniformly across projects so management can compare and re-run the cost model with current burdened rates: Discovery & Design · Prompt Engineering · Backend Integration · Frontend Integration · Guard Rails & Validation · SME Testing (multiple rounds) · UAT · Documentation · Deployment & Stabilization.
  • A role-decomposed labor-hour matrix — engineer hours plus SME and tester hours per phase — so the dollar figure is auditable, not back-of-envelope.
  • AI-acceleration productivity multipliers baked into the engineer hours and disclosed explicitly (productivity gains are larger on code-heavy phases, smaller on SME-gated phases where availability — not engineering speed — is the bottleneck). This makes the estimate credible to reviewers who would otherwise discount it.
  • A scope-evidence layer linking each hour estimate to specific source files and existing integration points in the target applications. The estimate is traceable, not speculative.
  • Capital-classification rationale per the utility’s standard criteria — new asset creation, multi-year benefit, substantial investment, measurable improvement — argued explicitly for each work order.
  • Risk clauses with hour deltas (“if prompt iteration exceeds N rounds, add X engineer-hours plus Y SME-hours”). Translates uncertainty into a defensible buffer instead of a single point estimate.

Capital-justification authoring isn’t an end in itself. It’s the workstream that ensures the engineering work above actually gets funded, scoped, and shipped — instead of becoming another presentation that never converts to a budget line.

Results (in flight)

  • All four capabilities are under active development.
  • Architecture committed in the platform roadmap I authored.
  • Impact thesis — four pillars: reliability (cleaner cause-code data feeding reliability reports + fewer switching-driven incidents — the SAIDI lever inside the utility), cost (less manual QA, broader RA adoption, faster self-service analytics — O&M reduction), safety (write-time procedure validation in switching prevents crew/equipment incidents), and regulatory compliance (more accurate cause-code data feeding PSC reliability reports + procedural-adherence enforcement in switching documentation). The framing is universal; the utility-specific names (SAIDI for reliability, O&M for cost) are the lever each pillar pulls inside this industry.
  • This will be the first production AI capability deployed inside Grid Operations, and it’s the foundation pattern that future AI projects at OG&E will reuse.

Stack notes

The most consequential design choice was single API gateway, single context manager, single audit log — not one integration per AI feature. Each new capability is a thin client of the same platform. Adding the fifth, sixth, seventh AI feature should be a roadmap question, not an architecture question.

The second was read-only-by-default with explicit write paths. The AI BI generates queries the database executes; it doesn’t issue arbitrary statements. The Resource Availability natural-language interface writes through the existing RA API, which has its own validation and audit. The AI never has direct write access to operational tables.

Roadmap follow-on (Phase 2 of the Nighthawk roadmap) layers ML-based predictive insights — Accurate ETR (Estimated Time of Restoration) and ADMS model improvements. Phase 3 (2028–2030) introduces agentic workflows: a Schedule Agent, Storm Preparation Agent, Compliance Agent, Dispatch Recommendation Agent, and Auto-Reporting Agent, all under a four-tier human-in-the-loop governance framework I authored as part of the roadmap.

Interactive demo

Some specifics are abstracted for confidentiality. Happy to go deeper on the technical approach, the failure modes we worked through, or the operational outcomes — jose@macias-tech.com.