Environment

Energy is part of the UX

Joules per token and per training run shape who can participate—and what “real-time” costs the planet.

Training vs inference

Large one-off training runs dominate headlines, but cumulative inference at scale can rival training over a product lifetime— especially for viral assistants with long contexts. Measure both.

Efficiency is access

Quantization, distillation, caching, and better batching change which teams and regions can self-host—overlapping open weights debates.

Latency ties to power

Faster wall-clock often means more aggressive hardware utilization or smaller models—see latency UX.

Scaling implications

Chasing marginal loss reductions via brute compute raises ethical questions alongside technical ones— scaling essay.

Carbon accounting methodologies

Location-based vs market-based grid factors change reported CO2e; disclose methodology when publishing "green AI" claims—avoid marketing numbers that shift with accounting rules alone.

Inference caching and reuse

Prompt caching, KV-cache reuse across sessions (where policy allows), and shared prefixes in batching cut repeated work—often cheaper than bigger models for repetitive enterprise queries—see context economics.