Agents

Tool use is not autonomy

Agents chain APIs and browsers into workflows—still driven by human objectives, sandbox rules, and reward hacks waiting to happen.

What “autonomy” means in marketing

Vendors imply independent judgment; engineering reality is closer to scripted exploration with retries. Goals, permissions, and stopping conditions come from people. True reliability requires stress tests—see red-teaming—across perturbed environments (URL changes, auth expiry, time zones).

Grounding through tools

Tools can ground models in live data—bridging to grounding—but each interface is a new injection surface: malicious web pages, poisoned repositories. Combine with RAG when corpora are static enough to index safely.

Latency stacks per hop

Each call adds milliseconds to seconds—impacting latency UX and user trust. Parallelize where safe; serialize where dependencies demand.

Open vs API ecosystems

Self-hosted agents ( open weights vs API) change who can experiment—and who inherits misuse risk.

State machines vs end-to-end policies

Many production agents combine explicit state machines for allowed transitions with LLM policies for language within each state— reducing runaway tool use while preserving flexibility. Document states and invariants for auditors.

Economic incentives and task routing

If agents are paid per successful task, designers may shorten reasoning or skip safety checks—align incentives with adversarial testing and human oversight thresholds.