Privacy & memory

Fine-tuning on conversations is a retention decision

Turning chats into training signal changes the privacy calculus—even when identifiers are stripped.

From ephemeral to durable

Conversations that once disappeared from servers become gradients. Rare phrases and side-channel context can re-identify individuals or organizations when combined with other leaks. Minimization asks: does this data need to persist for the feature to work? Short retention, opt-out training, and on-device adaptation reduce blast radius—ties to data lineage.

Interaction with alignment

Preference learning from production logs ( RLHF-style loops) can improve policies but ingests user content at scale—governance must precede scale.

Synthetic substitutes

Teams sometimes replace raw logs with synthetic summaries—see synthetic data—but synthetic generation inherits model biases unless audited.

Deployment models

API-hosted assistants versus local open weights shift who holds logs and who can audit them.

Differential privacy and federated learning (where applicable)

DP noise and federated aggregation reduce memorization risk at a utility cost—viable for some on-device personalization; less so for frontier-scale foundation training. Match technique to threat model.

Data retention schedules

Automate deletion after N days unless a ticket references the session; surface retention in settings—reduces exposure when breaches occur and supports regulatory narratives under privacy policy commitments.