Blog

Writing about AI safety, alignment, and security.

Mechanistic Watchdog

Real-time cognitive interdiction for LLMs. A safety layer that monitors internal activations and halts generation before harmful content is produced.

AI Safety Interpretability