This is my blog.
I use it to talk about random things that interest me. These include:
AI safety,
Rationality,
Mechanistic Interpretability,
Sci-fi movies,
My art projects,
The Hitchhiker's Guide To The Galaxy,
Maths,
Climbing,
Anki, or Why Spaced Repetition Is The Best Thing Ever,
and whatever I happen to feel like at the time.
Last year I finished my Master's in Mathematics at the University of Cambridge.
I am currently conducting mechanistic interpretability research with Arthur Conmy (mentored by Neel Nanda), and working on the third iteration of the AI Alignment Research Engineer Accelerator programme (ARENA).
I'm working on an open-source viewer for investigating the learned features of sparse autoencoders, based on Anthropic's recent work. It provides similar functionality to Anthropic's viewer, although with the added ability to take user-defined prompts and highlight the most important features (by one of a few different metrics).
Click here to view one which is based on Neel Nanda's GELU-1l model.