All's fair in love and love: copy suppression in GPT2-Small
This page was created to accompany the Streamlit app, which was originally designed to help explore different prompts for GPT-2 Small, as part of Callum McDougall, Arthur Conmy & Cody Rushing's work on self-repair in LLMs. We focus on negative behaviour (specifically copy-suppression in attention head 10.7 in GPT2-Small).
The primary goal of this app is to make our work more accessible to others (as opposed to the Streamlit page, which mainly functioned as a sandbox environment to help us spot interesting things about the behaviour of negative heads which we might otherwise have missed).
See all pages in this project, by clicking on the titles below: