PERFECTLY NORMAL

CALLUM MCDOUGALL

This image was created using a variant of my thread art algorithm - read more here.

This chatbot was designed to produce code from Neel Nanda's TransformerLens library. It works by matching an embedding of your query with a set of embeddings of chunked Python files to find the most relevant bits of code, then packaging that into a prompt which is fed into OpenAI's API.

Most the code was written by me, as part of the ARENA curriculum (some of it is based on pre-existing material).

You can use the checkboxes below to decide which files to include in context. You can also set the float values on each line, to increase or decrease the probability of a certain file being added to context.

▶ Click for more details on how the context works.

The context window is constructed by splitting the file into chunks, calculating the cosine distances between each chunk and your question, and adding the chunks with smallest values to the context window.

The values in these boxes will be directly subtracted from the cosine distances.

For example, if you set one of the values to be +2.0, then any content from that file will be chosen over content in the other files (because the full range of possible cosine distance values is only 2.0).

Some example prompts you can try (along with the recommended context):

Prompt Context to include
Write code for a hooked forward pass in a TransformerLens model, where the output of attention head L1H3 is mean-ablated with values from cache, an object of type ActivationCache. [1.2] - Intro to MI
Write code to visualise a TransformerLens model's full QK matrix, for each attention head in layer 9 & 10. You can use the plotly helper function imshow which should be in context. [1.2] - Intro to MI, July & August MI Challenges
Write code (from first principles) to perform path patching, from the output of an attention head in layer 5 to the query input of an attention head in layer 10. You should use clean_tokens, clean_cache for your clean values, and corrupted_tokens, corrupted_cache for your corrupted values (the caches are ActivationCache objects). All outputs of intermediate attention heads should be frozen to their values on clean_cache, so we only measure the direct path between the two heads. [1.3] - IOI

Note - this model is very experimental, and tends to hallucinate! I usually use it to produce an approximation of the code I want, and then edit the thing it gives me.

I'm planning a version of this model which also allows you to attach your own code files. I also intend to expand this into more of a fully fledged research assistant in the future, by attaching other resources like papers and glossaries.


Select Python files for context: