This is great work! Mechanistic interpretability has tons of use cases, it's great to see open research in that field.
You mentioned you spent your own time and money on it, would you be willing to share how much you spent? It would help others who might be considering independent research.
Thank you, I too am a big believer and enjoyer of open research. The actual code has clarity that complex research papers were never able to convey to me as well as the actual code could.
Regarding the cost I would probably sum it up to round about ~2.5k USD for just the actual execution cost. Development cost would've probably doubled that sum if I wouldn't already have a GPU workstation for experiments at home that I take for granted. That cost is made up of:
* ~400 USD for ~2 months of storage and traffic of 7.4 TB (3.2 TB of raw, 3.2 TB of preprocessed training data) on a GCP standard bucket
* ~100 USD for Anthropic claude requests for experimenting with the right system prompt and test runs and the actual final execution
* The other ~2k USD were used to rent 8x Nvidia RTX4090's together with a 5TB SSD from runpod.io for various stages of the experiments. For the actual SAE training I rented the node for 8 days straight and I would probably allocate an additional ~3-4 days of runtime just to experiments to determine the best Hyperparameters for training.