Untitled

04adedf Add some analysis of the adjustments from SA and ffwd.
045dfb4 Add the ffwd outputs to the experiment
ab20384 Factor core logic of cluster_proj_outputs() into a more generic function
395f245 Factor core logic of get_all_proj_outputs_for_slen() into more generic function
1d556c1 Add Cluster class
12629bb Re-ran the experiment and the file sizes changed slightly
019b28d Rename ProjMatrixExperiment to BlockInternalsExperiment
644a279 Remove index from substrins
a819aeb Rename cluster_proj_matrix_results() to cluster_proj_outputs()
0dba347 Add missing line to LogitsExperiment test
7929eeb Add proj matrix experiment and analysis
479c55e quick look at proj weight and bias
f2f5771 Code to analyze tensors in the block and visualize them.
b322789 Add plot_wei_for_all_heads()
47e9991 Make plot_wei take axes
15e11a3 Format cell containing attention_head_details() and plot_wei()
09af049 Start examining example
36d8d52 Fix bug in plot_logit_lens: first row is the input
454fc27 Find some interesting strings and begin analyzing them
62903d4 Make stride length 96 and re-run logits experiment to get more data
27711d4 Replace all the manual calculations in DataBatcher with tensor.unfold()
9ddc0e4 Introduce DataBatcher
caabdb4 Experiment to run a bunch of strings through the model and examine logits
015d59c Study of the V matrix
e5eb62d WIP trying to understand V
e490b48 More work on positional encodings
6e8ee21 Positional encoding investigation
6e0e43d Fix bugs in plotting positional encodings and plot them for b0h0
5e07e4a Analyze a few more
d89b6ae Start of analysis of long block results
175390f Add .gitignore for s_len256 files
c13dac8 Run the attention weights experiments on all layers/heads
56a31c9 Add code to print the top tokens in the top decile
da9ea11 Make plot_wei() take any iterable of str for the labels
0c3201e Clean up quick experiments on long contexts
9dc6238 Reduce copypasta in analyze_attention_weight_results() and print prefixes
a2548ab Update analysis with new experiment results.
edf5509 Make attention weights experiment run over all strings in the validation set
5982bda Add gitignore
3413973 Remove some unused variables from run_attention_weights_experiment()
ecc5047 Add some explanatory text to the attention weights experiments section and allow passing in the data set
5559b4f Analysis code for attention weight experiment results
44f7910 Experiment to run a bunch of sample tokens through and get the attention weights
e7cbf46 Attention head analysis showing how b1h0 copies the first row to the second.
f28800c more interpretation of outputs.
c70446a slight improvement to attention_head_details and more analysis of output
185f107 Add workthrough of output calculation
5a06ccc Fix math errors
bec0625 Code to display attention heads
ecfbdff Some exposition on the math behind attention
400fdba Detailed analysis and examination of how `:` enters at block 0
ad48eda Add format_topk_chars
ba6dfc1 Readability improvement
d68767e Consider sa residual in addition to ffwd output
3422f65 Fix head progression analysis to use ffwd output
f2cec93 Refactor head_progression to return the whole io_accessor
5091a61 Change ortho experiment to use uniform distribution and try variant with angles
b93edd4 WIP analysis of blocks
9e48393 Some early evidence that predicted chars correlate with cosine sims of learned embeddings
f8b7d2c Small refactorings and code to analyze frequencies in the input text and compare to transfomer.
102b1f3 Add frequency graph for corpus
a5e8f06 Add title to blocks progress plot
fb8d069 Cleanup random experiments a bit
130c904 Add std dev to orthogonality plot
c57fa43 Response curve code cleanup and add kq response curves
32c2894 Add plots of all response curves
795d672 Basic response curve experiments
ad5d8f2 Add orthogonality experiment
7d893d5 Add stats about cosine similarity matrices
d2e2411 Re-run the multi-embeddings and rotations with the new code; update graphs and conclusions
a408cd1 Clean up a bunch of stuff in experiments
2f12ac7 Remove a bunch of junk from experiments
980578b Add function to disambiguate filenames based on case. Use this everywhere
de882ee Fix major bug in creation of char_to_embedding
5b51df5 Move singular vectors code into the main section and build char_to_embeddings from it
c6d39e0 WIP: a bunch of experiments related to multi-embeddings
f23a19d Add title to cosine sims plots and add plot for full embeddings
2e0cf77 Add code to create final PCA embeddings and move cosine similarity out of the random experiments section
c5cf3b8 Remove some useless experiments
de6e58b Clean up experiments in light of bug fix and add explanatory notes.
389b458 Replaced manual loops to find indices with helper functions.
79a09b9 Clean up some error cells
1d883d5 WIP: Adding PCA
f8a32de Add code to plot embeddings
c49b107 Add analysis of zeroing out last bits and replacing them with random values
1a15255 Delete old rotations files
0868e53 Port and improve code to perform rotations, save, load, and plot results.
a9fcd65 Add line_profiler to dev requirements
fc57231 Remove loop in computation of x[n_embed-1] in cartesian_from_spherical; brings down execution time from 373ms to 3.28ms
2fe3005 Pre-compute cumulative product of sines. Brings down execution time for cartesian_from_spherical from 373ms to 6.17ms
ac71edd improve cartesian_from_spherical perf by caching sines and cosines
e74d50c Port over rotation functions from old notebook with better tests
43a8081 Add code to learn final embeddings and show they are not unique
6ebebad Format logit lens code
163d2f3 Implement logit lens in the new codebase
1baec34 Show expanded graphs of block 1 self attention and final block output
9f3d17c Move the function that computes intermediates higher up.
e1edaf2 Replicate the heads isolation analysis in the new codebase
677414a Iterate on new functions for running the model to the point that I can duplicate the blocks progress analysis
acb69aa Start of cleaned up analysis notebook
3638246 rename file to scratchpad
6e489a6 Add vector rotations experiment
04dfd24 Fixed learning of embeddings to include layer_norm; learned embeddings for just logits. all still WIP
89b0de2 WIP checkpoint with a whole lot of rando experiments
c82a374 add svd for attention heads
1ecebf7 Early experiment of projecting the singular vectors into token space
ff52cc2 start of the logit lens experiment
6445566 Initial commit