Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 04adedf Add some analysis of the adjustments from SA and ffwd.
- 045dfb4 Add the ffwd outputs to the experiment
- ab20384 Factor core logic of cluster_proj_outputs() into a more generic function
- 395f245 Factor core logic of get_all_proj_outputs_for_slen() into more generic function
- 1d556c1 Add Cluster class
- 12629bb Re-ran the experiment and the file sizes changed slightly
- 019b28d Rename ProjMatrixExperiment to BlockInternalsExperiment
- 644a279 Remove index from substrins
- a819aeb Rename cluster_proj_matrix_results() to cluster_proj_outputs()
- 0dba347 Add missing line to LogitsExperiment test
- 7929eeb Add proj matrix experiment and analysis
- 479c55e quick look at proj weight and bias
- f2f5771 Code to analyze tensors in the block and visualize them.
- b322789 Add plot_wei_for_all_heads()
- 47e9991 Make plot_wei take axes
- 15e11a3 Format cell containing attention_head_details() and plot_wei()
- 09af049 Start examining example
- 36d8d52 Fix bug in plot_logit_lens: first row is the input
- 454fc27 Find some interesting strings and begin analyzing them
- 62903d4 Make stride length 96 and re-run logits experiment to get more data
- 27711d4 Replace all the manual calculations in DataBatcher with tensor.unfold()
- 9ddc0e4 Introduce DataBatcher
- caabdb4 Experiment to run a bunch of strings through the model and examine logits
- 015d59c Study of the V matrix
- e5eb62d WIP trying to understand V
- e490b48 More work on positional encodings
- 6e8ee21 Positional encoding investigation
- 6e0e43d Fix bugs in plotting positional encodings and plot them for b0h0
- 5e07e4a Analyze a few more
- d89b6ae Start of analysis of long block results
- 175390f Add .gitignore for s_len256 files
- c13dac8 Run the attention weights experiments on all layers/heads
- 56a31c9 Add code to print the top tokens in the top decile
- da9ea11 Make plot_wei() take any iterable of str for the labels
- 0c3201e Clean up quick experiments on long contexts
- 9dc6238 Reduce copypasta in analyze_attention_weight_results() and print prefixes
- a2548ab Update analysis with new experiment results.
- edf5509 Make attention weights experiment run over all strings in the validation set
- 5982bda Add gitignore
- 3413973 Remove some unused variables from run_attention_weights_experiment()
- ecc5047 Add some explanatory text to the attention weights experiments section and allow passing in the data set
- 5559b4f Analysis code for attention weight experiment results
- 44f7910 Experiment to run a bunch of sample tokens through and get the attention weights
- e7cbf46 Attention head analysis showing how b1h0 copies the first row to the second.
- f28800c more interpretation of outputs.
- c70446a slight improvement to attention_head_details and more analysis of output
- 185f107 Add workthrough of output calculation
- 5a06ccc Fix math errors
- bec0625 Code to display attention heads
- ecfbdff Some exposition on the math behind attention
- 400fdba Detailed analysis and examination of how `:` enters at block 0
- ad48eda Add format_topk_chars
- ba6dfc1 Readability improvement
- d68767e Consider sa residual in addition to ffwd output
- 3422f65 Fix head progression analysis to use ffwd output
- f2cec93 Refactor head_progression to return the whole io_accessor
- 5091a61 Change ortho experiment to use uniform distribution and try variant with angles
- b93edd4 WIP analysis of blocks
- 9e48393 Some early evidence that predicted chars correlate with cosine sims of learned embeddings
- f8b7d2c Small refactorings and code to analyze frequencies in the input text and compare to transfomer.
- 102b1f3 Add frequency graph for corpus
- a5e8f06 Add title to blocks progress plot
- fb8d069 Cleanup random experiments a bit
- 130c904 Add std dev to orthogonality plot
- c57fa43 Response curve code cleanup and add kq response curves
- 32c2894 Add plots of all response curves
- 795d672 Basic response curve experiments
- ad5d8f2 Add orthogonality experiment
- 7d893d5 Add stats about cosine similarity matrices
- d2e2411 Re-run the multi-embeddings and rotations with the new code; update graphs and conclusions
- a408cd1 Clean up a bunch of stuff in experiments
- 2f12ac7 Remove a bunch of junk from experiments
- 980578b Add function to disambiguate filenames based on case. Use this everywhere
- de882ee Fix major bug in creation of char_to_embedding
- 5b51df5 Move singular vectors code into the main section and build char_to_embeddings from it
- c6d39e0 WIP: a bunch of experiments related to multi-embeddings
- f23a19d Add title to cosine sims plots and add plot for full embeddings
- 2e0cf77 Add code to create final PCA embeddings and move cosine similarity out of the random experiments section
- c5cf3b8 Remove some useless experiments
- de6e58b Clean up experiments in light of bug fix and add explanatory notes.
- 389b458 Replaced manual loops to find indices with helper functions.
- 79a09b9 Clean up some error cells
- 1d883d5 WIP: Adding PCA
- f8a32de Add code to plot embeddings
- c49b107 Add analysis of zeroing out last bits and replacing them with random values
- 1a15255 Delete old rotations files
- 0868e53 Port and improve code to perform rotations, save, load, and plot results.
- a9fcd65 Add line_profiler to dev requirements
- fc57231 Remove loop in computation of x[n_embed-1] in cartesian_from_spherical; brings down execution time from 373ms to 3.28ms
- 2fe3005 Pre-compute cumulative product of sines. Brings down execution time for cartesian_from_spherical from 373ms to 6.17ms
- ac71edd improve cartesian_from_spherical perf by caching sines and cosines
- e74d50c Port over rotation functions from old notebook with better tests
- 43a8081 Add code to learn final embeddings and show they are not unique
- 6ebebad Format logit lens code
- 163d2f3 Implement logit lens in the new codebase
- 1baec34 Show expanded graphs of block 1 self attention and final block output
- 9f3d17c Move the function that computes intermediates higher up.
- e1edaf2 Replicate the heads isolation analysis in the new codebase
- 677414a Iterate on new functions for running the model to the point that I can duplicate the blocks progress analysis
- acb69aa Start of cleaned up analysis notebook
- 3638246 rename file to scratchpad
- 6e489a6 Add vector rotations experiment
- 04dfd24 Fixed learning of embeddings to include layer_norm; learned embeddings for just logits. all still WIP
- 89b0de2 WIP checkpoint with a whole lot of rando experiments
- c82a374 add svd for attention heads
- 1ecebf7 Early experiment of projecting the singular vectors into token space
- ff52cc2 start of the logit lens experiment
- 6445566 Initial commit
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement