Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 5c57f15 Add some analysis of the adjustments from SA and ffwd.
- ef9f43a Add the ffwd outputs to the experiment
- 4e60ee9 Factor core logic of cluster_proj_outputs() into a more generic function
- 36b1b5f Factor core logic of get_all_proj_outputs_for_slen() into more generic function
- 4355eed Add Cluster class
- ac96832 Re-ran the experiment and the file sizes changed slightly
- 891770f Rename ProjMatrixExperiment to BlockInternalsExperiment
- 72965ed Remove index from substrins
- 66c7a5d Rename cluster_proj_matrix_results() to cluster_proj_outputs()
- acc6704 Add missing line to LogitsExperiment test
- 6e75911 Add proj matrix experiment and analysis
- f4f33ff quick look at proj weight and bias
- e1f4407 Code to analyze tensors in the block and visualize them.
- ec54ce0 Add plot_wei_for_all_heads()
- 8f2c697 Make plot_wei take axes
- 180dd51 Format cell containing attention_head_details() and plot_wei()
- 0b2426d Start examining example
- b69fa40 Fix bug in plot_logit_lens: first row is the input
- 721dd1b Find some interesting strings and begin analyzing them
- 4fa7092 Make stride length 96 and re-run logits experiment to get more data
- 0cfbe42 Replace all the manual calculations in DataBatcher with tensor.unfold()
- acbfc03 Introduce DataBatcher
- 8638287 Experiment to run a bunch of strings through the model and examine logits
- d46f45e Study of the V matrix
- 6804e65 WIP trying to understand V
- cfc4b75 More work on positional encodings
- 99a175a Positional encoding investigation
- b77a706 Fix bugs in plotting positional encodings and plot them for b0h0
- ad98595 Analyze a few more
- 9a971b6 Start of analysis of long block results
- e583b51 Add .gitignore for s_len256 files
- 19f6867 Run the attention weights experiments on all layers/heads
- 286a029 Add code to print the top tokens in the top decile
- a94109d Make plot_wei() take any iterable of str for the labels
- c15be72 Clean up quick experiments on long contexts
- 588c65e Reduce copypasta in analyze_attention_weight_results() and print prefixes
- 17304c6 Update analysis with new experiment results.
- 0af31b9 Make attention weights experiment run over all strings in the validation set
- e3b3ff9 Ran `git add --renormalize .` to deal with modified existing .pt file
- 94761ab Add gitattributes settings for git-lfs
- 5221356 Add gitignore
- 9d90c57 Remove some unused variables from run_attention_weights_experiment()
- 7d888f1 Add some explanatory text to the attention weights experiments section and allow passing in the data set
- 62f2546 Analysis code for attention weight experiment results
- f5e050e Experiment to run a bunch of sample tokens through and get the attention weights
- 01dfd8c Attention head analysis showing how b1h0 copies the first row to the second.
- b5740d9 more interpretation of outputs.
- b4a0f2d slight improvement to attention_head_details and more analysis of output
- eaf0468 Add workthrough of output calculation
- 472cc68 Fix math errors
- fad56ba Code to display attention heads
- 3ca1a0f Some exposition on the math behind attention
- 20ecde2 Detailed analysis and examination of how `:` enters at block 0
- 31e67de Add format_topk_chars
- 540073b Readability improvement
- 67c65ef Consider sa residual in addition to ffwd output
- 157422b Fix head progression analysis to use ffwd output
- ef62a52 Refactor head_progression to return the whole io_accessor
- 177249b Change ortho experiment to use uniform distribution and try variant with angles
- 4113094 WIP analysis of blocks
- b71d68a Some early evidence that predicted chars correlate with cosine sims of learned embeddings
- 46dd102 Small refactorings and code to analyze frequencies in the input text and compare to transfomer.
- 69f4803 Add frequency graph for corpus
- 0c4caaa Add title to blocks progress plot
- 396ae75 Cleanup random experiments a bit
- d0cdcfe Add std dev to orthogonality plot
- 743f060 Response curve code cleanup and add kq response curves
- c5ea752 Add plots of all response curves
- 9ad7c3d Basic response curve experiments
- f15d015 Add orthogonality experiment
- ff96352 Add stats about cosine similarity matrices
- 4621aa1 Re-run the multi-embeddings and rotations with the new code; update graphs and conclusions
- e404621 Clean up a bunch of stuff in experiments
- 9b044cc Remove a bunch of junk from experiments
- 809e244 Add function to disambiguate filenames based on case. Use this everywhere
- 77a475c Fix major bug in creation of char_to_embedding
- c90abef Move singular vectors code into the main section and build char_to_embeddings from it
- f93d8ab WIP: a bunch of experiments related to multi-embeddings
- ec074c1 Add title to cosine sims plots and add plot for full embeddings
- 570ac5b Add code to create final PCA embeddings and move cosine similarity out of the random experiments section
- 97eba67 Remove some useless experiments
- e43aa39 Clean up experiments in light of bug fix and add explanatory notes.
- 00a586b Replaced manual loops to find indices with helper functions.
- db1a773 Clean up some error cells
- 1def6d8 WIP: Adding PCA
- e307cea Add code to plot embeddings
- 1de4ef4 Add analysis of zeroing out last bits and replacing them with random values
- 5378dfa Delete old rotations files
- 6370583 Port and improve code to perform rotations, save, load, and plot results.
- 24019dd Add line_profiler to dev requirements
- 25656b3 Remove loop in computation of x[n_embed-1] in cartesian_from_spherical; brings down execution time from 373ms to 3.28ms
- 258858c Pre-compute cumulative product of sines. Brings down execution time for cartesian_from_spherical from 373ms to 6.17ms
- 3398e6b improve cartesian_from_spherical perf by caching sines and cosines
- e0677b8 Port over rotation functions from old notebook with better tests
- 7cee1ea Add code to learn final embeddings and show they are not unique
- 43d84a2 Format logit lens code
- 461ef14 Implement logit lens in the new codebase
- 1c2b4f4 Show expanded graphs of block 1 self attention and final block output
- 7d25509 Move the function that computes intermediates higher up.
- 218f5a1 Replicate the heads isolation analysis in the new codebase
- 918a3e9 Iterate on new functions for running the model to the point that I can duplicate the blocks progress analysis
- 1469ea0 Start of cleaned up analysis notebook
- 42b8281 rename file to scratchpad
- 39e1c7f Add vector rotations experiment
- bfc012d Fixed learning of embeddings to include layer_norm; learned embeddings for just logits. all still WIP
- cdb02d9 WIP checkpoint with a whole lot of rando experiments
- a10f606 add svd for attention heads
- e6f3345 Early experiment of projecting the singular vectors into token space
- ce8717f start of the logit lens experiment
- 83f0dfc Initial commit
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement