Advertisement
spather

Untitled

Sep 29th, 2023
14
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.40 KB | None | 0 0
  1. 5c57f15 Add some analysis of the adjustments from SA and ffwd.
  2. ef9f43a Add the ffwd outputs to the experiment
  3. 4e60ee9 Factor core logic of cluster_proj_outputs() into a more generic function
  4. 36b1b5f Factor core logic of get_all_proj_outputs_for_slen() into more generic function
  5. 4355eed Add Cluster class
  6. ac96832 Re-ran the experiment and the file sizes changed slightly
  7. 891770f Rename ProjMatrixExperiment to BlockInternalsExperiment
  8. 72965ed Remove index from substrins
  9. 66c7a5d Rename cluster_proj_matrix_results() to cluster_proj_outputs()
  10. acc6704 Add missing line to LogitsExperiment test
  11. 6e75911 Add proj matrix experiment and analysis
  12. f4f33ff quick look at proj weight and bias
  13. e1f4407 Code to analyze tensors in the block and visualize them.
  14. ec54ce0 Add plot_wei_for_all_heads()
  15. 8f2c697 Make plot_wei take axes
  16. 180dd51 Format cell containing attention_head_details() and plot_wei()
  17. 0b2426d Start examining example
  18. b69fa40 Fix bug in plot_logit_lens: first row is the input
  19. 721dd1b Find some interesting strings and begin analyzing them
  20. 4fa7092 Make stride length 96 and re-run logits experiment to get more data
  21. 0cfbe42 Replace all the manual calculations in DataBatcher with tensor.unfold()
  22. acbfc03 Introduce DataBatcher
  23. 8638287 Experiment to run a bunch of strings through the model and examine logits
  24. d46f45e Study of the V matrix
  25. 6804e65 WIP trying to understand V
  26. cfc4b75 More work on positional encodings
  27. 99a175a Positional encoding investigation
  28. b77a706 Fix bugs in plotting positional encodings and plot them for b0h0
  29. ad98595 Analyze a few more
  30. 9a971b6 Start of analysis of long block results
  31. e583b51 Add .gitignore for s_len256 files
  32. 19f6867 Run the attention weights experiments on all layers/heads
  33. 286a029 Add code to print the top tokens in the top decile
  34. a94109d Make plot_wei() take any iterable of str for the labels
  35. c15be72 Clean up quick experiments on long contexts
  36. 588c65e Reduce copypasta in analyze_attention_weight_results() and print prefixes
  37. 17304c6 Update analysis with new experiment results.
  38. 0af31b9 Make attention weights experiment run over all strings in the validation set
  39. e3b3ff9 Ran `git add --renormalize .` to deal with modified existing .pt file
  40. 94761ab Add gitattributes settings for git-lfs
  41. 5221356 Add gitignore
  42. 9d90c57 Remove some unused variables from run_attention_weights_experiment()
  43. 7d888f1 Add some explanatory text to the attention weights experiments section and allow passing in the data set
  44. 62f2546 Analysis code for attention weight experiment results
  45. f5e050e Experiment to run a bunch of sample tokens through and get the attention weights
  46. 01dfd8c Attention head analysis showing how b1h0 copies the first row to the second.
  47. b5740d9 more interpretation of outputs.
  48. b4a0f2d slight improvement to attention_head_details and more analysis of output
  49. eaf0468 Add workthrough of output calculation
  50. 472cc68 Fix math errors
  51. fad56ba Code to display attention heads
  52. 3ca1a0f Some exposition on the math behind attention
  53. 20ecde2 Detailed analysis and examination of how `:` enters at block 0
  54. 31e67de Add format_topk_chars
  55. 540073b Readability improvement
  56. 67c65ef Consider sa residual in addition to ffwd output
  57. 157422b Fix head progression analysis to use ffwd output
  58. ef62a52 Refactor head_progression to return the whole io_accessor
  59. 177249b Change ortho experiment to use uniform distribution and try variant with angles
  60. 4113094 WIP analysis of blocks
  61. b71d68a Some early evidence that predicted chars correlate with cosine sims of learned embeddings
  62. 46dd102 Small refactorings and code to analyze frequencies in the input text and compare to transfomer.
  63. 69f4803 Add frequency graph for corpus
  64. 0c4caaa Add title to blocks progress plot
  65. 396ae75 Cleanup random experiments a bit
  66. d0cdcfe Add std dev to orthogonality plot
  67. 743f060 Response curve code cleanup and add kq response curves
  68. c5ea752 Add plots of all response curves
  69. 9ad7c3d Basic response curve experiments
  70. f15d015 Add orthogonality experiment
  71. ff96352 Add stats about cosine similarity matrices
  72. 4621aa1 Re-run the multi-embeddings and rotations with the new code; update graphs and conclusions
  73. e404621 Clean up a bunch of stuff in experiments
  74. 9b044cc Remove a bunch of junk from experiments
  75. 809e244 Add function to disambiguate filenames based on case. Use this everywhere
  76. 77a475c Fix major bug in creation of char_to_embedding
  77. c90abef Move singular vectors code into the main section and build char_to_embeddings from it
  78. f93d8ab WIP: a bunch of experiments related to multi-embeddings
  79. ec074c1 Add title to cosine sims plots and add plot for full embeddings
  80. 570ac5b Add code to create final PCA embeddings and move cosine similarity out of the random experiments section
  81. 97eba67 Remove some useless experiments
  82. e43aa39 Clean up experiments in light of bug fix and add explanatory notes.
  83. 00a586b Replaced manual loops to find indices with helper functions.
  84. db1a773 Clean up some error cells
  85. 1def6d8 WIP: Adding PCA
  86. e307cea Add code to plot embeddings
  87. 1de4ef4 Add analysis of zeroing out last bits and replacing them with random values
  88. 5378dfa Delete old rotations files
  89. 6370583 Port and improve code to perform rotations, save, load, and plot results.
  90. 24019dd Add line_profiler to dev requirements
  91. 25656b3 Remove loop in computation of x[n_embed-1] in cartesian_from_spherical; brings down execution time from 373ms to 3.28ms
  92. 258858c Pre-compute cumulative product of sines. Brings down execution time for cartesian_from_spherical from 373ms to 6.17ms
  93. 3398e6b improve cartesian_from_spherical perf by caching sines and cosines
  94. e0677b8 Port over rotation functions from old notebook with better tests
  95. 7cee1ea Add code to learn final embeddings and show they are not unique
  96. 43d84a2 Format logit lens code
  97. 461ef14 Implement logit lens in the new codebase
  98. 1c2b4f4 Show expanded graphs of block 1 self attention and final block output
  99. 7d25509 Move the function that computes intermediates higher up.
  100. 218f5a1 Replicate the heads isolation analysis in the new codebase
  101. 918a3e9 Iterate on new functions for running the model to the point that I can duplicate the blocks progress analysis
  102. 1469ea0 Start of cleaned up analysis notebook
  103. 42b8281 rename file to scratchpad
  104. 39e1c7f Add vector rotations experiment
  105. bfc012d Fixed learning of embeddings to include layer_norm; learned embeddings for just logits. all still WIP
  106. cdb02d9 WIP checkpoint with a whole lot of rando experiments
  107. a10f606 add svd for attention heads
  108. e6f3345 Early experiment of projecting the singular vectors into token space
  109. ce8717f start of the logit lens experiment
  110. 83f0dfc Initial commit
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement