Guest User

Untitled

a guest
Nov 8th, 2025
21
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.97 KB | None | 0 0
  1. ================================================================================
  2. COMPREHENSIVE T5 TEXT ENCODER EVALUATION
  3. FP16 Baseline vs FP16 Fast vs Q8 GGUF Quantization
  4. ================================================================================
  5.  
  6. Loading tokenizer...
  7. You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
  8.  
  9. ================================================================================
  10. BENCHMARK 1: FP16 BASELINE
  11. ================================================================================
  12. Loading FP16 baseline model...
  13. `torch_dtype` is deprecated! Use `dtype` instead!
  14.  
  15. Encoding prompts...
  16. Benchmarking speed...
  17. ✓ Speed: 0.1296s ± 0.0045s
  18. ✓ VRAM: 10.76 GB
  19. ✓ Embedding shape: (6, 4096)
  20. ✓ Embedding dtype: float16
  21.  
  22. ================================================================================
  23. BENCHMARK 2: FP16 WITH FAST ACCUMULATION (TF32)
  24. ================================================================================
  25. Loading FP16 fast model...
  26. Encoding prompts...
  27. Benchmarking speed...
  28. ✓ Speed: 0.1150s ± 0.0005s
  29. ✓ VRAM: 10.76 GB
  30.  
  31. ================================================================================
  32. BENCHMARK 3: Q8 GGUF QUANTIZATION (MIXED PRECISION)
  33. ================================================================================
  34.  
  35. 📊 Analyzing GGUF file structure...
  36. 📁 Analyzing GGUF model: /home/local/Downloads/paw/model_cache/models--city96--t5-v1_1-xxl-encoder-gguf/snapshots/005a6ea51a7d0b84d677b3e633bb52a8c85a83d9/t5-v1_1-xxl-encoder-Q8_0.gguf
  37. Architecture: [116 53 101 110 99 111 100 101 114]
  38. Total tensors: 219
  39. Quantization breakdown:
  40. • 0: 50 tensors (22.8%)
  41. • 8: 169 tensors (77.2%)
  42.  
  43. Sample tensor types:
  44. • enc.blk.0.attn_k.weight: 8 [4096 4096]
  45. • enc.blk.0.attn_o.weight: 8 [4096 4096]
  46. • enc.blk.0.attn_q.weight: 8 [4096 4096]
  47. • enc.blk.0.attn_rel_b.weight: 0 [64 32]
  48. • enc.blk.0.attn_v.weight: 8 [4096 4096]
  49. • enc.blk.0.attn_norm.weight: 0 [4096]
  50. • enc.blk.0.ffn_gate.weight: 8 [ 4096 10240]
  51. • enc.blk.0.ffn_up.weight: 8 [ 4096 10240]
  52. • enc.blk.0.ffn_down.weight: 8 [10240 4096]
  53. • enc.blk.0.ffn_norm.weight: 0 [4096]
  54.  
  55. ⚠️ CRITICAL FINDING:
  56. Q8_0 GGUF is MIXED PRECISION, not pure Q8!
  57. Contains: {'8': 169, '0': 50}
  58. This means some blocks are Q8_0 (quantized) and some are F32 (full precision)
  59. Even the 'quantized' parts have F32 scales per block!
  60.  
  61. 🔄 Loading Q8 GGUF model (simulating dequantization)...
  62. 🔄 Loading Q8 GGUF and dequantizing to FP16 (simulating ComfyUI-GGUF)
  63. Simulating Q8_0 quantization artifacts...
  64. (Q8_0 = 8-bit int + FP16 scale per block of 32 values)
  65. Quantized 170 weight tensors
  66. Encoding prompts...
  67. Benchmarking speed...
  68. ✓ Speed: 0.1059s ± 0.0009s
  69. ✓ VRAM: 11.50 GB
  70. ✓ Embedding shape: (6, 4096)
  71.  
  72. ================================================================================
  73. EMBEDDING ACCURACY COMPARISON
  74. ================================================================================
  75.  
  76. [FP16 Fast vs FP16 Baseline]
  77. Cosine Similarity: 0.999999
  78. (std: 0.000000, min: 0.999999)
  79. MSE: 0.00e+00
  80. MAE: 3.43e-05
  81. L2 norm difference: 8.01e-04
  82. Max difference: 9.77e-04
  83. Perplexity metric: 2352149.492204
  84.  
  85. [Q8 GGUF vs FP16 Baseline]
  86. Cosine Similarity: 0.999648
  87. (std: 0.000381, min: 0.998807)
  88. MSE: 1.49e-06
  89. MAE: 8.27e-04
  90. L2 norm difference: 7.02e-02
  91. Max difference: 2.29e-02
  92. Perplexity metric: 5173.351121
  93.  
  94. [FP16 Fast vs Q8 GGUF] - THE CRITICAL COMPARISON
  95. Cosine Similarity: 0.999648
  96. (std: 0.000385)
  97. MSE: 1.49e-06
  98. MAE: 8.27e-04
  99.  
  100. Per-prompt comparison (Cosine Similarity):
  101. Prompt FP16 Fast Q8 GGUF Winner
  102. ------------------------------------------------------- ------------ ------------ ------------
  103. a cat sitting on a chair 1.000000 0.999849 FP16 Fast
  104. cinematic shot of a futuristic cyberpunk city at n... 1.000000 0.999692 FP16 Fast
  105. close-up of delicate water droplets on a spider we... 1.000000 0.999865 FP16 Fast
  106. abstract concept of time dissolving into fractals 0.999999 0.998807 FP16 Fast
  107. professional product photography of a luxury watch... 1.000000 0.999872 FP16 Fast
  108. anime style illustration of a magical forest with ... 0.999999 0.999804 FP16 Fast
  109.  
  110. ================================================================================
  111. PERFORMANCE SUMMARY
  112. ================================================================================
  113.  
  114. Speed Comparison (lower is better):
  115. FP16 Baseline: 0.1296s ± 0.0045s
  116. FP16 Fast: 0.1150s ± 0.0005s
  117. Q8 GGUF: 0.1059s ± 0.0009s
  118.  
  119. FP16 Fast speedup vs Baseline: 11.3%
  120. Q8 GGUF speedup vs FP16 Fast: 7.9%
  121.  
  122. VRAM Usage (lower is better):
  123. FP16 Baseline: 10.76 GB
  124. FP16 Fast: 10.76 GB
  125. Q8 GGUF: 11.50 GB
  126.  
  127. Q8 GGUF VRAM savings vs FP16: -6.9%
  128.  
  129. ================================================================================
  130. EMBEDDING ACCURACY SUMMARY (Higher cosine similarity = Better)
  131. ================================================================================
  132.  
  133. FP16 Fast vs Baseline:
  134. ✓ Cosine Similarity: 0.99999946
  135. ✓ Quality Loss: 0.000054%
  136. ✓ Status: NEGLIGIBLE DIFFERENCE (>0.9999 threshold)
  137.  
  138. Q8 GGUF vs Baseline:
  139. ⚠️ Cosine Similarity: 0.99964816
  140. ⚠️ Quality Loss: 0.035184%
  141. ❌ Q8 is WORSE than FP16 Fast by 0.035130% cosine similarity
  142.  
  143. ================================================================================
  144. 🏆 FINAL VERDICT
  145. ================================================================================
  146.  
  147. Quality Ranking (Cosine Similarity to FP16 Baseline):
  148. 1. FP16 Baseline: 1.00000000 (reference)
  149. 2. 🥇 FP16 Fast: 0.99999946 ✓ WINNER
  150. 3. Q8 GGUF: 0.99964816
  151.  
  152. Speed Ranking (Time per batch):
  153. 1. Q8 GGUF 0.1059s 🥇 FASTEST
  154. 2. FP16 Fast 0.1150s
  155. 3. FP16 Baseline 0.1296s
  156.  
  157. 🎯 RECOMMENDATION FOR TEXT-TO-IMAGE/VIDEO (Flux, HunyuanVideo):
  158. Use FP16 + Fast Accumulation (TF32/BF16)
  159.  
  160. WHY:
  161. ✓ FP16 Fast has 0.035130% BETTER quality than Q8 GGUF
  162. ✓ FP16 Fast is 11.3% faster than baseline
  163. ✓ No quantization artifacts (Q8 has rounding errors)
  164. ✓ Native hardware support (no dequantization overhead)
  165. ⚠️ Q8 GGUF is MIXED PRECISION (Q8_0 + F32 blocks)
  166. ⚠️ Q8 requires dequantization which adds latency
  167.  
  168. ================================================================================
Advertisement
Add Comment
Please, Sign In to add comment