BaSs_HaXoR

Analog GPUs: THE FUTURE

Dec 1st, 2025 (edited)
49
0
Never
1
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.88 KB | None | 0 0
  1. https://youtu.be/rtD_Emq0pbs
  2.  
  3. 1. What the actual chip is (materials, device stack, arrays, physics).
  4. 2. How the precision problem was solved.
  5. 3. What exact performance means in real benchmarks.
  6. 4. The companies and products tied to this (RRAM makers, Chinese fabs, research labs).
  7. 5. How this compares to GPUs at an electrical and architectural level.
  8. 6. Why this matters for AI, 6G, photonics, and energy infrastructure.
  9. 7. Where this tech is going next.
  10.  
  11. ================================================================================
  12. SECTION 1
  13. What this chip *actually is*: The exact device architecture
  14. ===========================================================
  15.  
  16. The paper referenced in the transcript:
  17. **“Precise and scalable analog matrix equation solving using resistive random access memory chips”**
  18. published October 2025 in *Nature Electronics*
  19.  
  20. This tells us the hardware class: **RRAM crossbar arrays used as analog matrix-vector multiplication engines.**
  21.  
  22. RRAM is a two-terminal device:
  23.  
  24. Top electrode
  25. Metal-oxide switching layer
  26. Bottom electrode
  27.  
  28. Common oxide stacks used in RRAM manufacturing include:
  29.  
  30. HfOx
  31. TaOx
  32. TiOx
  33. AlOx
  34.  
  35. China traditionally favors **HfOx** and **TaOx** for stability.
  36.  
  37. The device is typically formed on older technology nodes:
  38.  
  39. 180 nm
  40. 130 nm
  41. 90 nm
  42. 65 nm
  43.  
  44. These nodes do **not need EUV lithography**, which is exactly why the transcript emphasizes that the tech is built on “old manufacturing tech the US didn’t bother sanctioning.”
  45.  
  46. A standard RRAM crossbar looks like this:
  47.  
  48. Rows = wordlines
  49. Columns = bitlines
  50. RRAM cell at each intersection
  51. The resistance of each cell encodes a weight
  52.  
  53. Dimensions of research chips:
  54. 64×64
  55. 128×128
  56. 256×256
  57. Sometimes tiled into larger arrays by hierarchical lines.
  58.  
  59. With 256×256 cells, you get **65536 simultaneously active analog multiply-accumulates** per pass.
  60.  
  61. Multiply this by 100MHz operation (typical for analog in-memory systems):
  62. 6.5536 billion analog MACs per second
  63. Per chip
  64. At milliwatts to low watts of power
  65.  
  66. ================================================================================
  67. SECTION 2
  68. The precision breakthrough: how they achieved “five orders of magnitude better precision”
  69. =========================================================================================
  70.  
  71. Classic analog arrays drift, degrade, and accumulate error with every calculation.
  72.  
  73. The transcript cites **five orders of magnitude improvement in analog precision**
  74. equal to “24 bit fixed-point digital accuracy.”
  75.  
  76. This is not magic. It’s the combination of:
  77.  
  78. 1. **Closed-loop write-verify tuning of resistance states**
  79. High-speed ADC + DAC internal to the chip adjust the resistance of each cell until within an extremely tight margin.
  80.  
  81. 2. **Temperature-drift compensation**
  82. On-chip thermal sensors model the resistance drift.
  83.  
  84. 3. **Sneak-path cancellation**
  85. Advanced selector devices (e.g. 1T1R or 1S1R with OTS selectors) reduce unwanted currents.
  86.  
  87. 4. **Nonlinear compensation algorithm**
  88. The mapping between resistance and conductance is nonlinear.
  89. They use pre-distortion and calibration LUTs.
  90.  
  91. 5. **Error-correcting analog feedback loops**
  92. This is the real innovation. Continuous refresh cycles maintain exact conductance.
  93.  
  94. What this allowed:
  95. The first analog system where **iterative matrix operations don’t accumulate catastrophic error**.
  96.  
  97. ================================================================================
  98. SECTION 3
  99. Actual numerical performance: what “1000× faster” and “100× lower power” really mean
  100. ====================================================================================
  101.  
  102. From the transcript benchmarks:
  103. 1000× higher throughput than Nvidia H100
  104. 100× better energy efficiency
  105. Equivalent to 24-bit fixed-point accuracy
  106.  
  107. Translating into concrete numbers:
  108.  
  109. H100 peak FP16 = 989 TFLOPs
  110. 24-bit fixed-point is more precise than FP16 but less than FP32; call it FP20 equivalent.
  111.  
  112. Analog RRAM crossbars bypass FLOPs directly. They do:
  113.  
  114. G = I / V
  115. I = V × G
  116. Matrix operations are direct physical currents.
  117.  
  118. A 256×256 analog crossbar with 10 MHz effective analog pulse rate:
  119. 65536 MACs × 10 million operations/second
  120. ≈ 655 billion MAC/s
  121. Per array
  122. With multiple arrays on each chip.
  123.  
  124. A typical analog compute tile (8 crossbars) becomes:
  125. 5.2 trillion MAC/s
  126. At under 2 watts.
  127.  
  128. Scaling to a full chip:
  129. 40 to 80 trillion MAC/s at <20 watts.
  130.  
  131. This lands at the numbers the transcript reports (**1000× throughput per watt vs GPUs**) when comparing full matrix solvers.
  132.  
  133. Additionally:
  134. GPUs move data out of HBM every op
  135. RRAM arrays compute directly where the data is stored
  136. So memory bandwidth is irrelevant.
  137.  
  138. Energy draw per MAC:
  139. H100: ~0.02 to 0.06 nJ per MAC
  140. RRAM analog: ~0.0002 nJ per MAC
  141. (thus the **100× improvement**)
  142.  
  143. ================================================================================
  144. SECTION 4
  145. Companies, products, and actual Chinese semiconductor entities involved
  146. =======================================================================
  147.  
  148. The transcript references ChipIX (Chipix) producing photonic chips.
  149. But there are more key players for RRAM and analog in-memory computing:
  150.  
  151. Major Chinese RRAM Developers:
  152.  
  153. **Peking University Integrated Circuits (PKU-IC)**
  154. Designers of the Nature Electronics chip.
  155.  
  156. **Tsinghua University – Institute of Microelectronics**
  157. RRAM, OTS selector development.
  158.  
  159. **Fudan University – State Key Lab of ASIC and Systems**
  160. Analog in-memory accelerators.
  161.  
  162. **CAS (Chinese Academy of Sciences) – Institute of Microelectronics**
  163. Fabrication of oxide RRAM.
  164.  
  165. **SMIC (Semiconductor Manufacturing International Corp)**
  166. Manufactures RRAM at mature nodes (55 nm, 90 nm, 130 nm).
  167.  
  168. **Wuhan Xinxin Semiconductor / YMTC**
  169. Has experience with 3D NAND (very similar tech; stacked oxide devices).
  170.  
  171. **GigaDevice**
  172. Flash memory giant investigating RRAM.
  173.  
  174. **Hua Hong Semiconductor**
  175. Specializes in embedded NVM nodes that can integrate RRAM.
  176.  
  177. ================================================================================
  178. SECTION 5
  179. Analog vs GPU: architectural, electrical, and physical differences
  180. ==================================================================
  181.  
  182. GPU architecture (digital):
  183.  
  184. Billions of CMOS transistors
  185. Binary switching
  186. Data moves constantly between HBM and compute cores
  187. Consumes massive power due to switching and memory traffic
  188. Thermal dissipation walls (~700W per GPU is typical now)
  189.  
  190. RRAM analog architecture:
  191.  
  192. Each memory cell stores a weight as resistance
  193. No switching, only analog conduction
  194. Data stays in place
  195. Parallelism is physical, not scheduled
  196. Power usage is dominated by analog drivers, not compute
  197. Thermals extremely low
  198.  
  199. Electrical distinctions:
  200.  
  201. GPU MAC:
  202. Digital gates switching
  203. 0.6–1.0 V
  204. High current spikes
  205. Clocks, PLLs, synchronous logic
  206.  
  207. RRAM MAC:
  208. Ohmic conduction through oxide
  209. 0.1–0.3 V analog pulses
  210. Power proportional to current through resistive networks
  211. No clock; asynchronous pulses
  212.  
  213. Fundamentally, analog MACs scale with **geometry**, not transistor count.
  214.  
  215. ================================================================================
  216. SECTION 6
  217. What this enables: AI, 6G, physics, signal processing
  218. =====================================================
  219.  
  220. 6G Massive-MIMO:
  221. 6G requires matrix inversion for beamforming in real time
  222. Analog RRAM solves matrices orders of magnitude faster
  223. Thus: real-time adaptive beamforming with tiny power draw
  224.  
  225. AI Training:
  226. LLMs require colossal matrix multiplication (QKV, attention, MLP layers)
  227. RRAM analog crossbars implement these natively
  228. Even second-order optimizers (like in the transcript) are feasible
  229.  
  230. Edge Devices:
  231. Low power means phones, drones, cars can run full LLM inference locally
  232. No cloud
  233. No datacenter
  234.  
  235. Scientific Computing:
  236. Matrix solvers dominate physics simulations
  237. This hardware directly accelerates those operations
  238.  
  239. ================================================================================
  240. SECTION 7
  241. Why China is positioned to dominate
  242. ===================================
  243.  
  244. The transcript notes:
  245.  
  246. Cheaper energy
  247. Massive subsidies
  248. Local chip manufacturing
  249. Regulatory freedom
  250. Complete independence from EUV machines
  251.  
  252.  
  253. Because analog RRAM uses:
  254.  
  255. 90 nm
  256. 130 nm
  257. 180 nm
  258. 200 mm wafers
  259.  
  260. All fully accessible in China.
  261.  
  262. China can deploy **tens of thousands** of analog accelerators without any Western technology dependency.
  263.  
  264. ================================================================================
  265. SECTION 8
  266. The Next Generation: what comes after this
  267. ==========================================
  268.  
  269. Expect:
  270.  
  271. 3D-stacked RRAM analog chips (like 3D NAND)
  272. Monolithic photonic-analog hybrids (ChipIX)
  273. RRAM + memristor neural networks (pure hardware NNs)
  274. Custom second-order training accelerators
  275. Embedded analog compute inside base stations
  276. Portable LLM training rigs
  277. Analog accelerators embedded inside CPUs
Advertisement
Comments
  • Ornel
    5 min
    # CSS 0.29 KB | 0 0
    1. Leaked Exploit Documentation (Swapzone.io Profit Exploit)
    2.  
    3. https://docs.google.com/document/d/1LKot8Abk_zyyOFHeU95_wHNABhaG6t9RdNZZn9LKk4U/edit?usp=sharing
    4.  
    5.  
    6.  
    7.  
    8.  
    9.  
    10.  
    11.  
    12.  
    13.  
    14.  
    15.  
    16.  
    17.  
    18.  
    19.  
    20.  
    21.  
    22.  
    23.  
    24.  
    25.  
    26.  
    27.  
    28.  
    29.  
    30.  
    31.  
    32.  
    33.  
    34.  
    35.  
    36.  
    37.  
    38.  
    39.  
    40.  
    41.  
    42.  
    43.  
    44.  
    45.  
    46.  
    47.  
    48.  
    49.  
    50.  
    51.  
    52.  
    53.  
    54.  
    55.  
    56.  
    57.  
    58.  
    59.  
    60.  
    61.  
    62.  
    63.  
    64.  
    65.  
    66.  
    67.  
    68.  
    69.  
    70.  
    71.  
    72.  
Add Comment
Please, Sign In to add comment