SHARE
TWEET

-x^2 logic

Zeda Apr 8th, 2016 (edited) 99 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Three things to understand that really make this algorithm faster:
  2. If A and B are binary digits (0 or 1), then:
  3.     ~a=1-a, where ~ is the "not" function and 'a' is a binary digit.
  4.     ~x=-x-1, where x is a binary number, modulo some base power of 2 (in assembly, this is commonly 2^8, 2^16, 2^32, or 2^64)
  5.     a(1-a) is 0, always [it is equivalent to a&~a].
  6.     a(1-b)+b(1-a) = a+b-2ab is the same as a^b, where ^ is the xor function.
  7. So then suppose you have x is a number whose bits are abcdefgh... where 'a' is the least-significant bit.
  8. In base 10, a product might look at the 1s place, 10s place, 100s place, etc for the digits. In binary, we looks at 1s, 2s, 4s, etc. for the bits.
  9. So then the 1s place of ~x is (1-a), the second digit is (1-b), etc.
  10. Then the product of x*~x is:
  11.  
  12. 1s      a(1-a)
  13. 2s      a(1-b)+b(1-a)
  14. 4s      a(1-c)+b(1-b)+c(1-a)
  15. 8s      a(1-d)+b(1-c)+c(1-b)+d(1-a)
  16. 16s     a(1-e)+b(1-d)+c(1-c)+d(1-b)+e(1-a)
  17. 32s     a(1-f)+b(1-e)+c(1-d)+d(1-c)+e(1-b)+f(1-a)
  18. 64s     a(1-g)+b(1-f)+c(1-e)+d(1-d)+e(1-c)+f(1-b)+g(1-a)
  19. 128s    a(1-h)+b(1-g)+c(1-f)+d(1-e)+e(1-d)+f(1-c)+g(1-b)+h(1-a)
  20. ...
  21.  
  22. So applying the fact that a(1-a)=0 (this works the same for b(1-b), etc.):
  23. 1s      0
  24. 2s      a(1-b)+b(1-a)
  25. 4s      a(1-c)+c(1-a)
  26. 8s      a(1-d)+b(1-c)+c(1-b)+d(1-a)
  27. 16s     a(1-e)+b(1-d)+d(1-b)+e(1-a)
  28. 32s     a(1-f)+b(1-e)+c(1-d)+d(1-c)+e(1-b)+f(1-a)
  29. 64s     a(1-g)+b(1-f)+c(1-e)+e(1-c)+f(1-b)+g(1-a)
  30. 128s    a(1-h)+b(1-g)+c(1-f)+d(1-e)+e(1-d)+f(1-c)+g(1-b)+h(1-a)
  31. ...
  32.  
  33. And now applying the trick following that:
  34.  
  35. 1s      0
  36. 2s      a^b
  37. 4s      a^c
  38. 8s      a^d+b^c
  39. 16s     a^e+b^d
  40. 32s     a^f+b^e+c^d
  41. 64s     a^g+b^f+c^e
  42. 128s    a^h+b^g+c^f+d^e
  43. ...
  44.  
  45. Oh hey, now *that* looks pretty easy! In fact, we've already filled out what the first 8 bits look like!
  46.  
  47. Since this is x*~x, this is x*(-x-1)=-x^2-x. So if we add the original x and negate, we get the first 8 bits of x^2.
  48.  
  49.  
  50.  
  51.  
  52. If x is an 8-bit int and you want the full 16-bit result of x*~x
  53. 1s      0
  54. 2s      a^b
  55. 4s      a^c
  56. 8s      a^d+b^c
  57. 16s     a^e+b^d
  58. 32s     a^f+b^e+c^d
  59. 64s     a^g+b^f+c^e
  60. 128s    a^h+b^g+c^f+d^e
  61. 256s       +b^h+c^g+d^f
  62. 512s           +c^h+d^g+e^f
  63. 1024s              +d^h+e^g
  64. 2048s                  +e^h+f^g
  65. 4096s                      +f^h
  66. 8192s                          +g^h
  67. 16384s  0
  68. 32768s  0
  69.  
  70. Now if this is being applied to a circuit, you can reduce the number of additions:
  71. 1s      0
  72. 2s      a^b
  73. 4s      a^c
  74. 8s      a^d+b^c
  75. 16s     a^e+b^d
  76. 32s     a^f+b^e+c^d
  77. 64s     a^g+b^f+c^e
  78. 128s    a^h+b^g+c^f+d^e
  79. 256s    b^h+c^g+d^f
  80. 512s    c^h+d^g+e^f
  81. 1024s   d^h+e^g
  82. 2048s   e^h+f^g
  83. 4096s   f^h
  84. 8192s   g^h
  85. 16384s  0
  86. 32768s  0
  87.  
  88. a^b+b
  89. a+2b-2ab
  90.  
  91. And hey, why not add in the original x in there to make it compute -x^2 :
  92. We'll use the identity that a+b-2ab=a^b
  93. 1s      0               +a  =a
  94. 2s      a^b             +b  =a+2b-2ab
  95. 4s      a^c             +c  =a+2c-2ac
  96. 8s      a^d+b^c         +d  =a+2d-2ad+b^c
  97. 16s     a^e+b^d         +e  =a+2e-2ae+b^d
  98. 32s     a^f+b^e+c^d     +f  =a+2f-2af+b^e+c^d
  99. 64s     a^g+b^f+c^e     +g  =a+2g-2ag+b^f+c^e
  100. 128s    a^h+b^g+c^f+d^e +h  =a+2h-2ah+b^g+c^f+d^e
  101. 256s    b^h+c^g+d^f
  102. 512s    c^h+d^g+e^f
  103. 1024s   d^h+e^g
  104. 2048s   e^h+f^g
  105. 4096s   f^h
  106. 8192s   g^h
  107.  
  108. Anything multiplied by 2 can get shifted to the next digit position:
  109. 1s      a
  110. 2s      a
  111. 4s      a             +b-ab
  112. 8s      a+b^c         +c-ac
  113. 16s     a+b^d         +d-ad
  114. 32s     a+b^e+c^d     +e-ae
  115. 64s     a+b^f+c^e     +f-af
  116. 128s    a+b^g+c^f+d^e +g-ag
  117. 256s    b^h+c^g+d^f   +h-ah
  118. 512s    c^h+d^g+e^f
  119. 1024s   d^h+e^g
  120. 2048s   e^h+f^g
  121. 4096s   f^h
  122. 8192s   g^h
  123.  
  124. Now we'll use a+b-ab=a|b (| is logical OR):
  125. 1s      a
  126. 2s      a
  127. 4s      a|b
  128. 8s      a|c+b^c
  129. 16s     a|d+b^d
  130. 32s     a|e+b^e+c^d
  131. 64s     a|f+c^e+b^f
  132. 128s    a|g+d^e+b^g+c^f
  133. 256s    c^g+d^f-a  +b
  134. 512s    d^g+f^e-bh +c
  135. 1024s       g^e-ch +d
  136. 2048s   f^g+e  -dh
  137. 4096s       f  -eh
  138. 8192s       g  -fh
  139. 16384s  h~g
RAW Paste Data
Top