Advertisement
Guest User

Untitled

a guest
Sep 30th, 2016
75
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.55 KB | None | 0 0
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Lecture 13: Floating point arithmetic\n",
  8. "\n",
  9. "In order to complete computations in finite space and bounded time, we replace the real numbers with a surrogate finite set $\\mathbb{F}$, the floating point numbers. (The \"floating point\" term originally differentiated it from \"fixed point\", which was an early alternative system based on absolute errors rather than relative errors.) Most scientific computing now conforms to the IEEE 754 double precision standard. We won't need to think about the details of this standard going forward, but it is useful to briefly explore what is really going on in the computer. \n",
  10. "\n",
  11. "In double precision there are 64 binary bits used to represent the members of $\\mathbb{F}$. We can get them directly."
  12. ]
  13. },
  14. {
  15. "cell_type": "code",
  16. "execution_count": 1,
  17. "metadata": {
  18. "collapsed": false
  19. },
  20. "outputs": [
  21. {
  22. "data": {
  23. "text/plain": [
  24. "\"0011111111110000000000000000000000000000000000000000000000000000\""
  25. ]
  26. },
  27. "execution_count": 1,
  28. "metadata": {},
  29. "output_type": "execute_result"
  30. }
  31. ],
  32. "source": [
  33. "one = bits(1.0)"
  34. ]
  35. },
  36. {
  37. "cell_type": "markdown",
  38. "metadata": {},
  39. "source": [
  40. "These bits define three integers $s$, $e$, and $m$ used in the representation\n",
  41. "\n",
  42. "$$ x = (-1)^s \\cdot \\left( 1 + 2^{-52}m \\right) \\cdot 2^e.$$\n",
  43. "\n",
  44. "Here $s\\in\\{0,1\\}$ requires one bit, $e\\in\\{-1022,\\ldots,1023\\}$ requires 11 bits, and $m\\in\\{0,1,\\ldots,2^{52}-1\\}$ requires 52 bits. We can dissect a double precision number to see these parts."
  45. ]
  46. },
  47. {
  48. "cell_type": "code",
  49. "execution_count": 2,
  50. "metadata": {
  51. "collapsed": false
  52. },
  53. "outputs": [
  54. {
  55. "data": {
  56. "text/plain": [
  57. "((\"0\",0),(\"01111111111\",0),(\"0000000000000000000000000000000000000000000000000000\",0))"
  58. ]
  59. },
  60. "execution_count": 2,
  61. "metadata": {},
  62. "output_type": "execute_result"
  63. }
  64. ],
  65. "source": [
  66. "function ieee(x::Float64)\n",
  67. " b = bits(x);\n",
  68. " s = (b[1:1],parse(Int,b[1:1]));\n",
  69. " e = (b[2:12],parse(Int,b[2:12],2)-1023);\n",
  70. " f = (b[13:64],parse(Int,b[13:64],2));\n",
  71. " return s,e,f\n",
  72. "end\n",
  73. "\n",
  74. "ieee(1.0)"
  75. ]
  76. },
  77. {
  78. "cell_type": "markdown",
  79. "metadata": {},
  80. "source": [
  81. "The next-greater element of $\\mathbb{F}$ is $1+\\epsilon_M$, for machine epsilon $\\epsilon_M=2^{-52}$. "
  82. ]
  83. },
  84. {
  85. "cell_type": "code",
  86. "execution_count": 3,
  87. "metadata": {
  88. "collapsed": false
  89. },
  90. "outputs": [
  91. {
  92. "name": "stdout",
  93. "output_type": "stream",
  94. "text": [
  95. "eps() = 2.220446049250313e-16\n"
  96. ]
  97. },
  98. {
  99. "data": {
  100. "text/plain": [
  101. "((\"0\",0),(\"01111111111\",0),(\"0000000000000000000000000000000000000000000000000001\",1))"
  102. ]
  103. },
  104. "execution_count": 3,
  105. "metadata": {},
  106. "output_type": "execute_result"
  107. }
  108. ],
  109. "source": [
  110. "@show eps()\n",
  111. "ieee(1.0+eps())"
  112. ]
  113. },
  114. {
  115. "cell_type": "markdown",
  116. "metadata": {},
  117. "source": [
  118. "There are $2^{52}$ elements of $\\mathbb{F}$ equally spaced throughout $[1,2)$. After these, the exponent increases to 1 and the value of $m$ resets to zero. Thus there are also $2^{52}$ elements equally spaced throughout $[2,4)$, as well as $[4,8)$, $[1/2,1)$, and in general, $[2^{e},2^{e+1})$. Let $\\text{fl}(x)$ be the element of $\\mathbb{F}$ nearest to any real number $x$. Consequently, if we momentarily ignore the bounds on the exponent $e$,\n",
  119. "\n",
  120. "$$ \\frac{\\bigl|x-\\text{fl}(x)\\bigr|}{\\bigl|x\\bigr|} \\le \\frac{1}{2} \\epsilon_M.$$\n",
  121. "\n",
  122. "In practice, the largest finite value in $\\mathbb{F}$ is"
  123. ]
  124. },
  125. {
  126. "cell_type": "code",
  127. "execution_count": 4,
  128. "metadata": {
  129. "collapsed": false
  130. },
  131. "outputs": [
  132. {
  133. "name": "stdout",
  134. "output_type": "stream",
  135. "text": [
  136. "R = 2.0 ^ 1023 * (1 + (2 ^ 52 - 1) / 2 ^ 52) = 1.7976931348623157e308\n"
  137. ]
  138. },
  139. {
  140. "data": {
  141. "text/plain": [
  142. "((\"0\",0),(\"11111111110\",1023),(\"1111111111111111111111111111111111111111111111111111\",4503599627370495))"
  143. ]
  144. },
  145. "execution_count": 4,
  146. "metadata": {},
  147. "output_type": "execute_result"
  148. }
  149. ],
  150. "source": [
  151. "@show R = (2.0^1023)*(1 + (2^52-1)/2^52);\n",
  152. "ieee(R)"
  153. ]
  154. },
  155. {
  156. "cell_type": "markdown",
  157. "metadata": {},
  158. "source": [
  159. "Results that should be larger than this become the special value Inf; this situation is called _overflow_."
  160. ]
  161. },
  162. {
  163. "cell_type": "code",
  164. "execution_count": 5,
  165. "metadata": {
  166. "collapsed": false
  167. },
  168. "outputs": [
  169. {
  170. "data": {
  171. "text/plain": [
  172. "Inf"
  173. ]
  174. },
  175. "execution_count": 5,
  176. "metadata": {},
  177. "output_type": "execute_result"
  178. }
  179. ],
  180. "source": [
  181. "nextfloat(R)"
  182. ]
  183. },
  184. {
  185. "cell_type": "code",
  186. "execution_count": 6,
  187. "metadata": {
  188. "collapsed": false
  189. },
  190. "outputs": [
  191. {
  192. "data": {
  193. "text/plain": [
  194. "2.2250738585072014e-308"
  195. ]
  196. },
  197. "execution_count": 6,
  198. "metadata": {},
  199. "output_type": "execute_result"
  200. }
  201. ],
  202. "source": [
  203. "2.0^(-1022)"
  204. ]
  205. },
  206. {
  207. "cell_type": "markdown",
  208. "metadata": {},
  209. "source": [
  210. "The analogous situation near zero is called _underflow_. Ignoring some special \"denormalized\" numbers, the smallest positive element of $\\mathbb{F}$ is "
  211. ]
  212. },
  213. {
  214. "cell_type": "code",
  215. "execution_count": 7,
  216. "metadata": {
  217. "collapsed": false
  218. },
  219. "outputs": [
  220. {
  221. "name": "stdout",
  222. "output_type": "stream",
  223. "text": [
  224. "r = 2.0 ^ -1022 = 2.2250738585072014e-308\n"
  225. ]
  226. },
  227. {
  228. "data": {
  229. "text/plain": [
  230. "((\"0\",0),(\"00000000001\",-1022),(\"0000000000000000000000000000000000000000000000000000\",0))"
  231. ]
  232. },
  233. "execution_count": 7,
  234. "metadata": {},
  235. "output_type": "execute_result"
  236. }
  237. ],
  238. "source": [
  239. "@show r = 2.0^-1022;\n",
  240. "ieee(r)"
  241. ]
  242. },
  243. {
  244. "cell_type": "markdown",
  245. "metadata": {},
  246. "source": [
  247. "Note that this minimum value is far smaller than $\\epsilon_M$, which is the number spacing relative to 1. "
  248. ]
  249. }
  250. ],
  251. "metadata": {
  252. "kernelspec": {
  253. "display_name": "Julia 0.5.0",
  254. "language": "julia",
  255. "name": "julia-0.5"
  256. },
  257. "language_info": {
  258. "file_extension": ".jl",
  259. "mimetype": "application/julia",
  260. "name": "julia",
  261. "version": "0.5.0"
  262. }
  263. },
  264. "nbformat": 4,
  265. "nbformat_minor": 1
  266. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement