Guest User

Untitled

a guest
Jan 20th, 2018
91
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.11 KB | None | 0 0
  1. from __future__ import division, print_function
  2. import tensorflow as tf
  3.  
  4. """
  5. This program tries to test whether or not TensorFlow implements an inter-op thread pool on GPUs. In other words,
  6. it checks whether or not operations that don't depend on each other can actually run in parallel.
  7. To check this, it creates a TensorFlow graph that computes 1 + 1/2 + 1/4 + 1/8 + ...
  8. There are two variables `x` and `y`, and two operations that modify these variables:
  9. * `add` computes x <- x + y
  10. * `divide` computes y <- y / 2
  11. There is no explicit dependency between the `add` and `divide` operations, so if there is an inter-op thread
  12. pool, then TensorFlow will try to run them in parallel. If this is the case, sometimes `add` will execute first,
  13. and sometimes `divide` will execute first.
  14. For each device, the code runs three experiments:
  15. 1) run 2000 iterations, and at each iteration manually evaluate `add`, then `divide`. This forces the execution
  16. order, so the end result should always be 2.0, regardless of the presence or absence of an inter-op thread pool.
  17. We do 20 runs of all this, so it should display 2.0 a total of 20 times.
  18. 2) run 2000 iterations, but this time evaluate both `add` and `divide` simultaneously: `sess.run([add, divide])`.
  19. If there is an inter-op thread pool, then the order of execution at each iteration may change. We may end up with
  20. the order add, divide, divide, add, divide add, add, divide, etc. or another order, depending on the CPU speed and
  21. load. So the result may change at each run.
  22. 3) do the same as 2), but evaluate `sess.run([divide, add])` instead.
  23.  
  24. Here are the results:
  25. * unsurprisingly, the first experiment prints 2.0 twenty times, both for the CPU and the GPU. It's a sanity check,
  26. and it works.
  27. * the second experiment prints 1.00049, 1.0, 1.00012, 1.0, ..., 1.5, 1.00403 for the CPU, but it display 2.0 twenty
  28. times for the GPU. This confirms that there is an inter-op thread pool on the CPU, but it seems to show that
  29. there is no inter-op thread pool on the GPU. I tried to run the program while the GPU was busing doing something
  30. else, but it did not change the result. This is not a hard proof, because it is conceivable that the operations
  31. are run in parallel, but the `add` operation always finishes first because it is shorter to compute than `divide`.
  32. But it seems very unlikely. So my conclusion is that TensorFlow has no inter-op thread pool for GPUs. This makes
  33. sense if most operations use heavily multithreaded implementations (e.g., cuDNN) that already use up all the GPU's
  34. threads: there would be no significant performance gain in running multiple operations in parallel, they would
  35. just compete against each other, not actually run in parallel.
  36. * the third experiment has the same results. This shows that it makes no difference whether you run
  37. `sess.run([add, divide])` or `sess.run([divide, add])`. The order is decided deterministically by TensorFlow, and
  38. it seems to ignore the order of the operations in the list of operations to evaluate.
  39. """
  40.  
  41. for device in ("/cpu:0", "/gpu:0"):
  42. print("-" * 80)
  43. print("Device:", device)
  44. graph = tf.Graph()
  45. with graph.as_default():
  46. with tf.device(device):
  47. x = tf.Variable(0.0)
  48. y = tf.Variable(1.0)
  49. add = tf.assign(x, x + y)
  50. divide = tf.assign(y, y / 2)
  51. init = tf.global_variables_initializer()
  52.  
  53. print("Experiment #1: manual sequential execution")
  54. for execution in range(20):
  55. with tf.Session(graph=graph) as sess:
  56. init.run()
  57. for i in range(2000):
  58. sess.run(add)
  59. sess.run(divide)
  60. print(x.eval())
  61.  
  62. print("Experiment #2: possible parallel execution")
  63. for execution in range(20):
  64. with tf.Session(graph=graph) as sess:
  65. init.run()
  66. for i in range(2000):
  67. sess.run([add, divide])
  68. print(x.eval())
  69.  
  70.  
  71. print("Experiment #3: possible parallel execution, reversed")
  72. for execution in range(20):
  73. with tf.Session(graph=graph) as sess:
  74. init.run()
  75. for i in range(2000):
  76. sess.run([divide, add])
  77. print(x.eval())
Add Comment
Please, Sign In to add comment