Advertisement
Guest User

[WIP] rcu: Throttle rcu_try_advance_all_cbs() execution

a guest
Sep 2nd, 2013
86
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 27.59 KB | None | 0 0
  1. On Sun, Aug 25, 2013 at 09:50:21PM +0200, Tibor Billes wrote:
  2. > From: Paul E. McKenney Sent: 08/24/13 11:03 PM
  3. > > On Sat, Aug 24, 2013 at 09:59:45PM +0200, Tibor Billes wrote:
  4. > > > From: Paul E. McKenney Sent: 08/22/13 12:09 AM
  5. > > > > On Wed, Aug 21, 2013 at 11:05:51PM +0200, Tibor Billes wrote:
  6. > > > > > > From: Paul E. McKenney Sent: 08/21/13 09:12 PM
  7. > > > > > > On Wed, Aug 21, 2013 at 08:14:46PM +0200, Tibor Billes wrote:
  8. > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 11:43 PM
  9. > > > > > > > > On Tue, Aug 20, 2013 at 10:52:26PM +0200, Tibor Billes wrote:
  10. > > > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 04:53 PM
  11. > > > > > > > > > > On Tue, Aug 20, 2013 at 08:01:28AM +0200, Tibor Billes wrote:
  12. > > > > > > > > > > > Hi,
  13. > > > > > > > > > > >
  14. > > > > > > > > > > > I was using the 3.9.7 stable release and tried to upgrade to the 3.10.x series.
  15. > > > > > > > > > > > The 3.10.x series was showing unusually high (>75%) system CPU usage in some
  16. > > > > > > > > > > > situations, making things really slow. The latest stable I tried is 3.10.7.
  17. > > > > > > > > > > > I also tried 3.11-rc5, they both show this behaviour. This behaviour doesn't
  18. > > > > > > > > > > > show up when the system is idling, only when doing some CPU intensive work,
  19. > > > > > > > > > > > like compiling with multiple threads. Compiling with only one thread seems not
  20. > > > > > > > > > > > to trigger this behaviour.
  21. > > > > > > > > > > >
  22. > > > > > > > > > > > To be more precise I did a `perf record -a` while compiling a large C++ program
  23. > > > > > > > > > > > with scons using 4 threads, the result is appended at the end of this email.
  24. > > > > > > > > > >
  25. > > > > > > > > > > New one on me! You are running a mainstream system (x86_64), so I am
  26. > > > > > > > > > > surprised no one else noticed.
  27. > > > > > > > > > >
  28. > > > > > > > > > > Could you please send along your .config file?
  29. > > > > > > > > >
  30. > > > > > > > > > Here it is
  31. > > > > > > > >
  32. > > > > > > > > Interesting. I don't see RCU stuff all that high on the list, but
  33. > > > > > > > > the items I do see lead me to suspect RCU_FAST_NO_HZ, which has some
  34. > > > > > > > > relevance to the otherwise inexplicable group of commits you located
  35. > > > > > > > > with your bisection. Could you please rerun with CONFIG_RCU_FAST_NO_HZ=n?
  36. > > > > > > > >
  37. > > > > > > > > If that helps, there are some things I could try.
  38. > > > > > > >
  39. > > > > > > > It did help. I didn't notice anything unusual when running with CONFIG_RCU_FAST_NO_HZ=n.
  40. > > > > > >
  41. > > > > > > Interesting. Thank you for trying this -- and we at least have a
  42. > > > > > > short-term workaround for this problem. I will put a patch together
  43. > > > > > > for further investigation.
  44. > > > > >
  45. > > > > > I don't specifically need this config option so I'm fine without it in
  46. > > > > > the long term, but I guess it's not supposed to behave like that.
  47. > > > >
  48. > > > > OK, good, we have a long-term workload for your specific case,
  49. > > > > even better. ;-)
  50. > > > >
  51. > > > > But yes, there are situations where RCU_FAST_NO_HZ needs to work
  52. > > > > a bit better. I hope you will bear with me with a bit more
  53. > > > > testing...
  54. > > > >
  55. > > > > > > In the meantime, could you please tell me how you were measuring
  56. > > > > > > performance for your kernel builds? Wall-clock time required to complete
  57. > > > > > > one build? Number of builds completed per unit time? Something else?
  58. > > > > >
  59. > > > > > Actually, I wasn't all this sophisticated. I have a system monitor
  60. > > > > > applet on my top panel (using MATE, Linux Mint), four little graphs,
  61. > > > > > one of which shows CPU usage. Different colors indicate different kind
  62. > > > > > of CPU usage. Blue shows user space usage, red shows system usage, and
  63. > > > > > two more for nice and iowait. During a normal compile it's almost
  64. > > > > > completely filled with blue user space CPU usage, only the top few
  65. > > > > > pixels show some iowait and system usage. With CONFIG_RCU_FAST_NO_HZ
  66. > > > > > set, about 3/4 of the graph was red system CPU usage, the rest was
  67. > > > > > blue. So I just looked for a pile of red on my graphs when I tested
  68. > > > > > different kernel builds. But also compile speed was horrible I couldn't
  69. > > > > > wait for the build to finish. Even the UI got unresponsive.
  70. > > > >
  71. > > > > We have been having problems with CPU accounting, but this one looks
  72. > > > > quite real.
  73. > > > >
  74. > > > > > Now I did some measuring. In the normal case a compile finished in 36
  75. > > > > > seconds, compiled 315 object files. Here are some output lines from
  76. > > > > > dstat -tasm --vm during compile:
  77. > > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
  78. > > > > >     time     |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  free| used  buff  cach  free|majpf minpf alloc  free
  79. > > > > > 21-08 21:48:05| 91   8   2   0   0   0|   0  5852k|   0     0 |   0     0 |1413  1772 |   0  7934M| 581M 58.0M  602M 6553M|   0    71k   46k   54k
  80. > > > > > 21-08 21:48:06| 93   6   1   0   0   0|   0  2064k| 137B  131B|   0     0 |1356  1650 |   0  7934M| 649M 58.0M  604M 6483M|   0    72k   47k   28k
  81. > > > > > 21-08 21:48:07| 86  11   4   0   0   0|   0  5872k|   0     0 |   0     0 |2000  2991 |   0  7934M| 577M 58.0M  627M 6531M|   0    99k   67k   79k
  82. > > > > > 21-08 21:48:08| 87   9   3   0   0   0|   0  2840k|   0     0 |   0     0 |2558  4164 |   0  7934M| 597M 58.0M  632M 6507M|   0    96k   57k   51k
  83. > > > > > 21-08 21:48:09| 93   7   1   0   0   0|   0  3032k|   0     0 |   0     0 |1329  1512 |   0  7934M| 641M 58.0M  626M 6469M|   0    61k   48k   39k
  84. > > > > > 21-08 21:48:10| 93   6   0   0   0   0|   0  4984k|   0     0 |   0     0 |1160  1146 |   0  7934M| 572M 58.0M  628M 6536M|   0    50k   40k   57k
  85. > > > > > 21-08 21:48:11| 86   9   6   0   0   0|   0  2520k|   0     0 |   0     0 |2947  4760 |   0  7934M| 605M 58.0M  631M 6500M|   0   103k   55k   45k
  86. > > > > > 21-08 21:48:12| 90   8   2   0   0   0|   0  2840k|   0     0 |   0     0 |2674  4179 |   0  7934M| 671M 58.0M  635M 6431M|   0    84k   59k   42k
  87. > > > > > 21-08 21:48:13| 90   9   1   0   0   0|   0  4656k|   0     0 |   0     0 |1223  1410 |   0  7934M| 643M 58.0M  638M 6455M|   0    90k   62k   68k
  88. > > > > > 21-08 21:48:14| 91   8   1   0   0   0|   0  3572k|   0     0 |   0     0 |1432  1828 |   0  7934M| 647M 58.0M  641M 6447M|   0    81k   59k   57k
  89. > > > > > 21-08 21:48:15| 91   8   1   0   0   0|   0  5116k| 116B    0 |   0     0 |1194  1295 |   0  7934M| 605M 58.0M  644M 6487M|   0    69k   54k   64k
  90. > > > > > 21-08 21:48:16| 87  10   3   0   0   0|   0  5140k|   0     0 |   0     0 |1761  2586 |   0  7934M| 584M 58.0M  650M 6502M|   0   105k   64k   68k
  91. > > > > >
  92. > > > > > The abnormal case compiled only 182 object file in 6 and a half minutes,
  93. > > > > > then I stopped it. The same dstat output for this case:
  94. > > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
  95. > > > > >     time     |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  free| used  buff  cach  free|majpf minpf alloc  free
  96. > > > > > 21-08 22:10:49| 27  62   0   0  11   0|   0     0 | 210B    0 |   0     0 |1414  3137k|   0  7934M| 531M 57.6M  595M 6611M|   0  1628  1250   322
  97. > > > > > 21-08 22:10:50| 25  60   4   0  11   0|   0    88k| 126B    0 |   0     0 |1337  3110k|   0  7934M| 531M 57.6M  595M 6611M|   0    91   128   115
  98. > > > > > 21-08 22:10:51| 26  63   0   0  11   0|   0   184k| 294B    0 |   0     0 |1411  3147k|   0  7934M| 531M 57.6M  595M 6611M|   0  1485   814   815
  99. > > > > > 21-08 22:10:52| 26  63   0   0  11   0|   0     0 | 437B  239B|   0     0 |1355  3160k|   0  7934M| 531M 57.6M  595M 6611M|   0    24    94    97
  100. > > > > > 21-08 22:10:53| 26  63   0   0  11   0|   0     0 | 168B    0 |   0     0 |1397  3155k|   0  7934M| 531M 57.6M  595M 6611M|   0   479   285   273
  101. > > > > > 21-08 22:10:54| 26  63   0   0  11   0|   0  4096B| 396B  324B|   0     0 |1346  3154k|   0  7934M| 531M 57.6M  595M 6611M|   0    27   145   145
  102. > > > > > 21-08 22:10:55| 26  63   0   0  11   0|   0    60k|   0     0 |   0     0 |1353  3148k|   0  7934M| 531M 57.6M  595M 6610M|   0    93   117    36
  103. > > > > > 21-08 22:10:56| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1341  3172k|   0  7934M| 531M 57.6M  595M 6610M|   0   158    87    74
  104. > > > > > 21-08 22:10:57| 26  62   1   0  11   0|   0     0 |  42B   60B|   0     0 |1332  3162k|   0  7934M| 531M 57.6M  595M 6610M|   0    56    82    78
  105. > > > > > 21-08 22:10:58| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1334  3178k|   0  7934M| 531M 57.6M  595M 6610M|   0    26    56    56
  106. > > > > > 21-08 22:10:59| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1336  3179k|   0  7934M| 531M 57.6M  595M 6610M|   0     3    33    32
  107. > > > > > 21-08 22:11:00| 26  63   0   0  11   0|   0    24k|  90B  108B|   0     0 |1347  3172k|   0  7934M| 531M 57.6M  595M 6610M|   0    41    73    71
  108. > > > > >
  109. > > > > > I have four logical cores so 25% makes up 1 core. I don't know if the ~26% user CPU usage has anthing to do with this fact or just coincidence. The rest is ~63% system and ~11% hardware interrupt. Do these support what you suspect?
  110. > > > >
  111. > > > > The massive increase in context switches does come as a bit of a surprise!
  112. > > > > It does rule out my initial suspicion of lock contention, but then again
  113. > > > > the fact that you have only four CPUs made that pretty unlikely to begin
  114. > > > > with.
  115. > > > >
  116. > > > > 2.4k average context switches in the good case for the full run vs. 3,156k
  117. > > > > for about half of a run in the bad case. That is an increase of more
  118. > > > > than three orders of magnitude!
  119. > > > >
  120. > > > > Yow!!!
  121. > > > >
  122. > > > > Page faults are actually -higher- in the good case. You have about 6.5GB
  123. > > > > free in both cases, so you are not running out of memory. Lots more disk
  124. > > > > writes in the good case, perhaps consistent with its getting more done.
  125. > > > > Networking is negligible in both cases.
  126. > > > >
  127. > > > > Lots of hardware interrupts in the bad case as well. Would you be willing
  128. > > > > to take a look at /proc/interrupts before and after to see which one you
  129. > > > > are getting hit with? (Or whatever interrupt tracking tool you prefer.)
  130. > > >
  131. > > > Here are the results.
  132. > > >
  133. > > > Good case before:
  134. > > >           CPU0       CPU1       CPU2       CPU3      
  135. > > >  0:         17          0          0          0   IO-APIC-edge      timer
  136. > > >  1:        356          1         68          4   IO-APIC-edge      i8042
  137. > > >  8:          0          0          1          0   IO-APIC-edge      rtc0
  138. > > >  9:        330         14        449         71   IO-APIC-fasteoi   acpi
  139. > > > 12:         10        108        269       2696   IO-APIC-edge      i8042
  140. > > > 16:         36         10        111          2   IO-APIC-fasteoi   ehci_hcd:usb1
  141. > > > 17:         20          3         25          4   IO-APIC-fasteoi   mmc0
  142. > > > 21:          3          0         34          0   IO-APIC-fasteoi   ehci_hcd:usb2
  143. > > > 40:          0          1         12         11   PCI-MSI-edge      mei_me
  144. > > > 41:      10617        173       9959        292   PCI-MSI-edge      ahci
  145. > > > 42:        862         11        186         26   PCI-MSI-edge      xhci_hcd
  146. > > > 43:        107         77         27        102   PCI-MSI-edge      i915
  147. > > > 44:       5322         20        434         22   PCI-MSI-edge      iwlwifi
  148. > > > 45:        180          0        183         86   PCI-MSI-edge      snd_hda_intel
  149. > > > 46:          0          3          0          0   PCI-MSI-edge      eth0
  150. > > > NMI:          1          0          0          0   Non-maskable interrupts
  151. > > > LOC:      16312      15177      10840       8995   Local timer interrupts
  152. > > > SPU:          0          0          0          0   Spurious interrupts
  153. > > > PMI:          1          0          0          0   Performance monitoring interrupts
  154. > > > IWI:       1160        523       1031        481   IRQ work interrupts
  155. > > > RTR:          3          0          0          0   APIC ICR read retries
  156. > > > RES:      14976      16135       9973      10784   Rescheduling interrupts
  157. > > > CAL:        482        457        151        370   Function call interrupts
  158. > > > TLB:         70        106        352        230   TLB shootdowns
  159. > > > TRM:          0          0          0          0   Thermal event interrupts
  160. > > > THR:          0          0          0          0   Threshold APIC interrupts
  161. > > > MCE:          0          0          0          0   Machine check exceptions
  162. > > > MCP:          2          2          2          2   Machine check polls
  163. > > > ERR:          0
  164. > > > MIS:          0
  165. > > >
  166. > > > Good case after:
  167. > > >           CPU0       CPU1       CPU2       CPU3      
  168. > > >  0:         17          0          0          0   IO-APIC-edge      timer
  169. > > >  1:        367          1         81          4   IO-APIC-edge      i8042
  170. > > >  8:          0          0          1          0   IO-APIC-edge      rtc0
  171. > > >  9:        478         14        460         71   IO-APIC-fasteoi   acpi
  172. > > > 12:         10        108        269       2696   IO-APIC-edge      i8042
  173. > > > 16:         36         10        111          2   IO-APIC-fasteoi   ehci_hcd:usb1
  174. > > > 17:         20          3         25          4   IO-APIC-fasteoi   mmc0
  175. > > > 21:          3          0         34          0   IO-APIC-fasteoi   ehci_hcd:usb2
  176. > > > 40:          0          1         12         11   PCI-MSI-edge      mei_me
  177. > > > 41:      16888        173       9959        292   PCI-MSI-edge      ahci
  178. > > > 42:       1102         11        186         26   PCI-MSI-edge      xhci_hcd
  179. > > > 43:        107        132         27        136   PCI-MSI-edge      i915
  180. > > > 44:       6943         20        434         22   PCI-MSI-edge      iwlwifi
  181. > > > 45:        180          0        183         86   PCI-MSI-edge      snd_hda_intel
  182. > > > 46:          0          3          0          0   PCI-MSI-edge      eth0
  183. > > > NMI:          4          3          3          3   Non-maskable interrupts
  184. > > > LOC:      26845      24780      19025      17746   Local timer interrupts
  185. > > > SPU:          0          0          0          0   Spurious interrupts
  186. > > > PMI:          4          3          3          3   Performance monitoring interrupts
  187. > > > IWI:       1637        751       1287        695   IRQ work interrupts
  188. > > > RTR:          3          0          0          0   APIC ICR read retries
  189. > > > RES:      26511      26673      18791      20194   Rescheduling interrupts
  190. > > > CAL:        510        480        151        370   Function call interrupts
  191. > > > TLB:        361        292        575        461   TLB shootdowns
  192. > > > TRM:          0          0          0          0   Thermal event interrupts
  193. > > > THR:          0          0          0          0   Threshold APIC interrupts
  194. > > > MCE:          0          0          0          0   Machine check exceptions
  195. > > > MCP:          2          2          2          2   Machine check polls
  196. > > > ERR:          0
  197. > > > MIS:          0
  198. > > >
  199. > > > Bad case before:
  200. > > >           CPU0       CPU1       CPU2       CPU3      
  201. > > >  0:         17          0          0          0   IO-APIC-edge      timer
  202. > > >  1:        172          3         78          3   IO-APIC-edge      i8042
  203. > > >  8:          0          1          0          0   IO-APIC-edge      rtc0
  204. > > >  9:       1200        148        395         81   IO-APIC-fasteoi   acpi
  205. > > > 12:       1625          2        348         10   IO-APIC-edge      i8042
  206. > > > 16:         26         23        115          0   IO-APIC-fasteoi   ehci_hcd:usb1
  207. > > > 17:         16          3         12         21   IO-APIC-fasteoi   mmc0
  208. > > > 21:          2          2         33          0   IO-APIC-fasteoi   ehci_hcd:usb2
  209. > > > 40:          0          0         14         10   PCI-MSI-edge      mei_me
  210. > > > 41:      15776        374       8497        687   PCI-MSI-edge      ahci
  211. > > > 42:       1297        829        115         24   PCI-MSI-edge      xhci_hcd
  212. > > > 43:        103        149          9        212   PCI-MSI-edge      i915
  213. > > > 44:      13151        101        511         91   PCI-MSI-edge      iwlwifi
  214. > > > 45:        153        159          0        122   PCI-MSI-edge      snd_hda_intel
  215. > > > 46:          0          1          1          0   PCI-MSI-edge      eth0
  216. > > > NMI:         32         31         31         31   Non-maskable interrupts
  217. > > > LOC:      82504      82732      74172      75985   Local timer interrupts
  218. > > > SPU:          0          0          0          0   Spurious interrupts
  219. > > > PMI:         32         31         31         31   Performance monitoring interrupts
  220. > > > IWI:      17816      16278      13833      13282   IRQ work interrupts
  221. > > > RTR:          3          0          0          0   APIC ICR read retries
  222. > > > RES:      18784      21084      13313      12946   Rescheduling interrupts
  223. > > > CAL:        393        422        306        356   Function call interrupts
  224. > > > TLB:        231        176        235        191   TLB shootdowns
  225. > > > TRM:          0          0          0          0   Thermal event interrupts
  226. > > > THR:          0          0          0          0   Threshold APIC interrupts
  227. > > > MCE:          0          0          0          0   Machine check exceptions
  228. > > > MCP:          3          3          3          3   Machine check polls
  229. > > > ERR:          0
  230. > > > MIS:          0
  231. > > >
  232. > > > Bad case after:
  233. > > >           CPU0       CPU1       CPU2       CPU3      
  234. > > >  0:         17          0          0          0   IO-APIC-edge      timer
  235. > > >  1:        415          3         85          3   IO-APIC-edge      i8042
  236. > > >  8:          0          1          0          0   IO-APIC-edge      rtc0
  237. > > >  9:       1277        148        428         81   IO-APIC-fasteoi   acpi
  238. > > > 12:       1625          2        348         10   IO-APIC-edge      i8042
  239. > > > 16:         26         23        115          0   IO-APIC-fasteoi   ehci_hcd:usb1
  240. > > > 17:         16          3         12         21   IO-APIC-fasteoi   mmc0
  241. > > > 21:          2          2         33          0   IO-APIC-fasteoi   ehci_hcd:usb2
  242. > > > 40:          0          0         14         10   PCI-MSI-edge      mei_me
  243. > > > 41:      17814        374       8497        687   PCI-MSI-edge      ahci
  244. > > > 42:       1567        829        115         24   PCI-MSI-edge      xhci_hcd
  245. > > > 43:        103        177          9        242   PCI-MSI-edge      i915
  246. > > > 44:      14956        101        511         91   PCI-MSI-edge      iwlwifi
  247. > > > 45:        153        159          0        122   PCI-MSI-edge      snd_hda_intel
  248. > > > 46:          0          1          1          0   PCI-MSI-edge      eth0
  249. > > > NMI:         36         35         34         34   Non-maskable interrupts
  250. > > > LOC:      92429      92708      81714      84071   Local timer interrupts
  251. > > > SPU:          0          0          0          0   Spurious interrupts
  252. > > > PMI:         36         35         34         34   Performance monitoring interrupts
  253. > > > IWI:      22594      19658      17439      14257   IRQ work interrupts
  254. > > > RTR:          3          0          0          0   APIC ICR read retries
  255. > > > RES:      21491      24670      14618      14569   Rescheduling interrupts
  256. > > > CAL:        441        439        306        356   Function call interrupts
  257. > > > TLB:        232        181        274        465   TLB shootdowns
  258. > > > TRM:          0          0          0          0   Thermal event interrupts
  259. > > > THR:          0          0          0          0   Threshold APIC interrupts
  260. > > > MCE:          0          0          0          0   Machine check exceptions
  261. > > > MCP:          3          3          3          3   Machine check polls
  262. > > > ERR:          0
  263. > > > MIS:          0
  264. > >
  265. > > Lots more local timer interrupts, which is consistent with the higher
  266. > > time in interrupt handlers for the bad case.
  267. > >
  268. > > > > One hypothesis is that your workload and configuration are interacting
  269. > > > > with RCU_FAST_NO_HZ to force very large numbers of RCU grace periods.
  270. > > > > Could you please check for this by building with CONFIG_RCU_TRACE=y,
  271. > > > > mounting debugfs somewhere and dumping rcu/rcu_sched/rcugp before and
  272. > > > > after each run?
  273. > > >
  274. > > > Good case before:
  275. > > > completed=8756  gpnum=8757  age=0  max=21
  276. > > > after:
  277. > > > completed=14686  gpnum=14687  age=0  max=21
  278. > > >
  279. > > > Bad case before:
  280. > > > completed=22970  gpnum=22971  age=0  max=21
  281. > > > after:
  282. > > > completed=26110  gpnum=26111  age=0  max=21
  283. > >
  284. > > In the good case, (14686-8756)/40=148.25 grace periods per second, which
  285. > > is a fast but reasonable rate given your HZ=250. Not a large difference
  286. > > in the number of grace periods, but extrapolating for the longer runtime,
  287. > > maybe ten times as much. But not much change in grace-period rate per
  288. > > unit time.
  289. > >
  290. > > > The test scenario was the following in both cases (mixed english and pseudo-bash):
  291. > > > reboot, login, start terminal
  292. > > > cd project
  293. > > > rm -r build
  294. > > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
  295. > > > scons -j4
  296. > > > wait ~40 sec (good case finished, Ctrl-C in bad case)
  297. > > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
  298. > > >
  299. > > > I stopped the build in the bad case after about the same time the good
  300. > > > case finished, so the extra interrupts and RCU grace periods because of the
  301. > > > longer runtime don't fake the results.
  302. > >
  303. > > That procedure works for me, thank you for laying it out carefully.
  304. > >
  305. > > I believe I see what is going on and how to fix it, though it may take
  306. > > me a bit to work things through and get a good patch.
  307. > >
  308. > > Thank you very much for your testing efforts!
  309. >
  310. > I'm glad I can help. I've been using Linux for many years, now I have a
  311. > chance to help the community, to do something in return. I'm quite
  312. > enjoying this :)
  313.  
  314. ;-)
  315.  
  316. Here is a patch that is more likely to help. I am testing it in parallel,
  317. but figured I should send you a sneak preview.
  318.  
  319. Thanx, Paul
  320.  
  321. ------------------------------------------------------------------------
  322.  
  323. rcu: Throttle rcu_try_advance_all_cbs() execution
  324.  
  325. The rcu_try_advance_all_cbs() function is invoked on each attempted
  326. entry to and every exit from idle. If this function determines that
  327. there are callbacks ready to invoke, the caller will invoke the RCU
  328. core, which in turn will result in a pair of context switches. If a
  329. CPU enters and exits idle extremely frequently, this can result in
  330. an excessive number of context switches and high CPU overhead.
  331.  
  332. This commit therefore causes rcu_try_advance_all_cbs() to throttle
  333. itself, refusing to do work more than once per jiffy.
  334.  
  335. Reported-by: Tibor Billes <tbilles@gmx.com>
  336. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
  337.  
  338. diff --git a/kernel/rcutree.h b/kernel/rcutree.h
  339. index 5f97eab..52be957 100644
  340. --- a/kernel/rcutree.h
  341. +++ b/kernel/rcutree.h
  342. @@ -104,6 +104,8 @@ struct rcu_dynticks {
  343. /* idle-period nonlazy_posted snapshot. */
  344. unsigned long last_accelerate;
  345. /* Last jiffy CBs were accelerated. */
  346. + unsigned long last_advance_all;
  347. + /* Last jiffy CBs were all advanced. */
  348. int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
  349. #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
  350. };
  351. diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
  352. index a538e73..2205751 100644
  353. --- a/kernel/rcutree_plugin.h
  354. +++ b/kernel/rcutree_plugin.h
  355. @@ -1630,17 +1630,23 @@ module_param(rcu_idle_lazy_gp_delay, int, 0644);
  356. extern int tick_nohz_enabled;
  357.  
  358. /*
  359. - * Try to advance callbacks for all flavors of RCU on the current CPU.
  360. - * Afterwards, if there are any callbacks ready for immediate invocation,
  361. - * return true.
  362. + * Try to advance callbacks for all flavors of RCU on the current CPU, but
  363. + * only if it has been awhile since the last time we did so. Afterwards,
  364. + * if there are any callbacks ready for immediate invocation, return true.
  365. */
  366. static bool rcu_try_advance_all_cbs(void)
  367. {
  368. bool cbs_ready = false;
  369. struct rcu_data *rdp;
  370. + struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks);
  371. struct rcu_node *rnp;
  372. struct rcu_state *rsp;
  373.  
  374. + /* Exit early if we advanced recently. */
  375. + if (jiffies == rdtp->last_advance_all)
  376. + return 0;
  377. + rdtp->last_advance_all = jiffies;
  378. +
  379. for_each_rcu_flavor(rsp) {
  380. rdp = this_cpu_ptr(rsp->rda);
  381. rnp = rdp->mynode;
  382.  
  383. --
  384. To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
  385. the body of a message to majordomo@vger.kernel.org
  386. More majordomo info at http://vger.kernel.org/majordomo-info.html
  387. Please read the FAQ at http://www.tux.org/lkml/
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement