Advertisement
Guest User

ThreePMEperNode

a guest
Apr 30th, 2015
456
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 36.67 KB | None | 0 0
  1. Log file opened on Thu Apr 30 11:20:37 2015
  2. Host: nid07334 pid: 9515 rank ID: 0 number of ranks: 1024
  3. GROMACS: gmx mdrun, VERSION 5.0.2
  4.  
  5. GROMACS is written by:
  6. Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
  7. Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
  8. Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
  9. Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
  10. Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
  11. Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
  12. Peter Tieleman Christian Wennberg Maarten Wolf
  13. and the project leaders:
  14. Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
  15.  
  16. Copyright (c) 1991-2000, University of Groningen, The Netherlands.
  17. Copyright (c) 2001-2014, The GROMACS development team at
  18. Uppsala University, Stockholm University and
  19. the Royal Institute of Technology, Sweden.
  20. check out http://www.gromacs.org for more information.
  21.  
  22. GROMACS is free software; you can redistribute it and/or modify it
  23. under the terms of the GNU Lesser General Public License
  24. as published by the Free Software Foundation; either version 2.1
  25. of the License, or (at your option) any later version.
  26.  
  27. GROMACS: gmx mdrun, VERSION 5.0.2
  28. Executable: mdrun_mpi
  29. Library dir: /sw/xk6/gromacs/5.0.2/cle5.2_gnu4.8.2/share/gromacs/top
  30. Command line:
  31. mdrun_mpi -gpu_id 00000 -npme 384 -dlb yes -pin on -resethway -noconfout -v -s opt.tpr -deffnm test
  32.  
  33. Gromacs version: VERSION 5.0.2
  34. Precision: single
  35. Memory model: 64 bit
  36. MPI library: MPI
  37. OpenMP support: enabled
  38. GPU support: enabled
  39. invsqrt routine: gmx_software_invsqrt(x)
  40. SIMD instructions: AVX_128_FMA
  41. FFT library: commercial-fftw-3.3.4-fma-sse2-avx
  42. RDTSCP usage: disabled
  43. C++11 compilation: disabled
  44. TNG support: enabled
  45. Tracing support: disabled
  46. Built on: Thu Mar 12 18:27:12 EDT 2015
  47. Built by: ff1@titan-ext8 [CMAKE]
  48. Build OS/arch: Linux 3.0.101-0.46-default x86_64
  49. Build CPU vendor: AuthenticAMD
  50. Build CPU brand: AMD Opteron(tm) Processor 6140
  51. Build CPU family: 16 Model: 9 Stepping: 1
  52. Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
  53. C compiler: /opt/cray/craype/2.2.1/bin/cc GNU 4.8.2
  54. C compiler flags: -mavx -mfma4 -mxop -Wno-maybe-uninitialized -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
  55. C++ compiler: /opt/cray/craype/2.2.1/bin/CC GNU 4.8.2
  56. C++ compiler flags: -mavx -mfma4 -mxop -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
  57. Boost version: 1.55.0 (internal)
  58. CUDA compiler: /opt/nvidia/cudatoolkit/5.5.51-1.0502.9594.3.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on Thu_Mar__6_02:21:19_PST_2014;Cuda compilation tools, release 5.5, V5.5.0
  59. CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;; ;-mavx;-mfma4;-mxop;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-O3;-DNDEBUG;-fomit-frame-pointer;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds;
  60. CUDA driver: 5.50
  61. CUDA runtime: 5.50
  62.  
  63.  
  64.  
  65. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  66. B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
  67. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
  68. molecular simulation
  69. J. Chem. Theory Comput. 4 (2008) pp. 435-447
  70. -------- -------- --- Thank You --- -------- --------
  71.  
  72.  
  73. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  74. D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
  75. Berendsen
  76. GROMACS: Fast, Flexible and Free
  77. J. Comp. Chem. 26 (2005) pp. 1701-1719
  78. -------- -------- --- Thank You --- -------- --------
  79.  
  80.  
  81. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  82. E. Lindahl and B. Hess and D. van der Spoel
  83. GROMACS 3.0: A package for molecular simulation and trajectory analysis
  84. J. Mol. Mod. 7 (2001) pp. 306-317
  85. -------- -------- --- Thank You --- -------- --------
  86.  
  87.  
  88. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  89. H. J. C. Berendsen, D. van der Spoel and R. van Drunen
  90. GROMACS: A message-passing parallel molecular dynamics implementation
  91. Comp. Phys. Comm. 91 (1995) pp. 43-56
  92. -------- -------- --- Thank You --- -------- --------
  93.  
  94.  
  95. Number of hardware threads detected (16) does not match the number reported by OpenMP (1).
  96. Consider setting the launch configuration manually!
  97. Changing nstlist from 20 to 40, rlist from 1.2 to 1.239
  98.  
  99. Input Parameters:
  100. integrator = md
  101. tinit = 0
  102. dt = 0.002
  103. nsteps = 10000
  104. init-step = 0
  105. simulation-part = 1
  106. comm-mode = Linear
  107. nstcomm = 100
  108. bd-fric = 0
  109. ld-seed = 60975668
  110. emtol = 10
  111. emstep = 0.01
  112. niter = 20
  113. fcstep = 0
  114. nstcgsteep = 1000
  115. nbfgscorr = 10
  116. rtpi = 0.05
  117. nstxout = 5000
  118. nstvout = 5000
  119. nstfout = 5000
  120. nstlog = 1000
  121. nstcalcenergy = 100
  122. nstenergy = 1000
  123. nstxout-compressed = 0
  124. compressed-x-precision = 1000
  125. cutoff-scheme = Verlet
  126. nstlist = 40
  127. ns-type = Grid
  128. pbc = xyz
  129. periodic-molecules = FALSE
  130. verlet-buffer-tolerance = 0.005
  131. rlist = 1.239
  132. rlistlong = 1.239
  133. nstcalclr = 20
  134. coulombtype = PME
  135. coulomb-modifier = Potential-shift
  136. rcoulomb-switch = 0
  137. rcoulomb = 1.2
  138. epsilon-r = 1
  139. epsilon-rf = inf
  140. vdw-type = Cut-off
  141. vdw-modifier = Force-switch
  142. rvdw-switch = 1
  143. rvdw = 1.2
  144. DispCorr = No
  145. table-extension = 1
  146. fourierspacing = 0.12
  147. fourier-nx = 144
  148. fourier-ny = 144
  149. fourier-nz = 64
  150. pme-order = 4
  151. ewald-rtol = 1e-05
  152. ewald-rtol-lj = 0.001
  153. lj-pme-comb-rule = Geometric
  154. ewald-geometry = 0
  155. epsilon-surface = 0
  156. implicit-solvent = No
  157. gb-algorithm = Still
  158. nstgbradii = 1
  159. rgbradii = 1
  160. gb-epsilon-solvent = 80
  161. gb-saltconc = 0
  162. gb-obc-alpha = 1
  163. gb-obc-beta = 0.8
  164. gb-obc-gamma = 4.85
  165. gb-dielectric-offset = 0.009
  166. sa-algorithm = Ace-approximation
  167. sa-surface-tension = 2.05016
  168. tcoupl = Nose-Hoover
  169. nsttcouple = 20
  170. nh-chain-length = 1
  171. print-nose-hoover-chain-variables = FALSE
  172. pcoupl = Parrinello-Rahman
  173. pcoupltype = Semiisotropic
  174. nstpcouple = 20
  175. tau-p = 5
  176. compressibility (3x3):
  177. compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
  178. compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
  179. compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
  180. ref-p (3x3):
  181. ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
  182. ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
  183. ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
  184. refcoord-scaling = COM
  185. posres-com (3):
  186. posres-com[0]= 0.00000e+00
  187. posres-com[1]= 0.00000e+00
  188. posres-com[2]= 0.00000e+00
  189. posres-comB (3):
  190. posres-comB[0]= 0.00000e+00
  191. posres-comB[1]= 0.00000e+00
  192. posres-comB[2]= 0.00000e+00
  193. QMMM = FALSE
  194. QMconstraints = 0
  195. QMMMscheme = 0
  196. MMChargeScaleFactor = 1
  197. qm-opts:
  198. ngQM = 0
  199. constraint-algorithm = Lincs
  200. continuation = TRUE
  201. Shake-SOR = FALSE
  202. shake-tol = 0.0001
  203. lincs-order = 4
  204. lincs-iter = 1
  205. lincs-warnangle = 30
  206. nwall = 0
  207. wall-type = 9-3
  208. wall-r-linpot = -1
  209. wall-atomtype[0] = -1
  210. wall-atomtype[1] = -1
  211. wall-density[0] = 0
  212. wall-density[1] = 0
  213. wall-ewald-zfac = 3
  214. pull = no
  215. rotation = FALSE
  216. interactiveMD = FALSE
  217. disre = No
  218. disre-weighting = Conservative
  219. disre-mixed = FALSE
  220. dr-fc = 1000
  221. dr-tau = 0
  222. nstdisreout = 100
  223. orire-fc = 0
  224. orire-tau = 0
  225. nstorireout = 100
  226. free-energy = no
  227. cos-acceleration = 0
  228. deform (3x3):
  229. deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  230. deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  231. deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  232. simulated-tempering = FALSE
  233. E-x:
  234. n = 0
  235. E-xt:
  236. n = 0
  237. E-y:
  238. n = 0
  239. E-yt:
  240. n = 0
  241. E-z:
  242. n = 0
  243. E-zt:
  244. n = 0
  245. swapcoords = no
  246. adress = FALSE
  247. userint1 = 0
  248. userint2 = 0
  249. userint3 = 0
  250. userint4 = 0
  251. userreal1 = 0
  252. userreal2 = 0
  253. userreal3 = 0
  254. userreal4 = 0
  255. grpopts:
  256. nrdf: 261777 192987
  257. ref-t: 303.15 303.15
  258. tau-t: 1 1
  259. annealing: No No
  260. annealing-npoints: 0 0
  261. acc: 0 0 0
  262. nfreeze: N N N
  263. energygrp-flags[ 0]: 0
  264.  
  265. Initializing Domain Decomposition on 1024 ranks
  266. Dynamic load balancing: yes
  267. Will sort the charge groups at every domain (re)decomposition
  268. Initial maximum inter charge-group distances:
  269. two-body bonded interactions: 0.420 nm, LJ-14, atoms 42821 42830
  270. multi-body bonded interactions: 0.420 nm, Proper Dih., atoms 42821 42830
  271. Minimum cell size due to bonded interactions: 0.462 nm
  272. Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
  273. Estimated maximum distance required for P-LINCS: 0.222 nm
  274. Using 384 separate PME ranks, per user request
  275. Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
  276. Optimizing the DD grid for 640 cells with a minimum initial size of 0.578 nm
  277. The maximum allowed number of cells is: X 27 Y 27 Z 13
  278. Domain decomposition grid 16 x 8 x 5, separate PME ranks 384
  279. PME domain decomposition: 16 x 24 x 1
  280. Interleaving PP and PME ranks
  281. This rank does only particle-particle work.
  282.  
  283. Domain decomposition rank 0, coordinates 0 0 0
  284.  
  285. Using two step summing over 128 groups of on average 5.0 ranks
  286.  
  287. Using 1024 MPI processes
  288. Using 2 OpenMP threads per MPI process
  289.  
  290. Detecting CPU SIMD instructions.
  291. Present hardware specification:
  292. Vendor: AuthenticAMD
  293. Brand: AMD Opteron(TM) Processor 6274
  294. Family: 21 Model: 1 Stepping: 2
  295. Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
  296. SIMD instructions most likely to fit this hardware: AVX_128_FMA
  297. SIMD instructions selected at GROMACS compile time: AVX_128_FMA
  298.  
  299.  
  300. The current CPU can measure timings more accurately than the code in
  301. mdrun_mpi was configured to use. This might affect your simulation
  302. speed as accurate timings are needed for load-balancing.
  303. Please consider rebuilding mdrun_mpi with the GMX_USE_RDTSCP=OFF CMake option.
  304.  
  305.  
  306. 1 GPU detected on host nid07334:
  307. #0: NVIDIA Tesla K20X, compute cap.: 3.5, ECC: yes, stat: compatible
  308.  
  309. 1 GPU user-selected for this run.
  310. Mapping of GPUs to the 5 PP ranks in this node: #0, #0, #0, #0, #0
  311.  
  312. NOTE: You assigned GPUs to multiple MPI processes.
  313. Will do PME sum in reciprocal space for electrostatic interactions.
  314.  
  315. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  316. U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
  317. A smooth particle mesh Ewald method
  318. J. Chem. Phys. 103 (1995) pp. 8577-8592
  319. -------- -------- --- Thank You --- -------- --------
  320.  
  321. Will do ordinary reciprocal space Ewald sum.
  322. Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
  323. Cut-off's: NS: 1.239 Coulomb: 1.2 LJ: 1.2
  324. System total charge: 0.000
  325. Generated table with 1119 data points for Ewald.
  326. Tabscale = 500 points/nm
  327. Generated table with 1119 data points for LJ6Shift.
  328. Tabscale = 500 points/nm
  329. Generated table with 1119 data points for LJ12Shift.
  330. Tabscale = 500 points/nm
  331. Generated table with 1119 data points for 1-4 COUL.
  332. Tabscale = 500 points/nm
  333. Generated table with 1119 data points for 1-4 LJ6.
  334. Tabscale = 500 points/nm
  335. Generated table with 1119 data points for 1-4 LJ12.
  336. Tabscale = 500 points/nm
  337.  
  338. Using CUDA 8x8 non-bonded kernels
  339.  
  340. Potential shift: LJ r^-12: -2.648e-01 r^-6: -5.349e-01, Ewald -1.000e-05
  341. Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size: 1536
  342.  
  343.  
  344. Overriding thread affinity set outside mdrun_mpi
  345.  
  346. Pinning threads with an auto-selected logical core stride of 1
  347.  
  348. Initializing Parallel LINear Constraint Solver
  349.  
  350. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  351. B. Hess
  352. P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
  353. J. Chem. Theory Comput. 4 (2008) pp. 116-122
  354. -------- -------- --- Thank You --- -------- --------
  355.  
  356. The number of constraints is 67980
  357. There are inter charge-group constraints,
  358. will communicate selected coordinates each lincs iteration
  359.  
  360. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  361. S. Miyamoto and P. A. Kollman
  362. SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
  363. Water Models
  364. J. Comp. Chem. 13 (1992) pp. 952-962
  365. -------- -------- --- Thank You --- -------- --------
  366.  
  367.  
  368. Linking all bonded interactions to atoms
  369. There are 739685 inter charge-group exclusions,
  370. will use an extra communication step for exclusion forces for PME
  371.  
  372. The maximum number of communication pulses is: X 2 Y 2 Z 2
  373. The minimum size for domain decomposition cells is 0.711 nm
  374. The requested allowed shrink of DD cells (option -dds) is: 0.80
  375. The allowed shrink of domain decomposition cells is: X 0.71 Y 0.35 Z 0.47
  376. The maximum allowed distance for charge groups involved in interactions is:
  377. non-bonded interactions 1.239 nm
  378. two-body bonded interactions (-rdd) 1.239 nm
  379. multi-body bonded interactions (-rdd) 0.711 nm
  380. atoms separated by up to 5 constraints (-rcon) 0.711 nm
  381.  
  382.  
  383. Making 3D domain decomposition grid 16 x 8 x 5, home cell index 0 0 0
  384.  
  385. Center of mass motion removal mode is Linear
  386. We have the following groups for center of mass motion removal:
  387. 0: NPROT
  388. 1: SOL_ION
  389. There are: 206415 Atoms
  390. Charge group distribution at step 0: 309 340 305 349 309 303 338 315 332 299 313 318 307 343 319 301 338 324 340 318 312 346 289 335 303 313 347 320 353 324 299 345 319 342 303 333 342 316 342 303 304 343 311 346 323 304 344 311 339 300 305 318 329 343 296 308 340 304 357 326 307 333 323 313 316 316 330 336 340 310 319 324 307 337 315 299 359 294 329 304 300 333 330 330 318 287 336 309 320 313 309 325 327 349 315 312 337 310 343 305 310 333 346 352 316 325 353 322 345 308 298 334 325 353 317 316 342 321 360 314 329 341 330 349 307 314 324 322 327 327 307 330 319 333 321 303 341 326 319 295 333 341 311 353 312 310 339 302 338 301 281 345 322 321 315 329 337 315 340 313 291 340 293 350 316 326 363 317 345 318 305 338 301 344 300 319 327 302 325 290 314 345 273 327 309 308 335 305 325 303 326 344 280 330 302 311 338 321 361 309 327 342 303 324 334 302 328 322 319 303 316 352 295 354 306 306 368 326 355 321 325 343 299 338 298 325 331 317 333 307 312 347 305 353 311 334 347 319 341 303 321 349 336 337 311 316 357 320 329 294 318 345 316 345 312 315 330 315 334 325 320 330 319 352 306 324 319 317 336 300 295 338 332 333 309 329 345 320 339 326 315 328 325 320 323 316 345 320 343 309 331 332 310 333 295 308 356 300 350 316 306 337 308 336 298 303 344 312 330 302 319 334 304 351 319 315 347 310 313 325 302 344 320 332 310 322 321 334 329 327 306 319 329 324 310 299 333 302 326 310 335 319 298 346 325 298 354 301 344 303 311 336 333 327 301 314 321 329 349 311 308 339 327 317 303 306 332 291 341 315 312 344 315 333 316 310 340 332 334 320 303 333 318 335 316 316 319 327 320 309 315 356 296 318 313 305 322 338 359 305 335 332 321 340 321 322 337 303 345 310 292 322 313 370 305 305 354 304 345 301 305 349 304 331 325 301 332 300 356 311 325 327 313 330 301 293 341 304 346 292 324 339 313 347 308 327 342 310 346 297 311 344 301 334 315 313 341 329 343 291 321 369 310 356 312 312 314 290 347 308 314 328 299 352 312 312 343 327 334 297 316 349 326 349 302 313 326 318 322 312 328 339 331 342 313 312 337 316 345 309 326 355 300 324 326 319 348 317 341 319 320 342 307 318 293 333 333 331 329 324 303 327 292 308 322 303 327 336 338 314 312 320 316 350 307 301 323 302 333 308 305 347 307 350 300 307 342 296 357 311 313 325 321 339 290 283 341 281 340 317 314 329 336 315 304 332 344 315 338 297 311 335 316 337 316 309 341 317 342 331 319 320 288 347 319 305 323 299 334 316 323 339 310 341 289 315 346 317 337 331 301 350 329 348 312 301 332 316 345 316 306 357 325 318 303 297 344 329 334 322 313 330 311 353 310 307 339 334 335 312 317 332 301 338 304 318 346 322 316 307
  391. Initial temperature: 303.008 K
  392.  
  393. Started mdrun on rank 0 Thu Apr 30 11:20:40 2015
  394. Step Time Lambda
  395. 0 0.00000 0.00000
  396.  
  397. Energies (kJ/mol)
  398. Bond U-B Proper Dih. Improper Dih. LJ-14
  399. 5.27858e+04 2.89239e+05 1.51310e+05 1.50125e+03 3.17014e+04
  400. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  401. -4.55645e+05 -3.43455e+04 -1.43186e+06 8.83147e+03 -1.38648e+06
  402. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  403. 5.74416e+05 -8.12065e+05 3.03832e+02 -5.33699e+01 4.19722e-06
  404.  
  405. DD step 39 vol min/aver 1.000 load imb.: force 28.7% pme mesh/force 17.174
  406.  
  407. step 120: timed with pme grid 144 144 64, coulomb cutoff 1.200: 122.3 M-cycles
  408. step 200: timed with pme grid 128 128 60, coulomb cutoff 1.269: 124.6 M-cycles
  409. step 280: timed with pme grid 112 112 56, coulomb cutoff 1.431: 107.9 M-cycles
  410. step 280: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.541
  411. step 360: timed with pme grid 120 120 60, coulomb cutoff 1.336: 80.1 M-cycles
  412. step 440: timed with pme grid 120 120 56, coulomb cutoff 1.359: 81.6 M-cycles
  413. step 440: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.431
  414. step 520: timed with pme grid 144 144 64, coulomb cutoff 1.200: 121.6 M-cycles
  415. step 600: timed with pme grid 128 128 64, coulomb cutoff 1.252: 118.3 M-cycles
  416. step 680: timed with pme grid 128 128 60, coulomb cutoff 1.269: 77.2 M-cycles
  417. step 760: timed with pme grid 120 120 60, coulomb cutoff 1.336: 82.2 M-cycles
  418. step 840: timed with pme grid 120 120 56, coulomb cutoff 1.359: 86.6 M-cycles
  419. step 840: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.359
  420. step 920: timed with pme grid 144 144 64, coulomb cutoff 1.200: 112.1 M-cycles
  421. step 1000: timed with pme grid 128 128 64, coulomb cutoff 1.252: 80.1 M-cycles
  422. DD load balancing is limited by minimum cell size in dimension Z
  423. DD step 999 vol min/aver 0.319! load imb.: force 9.8% pme mesh/force 1.733
  424.  
  425. Step Time Lambda
  426. 1000 2.00000 0.00000
  427.  
  428. Energies (kJ/mol)
  429. Bond U-B Proper Dih. Improper Dih. LJ-14
  430. 5.21615e+04 2.87938e+05 1.51047e+05 1.40608e+03 3.15769e+04
  431. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  432. -4.55451e+05 -3.40091e+04 -1.42824e+06 7.15622e+03 -1.38642e+06
  433. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  434. 5.75773e+05 -8.10643e+05 3.04550e+02 -1.25697e+02 4.18198e-06
  435.  
  436. step 1080: timed with pme grid 128 128 60, coulomb cutoff 1.269: 84.3 M-cycles
  437. step 1160: timed with pme grid 120 120 60, coulomb cutoff 1.336: 83.0 M-cycles
  438. step 1240: timed with pme grid 120 120 56, coulomb cutoff 1.359: 81.2 M-cycles
  439. step 1320: timed with pme grid 128 128 64, coulomb cutoff 1.252: 82.3 M-cycles
  440. step 1400: timed with pme grid 128 128 60, coulomb cutoff 1.269: 81.6 M-cycles
  441. step 1480: timed with pme grid 120 120 60, coulomb cutoff 1.336: 82.4 M-cycles
  442. step 1560: timed with pme grid 120 120 56, coulomb cutoff 1.359: 92.4 M-cycles
  443. optimal pme grid 128 128 60, coulomb cutoff 1.269
  444. DD load balancing is limited by minimum cell size in dimension X Z
  445. DD step 1999 vol min/aver 0.241! load imb.: force 9.5% pme mesh/force 1.771
  446.  
  447. Step Time Lambda
  448. 2000 4.00000 0.00000
  449.  
  450. Energies (kJ/mol)
  451. Bond U-B Proper Dih. Improper Dih. LJ-14
  452. 5.19445e+04 2.87072e+05 1.50866e+05 1.45781e+03 3.16535e+04
  453. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  454. -4.55644e+05 -3.58099e+04 -1.42684e+06 7.22192e+03 -1.38808e+06
  455. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  456. 5.74671e+05 -8.13404e+05 3.03967e+02 -5.04068e+01 4.17936e-06
  457.  
  458. DD load balancing is limited by minimum cell size in dimension X Y Z
  459. DD step 2999 vol min/aver 0.267! load imb.: force 13.8% pme mesh/force 2.195
  460.  
  461. Step Time Lambda
  462. 3000 6.00000 0.00000
  463.  
  464. Energies (kJ/mol)
  465. Bond U-B Proper Dih. Improper Dih. LJ-14
  466. 5.13268e+04 2.86872e+05 1.50802e+05 1.51236e+03 3.19923e+04
  467. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  468. -4.54987e+05 -3.53924e+04 -1.42859e+06 7.07160e+03 -1.38940e+06
  469. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  470. 5.73414e+05 -8.15982e+05 3.03302e+02 1.45053e+02 4.19015e-06
  471.  
  472. DD load balancing is limited by minimum cell size in dimension X Z
  473. DD step 3999 vol min/aver 0.183! load imb.: force 10.9% pme mesh/force 1.892
  474.  
  475. Step Time Lambda
  476. 4000 8.00000 0.00000
  477.  
  478. Energies (kJ/mol)
  479. Bond U-B Proper Dih. Improper Dih. LJ-14
  480. 5.17893e+04 2.88142e+05 1.51003e+05 1.44915e+03 3.18051e+04
  481. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  482. -4.55468e+05 -3.56771e+04 -1.42594e+06 7.13401e+03 -1.38576e+06
  483. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  484. 5.74427e+05 -8.11332e+05 3.03838e+02 -4.97117e+01 4.16063e-06
  485.  
  486.  
  487. step 5000: resetting all time and cycle counters
  488.  
  489. Restarted time on rank 0 Thu Apr 30 11:20:57 2015
  490. DD load balancing is limited by minimum cell size in dimension X Y Z
  491. DD step 4999 vol min/aver 0.189! load imb.: force 15.4% pme mesh/force 1.912
  492.  
  493. Step Time Lambda
  494. 5000 10.00000 0.00000
  495.  
  496. Energies (kJ/mol)
  497. Bond U-B Proper Dih. Improper Dih. LJ-14
  498. 5.12689e+04 2.87968e+05 1.51258e+05 1.34089e+03 3.20722e+04
  499. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  500. -4.56049e+05 -3.60387e+04 -1.42908e+06 7.08832e+03 -1.39017e+06
  501. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  502. 5.72858e+05 -8.17312e+05 3.03008e+02 7.60012e+01 4.16152e-06
  503.  
  504. DD load balancing is limited by minimum cell size in dimension X Y Z
  505. DD step 5999 vol min/aver 0.169! load imb.: force 14.2% pme mesh/force 1.847
  506.  
  507. Step Time Lambda
  508. 6000 12.00000 0.00000
  509.  
  510. Energies (kJ/mol)
  511. Bond U-B Proper Dih. Improper Dih. LJ-14
  512. 5.19672e+04 2.87359e+05 1.51090e+05 1.40197e+03 3.17675e+04
  513. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  514. -4.56662e+05 -3.66389e+04 -1.42440e+06 7.13788e+03 -1.38698e+06
  515. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  516. 5.72988e+05 -8.13987e+05 3.03077e+02 -8.88271e+01 4.23513e-06
  517.  
  518. DD load balancing is limited by minimum cell size in dimension X Y
  519. DD step 6999 vol min/aver 0.147! load imb.: force 10.6% pme mesh/force 1.705
  520.  
  521. Step Time Lambda
  522. 7000 14.00000 0.00000
  523.  
  524. Energies (kJ/mol)
  525. Bond U-B Proper Dih. Improper Dih. LJ-14
  526. 5.19064e+04 2.88068e+05 1.50748e+05 1.47583e+03 3.16473e+04
  527. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  528. -4.56740e+05 -3.55152e+04 -1.42624e+06 7.19116e+03 -1.38746e+06
  529. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  530. 5.69536e+05 -8.17922e+05 3.01251e+02 2.59555e+01 4.20098e-06
  531.  
  532. DD load balancing is limited by minimum cell size in dimension X Y
  533. DD step 7999 vol min/aver 0.137! load imb.: force 15.2% pme mesh/force 1.920
  534.  
  535. Step Time Lambda
  536. 8000 16.00000 0.00000
  537.  
  538. Energies (kJ/mol)
  539. Bond U-B Proper Dih. Improper Dih. LJ-14
  540. 5.21005e+04 2.87501e+05 1.50585e+05 1.45519e+03 3.14913e+04
  541. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  542. -4.56168e+05 -3.49690e+04 -1.42782e+06 7.12573e+03 -1.38870e+06
  543. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  544. 5.72761e+05 -8.15938e+05 3.02957e+02 -9.33493e+01 4.16234e-06
  545.  
  546. DD load balancing is limited by minimum cell size in dimension X Z
  547. DD step 8999 vol min/aver 0.137! load imb.: force 13.4% pme mesh/force 1.678
  548.  
  549. Step Time Lambda
  550. 9000 18.00000 0.00000
  551.  
  552. Energies (kJ/mol)
  553. Bond U-B Proper Dih. Improper Dih. LJ-14
  554. 5.17143e+04 2.86797e+05 1.51199e+05 1.31543e+03 3.17968e+04
  555. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  556. -4.56358e+05 -3.37669e+04 -1.42837e+06 7.11669e+03 -1.38856e+06
  557. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  558. 5.72744e+05 -8.15815e+05 3.02948e+02 1.04735e+02 4.19623e-06
  559.  
  560. DD load balancing is limited by minimum cell size in dimension X Y Z
  561. DD step 9999 vol min/aver 0.158! load imb.: force 16.7% pme mesh/force 1.868
  562.  
  563. Step Time Lambda
  564. 10000 20.00000 0.00000
  565.  
  566. Energies (kJ/mol)
  567. Bond U-B Proper Dih. Improper Dih. LJ-14
  568. 5.18065e+04 2.88034e+05 1.51154e+05 1.36055e+03 3.16068e+04
  569. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  570. -4.55387e+05 -3.40056e+04 -1.42934e+06 7.05637e+03 -1.38772e+06
  571. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  572. 5.72760e+05 -8.14955e+05 3.02956e+02 8.90049e+01 4.20580e-06
  573.  
  574. <====== ############### ==>
  575. <==== A V E R A G E S ====>
  576. <== ############### ======>
  577.  
  578. Statistics over 10001 steps using 101 frames
  579.  
  580. Energies (kJ/mol)
  581. Bond U-B Proper Dih. Improper Dih. LJ-14
  582. 5.18885e+04 2.87601e+05 1.50962e+05 1.41860e+03 3.17028e+04
  583. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  584. -4.55909e+05 -3.51332e+04 -1.42805e+06 7.07502e+03 -1.38844e+06
  585. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  586. 5.73171e+05 -8.15273e+05 3.03174e+02 3.16513e+00 0.00000e+00
  587.  
  588. Box-X Box-Y Box-Z
  589. 1.60211e+01 1.60211e+01 7.62154e+00
  590.  
  591. Total Virial (kJ/mol)
  592. 1.85438e+05 -1.57817e+02 -5.18094e+01
  593. -1.61226e+02 1.86460e+05 -1.51923e+02
  594. -5.91288e+01 -1.58273e+02 2.00723e+05
  595.  
  596. Pressure (bar)
  597. 1.91706e+01 2.23894e+00 2.32552e+00
  598. 2.29658e+00 2.01522e+00 2.23707e+00
  599. 2.44983e+00 2.34499e+00 -1.16904e+01
  600.  
  601. T-NPROT T-SOL_ION
  602. 3.03163e+02 3.03188e+02
  603.  
  604.  
  605. P P - P M E L O A D B A L A N C I N G
  606.  
  607. PP/PME load balancing changed the cut-off and PME settings:
  608. particle-particle PME
  609. rcoulomb rlist grid spacing 1/beta
  610. initial 1.200 nm 1.239 nm 144 144 64 0.119 nm 0.384 nm
  611. final 1.269 nm 1.308 nm 128 128 60 0.127 nm 0.406 nm
  612. cost-ratio 1.18 0.74
  613. (note that these numbers concern only part of the total PP and PME load)
  614.  
  615.  
  616. M E G A - F L O P S A C C O U N T I N G
  617.  
  618. NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
  619. RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
  620. W3=SPC/TIP3p W4=TIP4p (single or pairs)
  621. V&F=Potential and force V=Potential only F=Force only
  622.  
  623. Computing: M-Number M-Flops % Flops
  624. -----------------------------------------------------------------------------
  625. NB VdW [V&F] 1088.367630 1088.368 0.0
  626. Pair Search distance check 3389.459936 30505.139 0.0
  627. NxN Ewald Elec. + LJ [F] 1403227.505216 109451745.407 95.7
  628. NxN Ewald Elec. + LJ [V&F] 14457.815104 1865058.148 1.6
  629. 1,4 nonbonded interactions 1575.315000 141778.350 0.1
  630. Calc Weights 3096.844245 111486.393 0.1
  631. Spread Q Bspline 66066.010560 132132.021 0.1
  632. Gather F Bspline 66066.010560 396396.063 0.3
  633. 3D-FFT 195731.828538 1565854.628 1.4
  634. Solve PME 1966.473216 125854.286 0.1
  635. Reset In Box 26.008290 78.025 0.0
  636. CG-CoM 26.008290 78.025 0.0
  637. Bonds 212.942580 12563.612 0.0
  638. Propers 1815.312990 415706.675 0.4
  639. Impropers 5.901180 1227.445 0.0
  640. Virial 59.038965 1062.701 0.0
  641. Stop-CM 10.527165 105.272 0.0
  642. Calc-Ekin 103.413915 2792.176 0.0
  643. Lincs 424.683590 25481.015 0.0
  644. Lincs-Mat 3067.540548 12270.162 0.0
  645. Constraint-V 1405.207510 11241.660 0.0
  646. Constraint-Vir 49.212396 1181.098 0.0
  647. Settle 185.280110 59845.476 0.1
  648. -----------------------------------------------------------------------------
  649. Total 114365532.145 100.0
  650. -----------------------------------------------------------------------------
  651.  
  652.  
  653. D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
  654.  
  655. av. #atoms communicated per step for force: 2 x 1148475.3
  656. av. #atoms communicated per step for LINCS: 2 x 45259.6
  657.  
  658. Average load imbalance: 15.6 %
  659. Part of the total run time spent waiting due to load imbalance: 5.5 %
  660. Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 2 % Y 2 % Z 2 %
  661. Average PME mesh/force load: 1.840
  662. Part of the total run time spent waiting due to PP/PME imbalance: 24.2 %
  663.  
  664. NOTE: 5.5 % of the available CPU time was lost due to load imbalance
  665. in the domain decomposition.
  666.  
  667. NOTE: 24.2 % performance was lost because the PME ranks
  668. had more work to do than the PP ranks.
  669. You might want to increase the number of PME ranks
  670. or increase the cut-off and the grid spacing.
  671.  
  672.  
  673. R E A L C Y C L E A N D T I M E A C C O U N T I N G
  674.  
  675. On 640 MPI ranks doing PP, each using 2 OpenMP threads, and
  676. on 384 MPI ranks doing PME, each using 2 OpenMP threads
  677.  
  678. Computing: Num Num Call Wall time Giga-Cycles
  679. Ranks Threads Count (s) total sum %
  680. -----------------------------------------------------------------------------
  681. Domain decomp. 640 2 126 0.529 1489.196 3.4
  682. DD comm. load 640 2 126 0.005 14.848 0.0
  683. DD comm. bounds 640 2 126 0.024 67.042 0.2
  684. Send X to PME 640 2 5001 0.015 43.589 0.1
  685. Neighbor search 640 2 126 0.113 317.560 0.7
  686. Launch GPU ops. 640 2 10002 0.822 2316.175 5.2
  687. Comm. coord. 640 2 4875 0.811 2283.951 5.1
  688. Force 640 2 5001 0.728 2049.307 4.6
  689. Wait + Comm. F 640 2 5001 1.639 4616.740 10.4
  690. PME mesh * 384 2 5001 7.787 13156.944 29.7
  691. PME wait for PP * 2.061 3482.081 7.8
  692. Wait + Recv. PME F 640 2 5001 2.681 7551.103 17.0
  693. Wait GPU nonlocal 640 2 5001 0.484 1362.933 3.1
  694. Wait GPU local 640 2 5001 0.355 1001.063 2.3
  695. NB X/F buffer ops. 640 2 19752 0.304 857.475 1.9
  696. Write traj. 640 2 2 0.069 193.055 0.4
  697. Update 640 2 5001 0.105 294.596 0.7
  698. Constraints 640 2 5001 0.648 1823.500 4.1
  699. Comm. energies 640 2 251 0.449 1264.524 2.8
  700. Rest 0.066 185.132 0.4
  701. -----------------------------------------------------------------------------
  702. Total 9.848 44370.860 100.0
  703. -----------------------------------------------------------------------------
  704. (*) Note that with separate PME ranks, the walltime column actually sums to
  705. twice the total reported, but the cycle count total and % are correct.
  706. -----------------------------------------------------------------------------
  707. Breakdown of PME mesh computation
  708. -----------------------------------------------------------------------------
  709. PME redist. X/F 384 2 10002 1.831 3093.099 7.0
  710. PME spread/gather 384 2 10002 1.149 1942.176 4.4
  711. PME 3D-FFT 384 2 10002 0.438 739.351 1.7
  712. PME 3D-FFT Comm. 384 2 20004 4.307 7277.497 16.4
  713. PME solve Elec 384 2 5001 0.038 64.066 0.1
  714. -----------------------------------------------------------------------------
  715.  
  716. Core t (s) Wall t (s) (%)
  717. Time: 19526.246 9.848 198279.0
  718. (ns/day) (hour/ns)
  719. Performance: 87.752 0.273
  720. Finished mdrun on rank 0 Thu Apr 30 11:21:07 2015
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement