Advertisement
Guest User

Fast

a guest
Apr 29th, 2015
298
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 37.72 KB | None | 0 0
  1. Log file opened on Wed Apr 29 18:54:29 2015
  2. Host: nid17273 pid: 17130 rank ID: 0 number of ranks: 1024
  3. GROMACS: gmx mdrun, VERSION 5.0.2
  4.  
  5. GROMACS is written by:
  6. Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
  7. Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
  8. Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
  9. Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
  10. Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
  11. Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
  12. Peter Tieleman Christian Wennberg Maarten Wolf
  13. and the project leaders:
  14. Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
  15.  
  16. Copyright (c) 1991-2000, University of Groningen, The Netherlands.
  17. Copyright (c) 2001-2014, The GROMACS development team at
  18. Uppsala University, Stockholm University and
  19. the Royal Institute of Technology, Sweden.
  20. check out http://www.gromacs.org for more information.
  21.  
  22. GROMACS is free software; you can redistribute it and/or modify it
  23. under the terms of the GNU Lesser General Public License
  24. as published by the Free Software Foundation; either version 2.1
  25. of the License, or (at your option) any later version.
  26.  
  27. GROMACS: gmx mdrun, VERSION 5.0.2
  28. Executable: mdrun_mpi
  29. Library dir: /sw/xk6/gromacs/5.0.2/cle5.2_gnu4.8.2/share/gromacs/top
  30. Command line:
  31. mdrun_mpi -gpu_id 000000 -npme 256 -dlb yes -pin on -resethway -noconfout -v -s opt.tpr -deffnm test
  32.  
  33. Gromacs version: VERSION 5.0.2
  34. Precision: single
  35. Memory model: 64 bit
  36. MPI library: MPI
  37. OpenMP support: enabled
  38. GPU support: enabled
  39. invsqrt routine: gmx_software_invsqrt(x)
  40. SIMD instructions: AVX_128_FMA
  41. FFT library: commercial-fftw-3.3.4-fma-sse2-avx
  42. RDTSCP usage: disabled
  43. C++11 compilation: disabled
  44. TNG support: enabled
  45. Tracing support: disabled
  46. Built on: Thu Mar 12 18:27:12 EDT 2015
  47. Built by: ff1@titan-ext8 [CMAKE]
  48. Build OS/arch: Linux 3.0.101-0.46-default x86_64
  49. Build CPU vendor: AuthenticAMD
  50. Build CPU brand: AMD Opteron(tm) Processor 6140
  51. Build CPU family: 16 Model: 9 Stepping: 1
  52. Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
  53. C compiler: /opt/cray/craype/2.2.1/bin/cc GNU 4.8.2
  54. C compiler flags: -mavx -mfma4 -mxop -Wno-maybe-uninitialized -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
  55. C++ compiler: /opt/cray/craype/2.2.1/bin/CC GNU 4.8.2
  56. C++ compiler flags: -mavx -mfma4 -mxop -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
  57. Boost version: 1.55.0 (internal)
  58. CUDA compiler: /opt/nvidia/cudatoolkit/5.5.51-1.0502.9594.3.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on Thu_Mar__6_02:21:19_PST_2014;Cuda compilation tools, release 5.5, V5.5.0
  59. CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;; ;-mavx;-mfma4;-mxop;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-O3;-DNDEBUG;-fomit-frame-pointer;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds;
  60. CUDA driver: 5.50
  61. CUDA runtime: 5.50
  62.  
  63.  
  64.  
  65. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  66. B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
  67. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
  68. molecular simulation
  69. J. Chem. Theory Comput. 4 (2008) pp. 435-447
  70. -------- -------- --- Thank You --- -------- --------
  71.  
  72.  
  73. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  74. D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
  75. Berendsen
  76. GROMACS: Fast, Flexible and Free
  77. J. Comp. Chem. 26 (2005) pp. 1701-1719
  78. -------- -------- --- Thank You --- -------- --------
  79.  
  80.  
  81. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  82. E. Lindahl and B. Hess and D. van der Spoel
  83. GROMACS 3.0: A package for molecular simulation and trajectory analysis
  84. J. Mol. Mod. 7 (2001) pp. 306-317
  85. -------- -------- --- Thank You --- -------- --------
  86.  
  87.  
  88. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  89. H. J. C. Berendsen, D. van der Spoel and R. van Drunen
  90. GROMACS: A message-passing parallel molecular dynamics implementation
  91. Comp. Phys. Comm. 91 (1995) pp. 43-56
  92. -------- -------- --- Thank You --- -------- --------
  93.  
  94.  
  95. Number of hardware threads detected (16) does not match the number reported by OpenMP (1).
  96. Consider setting the launch configuration manually!
  97. Changing nstlist from 20 to 40, rlist from 1.2 to 1.239
  98.  
  99. Input Parameters:
  100. integrator = md
  101. tinit = 0
  102. dt = 0.002
  103. nsteps = 10000
  104. init-step = 0
  105. simulation-part = 1
  106. comm-mode = Linear
  107. nstcomm = 100
  108. bd-fric = 0
  109. ld-seed = 60975668
  110. emtol = 10
  111. emstep = 0.01
  112. niter = 20
  113. fcstep = 0
  114. nstcgsteep = 1000
  115. nbfgscorr = 10
  116. rtpi = 0.05
  117. nstxout = 5000
  118. nstvout = 5000
  119. nstfout = 5000
  120. nstlog = 1000
  121. nstcalcenergy = 100
  122. nstenergy = 1000
  123. nstxout-compressed = 0
  124. compressed-x-precision = 1000
  125. cutoff-scheme = Verlet
  126. nstlist = 40
  127. ns-type = Grid
  128. pbc = xyz
  129. periodic-molecules = FALSE
  130. verlet-buffer-tolerance = 0.005
  131. rlist = 1.239
  132. rlistlong = 1.239
  133. nstcalclr = 20
  134. coulombtype = PME
  135. coulomb-modifier = Potential-shift
  136. rcoulomb-switch = 0
  137. rcoulomb = 1.2
  138. epsilon-r = 1
  139. epsilon-rf = inf
  140. vdw-type = Cut-off
  141. vdw-modifier = Force-switch
  142. rvdw-switch = 1
  143. rvdw = 1.2
  144. DispCorr = No
  145. table-extension = 1
  146. fourierspacing = 0.12
  147. fourier-nx = 144
  148. fourier-ny = 144
  149. fourier-nz = 64
  150. pme-order = 4
  151. ewald-rtol = 1e-05
  152. ewald-rtol-lj = 0.001
  153. lj-pme-comb-rule = Geometric
  154. ewald-geometry = 0
  155. epsilon-surface = 0
  156. implicit-solvent = No
  157. gb-algorithm = Still
  158. nstgbradii = 1
  159. rgbradii = 1
  160. gb-epsilon-solvent = 80
  161. gb-saltconc = 0
  162. gb-obc-alpha = 1
  163. gb-obc-beta = 0.8
  164. gb-obc-gamma = 4.85
  165. gb-dielectric-offset = 0.009
  166. sa-algorithm = Ace-approximation
  167. sa-surface-tension = 2.05016
  168. tcoupl = Nose-Hoover
  169. nsttcouple = 20
  170. nh-chain-length = 1
  171. print-nose-hoover-chain-variables = FALSE
  172. pcoupl = Parrinello-Rahman
  173. pcoupltype = Semiisotropic
  174. nstpcouple = 20
  175. tau-p = 5
  176. compressibility (3x3):
  177. compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
  178. compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
  179. compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
  180. ref-p (3x3):
  181. ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
  182. ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
  183. ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
  184. refcoord-scaling = COM
  185. posres-com (3):
  186. posres-com[0]= 0.00000e+00
  187. posres-com[1]= 0.00000e+00
  188. posres-com[2]= 0.00000e+00
  189. posres-comB (3):
  190. posres-comB[0]= 0.00000e+00
  191. posres-comB[1]= 0.00000e+00
  192. posres-comB[2]= 0.00000e+00
  193. QMMM = FALSE
  194. QMconstraints = 0
  195. QMMMscheme = 0
  196. MMChargeScaleFactor = 1
  197. qm-opts:
  198. ngQM = 0
  199. constraint-algorithm = Lincs
  200. continuation = TRUE
  201. Shake-SOR = FALSE
  202. shake-tol = 0.0001
  203. lincs-order = 4
  204. lincs-iter = 1
  205. lincs-warnangle = 30
  206. nwall = 0
  207. wall-type = 9-3
  208. wall-r-linpot = -1
  209. wall-atomtype[0] = -1
  210. wall-atomtype[1] = -1
  211. wall-density[0] = 0
  212. wall-density[1] = 0
  213. wall-ewald-zfac = 3
  214. pull = no
  215. rotation = FALSE
  216. interactiveMD = FALSE
  217. disre = No
  218. disre-weighting = Conservative
  219. disre-mixed = FALSE
  220. dr-fc = 1000
  221. dr-tau = 0
  222. nstdisreout = 100
  223. orire-fc = 0
  224. orire-tau = 0
  225. nstorireout = 100
  226. free-energy = no
  227. cos-acceleration = 0
  228. deform (3x3):
  229. deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  230. deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  231. deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  232. simulated-tempering = FALSE
  233. E-x:
  234. n = 0
  235. E-xt:
  236. n = 0
  237. E-y:
  238. n = 0
  239. E-yt:
  240. n = 0
  241. E-z:
  242. n = 0
  243. E-zt:
  244. n = 0
  245. swapcoords = no
  246. adress = FALSE
  247. userint1 = 0
  248. userint2 = 0
  249. userint3 = 0
  250. userint4 = 0
  251. userreal1 = 0
  252. userreal2 = 0
  253. userreal3 = 0
  254. userreal4 = 0
  255. grpopts:
  256. nrdf: 261777 192987
  257. ref-t: 303.15 303.15
  258. tau-t: 1 1
  259. annealing: No No
  260. annealing-npoints: 0 0
  261. acc: 0 0 0
  262. nfreeze: N N N
  263. energygrp-flags[ 0]: 0
  264.  
  265. Initializing Domain Decomposition on 1024 ranks
  266. Dynamic load balancing: yes
  267. Will sort the charge groups at every domain (re)decomposition
  268. Initial maximum inter charge-group distances:
  269. two-body bonded interactions: 0.420 nm, LJ-14, atoms 42821 42830
  270. multi-body bonded interactions: 0.420 nm, Proper Dih., atoms 42821 42830
  271. Minimum cell size due to bonded interactions: 0.462 nm
  272. Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
  273. Estimated maximum distance required for P-LINCS: 0.222 nm
  274. Using 256 separate PME ranks, per user request
  275. Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
  276. Optimizing the DD grid for 768 cells with a minimum initial size of 0.578 nm
  277. The maximum allowed number of cells is: X 27 Y 27 Z 13
  278. Domain decomposition grid 16 x 16 x 3, separate PME ranks 256
  279. PME domain decomposition: 16 x 16 x 1
  280. Interleaving PP and PME ranks
  281. This rank does only particle-particle work.
  282.  
  283. Domain decomposition rank 0, coordinates 0 0 0
  284.  
  285. Using two step summing over 128 groups of on average 6.0 ranks
  286.  
  287. Using 1024 MPI processes
  288. Using 2 OpenMP threads per MPI process
  289.  
  290. Detecting CPU SIMD instructions.
  291. Present hardware specification:
  292. Vendor: AuthenticAMD
  293. Brand: AMD Opteron(TM) Processor 6274
  294. Family: 21 Model: 1 Stepping: 2
  295. Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
  296. SIMD instructions most likely to fit this hardware: AVX_128_FMA
  297. SIMD instructions selected at GROMACS compile time: AVX_128_FMA
  298.  
  299.  
  300. The current CPU can measure timings more accurately than the code in
  301. mdrun_mpi was configured to use. This might affect your simulation
  302. speed as accurate timings are needed for load-balancing.
  303. Please consider rebuilding mdrun_mpi with the GMX_USE_RDTSCP=OFF CMake option.
  304.  
  305.  
  306. 1 GPU detected on host nid17273:
  307. #0: NVIDIA Tesla K20X, compute cap.: 3.5, ECC: yes, stat: compatible
  308.  
  309. 1 GPU user-selected for this run.
  310. Mapping of GPUs to the 6 PP ranks in this node: #0, #0, #0, #0, #0, #0
  311.  
  312. NOTE: You assigned GPUs to multiple MPI processes.
  313. Will do PME sum in reciprocal space for electrostatic interactions.
  314.  
  315. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  316. U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
  317. A smooth particle mesh Ewald method
  318. J. Chem. Phys. 103 (1995) pp. 8577-8592
  319. -------- -------- --- Thank You --- -------- --------
  320.  
  321. Will do ordinary reciprocal space Ewald sum.
  322. Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
  323. Cut-off's: NS: 1.239 Coulomb: 1.2 LJ: 1.2
  324. System total charge: 0.000
  325. Generated table with 1119 data points for Ewald.
  326. Tabscale = 500 points/nm
  327. Generated table with 1119 data points for LJ6Shift.
  328. Tabscale = 500 points/nm
  329. Generated table with 1119 data points for LJ12Shift.
  330. Tabscale = 500 points/nm
  331. Generated table with 1119 data points for 1-4 COUL.
  332. Tabscale = 500 points/nm
  333. Generated table with 1119 data points for 1-4 LJ6.
  334. Tabscale = 500 points/nm
  335. Generated table with 1119 data points for 1-4 LJ12.
  336. Tabscale = 500 points/nm
  337.  
  338. Using CUDA 8x8 non-bonded kernels
  339.  
  340. Potential shift: LJ r^-12: -2.648e-01 r^-6: -5.349e-01, Ewald -1.000e-05
  341. Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size: 1536
  342.  
  343.  
  344. Overriding thread affinity set outside mdrun_mpi
  345.  
  346. Pinning threads with an auto-selected logical core stride of 1
  347.  
  348. Initializing Parallel LINear Constraint Solver
  349.  
  350. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  351. B. Hess
  352. P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
  353. J. Chem. Theory Comput. 4 (2008) pp. 116-122
  354. -------- -------- --- Thank You --- -------- --------
  355.  
  356. The number of constraints is 67980
  357. There are inter charge-group constraints,
  358. will communicate selected coordinates each lincs iteration
  359.  
  360. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  361. S. Miyamoto and P. A. Kollman
  362. SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
  363. Water Models
  364. J. Comp. Chem. 13 (1992) pp. 952-962
  365. -------- -------- --- Thank You --- -------- --------
  366.  
  367.  
  368. Linking all bonded interactions to atoms
  369. There are 739685 inter charge-group exclusions,
  370. will use an extra communication step for exclusion forces for PME
  371.  
  372. The maximum number of communication pulses is: X 2 Y 2 Z 2
  373. The minimum size for domain decomposition cells is 0.711 nm
  374. The requested allowed shrink of DD cells (option -dds) is: 0.80
  375. The allowed shrink of domain decomposition cells is: X 0.71 Y 0.71 Z 0.28
  376. The maximum allowed distance for charge groups involved in interactions is:
  377. non-bonded interactions 1.239 nm
  378. two-body bonded interactions (-rdd) 1.239 nm
  379. multi-body bonded interactions (-rdd) 0.711 nm
  380. atoms separated by up to 5 constraints (-rcon) 0.711 nm
  381.  
  382.  
  383. Making 3D domain decomposition grid 16 x 16 x 3, home cell index 0 0 0
  384.  
  385. Center of mass motion removal mode is Linear
  386. We have the following groups for center of mass motion removal:
  387. 0: NPROT
  388. 1: SOL_ION
  389. There are: 206415 Atoms
  390. Charge group distribution at step 0: 260 264 256 284 267 281 271 280 244 252 268 272 257 272 273 277 249 272 266 276 277 254 274 274 268 261 250 275 254 277 267 280 278 273 284 275 256 295 258 271 260 268 285 271 270 272 290 248 263 287 287 263 265 262 266 267 264 272 269 260 271 273 276 243 284 244 254 282 277 279 255 288 276 258 264 261 275 258 269 298 280 261 265 259 287 253 260 254 266 282 273 275 258 257 260 262 267 269 266 261 278 270 256 271 275 254 255 254 252 275 273 273 282 270 267 270 267 267 271 265 260 282 286 265 298 266 269 271 269 294 279 271 266 293 264 254 286 264 280 274 277 262 286 274 284 277 277 281 284 253 267 274 269 262 265 277 251 282 285 271 250 271 272 290 253 251 270 248 270 271 267 285 278 279 253 275 249 278 267 268 252 286 268 245 271 262 275 271 268 274 276 270 272 260 276 251 259 272 281 273 268 279 303 265 260 284 263 263 256 262 266 275 251 267 248 256 277 252 256 263 254 266 267 277 247 265 262 258 277 247 269 277 265 247 264 281 260 267 288 280 276 270 267 276 258 283 244 262 262 267 279 260 261 283 248 282 263 286 271 296 280 269 284 276 277 254 258 286 265 263 268 259 261 280 271 274 267 280 266 267 277 271 285 283 256 277 279 264 275 281 271 280 281 266 267 288 256 286 276 243 286 269 276 270 278 257 270 262 261 269 273 284 267 285 278 270 270 257 261 264 241 284 277 269 259 275 258 259 293 263 283 282 270 268 270 286 264 268 272 266 281 260 271 287 274 265 274 262 277 274 260 273 260 257 257 282 267 283 264 277 259 268 263 263 274 258 253 274 267 271 264 262 276 255 270 263 286 277 267 254 274 277 279 259 268 261 267 261 287 264 277 274 269 262 273 278 254 277 255 246 286 270 246 272 266 262 260 264 280 282 273 264 248 276 266 269 269 262 274 260 268 268 272 261 286 253 250 274 282 276 276 266 264 271 256 268 273 262 271 261 277 255 256 265 274 280 272 258 270 266 276 262 284 262 294 258 263 240 269 276 274 283 266 268 256 260 268 273 282 257 258 267 263 271 255 285 264 265 290 270 271 271 276 288 276 267 281 264 278 263 267 264 257 279 264 246 289 267 276 280 274 257 263 259 272 262 268 261 265 286 266 275 284 254 256 265 270 267 253 264 281 261 271 284 256 252 252 261 286 271 270 263 274 267 275 267 266 273 282 259 257 273 269 286 256 264 260 270 265 281 289 252 271 275 273 283 297 269 257 253 267 262 274 258 268 254 280 264 268 271 266 285 255 271 274 262 269 294 267 274 272 266 267 285 260 252 259 268 275 276 248 282 294 278 273 267 268 266 273 272 277 282 268 286 246 272 269 273 265 280 282 275 276 270 252 264 265 253 297 278 267 264 269 275 258 273 251 260 239 271 253 295 271 264 261 274 270 274 272 251 275 263 257 245 259 260 280 266 278 274 263 266 260 268 266 247 263 261 301 275 264 268 256 270 274 256 246 258 263 273 242 280 251 287 257 257 285 261 279 267 272 279 281 248 256 269 280 269 284 257 268 263 283 264 286 276 267 261 273 272 246 274 252 266 255 271 265 268 274 288 261 275 242 262 268 264 268 282 283 281 259 286 274 268 281 272 259 277 277 264 277 256 271 287 251 261 287 252 255 278 257 273 284 279 270 288 269 263 256 271 266 275 280 262 285 259 269 270 269 264 258 262 278 275 242 275 269 270
  391. Initial temperature: 303.008 K
  392.  
  393. Started mdrun on rank 0 Wed Apr 29 18:54:31 2015
  394. Step Time Lambda
  395. 0 0.00000 0.00000
  396.  
  397. Energies (kJ/mol)
  398. Bond U-B Proper Dih. Improper Dih. LJ-14
  399. 5.27858e+04 2.89239e+05 1.51310e+05 1.50125e+03 3.17014e+04
  400. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  401. -4.55645e+05 -3.43455e+04 -1.43186e+06 8.83147e+03 -1.38648e+06
  402. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  403. 5.74416e+05 -8.12065e+05 3.03832e+02 -5.33589e+01 4.19705e-06
  404.  
  405. DD step 39 vol min/aver 1.000 load imb.: force 26.0% pme mesh/force 16.778
  406.  
  407. step 120: timed with pme grid 144 144 64, coulomb cutoff 1.200: 158.1 M-cycles
  408. step 200: timed with pme grid 128 128 60, coulomb cutoff 1.269: 123.9 M-cycles
  409. step 280: timed with pme grid 112 112 56, coulomb cutoff 1.431: 97.0 M-cycles
  410. step 360: timed with pme grid 104 104 48, coulomb cutoff 1.586: 99.5 M-cycles
  411. step 360: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.603
  412. step 440: timed with pme grid 120 120 60, coulomb cutoff 1.336: 84.2 M-cycles
  413. step 520: timed with pme grid 120 120 56, coulomb cutoff 1.359: 102.8 M-cycles
  414. step 600: timed with pme grid 112 112 52, coulomb cutoff 1.464: 91.0 M-cycles
  415. step 680: timed with pme grid 108 108 52, coulomb cutoff 1.484: 92.5 M-cycles
  416. step 680: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.484
  417. step 760: timed with pme grid 144 144 64, coulomb cutoff 1.200: 104.5 M-cycles
  418. step 840: timed with pme grid 128 128 64, coulomb cutoff 1.252: 90.3 M-cycles
  419. step 920: timed with pme grid 128 128 60, coulomb cutoff 1.269: 84.4 M-cycles
  420. step 1000: timed with pme grid 120 120 60, coulomb cutoff 1.336: 92.2 M-cycles
  421. DD step 999 vol min/aver 0.480 load imb.: force 5.6% pme mesh/force 1.539
  422.  
  423. Step Time Lambda
  424. 1000 2.00000 0.00000
  425.  
  426. Energies (kJ/mol)
  427. Bond U-B Proper Dih. Improper Dih. LJ-14
  428. 5.29522e+04 2.87836e+05 1.50674e+05 1.37663e+03 3.16257e+04
  429. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  430. -4.54874e+05 -3.27471e+04 -1.42815e+06 5.60546e+03 -1.38570e+06
  431. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  432. 5.72149e+05 -8.13556e+05 3.02633e+02 -2.81511e+01 4.19495e-06
  433.  
  434. step 1080: timed with pme grid 120 120 56, coulomb cutoff 1.359: 91.3 M-cycles
  435. step 1160: timed with pme grid 112 112 56, coulomb cutoff 1.431: 83.0 M-cycles
  436. step 1160: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.431
  437. step 1240: timed with pme grid 144 144 64, coulomb cutoff 1.200: 109.3 M-cycles
  438. step 1320: timed with pme grid 128 128 64, coulomb cutoff 1.252: 91.3 M-cycles
  439. step 1400: timed with pme grid 128 128 60, coulomb cutoff 1.269: 85.9 M-cycles
  440. step 1480: timed with pme grid 120 120 60, coulomb cutoff 1.336: 88.2 M-cycles
  441. step 1560: timed with pme grid 120 120 56, coulomb cutoff 1.359: 88.9 M-cycles
  442. step 1560: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.359
  443. step 1640: timed with pme grid 144 144 64, coulomb cutoff 1.200: 98.1 M-cycles
  444. step 1720: timed with pme grid 128 128 64, coulomb cutoff 1.252: 92.0 M-cycles
  445. step 1800: timed with pme grid 128 128 60, coulomb cutoff 1.269: 89.5 M-cycles
  446. step 1880: timed with pme grid 120 120 60, coulomb cutoff 1.336: 90.3 M-cycles
  447. step 1960: timed with pme grid 120 120 56, coulomb cutoff 1.359: 85.5 M-cycles
  448. DD load balancing is limited by minimum cell size in dimension Y
  449. DD step 1999 vol min/aver 0.425! load imb.: force 4.2% pme mesh/force 1.842
  450.  
  451. Step Time Lambda
  452. 2000 4.00000 0.00000
  453.  
  454. Energies (kJ/mol)
  455. Bond U-B Proper Dih. Improper Dih. LJ-14
  456. 5.20044e+04 2.87172e+05 1.50446e+05 1.38468e+03 3.16776e+04
  457. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  458. -4.54684e+05 -3.71101e+04 -1.42707e+06 7.53430e+03 -1.38865e+06
  459. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  460. 5.72660e+05 -8.15988e+05 3.02903e+02 4.59537e+00 4.20145e-06
  461.  
  462. step 2040: timed with pme grid 128 128 64, coulomb cutoff 1.252: 92.9 M-cycles
  463. step 2120: timed with pme grid 128 128 60, coulomb cutoff 1.269: 88.9 M-cycles
  464. step 2200: timed with pme grid 120 120 60, coulomb cutoff 1.336: 84.9 M-cycles
  465. step 2280: timed with pme grid 120 120 56, coulomb cutoff 1.359: 85.2 M-cycles
  466. optimal pme grid 120 120 60, coulomb cutoff 1.336
  467. DD step 2999 vol min/aver 0.386 load imb.: force 8.3% pme mesh/force 1.639
  468.  
  469. Step Time Lambda
  470. 3000 6.00000 0.00000
  471.  
  472. Energies (kJ/mol)
  473. Bond U-B Proper Dih. Improper Dih. LJ-14
  474. 5.18437e+04 2.87053e+05 1.50614e+05 1.58045e+03 3.16184e+04
  475. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  476. -4.56138e+05 -3.52287e+04 -1.42844e+06 5.90482e+03 -1.39119e+06
  477. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  478. 5.72635e+05 -8.18555e+05 3.02890e+02 1.91758e+01 4.19642e-06
  479.  
  480. DD load balancing is limited by minimum cell size in dimension X Y
  481. DD step 3999 vol min/aver 0.323! load imb.: force 5.9% pme mesh/force 1.540
  482.  
  483. Step Time Lambda
  484. 4000 8.00000 0.00000
  485.  
  486. Energies (kJ/mol)
  487. Bond U-B Proper Dih. Improper Dih. LJ-14
  488. 5.16289e+04 2.87956e+05 1.50517e+05 1.44336e+03 3.17104e+04
  489. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  490. -4.55442e+05 -3.72092e+04 -1.42299e+06 5.87870e+03 -1.38651e+06
  491. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  492. 5.73271e+05 -8.13240e+05 3.03226e+02 6.92634e+01 4.19984e-06
  493.  
  494.  
  495. step 5000: resetting all time and cycle counters
  496.  
  497. Restarted time on rank 0 Wed Apr 29 18:54:48 2015
  498. DD load balancing is limited by minimum cell size in dimension X Y
  499. DD step 4999 vol min/aver 0.315! load imb.: force 5.5% pme mesh/force 1.641
  500.  
  501. Step Time Lambda
  502. 5000 10.00000 0.00000
  503.  
  504. Energies (kJ/mol)
  505. Bond U-B Proper Dih. Improper Dih. LJ-14
  506. 5.20950e+04 2.86411e+05 1.51407e+05 1.33769e+03 3.21031e+04
  507. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  508. -4.54951e+05 -3.62656e+04 -1.42449e+06 5.97659e+03 -1.38638e+06
  509. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  510. 5.72911e+05 -8.13465e+05 3.03036e+02 4.70377e+01 4.20480e-06
  511.  
  512. DD load balancing is limited by minimum cell size in dimension X Y
  513. DD step 5999 vol min/aver 0.300! load imb.: force 6.6% pme mesh/force 1.629
  514.  
  515. Step Time Lambda
  516. 6000 12.00000 0.00000
  517.  
  518. Energies (kJ/mol)
  519. Bond U-B Proper Dih. Improper Dih. LJ-14
  520. 5.21578e+04 2.88634e+05 1.51206e+05 1.39760e+03 3.20440e+04
  521. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  522. -4.54861e+05 -3.56684e+04 -1.42939e+06 5.82133e+03 -1.38866e+06
  523. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  524. 5.72270e+05 -8.16389e+05 3.02697e+02 -1.00435e+02 4.22395e-06
  525.  
  526. DD load balancing is limited by minimum cell size in dimension X Y
  527. DD step 6999 vol min/aver 0.249! load imb.: force 6.4% pme mesh/force 1.529
  528.  
  529. Step Time Lambda
  530. 7000 14.00000 0.00000
  531.  
  532. Energies (kJ/mol)
  533. Bond U-B Proper Dih. Improper Dih. LJ-14
  534. 5.20223e+04 2.88500e+05 1.51289e+05 1.44962e+03 3.19163e+04
  535. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  536. -4.54714e+05 -3.67867e+04 -1.42481e+06 6.00453e+03 -1.38513e+06
  537. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  538. 5.73242e+05 -8.11890e+05 3.03211e+02 -5.23086e+01 4.21466e-06
  539.  
  540. DD load balancing is limited by minimum cell size in dimension X Y
  541. DD step 7999 vol min/aver 0.235! load imb.: force 6.0% pme mesh/force 1.767
  542.  
  543. Step Time Lambda
  544. 8000 16.00000 0.00000
  545.  
  546. Energies (kJ/mol)
  547. Bond U-B Proper Dih. Improper Dih. LJ-14
  548. 5.12719e+04 2.88628e+05 1.50898e+05 1.44811e+03 3.20650e+04
  549. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  550. -4.54932e+05 -3.62830e+04 -1.42787e+06 5.91038e+03 -1.38886e+06
  551. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  552. 5.75064e+05 -8.13798e+05 3.04175e+02 -1.96469e+01 4.18133e-06
  553.  
  554. DD load balancing is limited by minimum cell size in dimension X Y
  555. DD step 8999 vol min/aver 0.262! load imb.: force 6.0% pme mesh/force 1.444
  556.  
  557. Step Time Lambda
  558. 9000 18.00000 0.00000
  559.  
  560. Energies (kJ/mol)
  561. Bond U-B Proper Dih. Improper Dih. LJ-14
  562. 5.17386e+04 2.88164e+05 1.50976e+05 1.36630e+03 3.15794e+04
  563. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  564. -4.55671e+05 -3.53393e+04 -1.42599e+06 5.73925e+03 -1.38744e+06
  565. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  566. 5.75739e+05 -8.11699e+05 3.04532e+02 6.11549e+01 4.19903e-06
  567.  
  568. DD load balancing is limited by minimum cell size in dimension X Y
  569. DD step 9999 vol min/aver 0.258! load imb.: force 6.0% pme mesh/force 1.459
  570.  
  571. Step Time Lambda
  572. 10000 20.00000 0.00000
  573.  
  574. Energies (kJ/mol)
  575. Bond U-B Proper Dih. Improper Dih. LJ-14
  576. 5.19846e+04 2.88727e+05 1.50477e+05 1.41466e+03 3.15471e+04
  577. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  578. -4.55365e+05 -3.90563e+04 -1.42183e+06 5.86757e+03 -1.38623e+06
  579. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  580. 5.73585e+05 -8.12646e+05 3.03393e+02 -9.26017e+01 4.23236e-06
  581.  
  582. <====== ############### ==>
  583. <==== A V E R A G E S ====>
  584. <== ############### ======>
  585.  
  586. Statistics over 10001 steps using 101 frames
  587.  
  588. Energies (kJ/mol)
  589. Bond U-B Proper Dih. Improper Dih. LJ-14
  590. 5.18701e+04 2.87582e+05 1.50835e+05 1.41336e+03 3.17016e+04
  591. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  592. -4.55182e+05 -3.53445e+04 -1.42743e+06 6.04109e+03 -1.38852e+06
  593. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  594. 5.72975e+05 -8.15542e+05 3.03070e+02 8.28492e-02 0.00000e+00
  595.  
  596. Box-X Box-Y Box-Z
  597. 1.60046e+01 1.60046e+01 7.63406e+00
  598.  
  599. Total Virial (kJ/mol)
  600. 1.86566e+05 6.86876e+02 6.96065e+02
  601. 6.72269e+02 1.87989e+05 7.10692e+02
  602. 6.95158e+02 7.15782e+02 1.98410e+05
  603.  
  604. Pressure (bar)
  605. 2.28417e+00 -1.07458e+01 -1.09334e+01
  606. -1.04979e+01 -2.84324e+01 -1.12088e+01
  607. -1.09179e+01 -1.12950e+01 2.63968e+01
  608.  
  609. T-NPROT T-SOL_ION
  610. 3.03124e+02 3.02996e+02
  611.  
  612.  
  613. P P - P M E L O A D B A L A N C I N G
  614.  
  615. PP/PME load balancing changed the cut-off and PME settings:
  616. particle-particle PME
  617. rcoulomb rlist grid spacing 1/beta
  618. initial 1.200 nm 1.239 nm 144 144 64 0.119 nm 0.384 nm
  619. final 1.336 nm 1.375 nm 120 120 60 0.134 nm 0.428 nm
  620. cost-ratio 1.37 0.65
  621. (note that these numbers concern only part of the total PP and PME load)
  622.  
  623.  
  624. M E G A - F L O P S A C C O U N T I N G
  625.  
  626. NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
  627. RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
  628. W3=SPC/TIP3p W4=TIP4p (single or pairs)
  629. V&F=Potential and force V=Potential only F=Force only
  630.  
  631. Computing: M-Number M-Flops % Flops
  632. -----------------------------------------------------------------------------
  633. NB VdW [V&F] 1088.367630 1088.368 0.0
  634. Pair Search distance check 3110.231312 27992.082 0.0
  635. NxN Ewald Elec. + LJ [F] 1581265.967808 123338745.489 96.2
  636. NxN Ewald Elec. + LJ [V&F] 16293.021184 2101799.733 1.6
  637. 1,4 nonbonded interactions 1575.315000 141778.350 0.1
  638. Calc Weights 3096.844245 111486.393 0.1
  639. Spread Q Bspline 66066.010560 132132.021 0.1
  640. Gather F Bspline 66066.010560 396396.063 0.3
  641. 3D-FFT 170420.677320 1363365.419 1.1
  642. Solve PME 1152.230400 73742.746 0.1
  643. Reset In Box 26.008290 78.025 0.0
  644. CG-CoM 26.008290 78.025 0.0
  645. Bonds 212.942580 12563.612 0.0
  646. Propers 1815.312990 415706.675 0.3
  647. Impropers 5.901180 1227.445 0.0
  648. Virial 60.484725 1088.725 0.0
  649. Stop-CM 10.527165 105.272 0.0
  650. Calc-Ekin 103.413915 2792.176 0.0
  651. Lincs 431.941305 25916.478 0.0
  652. Lincs-Mat 3118.378608 12473.514 0.0
  653. Constraint-V 1438.938657 11511.509 0.0
  654. Constraint-Vir 50.541304 1212.991 0.0
  655. Settle 191.685349 61914.368 0.0
  656. -----------------------------------------------------------------------------
  657. Total 128235195.478 100.0
  658. -----------------------------------------------------------------------------
  659.  
  660.  
  661. D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
  662.  
  663. av. #atoms communicated per step for force: 2 x 1370007.8
  664. av. #atoms communicated per step for LINCS: 2 x 49524.2
  665.  
  666. Average load imbalance: 6.4 %
  667. Part of the total run time spent waiting due to load imbalance: 2.9 %
  668. Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 2 % Y 2 % Z 0 %
  669. Average PME mesh/force load: 1.601
  670. Part of the total run time spent waiting due to PP/PME imbalance: 23.6 %
  671.  
  672. NOTE: 23.6 % performance was lost because the PME ranks
  673. had more work to do than the PP ranks.
  674. You might want to increase the number of PME ranks
  675. or increase the cut-off and the grid spacing.
  676.  
  677.  
  678. R E A L C Y C L E A N D T I M E A C C O U N T I N G
  679.  
  680. On 768 MPI ranks doing PP, each using 2 OpenMP threads, and
  681. on 256 MPI ranks doing PME, each using 2 OpenMP threads
  682.  
  683. Computing: Num Num Call Wall time Giga-Cycles
  684. Ranks Threads Count (s) total sum %
  685. -----------------------------------------------------------------------------
  686. Domain decomp. 768 2 126 0.381 1287.603 3.6
  687. DD comm. load 768 2 126 0.003 10.862 0.0
  688. DD comm. bounds 768 2 126 0.014 48.425 0.1
  689. Send X to PME 768 2 5001 0.012 40.072 0.1
  690. Neighbor search 768 2 126 0.103 348.393 1.0
  691. Launch GPU ops. 768 2 10002 0.804 2718.085 7.7
  692. Comm. coord. 768 2 4875 0.573 1937.896 5.5
  693. Force 768 2 5001 0.682 2303.831 6.5
  694. Wait + Comm. F 768 2 5001 1.313 4436.671 12.6
  695. PME mesh * 256 2 5001 6.253 7043.857 20.0
  696. PME wait for PP * 1.576 1775.682 5.0
  697. Wait + Recv. PME F 768 2 5001 1.479 4998.574 14.2
  698. Wait GPU nonlocal 768 2 5001 0.661 2234.111 6.3
  699. Wait GPU local 768 2 5001 0.484 1634.079 4.6
  700. NB X/F buffer ops. 768 2 19752 0.290 979.523 2.8
  701. Write traj. 768 2 2 0.061 207.743 0.6
  702. Update 768 2 5001 0.092 310.507 0.9
  703. Constraints 768 2 5001 0.446 1507.060 4.3
  704. Comm. energies 768 2 251 0.367 1239.826 3.5
  705. Rest 0.064 215.421 0.6
  706. -----------------------------------------------------------------------------
  707. Total 7.830 35278.245 100.0
  708. -----------------------------------------------------------------------------
  709. (*) Note that with separate PME ranks, the walltime column actually sums to
  710. twice the total reported, but the cycle count total and % are correct.
  711. -----------------------------------------------------------------------------
  712. Breakdown of PME mesh computation
  713. -----------------------------------------------------------------------------
  714. PME redist. X/F 256 2 10002 1.137 1280.874 3.6
  715. PME spread/gather 256 2 10002 1.408 1586.372 4.5
  716. PME 3D-FFT 256 2 10002 0.492 554.747 1.6
  717. PME 3D-FFT Comm. 256 2 20004 3.143 3540.090 10.0
  718. PME solve Elec 256 2 5001 0.049 54.638 0.2
  719. -----------------------------------------------------------------------------
  720.  
  721. Core t (s) Wall t (s) (%)
  722. Time: 14927.494 7.830 190649.2
  723. (ns/day) (hour/ns)
  724. Performance: 110.369 0.217
  725. Finished mdrun on rank 0 Wed Apr 29 18:54:56 2015
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement