Advertisement
Guest User

FourThreads

a guest
Apr 30th, 2015
506
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 35.37 KB | None | 0 0
  1. Log file opened on Thu Apr 30 12:24:09 2015
  2. Host: nid01946 pid: 24118 rank ID: 0 number of ranks: 256
  3. GROMACS: gmx mdrun, VERSION 5.0.2
  4.  
  5. GROMACS is written by:
  6. Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
  7. Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
  8. Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
  9. Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
  10. Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
  11. Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
  12. Peter Tieleman Christian Wennberg Maarten Wolf
  13. and the project leaders:
  14. Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
  15.  
  16. Copyright (c) 1991-2000, University of Groningen, The Netherlands.
  17. Copyright (c) 2001-2014, The GROMACS development team at
  18. Uppsala University, Stockholm University and
  19. the Royal Institute of Technology, Sweden.
  20. check out http://www.gromacs.org for more information.
  21.  
  22. GROMACS is free software; you can redistribute it and/or modify it
  23. under the terms of the GNU Lesser General Public License
  24. as published by the Free Software Foundation; either version 2.1
  25. of the License, or (at your option) any later version.
  26.  
  27. GROMACS: gmx mdrun, VERSION 5.0.2
  28. Executable: mdrun_mpi
  29. Library dir: /sw/xk6/gromacs/5.0.2/cle5.2_gnu4.8.2/share/gromacs/top
  30. Command line:
  31. mdrun_mpi -npme 128 -dlb yes -pin on -resethway -noconfout -v -s opt.tpr -deffnm test
  32.  
  33. Gromacs version: VERSION 5.0.2
  34. Precision: single
  35. Memory model: 64 bit
  36. MPI library: MPI
  37. OpenMP support: enabled
  38. GPU support: enabled
  39. invsqrt routine: gmx_software_invsqrt(x)
  40. SIMD instructions: AVX_128_FMA
  41. FFT library: commercial-fftw-3.3.4-fma-sse2-avx
  42. RDTSCP usage: disabled
  43. C++11 compilation: disabled
  44. TNG support: enabled
  45. Tracing support: disabled
  46. Built on: Thu Mar 12 18:27:12 EDT 2015
  47. Built by: ff1@titan-ext8 [CMAKE]
  48. Build OS/arch: Linux 3.0.101-0.46-default x86_64
  49. Build CPU vendor: AuthenticAMD
  50. Build CPU brand: AMD Opteron(tm) Processor 6140
  51. Build CPU family: 16 Model: 9 Stepping: 1
  52. Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
  53. C compiler: /opt/cray/craype/2.2.1/bin/cc GNU 4.8.2
  54. C compiler flags: -mavx -mfma4 -mxop -Wno-maybe-uninitialized -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
  55. C++ compiler: /opt/cray/craype/2.2.1/bin/CC GNU 4.8.2
  56. C++ compiler flags: -mavx -mfma4 -mxop -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
  57. Boost version: 1.55.0 (internal)
  58. CUDA compiler: /opt/nvidia/cudatoolkit/5.5.51-1.0502.9594.3.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on Thu_Mar__6_02:21:19_PST_2014;Cuda compilation tools, release 5.5, V5.5.0
  59. CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;; ;-mavx;-mfma4;-mxop;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-O3;-DNDEBUG;-fomit-frame-pointer;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds;
  60. CUDA driver: 5.50
  61. CUDA runtime: 5.50
  62.  
  63.  
  64.  
  65. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  66. B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
  67. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
  68. molecular simulation
  69. J. Chem. Theory Comput. 4 (2008) pp. 435-447
  70. -------- -------- --- Thank You --- -------- --------
  71.  
  72.  
  73. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  74. D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
  75. Berendsen
  76. GROMACS: Fast, Flexible and Free
  77. J. Comp. Chem. 26 (2005) pp. 1701-1719
  78. -------- -------- --- Thank You --- -------- --------
  79.  
  80.  
  81. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  82. E. Lindahl and B. Hess and D. van der Spoel
  83. GROMACS 3.0: A package for molecular simulation and trajectory analysis
  84. J. Mol. Mod. 7 (2001) pp. 306-317
  85. -------- -------- --- Thank You --- -------- --------
  86.  
  87.  
  88. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  89. H. J. C. Berendsen, D. van der Spoel and R. van Drunen
  90. GROMACS: A message-passing parallel molecular dynamics implementation
  91. Comp. Phys. Comm. 91 (1995) pp. 43-56
  92. -------- -------- --- Thank You --- -------- --------
  93.  
  94.  
  95. Number of hardware threads detected (16) does not match the number reported by OpenMP (1).
  96. Consider setting the launch configuration manually!
  97. Changing nstlist from 20 to 40, rlist from 1.2 to 1.239
  98.  
  99. Input Parameters:
  100. integrator = md
  101. tinit = 0
  102. dt = 0.002
  103. nsteps = 10000
  104. init-step = 0
  105. simulation-part = 1
  106. comm-mode = Linear
  107. nstcomm = 100
  108. bd-fric = 0
  109. ld-seed = 60975668
  110. emtol = 10
  111. emstep = 0.01
  112. niter = 20
  113. fcstep = 0
  114. nstcgsteep = 1000
  115. nbfgscorr = 10
  116. rtpi = 0.05
  117. nstxout = 5000
  118. nstvout = 5000
  119. nstfout = 5000
  120. nstlog = 1000
  121. nstcalcenergy = 100
  122. nstenergy = 1000
  123. nstxout-compressed = 0
  124. compressed-x-precision = 1000
  125. cutoff-scheme = Verlet
  126. nstlist = 40
  127. ns-type = Grid
  128. pbc = xyz
  129. periodic-molecules = FALSE
  130. verlet-buffer-tolerance = 0.005
  131. rlist = 1.239
  132. rlistlong = 1.239
  133. nstcalclr = 20
  134. coulombtype = PME
  135. coulomb-modifier = Potential-shift
  136. rcoulomb-switch = 0
  137. rcoulomb = 1.2
  138. epsilon-r = 1
  139. epsilon-rf = inf
  140. vdw-type = Cut-off
  141. vdw-modifier = Force-switch
  142. rvdw-switch = 1
  143. rvdw = 1.2
  144. DispCorr = No
  145. table-extension = 1
  146. fourierspacing = 0.12
  147. fourier-nx = 144
  148. fourier-ny = 144
  149. fourier-nz = 64
  150. pme-order = 4
  151. ewald-rtol = 1e-05
  152. ewald-rtol-lj = 0.001
  153. lj-pme-comb-rule = Geometric
  154. ewald-geometry = 0
  155. epsilon-surface = 0
  156. implicit-solvent = No
  157. gb-algorithm = Still
  158. nstgbradii = 1
  159. rgbradii = 1
  160. gb-epsilon-solvent = 80
  161. gb-saltconc = 0
  162. gb-obc-alpha = 1
  163. gb-obc-beta = 0.8
  164. gb-obc-gamma = 4.85
  165. gb-dielectric-offset = 0.009
  166. sa-algorithm = Ace-approximation
  167. sa-surface-tension = 2.05016
  168. tcoupl = Nose-Hoover
  169. nsttcouple = 20
  170. nh-chain-length = 1
  171. print-nose-hoover-chain-variables = FALSE
  172. pcoupl = Parrinello-Rahman
  173. pcoupltype = Semiisotropic
  174. nstpcouple = 20
  175. tau-p = 5
  176. compressibility (3x3):
  177. compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
  178. compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
  179. compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
  180. ref-p (3x3):
  181. ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
  182. ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
  183. ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
  184. refcoord-scaling = COM
  185. posres-com (3):
  186. posres-com[0]= 0.00000e+00
  187. posres-com[1]= 0.00000e+00
  188. posres-com[2]= 0.00000e+00
  189. posres-comB (3):
  190. posres-comB[0]= 0.00000e+00
  191. posres-comB[1]= 0.00000e+00
  192. posres-comB[2]= 0.00000e+00
  193. QMMM = FALSE
  194. QMconstraints = 0
  195. QMMMscheme = 0
  196. MMChargeScaleFactor = 1
  197. qm-opts:
  198. ngQM = 0
  199. constraint-algorithm = Lincs
  200. continuation = TRUE
  201. Shake-SOR = FALSE
  202. shake-tol = 0.0001
  203. lincs-order = 4
  204. lincs-iter = 1
  205. lincs-warnangle = 30
  206. nwall = 0
  207. wall-type = 9-3
  208. wall-r-linpot = -1
  209. wall-atomtype[0] = -1
  210. wall-atomtype[1] = -1
  211. wall-density[0] = 0
  212. wall-density[1] = 0
  213. wall-ewald-zfac = 3
  214. pull = no
  215. rotation = FALSE
  216. interactiveMD = FALSE
  217. disre = No
  218. disre-weighting = Conservative
  219. disre-mixed = FALSE
  220. dr-fc = 1000
  221. dr-tau = 0
  222. nstdisreout = 100
  223. orire-fc = 0
  224. orire-tau = 0
  225. nstorireout = 100
  226. free-energy = no
  227. cos-acceleration = 0
  228. deform (3x3):
  229. deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  230. deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  231. deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
  232. simulated-tempering = FALSE
  233. E-x:
  234. n = 0
  235. E-xt:
  236. n = 0
  237. E-y:
  238. n = 0
  239. E-yt:
  240. n = 0
  241. E-z:
  242. n = 0
  243. E-zt:
  244. n = 0
  245. swapcoords = no
  246. adress = FALSE
  247. userint1 = 0
  248. userint2 = 0
  249. userint3 = 0
  250. userint4 = 0
  251. userreal1 = 0
  252. userreal2 = 0
  253. userreal3 = 0
  254. userreal4 = 0
  255. grpopts:
  256. nrdf: 261777 192987
  257. ref-t: 303.15 303.15
  258. tau-t: 1 1
  259. annealing: No No
  260. annealing-npoints: 0 0
  261. acc: 0 0 0
  262. nfreeze: N N N
  263. energygrp-flags[ 0]: 0
  264.  
  265. Initializing Domain Decomposition on 256 ranks
  266. Dynamic load balancing: yes
  267. Will sort the charge groups at every domain (re)decomposition
  268. Initial maximum inter charge-group distances:
  269. two-body bonded interactions: 0.420 nm, LJ-14, atoms 42821 42830
  270. multi-body bonded interactions: 0.420 nm, Proper Dih., atoms 42821 42830
  271. Minimum cell size due to bonded interactions: 0.462 nm
  272. Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
  273. Estimated maximum distance required for P-LINCS: 0.222 nm
  274. Using 128 separate PME ranks, per user request
  275. Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
  276. Optimizing the DD grid for 128 cells with a minimum initial size of 0.578 nm
  277. The maximum allowed number of cells is: X 27 Y 27 Z 13
  278. Domain decomposition grid 8 x 8 x 2, separate PME ranks 128
  279. PME domain decomposition: 8 x 16 x 1
  280. Interleaving PP and PME ranks
  281. This rank does only particle-particle work.
  282.  
  283. Domain decomposition rank 0, coordinates 0 0 0
  284.  
  285. Using 256 MPI processes
  286. Using 4 OpenMP threads per MPI process
  287.  
  288. Detecting CPU SIMD instructions.
  289. Present hardware specification:
  290. Vendor: AuthenticAMD
  291. Brand: AMD Opteron(TM) Processor 6274
  292. Family: 21 Model: 1 Stepping: 2
  293. Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
  294. SIMD instructions most likely to fit this hardware: AVX_128_FMA
  295. SIMD instructions selected at GROMACS compile time: AVX_128_FMA
  296.  
  297.  
  298. The current CPU can measure timings more accurately than the code in
  299. mdrun_mpi was configured to use. This might affect your simulation
  300. speed as accurate timings are needed for load-balancing.
  301. Please consider rebuilding mdrun_mpi with the GMX_USE_RDTSCP=OFF CMake option.
  302.  
  303.  
  304. 1 GPU detected on host nid01946:
  305. #0: NVIDIA Tesla K20X, compute cap.: 3.5, ECC: yes, stat: compatible
  306.  
  307. 1 GPU auto-selected for this run.
  308. Mapping of GPU to the 1 PP rank in this node: #0
  309.  
  310. Will do PME sum in reciprocal space for electrostatic interactions.
  311.  
  312. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  313. U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
  314. A smooth particle mesh Ewald method
  315. J. Chem. Phys. 103 (1995) pp. 8577-8592
  316. -------- -------- --- Thank You --- -------- --------
  317.  
  318. Will do ordinary reciprocal space Ewald sum.
  319. Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
  320. Cut-off's: NS: 1.239 Coulomb: 1.2 LJ: 1.2
  321. System total charge: 0.000
  322. Generated table with 1119 data points for Ewald.
  323. Tabscale = 500 points/nm
  324. Generated table with 1119 data points for LJ6Shift.
  325. Tabscale = 500 points/nm
  326. Generated table with 1119 data points for LJ12Shift.
  327. Tabscale = 500 points/nm
  328. Generated table with 1119 data points for 1-4 COUL.
  329. Tabscale = 500 points/nm
  330. Generated table with 1119 data points for 1-4 LJ6.
  331. Tabscale = 500 points/nm
  332. Generated table with 1119 data points for 1-4 LJ12.
  333. Tabscale = 500 points/nm
  334.  
  335. Using CUDA 8x8 non-bonded kernels
  336.  
  337. Potential shift: LJ r^-12: -2.648e-01 r^-6: -5.349e-01, Ewald -1.000e-05
  338. Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size: 1536
  339.  
  340.  
  341. Overriding thread affinity set outside mdrun_mpi
  342.  
  343. Pinning threads with an auto-selected logical core stride of 1
  344.  
  345. Initializing Parallel LINear Constraint Solver
  346.  
  347. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  348. B. Hess
  349. P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
  350. J. Chem. Theory Comput. 4 (2008) pp. 116-122
  351. -------- -------- --- Thank You --- -------- --------
  352.  
  353. The number of constraints is 67980
  354. There are inter charge-group constraints,
  355. will communicate selected coordinates each lincs iteration
  356.  
  357. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
  358. S. Miyamoto and P. A. Kollman
  359. SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
  360. Water Models
  361. J. Comp. Chem. 13 (1992) pp. 952-962
  362. -------- -------- --- Thank You --- -------- --------
  363.  
  364.  
  365. Linking all bonded interactions to atoms
  366. There are 739685 inter charge-group exclusions,
  367. will use an extra communication step for exclusion forces for PME
  368.  
  369. The maximum number of communication pulses is: X 1 Y 1 Z 1
  370. The minimum size for domain decomposition cells is 1.239 nm
  371. The requested allowed shrink of DD cells (option -dds) is: 0.80
  372. The allowed shrink of domain decomposition cells is: X 0.62 Y 0.62 Z 0.33
  373. The maximum allowed distance for charge groups involved in interactions is:
  374. non-bonded interactions 1.239 nm
  375. two-body bonded interactions (-rdd) 1.239 nm
  376. multi-body bonded interactions (-rdd) 1.212 nm
  377. atoms separated by up to 5 constraints (-rcon) 1.239 nm
  378.  
  379.  
  380. Making 3D domain decomposition grid 8 x 8 x 2, home cell index 0 0 0
  381.  
  382. Center of mass motion removal mode is Linear
  383. We have the following groups for center of mass motion removal:
  384. 0: NPROT
  385. 1: SOL_ION
  386. There are: 206415 Atoms
  387. Charge group distribution at step 0: 1641 1598 1624 1561 1574 1617 1609 1647 1618 1559 1620 1669 1602 1608 1633 1588 1641 1626 1585 1594 1584 1651 1608 1583 1655 1652 1650 1593 1573 1638 1635 1652 1600 1620 1645 1598 1616 1595 1621 1618 1610 1561 1601 1588 1589 1621 1644 1640 1674 1591 1655 1594 1644 1593 1608 1641 1596 1616 1588 1599 1595 1639 1642 1627 1606 1596 1569 1649 1601 1607 1606 1600 1583 1645 1587 1604 1622 1584 1585 1668 1625 1655 1622 1617 1559 1648 1616 1610 1646 1636 1514 1657 1575 1626 1598 1591 1594 1600 1585 1624 1603 1655 1565 1621 1622 1618 1597 1660 1625 1543 1585 1627 1640 1598 1603 1633 1626 1598 1618 1648 1582 1628 1584 1620 1590 1604 1645 1610
  388. Initial temperature: 303.008 K
  389.  
  390. Started mdrun on rank 0 Thu Apr 30 12:24:10 2015
  391. Step Time Lambda
  392. 0 0.00000 0.00000
  393.  
  394. Energies (kJ/mol)
  395. Bond U-B Proper Dih. Improper Dih. LJ-14
  396. 5.27858e+04 2.89239e+05 1.51310e+05 1.50125e+03 3.17014e+04
  397. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  398. -4.55645e+05 -3.43455e+04 -1.43186e+06 8.83147e+03 -1.38648e+06
  399. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  400. 5.74416e+05 -8.12065e+05 3.03832e+02 -5.33660e+01 4.19742e-06
  401.  
  402. DD step 39 vol min/aver 1.000 load imb.: force 17.0% pme mesh/force 1.128
  403.  
  404. step 120: timed with pme grid 144 144 64, coulomb cutoff 1.200: 106.2 M-cycles
  405. step 200: timed with pme grid 128 128 60, coulomb cutoff 1.269: 109.0 M-cycles
  406. step 280: timed with pme grid 112 112 56, coulomb cutoff 1.431: 112.3 M-cycles
  407. step 360: timed with pme grid 104 104 48, coulomb cutoff 1.586: 114.9 M-cycles
  408. step 360: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.603
  409. step 440: timed with pme grid 144 144 64, coulomb cutoff 1.200: 92.3 M-cycles
  410. step 520: timed with pme grid 128 128 64, coulomb cutoff 1.252: 83.3 M-cycles
  411. step 600: timed with pme grid 128 128 60, coulomb cutoff 1.269: 94.0 M-cycles
  412. step 680: timed with pme grid 120 120 60, coulomb cutoff 1.336: 84.0 M-cycles
  413. step 760: timed with pme grid 120 120 56, coulomb cutoff 1.359: 91.1 M-cycles
  414. step 840: timed with pme grid 112 112 56, coulomb cutoff 1.431: 82.4 M-cycles
  415. step 920: timed with pme grid 112 112 52, coulomb cutoff 1.464: 87.2 M-cycles
  416. step 1000: timed with pme grid 108 108 52, coulomb cutoff 1.484: 86.3 M-cycles
  417. step 1000: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.484
  418. DD step 999 vol min/aver 0.813 load imb.: force 1.6% pme mesh/force 2.427
  419.  
  420. Step Time Lambda
  421. 1000 2.00000 0.00000
  422.  
  423. Energies (kJ/mol)
  424. Bond U-B Proper Dih. Improper Dih. LJ-14
  425. 5.22339e+04 2.88294e+05 1.51124e+05 1.46352e+03 3.16583e+04
  426. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  427. -4.55027e+05 -3.49318e+04 -1.43354e+06 8.96135e+03 -1.38977e+06
  428. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  429. 5.73413e+05 -8.16355e+05 3.03301e+02 -2.05265e+02 4.19518e-06
  430.  
  431. step 1080: timed with pme grid 144 144 64, coulomb cutoff 1.200: 95.2 M-cycles
  432. step 1160: timed with pme grid 128 128 64, coulomb cutoff 1.252: 86.5 M-cycles
  433. step 1240: timed with pme grid 128 128 60, coulomb cutoff 1.269: 86.2 M-cycles
  434. step 1320: timed with pme grid 120 120 60, coulomb cutoff 1.336: 89.6 M-cycles
  435. step 1400: timed with pme grid 120 120 56, coulomb cutoff 1.359: 81.6 M-cycles
  436. step 1480: timed with pme grid 112 112 56, coulomb cutoff 1.431: 84.1 M-cycles
  437. step 1560: timed with pme grid 112 112 52, coulomb cutoff 1.464: 83.7 M-cycles
  438. step 1640: timed with pme grid 108 108 52, coulomb cutoff 1.484: 86.9 M-cycles
  439. step 1720: timed with pme grid 128 128 64, coulomb cutoff 1.252: 83.9 M-cycles
  440. step 1800: timed with pme grid 128 128 60, coulomb cutoff 1.269: 91.7 M-cycles
  441. step 1880: timed with pme grid 120 120 60, coulomb cutoff 1.336: 83.9 M-cycles
  442. step 1960: timed with pme grid 120 120 56, coulomb cutoff 1.359: 84.9 M-cycles
  443. DD step 1999 vol min/aver 0.835 load imb.: force 2.6% pme mesh/force 2.234
  444.  
  445. Step Time Lambda
  446. 2000 4.00000 0.00000
  447.  
  448. Energies (kJ/mol)
  449. Bond U-B Proper Dih. Improper Dih. LJ-14
  450. 5.16113e+04 2.87607e+05 1.51005e+05 1.36877e+03 3.17262e+04
  451. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  452. -4.54477e+05 -3.46807e+04 -1.42667e+06 4.69441e+03 -1.38781e+06
  453. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  454. 5.74593e+05 -8.13222e+05 3.03926e+02 -3.75786e+01 4.22066e-06
  455.  
  456. step 2040: timed with pme grid 112 112 56, coulomb cutoff 1.431: 81.7 M-cycles
  457. step 2040: the domain decompostion limits the PME load balancing to a coulomb cut-off of 1.431
  458. step 2120: timed with pme grid 144 144 64, coulomb cutoff 1.200: 90.4 M-cycles
  459. step 2200: timed with pme grid 128 128 64, coulomb cutoff 1.252: 85.8 M-cycles
  460. step 2280: timed with pme grid 128 128 60, coulomb cutoff 1.269: 88.2 M-cycles
  461. step 2360: timed with pme grid 120 120 60, coulomb cutoff 1.336: 84.2 M-cycles
  462. step 2440: timed with pme grid 120 120 56, coulomb cutoff 1.359: 84.6 M-cycles
  463. step 2520: timed with pme grid 112 112 56, coulomb cutoff 1.431: 80.1 M-cycles
  464. step 2600: timed with pme grid 128 128 64, coulomb cutoff 1.252: 98.5 M-cycles
  465. step 2680: timed with pme grid 128 128 60, coulomb cutoff 1.269: 105.0 M-cycles
  466. step 2760: timed with pme grid 120 120 60, coulomb cutoff 1.336: 87.3 M-cycles
  467. step 2840: timed with pme grid 120 120 56, coulomb cutoff 1.359: 87.6 M-cycles
  468. step 2920: timed with pme grid 112 112 56, coulomb cutoff 1.431: 79.0 M-cycles
  469. optimal pme grid 112 112 56, coulomb cutoff 1.431
  470. DD step 2999 vol min/aver 0.837 load imb.: force 2.3% pme mesh/force 2.224
  471.  
  472. Step Time Lambda
  473. 3000 6.00000 0.00000
  474.  
  475. Energies (kJ/mol)
  476. Bond U-B Proper Dih. Improper Dih. LJ-14
  477. 5.16298e+04 2.87335e+05 1.50838e+05 1.40316e+03 3.18678e+04
  478. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  479. -4.55062e+05 -3.76964e+04 -1.42403e+06 4.58932e+03 -1.38913e+06
  480. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  481. 5.73322e+05 -8.15805e+05 3.03253e+02 -1.92705e+02 4.18539e-06
  482.  
  483. DD step 3999 vol min/aver 0.850 load imb.: force 3.5% pme mesh/force 2.348
  484.  
  485. Step Time Lambda
  486. 4000 8.00000 0.00000
  487.  
  488. Energies (kJ/mol)
  489. Bond U-B Proper Dih. Improper Dih. LJ-14
  490. 5.14032e+04 2.87969e+05 1.51062e+05 1.43241e+03 3.18387e+04
  491. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  492. -4.55869e+05 -3.52871e+04 -1.42459e+06 4.64457e+03 -1.38740e+06
  493. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  494. 5.71147e+05 -8.16250e+05 3.02103e+02 9.11019e+01 4.24142e-06
  495.  
  496.  
  497. step 5000: resetting all time and cycle counters
  498.  
  499. Restarted time on rank 0 Thu Apr 30 12:24:30 2015
  500. DD step 4999 vol min/aver 0.846 load imb.: force 3.1% pme mesh/force 2.283
  501.  
  502. Step Time Lambda
  503. 5000 10.00000 0.00000
  504.  
  505. Energies (kJ/mol)
  506. Bond U-B Proper Dih. Improper Dih. LJ-14
  507. 5.27171e+04 2.87086e+05 1.50683e+05 1.42820e+03 3.17165e+04
  508. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  509. -4.55266e+05 -3.59058e+04 -1.42348e+06 4.59201e+03 -1.38643e+06
  510. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  511. 5.70754e+05 -8.15674e+05 3.01895e+02 -6.53456e+01 4.25289e-06
  512.  
  513. DD step 5999 vol min/aver 0.835 load imb.: force 1.9% pme mesh/force 2.376
  514.  
  515. Step Time Lambda
  516. 6000 12.00000 0.00000
  517.  
  518. Energies (kJ/mol)
  519. Bond U-B Proper Dih. Improper Dih. LJ-14
  520. 5.21181e+04 2.88148e+05 1.51161e+05 1.45073e+03 3.17385e+04
  521. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  522. -4.54968e+05 -3.55334e+04 -1.42569e+06 4.54206e+03 -1.38703e+06
  523. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  524. 5.73379e+05 -8.13655e+05 3.03284e+02 -1.19525e+02 4.20904e-06
  525.  
  526. DD step 6999 vol min/aver 0.816 load imb.: force 2.3% pme mesh/force 2.392
  527.  
  528. Step Time Lambda
  529. 7000 14.00000 0.00000
  530.  
  531. Energies (kJ/mol)
  532. Bond U-B Proper Dih. Improper Dih. LJ-14
  533. 5.19565e+04 2.87655e+05 1.50761e+05 1.33519e+03 3.18436e+04
  534. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  535. -4.56036e+05 -3.51603e+04 -1.42398e+06 4.49858e+03 -1.38713e+06
  536. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  537. 5.74061e+05 -8.13071e+05 3.03644e+02 6.96304e+01 4.22401e-06
  538.  
  539. DD step 7999 vol min/aver 0.832 load imb.: force 2.1% pme mesh/force 2.455
  540.  
  541. Step Time Lambda
  542. 8000 16.00000 0.00000
  543.  
  544. Energies (kJ/mol)
  545. Bond U-B Proper Dih. Improper Dih. LJ-14
  546. 5.18719e+04 2.87286e+05 1.51555e+05 1.37778e+03 3.17051e+04
  547. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  548. -4.56243e+05 -3.61634e+04 -1.42415e+06 4.69818e+03 -1.38806e+06
  549. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  550. 5.73987e+05 -8.14077e+05 3.03605e+02 2.64174e+01 4.20138e-06
  551.  
  552. DD step 8999 vol min/aver 0.823 load imb.: force 2.5% pme mesh/force 2.470
  553.  
  554. Step Time Lambda
  555. 9000 18.00000 0.00000
  556.  
  557. Energies (kJ/mol)
  558. Bond U-B Proper Dih. Improper Dih. LJ-14
  559. 5.16139e+04 2.87526e+05 1.51142e+05 1.39203e+03 3.18259e+04
  560. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  561. -4.55993e+05 -3.64471e+04 -1.42463e+06 4.51628e+03 -1.38906e+06
  562. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  563. 5.71761e+05 -8.17295e+05 3.02428e+02 9.60366e+01 4.16279e-06
  564.  
  565. DD step 9999 vol min/aver 0.806 load imb.: force 1.6% pme mesh/force 2.395
  566.  
  567. Step Time Lambda
  568. 10000 20.00000 0.00000
  569.  
  570. Energies (kJ/mol)
  571. Bond U-B Proper Dih. Improper Dih. LJ-14
  572. 5.15769e+04 2.85892e+05 1.50943e+05 1.39197e+03 3.17989e+04
  573. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  574. -4.56130e+05 -3.39380e+04 -1.43001e+06 4.60777e+03 -1.39386e+06
  575. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  576. 5.73135e+05 -8.20729e+05 3.03155e+02 1.75678e+02 4.22713e-06
  577.  
  578. <====== ############### ==>
  579. <==== A V E R A G E S ====>
  580. <== ############### ======>
  581.  
  582. Statistics over 10001 steps using 101 frames
  583.  
  584. Energies (kJ/mol)
  585. Bond U-B Proper Dih. Improper Dih. LJ-14
  586. 5.19270e+04 2.87532e+05 1.50784e+05 1.41552e+03 3.17099e+04
  587. Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
  588. -4.55452e+05 -3.57340e+04 -1.42601e+06 5.05284e+03 -1.38878e+06
  589. Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
  590. 5.73150e+05 -8.15629e+05 3.03162e+02 -1.10950e+01 0.00000e+00
  591.  
  592. Box-X Box-Y Box-Z
  593. 1.60015e+01 1.60015e+01 7.63570e+00
  594.  
  595. Total Virial (kJ/mol)
  596. 1.86825e+05 4.87536e+02 -3.95055e+02
  597. 4.90755e+02 1.87159e+05 -1.05574e+02
  598. -4.00751e+02 -9.74001e+01 2.01130e+05
  599.  
  600. Pressure (bar)
  601. -4.71925e+00 -9.89686e+00 8.51045e+00
  602. -9.95177e+00 -1.29647e+01 3.97039e+00
  603. 8.60705e+00 3.83140e+00 -1.56010e+01
  604.  
  605. T-NPROT T-SOL_ION
  606. 3.03164e+02 3.03161e+02
  607.  
  608.  
  609. P P - P M E L O A D B A L A N C I N G
  610.  
  611. NOTE: The PP/PME load balancing was limited by the domain decompostion,
  612. you might not have reached a good load balance.
  613. Try different mdrun -dd settings or lower the -dds value.
  614.  
  615. PP/PME load balancing changed the cut-off and PME settings:
  616. particle-particle PME
  617. rcoulomb rlist grid spacing 1/beta
  618. initial 1.200 nm 1.239 nm 144 144 64 0.119 nm 0.384 nm
  619. final 1.431 nm 1.470 nm 112 112 56 0.143 nm 0.458 nm
  620. cost-ratio 1.67 0.53
  621. (note that these numbers concern only part of the total PP and PME load)
  622.  
  623.  
  624. M E G A - F L O P S A C C O U N T I N G
  625.  
  626. NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
  627. RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
  628. W3=SPC/TIP3p W4=TIP4p (single or pairs)
  629. V&F=Potential and force V=Potential only F=Force only
  630.  
  631. Computing: M-Number M-Flops % Flops
  632. -----------------------------------------------------------------------------
  633. NB VdW [V&F] 1088.367630 1088.368 0.0
  634. Pair Search distance check 5466.680512 49200.125 0.0
  635. NxN Ewald Elec. + LJ [F] 1826845.484608 142493947.799 96.6
  636. NxN Ewald Elec. + LJ [V&F] 18822.293824 2428075.903 1.6
  637. 1,4 nonbonded interactions 1575.315000 141778.350 0.1
  638. Calc Weights 3096.844245 111486.393 0.1
  639. Spread Q Bspline 66066.010560 132132.021 0.1
  640. Gather F Bspline 66066.010560 396396.063 0.3
  641. 3D-FFT 136460.296602 1091682.373 0.7
  642. Solve PME 1003.720704 64238.125 0.0
  643. Reset In Box 26.008290 78.025 0.0
  644. CG-CoM 26.008290 78.025 0.0
  645. Bonds 212.942580 12563.612 0.0
  646. Propers 1815.312990 415706.675 0.3
  647. Impropers 5.901180 1227.445 0.0
  648. Virial 53.255925 958.607 0.0
  649. Stop-CM 10.527165 105.272 0.0
  650. Calc-Ekin 103.413915 2792.176 0.0
  651. Lincs 387.745759 23264.746 0.0
  652. Lincs-Mat 2798.666532 11194.666 0.0
  653. Constraint-V 1309.868960 10478.952 0.0
  654. Constraint-Vir 46.281189 1110.749 0.0
  655. Settle 178.125814 57534.638 0.0
  656. -----------------------------------------------------------------------------
  657. Total 147447119.106 100.0
  658. -----------------------------------------------------------------------------
  659.  
  660.  
  661. D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
  662.  
  663. av. #atoms communicated per step for force: 2 x 597133.5
  664. av. #atoms communicated per step for LINCS: 2 x 25401.9
  665.  
  666. Average load imbalance: 2.6 %
  667. Part of the total run time spent waiting due to load imbalance: 0.7 %
  668. Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 % Z 0 %
  669. Average PME mesh/force load: 2.371
  670. Part of the total run time spent waiting due to PP/PME imbalance: 22.3 %
  671.  
  672. NOTE: 22.3 % performance was lost because the PME ranks
  673. had more work to do than the PP ranks.
  674. You might want to increase the number of PME ranks
  675. or increase the cut-off and the grid spacing.
  676.  
  677.  
  678. R E A L C Y C L E A N D T I M E A C C O U N T I N G
  679.  
  680. On 128 MPI ranks doing PP, each using 4 OpenMP threads, and
  681. on 128 MPI ranks doing PME, each using 4 OpenMP threads
  682.  
  683. Computing: Num Num Call Wall time Giga-Cycles
  684. Ranks Threads Count (s) total sum %
  685. -----------------------------------------------------------------------------
  686. Domain decomp. 128 4 126 0.469 528.272 2.4
  687. DD comm. load 128 4 126 0.004 4.945 0.0
  688. DD comm. bounds 128 4 126 0.021 23.422 0.1
  689. Send X to PME 128 4 5001 0.051 57.293 0.3
  690. Neighbor search 128 4 126 0.227 256.118 1.2
  691. Launch GPU ops. 128 4 10002 0.396 445.524 2.1
  692. Comm. coord. 128 4 4875 1.011 1138.874 5.2
  693. Force 128 4 5001 1.564 1761.513 8.1
  694. Wait + Comm. F 128 4 5001 0.934 1052.349 4.9
  695. PME mesh * 128 4 5001 7.112 8011.553 36.9
  696. PME wait for PP * 2.519 2836.985 13.1
  697. Wait + Recv. PME F 128 4 5001 2.731 3076.330 14.2
  698. Wait GPU nonlocal 128 4 5001 0.027 30.401 0.1
  699. Wait GPU local 128 4 5001 0.021 23.784 0.1
  700. NB X/F buffer ops. 128 4 19752 0.448 504.248 2.3
  701. Write traj. 128 4 2 0.037 41.230 0.2
  702. Update 128 4 5001 0.273 307.857 1.4
  703. Constraints 128 4 5001 0.961 1082.961 5.0
  704. Comm. energies 128 4 251 0.290 326.762 1.5
  705. Rest 0.166 186.669 0.9
  706. -----------------------------------------------------------------------------
  707. Total 9.631 21697.106 100.0
  708. -----------------------------------------------------------------------------
  709. (*) Note that with separate PME ranks, the walltime column actually sums to
  710. twice the total reported, but the cycle count total and % are correct.
  711. -----------------------------------------------------------------------------
  712. Breakdown of PME mesh computation
  713. -----------------------------------------------------------------------------
  714. PME redist. X/F 128 4 10002 1.890 2128.461 9.8
  715. PME spread/gather 128 4 10002 1.706 1921.464 8.9
  716. PME 3D-FFT 128 4 10002 0.686 772.746 3.6
  717. PME 3D-FFT Comm. 128 4 20004 2.752 3099.967 14.3
  718. PME solve Elec 128 4 5001 0.038 42.752 0.2
  719. -----------------------------------------------------------------------------
  720.  
  721. Core t (s) Wall t (s) (%)
  722. Time: 9593.355 9.631 99608.0
  723. (ns/day) (hour/ns)
  724. Performance: 89.727 0.267
  725. Finished mdrun on rank 0 Thu Apr 30 12:24:40 2015
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement