Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Log file opened on Thu Apr 30 12:00:02 2015
- Host: nid01946 pid: 23408 rank ID: 0 number of ranks: 512
- GROMACS: gmx mdrun, VERSION 5.0.2
- GROMACS is written by:
- Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
- Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
- Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
- Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
- Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
- Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
- Peter Tieleman Christian Wennberg Maarten Wolf
- and the project leaders:
- Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
- Copyright (c) 1991-2000, University of Groningen, The Netherlands.
- Copyright (c) 2001-2014, The GROMACS development team at
- Uppsala University, Stockholm University and
- the Royal Institute of Technology, Sweden.
- check out http://www.gromacs.org for more information.
- GROMACS is free software; you can redistribute it and/or modify it
- under the terms of the GNU Lesser General Public License
- as published by the Free Software Foundation; either version 2.1
- of the License, or (at your option) any later version.
- GROMACS: gmx mdrun, VERSION 5.0.2
- Executable: mdrun_mpi
- Library dir: /sw/xk6/gromacs/5.0.2/cle5.2_gnu4.8.2/share/gromacs/top
- Command line:
- mdrun_mpi -gpu_id 000 -npme 128 -dlb yes -pin on -resethway -noconfout -v -s opt.tpr -deffnm test
- Gromacs version: VERSION 5.0.2
- Precision: single
- Memory model: 64 bit
- MPI library: MPI
- OpenMP support: enabled
- GPU support: enabled
- invsqrt routine: gmx_software_invsqrt(x)
- SIMD instructions: AVX_128_FMA
- FFT library: commercial-fftw-3.3.4-fma-sse2-avx
- RDTSCP usage: disabled
- C++11 compilation: disabled
- TNG support: enabled
- Tracing support: disabled
- Built on: Thu Mar 12 18:27:12 EDT 2015
- Built by: ff1@titan-ext8 [CMAKE]
- Build OS/arch: Linux 3.0.101-0.46-default x86_64
- Build CPU vendor: AuthenticAMD
- Build CPU brand: AMD Opteron(tm) Processor 6140
- Build CPU family: 16 Model: 9 Stepping: 1
- Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
- C compiler: /opt/cray/craype/2.2.1/bin/cc GNU 4.8.2
- C compiler flags: -mavx -mfma4 -mxop -Wno-maybe-uninitialized -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
- C++ compiler: /opt/cray/craype/2.2.1/bin/CC GNU 4.8.2
- C++ compiler flags: -mavx -mfma4 -mxop -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
- Boost version: 1.55.0 (internal)
- CUDA compiler: /opt/nvidia/cudatoolkit/5.5.51-1.0502.9594.3.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on Thu_Mar__6_02:21:19_PST_2014;Cuda compilation tools, release 5.5, V5.5.0
- CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;; ;-mavx;-mfma4;-mxop;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-O3;-DNDEBUG;-fomit-frame-pointer;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds;
- CUDA driver: 5.50
- CUDA runtime: 5.50
- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
- B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
- GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
- molecular simulation
- J. Chem. Theory Comput. 4 (2008) pp. 435-447
- -------- -------- --- Thank You --- -------- --------
- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
- D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
- Berendsen
- GROMACS: Fast, Flexible and Free
- J. Comp. Chem. 26 (2005) pp. 1701-1719
- -------- -------- --- Thank You --- -------- --------
- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
- E. Lindahl and B. Hess and D. van der Spoel
- GROMACS 3.0: A package for molecular simulation and trajectory analysis
- J. Mol. Mod. 7 (2001) pp. 306-317
- -------- -------- --- Thank You --- -------- --------
- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
- H. J. C. Berendsen, D. van der Spoel and R. van Drunen
- GROMACS: A message-passing parallel molecular dynamics implementation
- Comp. Phys. Comm. 91 (1995) pp. 43-56
- -------- -------- --- Thank You --- -------- --------
- Number of hardware threads detected (16) does not match the number reported by OpenMP (1).
- Consider setting the launch configuration manually!
- Changing nstlist from 20 to 40, rlist from 1.2 to 1.239
- Input Parameters:
- integrator = md
- tinit = 0
- dt = 0.002
- nsteps = 10000
- init-step = 0
- simulation-part = 1
- comm-mode = Linear
- nstcomm = 100
- bd-fric = 0
- ld-seed = 60975668
- emtol = 10
- emstep = 0.01
- niter = 20
- fcstep = 0
- nstcgsteep = 1000
- nbfgscorr = 10
- rtpi = 0.05
- nstxout = 5000
- nstvout = 5000
- nstfout = 5000
- nstlog = 1000
- nstcalcenergy = 100
- nstenergy = 1000
- nstxout-compressed = 0
- compressed-x-precision = 1000
- cutoff-scheme = Verlet
- nstlist = 40
- ns-type = Grid
- pbc = xyz
- periodic-molecules = FALSE
- verlet-buffer-tolerance = 0.005
- rlist = 1.239
- rlistlong = 1.239
- nstcalclr = 20
- coulombtype = PME
- coulomb-modifier = Potential-shift
- rcoulomb-switch = 0
- rcoulomb = 1.2
- epsilon-r = 1
- epsilon-rf = inf
- vdw-type = Cut-off
- vdw-modifier = Force-switch
- rvdw-switch = 1
- rvdw = 1.2
- DispCorr = No
- table-extension = 1
- fourierspacing = 0.12
- fourier-nx = 144
- fourier-ny = 144
- fourier-nz = 64
- pme-order = 4
- ewald-rtol = 1e-05
- ewald-rtol-lj = 0.001
- lj-pme-comb-rule = Geometric
- ewald-geometry = 0
- epsilon-surface = 0
- implicit-solvent = No
- gb-algorithm = Still
- nstgbradii = 1
- rgbradii = 1
- gb-epsilon-solvent = 80
- gb-saltconc = 0
- gb-obc-alpha = 1
- gb-obc-beta = 0.8
- gb-obc-gamma = 4.85
- gb-dielectric-offset = 0.009
- sa-algorithm = Ace-approximation
- sa-surface-tension = 2.05016
- tcoupl = Nose-Hoover
- nsttcouple = 20
- nh-chain-length = 1
- print-nose-hoover-chain-variables = FALSE
- pcoupl = Parrinello-Rahman
- pcoupltype = Semiisotropic
- nstpcouple = 20
- tau-p = 5
- compressibility (3x3):
- compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
- compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
- compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
- ref-p (3x3):
- ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
- ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
- ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
- refcoord-scaling = COM
- posres-com (3):
- posres-com[0]= 0.00000e+00
- posres-com[1]= 0.00000e+00
- posres-com[2]= 0.00000e+00
- posres-comB (3):
- posres-comB[0]= 0.00000e+00
- posres-comB[1]= 0.00000e+00
- posres-comB[2]= 0.00000e+00
- QMMM = FALSE
- QMconstraints = 0
- QMMMscheme = 0
- MMChargeScaleFactor = 1
- qm-opts:
- ngQM = 0
- constraint-algorithm = Lincs
- continuation = TRUE
- Shake-SOR = FALSE
- shake-tol = 0.0001
- lincs-order = 4
- lincs-iter = 1
- lincs-warnangle = 30
- nwall = 0
- wall-type = 9-3
- wall-r-linpot = -1
- wall-atomtype[0] = -1
- wall-atomtype[1] = -1
- wall-density[0] = 0
- wall-density[1] = 0
- wall-ewald-zfac = 3
- pull = no
- rotation = FALSE
- interactiveMD = FALSE
- disre = No
- disre-weighting = Conservative
- disre-mixed = FALSE
- dr-fc = 1000
- dr-tau = 0
- nstdisreout = 100
- orire-fc = 0
- orire-tau = 0
- nstorireout = 100
- free-energy = no
- cos-acceleration = 0
- deform (3x3):
- deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
- deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
- deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
- simulated-tempering = FALSE
- E-x:
- n = 0
- E-xt:
- n = 0
- E-y:
- n = 0
- E-yt:
- n = 0
- E-z:
- n = 0
- E-zt:
- n = 0
- swapcoords = no
- adress = FALSE
- userint1 = 0
- userint2 = 0
- userint3 = 0
- userint4 = 0
- userreal1 = 0
- userreal2 = 0
- userreal3 = 0
- userreal4 = 0
- grpopts:
- nrdf: 261777 192987
- ref-t: 303.15 303.15
- tau-t: 1 1
- annealing: No No
- annealing-npoints: 0 0
- acc: 0 0 0
- nfreeze: N N N
- energygrp-flags[ 0]: 0
- Initializing Domain Decomposition on 512 ranks
- Dynamic load balancing: yes
- Will sort the charge groups at every domain (re)decomposition
- Initial maximum inter charge-group distances:
- two-body bonded interactions: 0.420 nm, LJ-14, atoms 42821 42830
- multi-body bonded interactions: 0.420 nm, Proper Dih., atoms 42821 42830
- Minimum cell size due to bonded interactions: 0.462 nm
- Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
- Estimated maximum distance required for P-LINCS: 0.222 nm
- Using 128 separate PME ranks, per user request
- Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
- Optimizing the DD grid for 384 cells with a minimum initial size of 0.578 nm
- The maximum allowed number of cells is: X 27 Y 27 Z 13
- Domain decomposition grid 16 x 8 x 3, separate PME ranks 128
- PME domain decomposition: 16 x 8 x 1
- Interleaving PP and PME ranks
- This rank does only particle-particle work.
- Domain decomposition rank 0, coordinates 0 0 0
- Using two step summing over 128 groups of on average 3.0 ranks
- Using 512 MPI processes
- Using 2 OpenMP threads per MPI process
- Detecting CPU SIMD instructions.
- Present hardware specification:
- Vendor: AuthenticAMD
- Brand: AMD Opteron(TM) Processor 6274
- Family: 21 Model: 1 Stepping: 2
- Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
- SIMD instructions most likely to fit this hardware: AVX_128_FMA
- SIMD instructions selected at GROMACS compile time: AVX_128_FMA
- The current CPU can measure timings more accurately than the code in
- mdrun_mpi was configured to use. This might affect your simulation
- speed as accurate timings are needed for load-balancing.
- Please consider rebuilding mdrun_mpi with the GMX_USE_RDTSCP=OFF CMake option.
- 1 GPU detected on host nid01946:
- #0: NVIDIA Tesla K20X, compute cap.: 3.5, ECC: yes, stat: compatible
- 1 GPU user-selected for this run.
- Mapping of GPUs to the 3 PP ranks in this node: #0, #0, #0
- NOTE: You assigned GPUs to multiple MPI processes.
- Will do PME sum in reciprocal space for electrostatic interactions.
- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
- U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
- A smooth particle mesh Ewald method
- J. Chem. Phys. 103 (1995) pp. 8577-8592
- -------- -------- --- Thank You --- -------- --------
- Will do ordinary reciprocal space Ewald sum.
- Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
- Cut-off's: NS: 1.239 Coulomb: 1.2 LJ: 1.2
- System total charge: 0.000
- Generated table with 1119 data points for Ewald.
- Tabscale = 500 points/nm
- Generated table with 1119 data points for LJ6Shift.
- Tabscale = 500 points/nm
- Generated table with 1119 data points for LJ12Shift.
- Tabscale = 500 points/nm
- Generated table with 1119 data points for 1-4 COUL.
- Tabscale = 500 points/nm
- Generated table with 1119 data points for 1-4 LJ6.
- Tabscale = 500 points/nm
- Generated table with 1119 data points for 1-4 LJ12.
- Tabscale = 500 points/nm
- Using CUDA 8x8 non-bonded kernels
- Potential shift: LJ r^-12: -2.648e-01 r^-6: -5.349e-01, Ewald -1.000e-05
- Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size: 1536
- Overriding thread affinity set outside mdrun_mpi
- Pinning threads with an auto-selected logical core stride of 1
- Initializing Parallel LINear Constraint Solver
- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
- B. Hess
- P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
- J. Chem. Theory Comput. 4 (2008) pp. 116-122
- -------- -------- --- Thank You --- -------- --------
- The number of constraints is 67980
- There are inter charge-group constraints,
- will communicate selected coordinates each lincs iteration
- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
- S. Miyamoto and P. A. Kollman
- SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
- Water Models
- J. Comp. Chem. 13 (1992) pp. 952-962
- -------- -------- --- Thank You --- -------- --------
- Linking all bonded interactions to atoms
- There are 739685 inter charge-group exclusions,
- will use an extra communication step for exclusion forces for PME
- The maximum number of communication pulses is: X 2 Y 2 Z 2
- The minimum size for domain decomposition cells is 0.711 nm
- The requested allowed shrink of DD cells (option -dds) is: 0.80
- The allowed shrink of domain decomposition cells is: X 0.71 Y 0.35 Z 0.28
- The maximum allowed distance for charge groups involved in interactions is:
- non-bonded interactions 1.239 nm
- two-body bonded interactions (-rdd) 1.239 nm
- multi-body bonded interactions (-rdd) 0.711 nm
- atoms separated by up to 5 constraints (-rcon) 0.711 nm
- Making 3D domain decomposition grid 16 x 8 x 3, home cell index 0 0 0
- Center of mass motion removal mode is Linear
- We have the following groups for center of mass motion removal:
- 0: NPROT
- 1: SOL_ION
- There are: 206415 Atoms
- Charge group distribution at step 0: 544 531 537 523 548 516 534 521 545 520 550 551 543 515 527 540 564 553 527 555 526 557 561 518 526 552 549 538 536 524 514 557 520 533 537 565 537 533 522 530 563 539 541 519 542 530 535 520 528 547 536 510 526 529 525 557 543 534 541 532 525 580 552 563 550 540 520 579 528 542 560 551 565 561 530 529 539 546 522 532 556 523 560 501 555 549 546 531 542 517 497 557 530 549 547 538 523 519 548 560 576 533 523 540 525 533 523 507 540 506 522 532 539 505 554 512 516 531 569 540 552 528 550 511 541 522 543 546 534 540 580 556 563 519 521 548 530 535 534 557 537 562 562 520 555 562 537 553 564 499 556 547 533 539 535 545 537 555 535 545 541 510 518 568 521 551 552 556 530 549 532 536 561 536 550 534 517 540 546 544 522 542 521 524 538 529 539 541 547 544 533 533 529 548 531 539 547 547 500 563 525 508 532 530 544 530 549 528 543 529 529 554 525 526 550 548 532 544 518 526 517 542 532 550 538 538 556 542 539 514 552 526 536 529 549 520 529 520 575 534 559 547 543 544 531 542 503 568 531 533 543 533 533 527 554 520 531 549 534 548 514 523 536 517 549 545 537 548 549 525 543 529 533 541 559 517 554 572 542 519 527 525 532 522 551 537 559 517 543 566 533 519 544 528 557 570 526 539 540 540 563 528 540 549 555 540 540 535 505 561 547 542 518 512 522 517 556 545 521 549 535 517 525 525 544 534 531 527 548 538 534 542 512 519 500 543 508 572 518 558 548 520 525 553 537 532 549 559 539 507 547 523 531 523 549 530 523 550 547 549 527 567 546 523 554 533 532 574 503 528 562 536 533 544 540 528 560 539 533 528 531 553 544 512
- Initial temperature: 303.008 K
- Started mdrun on rank 0 Thu Apr 30 12:00:03 2015
- Step Time Lambda
- 0 0.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.27858e+04 2.89239e+05 1.51310e+05 1.50125e+03 3.17014e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55645e+05 -3.43454e+04 -1.43186e+06 8.83146e+03 -1.38648e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.74416e+05 -8.12065e+05 3.03832e+02 -5.33641e+01 4.19699e-06
- DD step 39 vol min/aver 1.000 load imb.: force 37.5% pme mesh/force 11.617
- step 120: timed with pme grid 144 144 64, coulomb cutoff 1.200: 144.5 M-cycles
- step 200: timed with pme grid 128 128 60, coulomb cutoff 1.269: 150.6 M-cycles
- step 280: timed with pme grid 112 112 56, coulomb cutoff 1.431: 176.5 M-cycles
- step 360: timed with pme grid 144 144 64, coulomb cutoff 1.200: 143.1 M-cycles
- step 440: timed with pme grid 128 128 64, coulomb cutoff 1.252: 144.8 M-cycles
- step 520: timed with pme grid 128 128 60, coulomb cutoff 1.269: 138.3 M-cycles
- step 600: timed with pme grid 120 120 60, coulomb cutoff 1.336: 139.4 M-cycles
- step 680: timed with pme grid 120 120 56, coulomb cutoff 1.359: 131.8 M-cycles
- step 760: timed with pme grid 144 144 64, coulomb cutoff 1.200: 140.3 M-cycles
- step 840: timed with pme grid 128 128 64, coulomb cutoff 1.252: 135.8 M-cycles
- step 920: timed with pme grid 128 128 60, coulomb cutoff 1.269: 151.7 M-cycles
- step 1000: timed with pme grid 120 120 60, coulomb cutoff 1.336: 149.3 M-cycles
- DD load balancing is limited by minimum cell size in dimension X
- DD step 999 vol min/aver 0.326! load imb.: force 14.0% pme mesh/force 2.182
- Step Time Lambda
- 1000 2.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.23176e+04 2.88001e+05 1.51069e+05 1.45179e+03 3.17308e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55046e+05 -3.59188e+04 -1.42370e+06 5.58662e+03 -1.38451e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.72707e+05 -8.11800e+05 3.02928e+02 -1.16064e+02 4.21434e-06
- step 1080: timed with pme grid 120 120 56, coulomb cutoff 1.359: 135.7 M-cycles
- step 1160: timed with pme grid 144 144 64, coulomb cutoff 1.200: 136.7 M-cycles
- step 1240: timed with pme grid 128 128 64, coulomb cutoff 1.252: 141.2 M-cycles
- step 1320: timed with pme grid 128 128 60, coulomb cutoff 1.269: 156.8 M-cycles
- step 1400: timed with pme grid 120 120 60, coulomb cutoff 1.336: 136.3 M-cycles
- step 1480: timed with pme grid 120 120 56, coulomb cutoff 1.359: 136.4 M-cycles
- step 1560: timed with pme grid 144 144 64, coulomb cutoff 1.200: 134.0 M-cycles
- step 1640: timed with pme grid 128 128 64, coulomb cutoff 1.252: 131.2 M-cycles
- step 1720: timed with pme grid 128 128 60, coulomb cutoff 1.269: 132.5 M-cycles
- step 1800: timed with pme grid 120 120 60, coulomb cutoff 1.336: 139.2 M-cycles
- step 1880: timed with pme grid 120 120 56, coulomb cutoff 1.359: 134.8 M-cycles
- step 1960: timed with pme grid 144 144 64, coulomb cutoff 1.200: 133.2 M-cycles
- DD load balancing is limited by minimum cell size in dimension X Y
- DD step 1999 vol min/aver 0.308! load imb.: force 13.0% pme mesh/force 2.347
- Step Time Lambda
- 2000 4.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.18345e+04 2.86220e+05 1.50848e+05 1.47901e+03 3.14844e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.56309e+05 -3.76295e+04 -1.42712e+06 7.52841e+03 -1.39166e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.74579e+05 -8.17084e+05 3.03918e+02 -1.07520e+01 4.20935e-06
- step 2040: timed with pme grid 128 128 64, coulomb cutoff 1.252: 134.6 M-cycles
- step 2120: timed with pme grid 128 128 60, coulomb cutoff 1.269: 129.0 M-cycles
- step 2200: timed with pme grid 120 120 60, coulomb cutoff 1.336: 134.1 M-cycles
- step 2280: timed with pme grid 120 120 56, coulomb cutoff 1.359: 149.9 M-cycles
- step 2360: timed with pme grid 144 144 64, coulomb cutoff 1.200: 135.2 M-cycles
- step 2440: timed with pme grid 128 128 64, coulomb cutoff 1.252: 128.3 M-cycles
- step 2520: timed with pme grid 128 128 60, coulomb cutoff 1.269: 128.9 M-cycles
- step 2600: timed with pme grid 120 120 60, coulomb cutoff 1.336: 132.2 M-cycles
- step 2680: timed with pme grid 120 120 56, coulomb cutoff 1.359: 130.5 M-cycles
- step 2760: timed with pme grid 144 144 64, coulomb cutoff 1.200: 130.7 M-cycles
- step 2840: timed with pme grid 128 128 64, coulomb cutoff 1.252: 128.6 M-cycles
- step 2920: timed with pme grid 128 128 60, coulomb cutoff 1.269: 127.8 M-cycles
- step 3000: timed with pme grid 120 120 60, coulomb cutoff 1.336: 134.2 M-cycles
- DD load balancing is limited by minimum cell size in dimension X Y
- DD step 2999 vol min/aver 0.267! load imb.: force 14.1% pme mesh/force 1.804
- Step Time Lambda
- 3000 6.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.14043e+04 2.87225e+05 1.50821e+05 1.42245e+03 3.15633e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.56050e+05 -3.62894e+04 -1.42688e+06 5.49381e+03 -1.39129e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.71073e+05 -8.20221e+05 3.02064e+02 -3.85750e+00 4.18183e-06
- step 3080: timed with pme grid 120 120 56, coulomb cutoff 1.359: 130.7 M-cycles
- optimal pme grid 128 128 60, coulomb cutoff 1.269
- DD load balancing is limited by minimum cell size in dimension X Y
- DD step 3999 vol min/aver 0.185! load imb.: force 18.2% pme mesh/force 2.008
- Step Time Lambda
- 4000 8.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.16076e+04 2.86753e+05 1.50731e+05 1.38304e+03 3.16921e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55500e+05 -3.61777e+04 -1.42736e+06 7.13750e+03 -1.38973e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.73651e+05 -8.16082e+05 3.03427e+02 -8.53276e+01 4.21809e-06
- step 5000: resetting all time and cycle counters
- Restarted time on rank 0 Thu Apr 30 12:00:21 2015
- DD load balancing is limited by minimum cell size in dimension X Y
- DD step 4999 vol min/aver 0.179! load imb.: force 12.2% pme mesh/force 2.108
- Step Time Lambda
- 5000 10.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.20451e+04 2.88353e+05 1.50570e+05 1.30489e+03 3.16238e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55467e+05 -3.52470e+04 -1.42829e+06 7.18653e+03 -1.38792e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.73705e+05 -8.14212e+05 3.03456e+02 6.70797e+01 4.19036e-06
- DD load balancing is limited by minimum cell size in dimension X
- DD step 5999 vol min/aver 0.158! load imb.: force 15.8% pme mesh/force 2.127
- Step Time Lambda
- 6000 12.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.14442e+04 2.87868e+05 1.50508e+05 1.40078e+03 3.16674e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55916e+05 -3.54566e+04 -1.42635e+06 7.05560e+03 -1.38777e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.73093e+05 -8.14682e+05 3.03132e+02 9.75912e+00 4.24040e-06
- DD load balancing is limited by minimum cell size in dimension X Y
- DD step 6999 vol min/aver 0.154! load imb.: force 16.5% pme mesh/force 1.977
- Step Time Lambda
- 7000 14.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.18326e+04 2.86636e+05 1.51212e+05 1.47271e+03 3.15622e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55427e+05 -3.64300e+04 -1.42759e+06 7.05582e+03 -1.38967e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.73014e+05 -8.16657e+05 3.03091e+02 7.78854e+01 4.22214e-06
- DD load balancing is limited by minimum cell size in dimension X
- DD step 7999 vol min/aver 0.149! load imb.: force 23.0% pme mesh/force 2.000
- Step Time Lambda
- 8000 16.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.17397e+04 2.86607e+05 1.50509e+05 1.41735e+03 3.18308e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.56065e+05 -3.59718e+04 -1.42991e+06 7.17171e+03 -1.39267e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.71509e+05 -8.21161e+05 3.02295e+02 5.22125e+01 4.18836e-06
- DD load balancing is limited by minimum cell size in dimension X Y
- DD step 8999 vol min/aver 0.151! load imb.: force 15.0% pme mesh/force 2.007
- Step Time Lambda
- 9000 18.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.19541e+04 2.87790e+05 1.50480e+05 1.42274e+03 3.17100e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55322e+05 -3.43028e+04 -1.42922e+06 7.09431e+03 -1.38840e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.70563e+05 -8.17833e+05 3.01794e+02 1.14439e+02 4.19833e-06
- DD load balancing is limited by minimum cell size in dimension X Y
- DD step 9999 vol min/aver 0.157! load imb.: force 17.1% pme mesh/force 2.299
- Step Time Lambda
- 10000 20.00000 0.00000
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.17778e+04 2.87804e+05 1.51100e+05 1.47646e+03 3.16999e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.56526e+05 -3.58191e+04 -1.42789e+06 7.20147e+03 -1.38918e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.73033e+05 -8.16143e+05 3.03101e+02 -2.89476e+01 4.19734e-06
- <====== ############### ==>
- <==== A V E R A G E S ====>
- <== ############### ======>
- Statistics over 10001 steps using 101 frames
- Energies (kJ/mol)
- Bond U-B Proper Dih. Improper Dih. LJ-14
- 5.18419e+04 2.87652e+05 1.50911e+05 1.41129e+03 3.17063e+04
- Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
- -4.55679e+05 -3.54964e+04 -1.42819e+06 7.13046e+03 -1.38871e+06
- Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd
- 5.73069e+05 -8.15644e+05 3.03120e+02 -1.59300e+00 0.00000e+00
- Box-X Box-Y Box-Z
- 1.60026e+01 1.60026e+01 7.63514e+00
- Total Virial (kJ/mol)
- 1.86250e+05 1.47008e+03 -2.72786e+02
- 1.46147e+03 1.86638e+05 3.97733e+01
- -2.76283e+02 4.00144e+01 2.00469e+05
- Pressure (bar)
- 3.44596e+00 -2.61332e+01 5.04982e+00
- -2.59874e+01 -2.42870e+00 4.05722e+00
- 5.10920e+00 4.05318e+00 -5.79625e+00
- T-NPROT T-SOL_ION
- 3.03156e+02 3.03070e+02
- P P - P M E L O A D B A L A N C I N G
- PP/PME load balancing changed the cut-off and PME settings:
- particle-particle PME
- rcoulomb rlist grid spacing 1/beta
- initial 1.200 nm 1.239 nm 144 144 64 0.119 nm 0.384 nm
- final 1.269 nm 1.308 nm 128 128 60 0.127 nm 0.406 nm
- cost-ratio 1.18 0.74
- (note that these numbers concern only part of the total PP and PME load)
- M E G A - F L O P S A C C O U N T I N G
- NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
- RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
- W3=SPC/TIP3p W4=TIP4p (single or pairs)
- V&F=Potential and force V=Potential only F=Force only
- Computing: M-Number M-Flops % Flops
- -----------------------------------------------------------------------------
- NB VdW [V&F] 1088.367630 1088.368 0.0
- Pair Search distance check 3643.770432 32793.934 0.0
- NxN Ewald Elec. + LJ [F] 1392280.458944 108597875.798 95.8
- NxN Ewald Elec. + LJ [V&F] 14343.511040 1850312.924 1.6
- 1,4 nonbonded interactions 1575.315000 141778.350 0.1
- Calc Weights 3096.844245 111486.393 0.1
- Spread Q Bspline 66066.010560 132132.021 0.1
- Gather F Bspline 66066.010560 396396.063 0.3
- 3D-FFT 195731.828538 1565854.628 1.4
- Solve PME 655.491072 41951.429 0.0
- Reset In Box 26.008290 78.025 0.0
- CG-CoM 26.008290 78.025 0.0
- Bonds 212.942580 12563.612 0.0
- Propers 1815.312990 415706.675 0.4
- Impropers 5.901180 1227.445 0.0
- Virial 56.147445 1010.654 0.0
- Stop-CM 10.527165 105.272 0.0
- Calc-Ekin 103.413915 2792.176 0.0
- Lincs 413.079943 24784.797 0.0
- Lincs-Mat 2978.164740 11912.659 0.0
- Constraint-V 1381.031864 11048.255 0.0
- Constraint-Vir 48.581213 1165.949 0.0
- Settle 184.957326 59741.216 0.1
- -----------------------------------------------------------------------------
- Total 113413884.667 100.0
- -----------------------------------------------------------------------------
- D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
- av. #atoms communicated per step for force: 2 x 932400.4
- av. #atoms communicated per step for LINCS: 2 x 42373.3
- Average load imbalance: 15.4 %
- Part of the total run time spent waiting due to load imbalance: 5.1 %
- Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 2 % Y 1 % Z 0 %
- Average PME mesh/force load: 2.008
- Part of the total run time spent waiting due to PP/PME imbalance: 31.4 %
- NOTE: 5.1 % of the available CPU time was lost due to load imbalance
- in the domain decomposition.
- NOTE: 31.4 % performance was lost because the PME ranks
- had more work to do than the PP ranks.
- You might want to increase the number of PME ranks
- or increase the cut-off and the grid spacing.
- R E A L C Y C L E A N D T I M E A C C O U N T I N G
- On 384 MPI ranks doing PP, each using 2 OpenMP threads, and
- on 128 MPI ranks doing PME, each using 2 OpenMP threads
- Computing: Num Num Call Wall time Giga-Cycles
- Ranks Threads Count (s) total sum %
- -----------------------------------------------------------------------------
- Domain decomp. 384 2 126 0.581 981.351 4.0
- DD comm. load 384 2 126 0.003 5.702 0.0
- DD comm. bounds 384 2 126 0.023 38.526 0.2
- Send X to PME 384 2 5001 0.029 48.641 0.2
- Neighbor search 384 2 126 0.143 242.149 1.0
- Launch GPU ops. 384 2 10002 0.855 1445.047 5.9
- Comm. coord. 384 2 4875 0.860 1453.266 6.0
- Force 384 2 5001 1.045 1765.642 7.2
- Wait + Comm. F 384 2 5001 1.653 2793.185 11.5
- PME mesh * 128 2 5001 8.500 4787.043 19.6
- PME wait for PP * 2.324 1308.648 5.4
- Wait + Recv. PME F 384 2 5001 3.217 5435.015 22.3
- Wait GPU nonlocal 384 2 5001 0.354 598.139 2.5
- Wait GPU local 384 2 5001 0.237 400.542 1.6
- NB X/F buffer ops. 384 2 19752 0.338 570.785 2.3
- Write traj. 384 2 2 0.067 113.086 0.5
- Update 384 2 5001 0.151 255.174 1.0
- Constraints 384 2 5001 0.750 1267.610 5.2
- Comm. energies 384 2 251 0.442 746.165 3.1
- Rest 0.075 127.083 0.5
- -----------------------------------------------------------------------------
- Total 10.823 24382.809 100.0
- -----------------------------------------------------------------------------
- (*) Note that with separate PME ranks, the walltime column actually sums to
- twice the total reported, but the cycle count total and % are correct.
- -----------------------------------------------------------------------------
- Breakdown of PME mesh computation
- -----------------------------------------------------------------------------
- PME redist. X/F 128 2 10002 1.716 966.712 4.0
- PME spread/gather 128 2 10002 2.208 1243.412 5.1
- PME 3D-FFT 128 2 10002 0.799 450.120 1.8
- PME 3D-FFT Comm. 128 2 20004 3.643 2051.724 8.4
- PME solve Elec 128 2 5001 0.108 61.096 0.3
- -----------------------------------------------------------------------------
- Core t (s) Wall t (s) (%)
- Time: 10828.087 10.823 100044.6
- (ns/day) (hour/ns)
- Performance: 79.844 0.301
- Finished mdrun on rank 0 Thu Apr 30 12:00:32 2015
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement