.. |Pow| replace:: *P*\ :sub:`OW`
.. |logPow| replace:: log *P*\ :sub:`OW`

.. _mdpow-scripts-label:

The ``mdpow-*`` scripts
=======================

A number of python scripts are installed together with the
:mod:`mdpow` package. They simplify some common tasks (especially at
the analysis stage) but they make some assumptions about directory
layout and filenames. If one uses defaults for all directory and
filename options then it should "just work".

In particular, a directory hierarchy such as the following is
assumed::

  moleculename/
       water.simulation
       octanol.simulation
       Equilibrium/
             water/
             octanol/
       FEP/
             water/
                   Ghyd.fep
                   Coulomb/
                   VDW/
             octanol/
                   Goct.fep
                   Coulomb/
                   VDW/

*moleculename* is, for instance, "benzene" or "amantadine"; in the run
input file (see `Equilibrium simulations`_) is the value of the
variable ``name`` in the ``[setup]`` section.


.. _runinput-file:

Run input file
--------------

The ``mdpow-*`` scripts are the **commandline interface** (CLI) to the
Python API functionality in the :mod:`mdpow` package. Whereas the
Python API requires passing of parameters to classes and functions and
therefore exposes the full flexibility of MDPOW, the CLI works with a
narrower set of options which are collected in a **run input
file**. This run input file or *RUNFILE* is named :file:`runinput.yml`
by default.

A template *RUNFILE* can be generated with
:program:`mdpow-get-runinput`

.. code-block:: bash	   

   mdpow-get-runinput runinput.yml

which will copy the default run input file bundled with MDPOW and put
it in the current directory under the name :file:`runinput.yml`.

For an example of a *RUNFILE* see :download:`benzene.yml
<../../examples/benzene.yml>`. The comments in the file serve as the
documentation.

The run input file uses `YAML`_ syntax (and is parsed by :mod:`yaml`).

.. Note::
   
   It is recommended to use absolute paths to file names. 

.. Note::

   It is recommended to enclose all strings in the input file in
   quotes, especially if they can be interpreted as numbers. For
   example, a name "005" would be interpreted as the number 5 unless
   explicitly quoted.

.. _YAML: http://yaml.org/


.. _mdpow-equilibrium-label:

Equilibrium simulations
-----------------------

The :program:`mdpow-equilibrium` script

* sets up equilibrium MD simulations for the solvents (e.g., *water* or
  *octanol*)
* runs **energy minimization**, **MD_relaxed**, and **MD_NPT**
  protocols; the user can choose if she wants to launch
  :program:`mdrun` herself (e.g. on a cluster) or let the script do it
  locally on the workstation

The script runs essentially the same steps as described in the
tutorial :ref:`example_octanol-label` but it gathers all required
parameters from the :ref:`run input file<runinput-file>` and it allows
one to stop and continue and the protocol transparently.

It requires as at least Gromacs 4.6.5 ready to run (check that all commands can
be found). The required **input** is

  1. a run configuration file (*runinput.yml*);
  2. a structure file (PDB or GRO) for the compound
  3. a Gromacs ITP file for the compound (OPLS/AA force field)

The script keeps track of the stages of the simulation protocol (in
the state files :file:`water.simulation`, :file:`octanol.simulation`
etc) and allows the user to **restart from the last completed
stage**. For instance, one can use the script to set up a simulation,
then run the simulation on a cluster, transfer back the generated
files, and start :program:`mdpow-equilibrium` again with the exact
same input to finish the protocol. Since Gromacs 4.5 it is also
possible to interrupt a running :program:`mdrun` process (e.g. with
:kbd:`Control-c`) and then resume the simulation at the last saved
trajectory checkpoint by running :program:`mdpow-equilibrium` again.

If in doubt, just try running :program:`mdpow-equilibrium` running again and
let it figure out the best course of action. Look at the log file to see what
has been done. Lines reading "Fast forwarding: *stage*" indicate that results
from *stage* are available.

Usage of the command:

   .. program:: mdpow-equilibrium

   :program:`mdpow-equilibrium` [options] *RUNFILE*

   Set up (and possibly run) the equilibration equilibrium simulation for one
   compound and one solvent. All parameters except the solvent are specified in
   the *RUNFILE*.

   Arguments:

   .. option:: RUNFILE

      The runfile :file:`runinput.yml` with all configuration parameters.


   Options:

   .. option:: -h, --help

      show this help message and exit

   .. option:: -S <NAME>, --solvent=<NAME>

      solvent ``<NAME>`` for compound, can be 'water', 'octanol', 'cyclohexane' [water]

   .. option:: -d <DIRECTORY>, --dirname=<DIRECTORY>

      generate files and directories in ``<DIRECTORY>``, which is created if it does not
      already exist. The default is to use the molecule *name* from the run input
      file.


.. _mdpow-fep-label:

FEP simulations
--------------- 

The :program:`mdpow-fep` script sets up (and can also run) the free energy
perturbation calculations for one compound and one solvent. It uses the results
from :ref:`mdpow-equilibrium <mdpow-equilibrium-label>` together with the run
input file.

You will require:

1. at least Gromacs 4.6.5 ready to run (check that all commands can
   be found)
2. the results from a previous complete run of :program:`mdpow-equilibrium`


Usage of the command:

   .. program:: mdpow-fep

   :program:`mdpow-fep`  [options] *RUNFILE*

   Arguments:

   .. option:: RUNFILE

      The runfile :file:`runinput.yml` with all configuration parameters.

   Options:

   .. option:: -h, --help

      show this help message and exit

   .. option:: -S <NAME>, --solvent=<NAME>

      solvent ``<NAME>`` for compound, can be 'water' or 'octanol' [water]

   .. option:: -d <DIRECTORY>, --dirname=<DIRECTORY>

      generate files and directories in ``<DIRECTORY>``, which is created if it
      does not already exist. The default is to use the molecule *name* from the run
      input file.



.. _mdpow-pow-label:

Running analysis
----------------

Solvation free energy
~~~~~~~~~~~~~~~~~~~~~

The :program:`mdpow-solvationenergy` calculatest the solvation free
energy. It

* collects data from FEP simulations (converting EDR files to XVG.bz2
  when necessary)
* calculates desolvation free energies for *solvent* --> vacuum via
  thermodynamic integration (TI)  
* plots dV/dlambda graphs
* appends results to ``energies.txt`` (when the
  default names are chosen), see :ref:`mdpow-pow-outputformat-label`.


Usage of the command:

   .. program:: mdpow-solvationenergy

   :program:`mdpow-solvationenergy` [options] DIRECTORY [DIRECTORY ...]

   Run the free energy analysis for a solvent in ``<DIRECTORY>/FEP`` and
   return ``DeltaG``.

   ``DIRECTORY`` should contain all the files resulting from running
   ``mdpow.fep.Ghyd.setup()`` (or the corresponding ``Goct.setup()`` or
   ``Gcyclohexane.setup()`` and the results of the MD FEP simulations. It
   relies on the canonical naming scheme (basically: just use the defaults
   as in the tutorial).

   The dV/dlambda plots can be produced automatically (``--plotfile auto``). If
   multiple ``DIRECTORY`` arguments are provided then one has to choose the
   "auto" option (or "none").

   The total solvation free energy is calculated as

   .. math::

	\Delta G^{*} = -(\Delta G_{\text{coul}} + \Delta G_{\text{vdw}})

   Note that the standard state refers to the "Ben-Naim" standard state of
   transferring 1 M of compound in the gas phase to 1 M in the aqueous
   phase.

   Results are *appended* to a data file with **Output file format**::

	     .                 ---------- kJ/mol ---
	     molecule solvent  DeltaG*  coulomb  vdw

   All observables are quoted with an error estimate, which is derived
   from the correlation time and error propagation through Simpson's rule
   (see :meth:`mdpow.fep.Gsolv`). It ultimately comes from the error on
   <dV/dlambda>, which is estimated as the error to determine the
   average.

   :molecule:
       molecule name as used in the itp
   :DeltaG*:
       solvation free energy vacuum --> solvent, in kJ/mol
   :coulomb:
       discharging contribution to the DeltaG*
   :vdw:
       decoupling contribution to the DeltaG*
   :directory:
       folder in which the simulations were stored


   positional arguments:

   .. option::  DIRECTORY [DIRECTORY ...]

		directory or directories which contain all the files
		resulting from running :meth:`mdpow.fep.Ghyd.setup`

   optional arguments:

   .. option::  -h, --help

		show this help message and exit
   .. option::  --plotfile FILE

		plot dV/dlambda to FILE; use png or pdf suffix to
		determine the file type. *'auto'* generates a pdf file
		:file:`{DIRECTORY}/dVdl_{molname}_{solvent}.pdf` and
		*none* disables it The plot function is only
		available for mdpow estimator,and is disabled when
		using alchemlyb estimators. (default: none)

   .. option::  --solvent NAME, -S NAME

		solvent `NAME` for compound, 'water', 'octanol', or
		'cyclohexane' (default: water)

   .. option::  -o FILE, --outfile FILE

		append one-line results summary to `FILE` (default:
		:file:`gsolv.txt`)

   .. option::  -e FILE, --energies FILE

		append solvation free energies to FILE (default:
		:file:`energies.txt`)

   .. option:: --estimator {mdpow,alchemlyb}

		choose the estimator to be used, *alchemlyb* or *mdpow*
		estimators (default: *alchemlyb*)

   .. option:: --method {TI,MBAR,BAR}

		choose the method to calculate free energy (default:
		*MBAR*)

   .. option:: --force

		 force rereading all data (default: False)

   .. option:: --SI

		 enable statistical inefficiency (SI) analysis.
		 Statistical inefficiency analysis is performed by
		 default when usingalchemlyb estimators and is always
		 disabled when using mdpow estimator. (default: True)

   .. option:: --no-SI

		 disable statistical inefficiency
                 analysis. Statistical inefficiency analysis is
                 performed by default when usingalchemlyb estimators
                 and is disabled when using mdpow estimator. (default:
                 False)

   .. option:: -s N, --stride N

		 use every `N`-th datapoint from the original dV/dlambda
		 data. (default: 1)

   .. option:: --start START

		 start point for the data used from the original
		 dV/dlambda data. (default: 0)

   .. option:: --stop STOP

		stop point for the data used from the original
		dV/dlambda data. (default: None)

   .. option:: --ignore-corrupted

		skip lines in the md.xvg files that cannot be parsed.
		(default: False)

		.. warning::

		   Other lines in the file might have been corrupted
		   in such a way that they appear correct but are in
		   fact wrong. WRONG RESULTS CAN OCCUR! USE AT YOUR
		   OWN RISK

                It is recommended to simply rerun the affected simulation(s).




Partition coefficients
~~~~~~~~~~~~~~~~~~~~~~

The :program:`mdpow-pow` script 

* collects data from FEP simulations
* calculates desolvation free energies for octanol --> vacuum and
  water --> vacuum via thermodynamic integration (TI)
* calculates transfer free energy water --> octanol
* calculates |logPow|
* plots dV/dlambda graphs
* appends results to ``pow.txt`` and ``energies.txt`` (when the
  default names are chosen), see :ref:`mdpow-pow-outputformat-label`.

An equivalent script :program:`mdpow-pcw` for the water-cyclohexane
partition coefficient is also included. 
  
Usage of the command:

   .. program:: mdpow-pow

   :program:`mdpow-pow` [options] *DIRECTORY* [*DIRECTORY* ...]

   Run the free energy analysis for water and octanol in *DIRECTORY*/FEP and
   return the octanol-water partition coefficient |logPow|.

   *DIRECTORY* should contain all the files resulting from running
   :meth:`mdpow.fep.Goct.setup` and :meth:`mdpow.fep.Goct.setup` and the results
   of the MD FEP simulations. It relies on the canonical naming scheme (basically:
   just use the defaults as in the tutorial).

   The dV/dlambda plots can be produced automatically
   (``--plotfile auto``). If multiple *DIRECTORY*
   arguments are provided then one has to choose the "auto" option (or
   "none").


   positional arguments:

   .. option::  DIRECTORY [DIRECTORY ...]

		One or more directories that contain the state pickle
                files (:file:`water.simulation`,
                :file:`octanol.simulation`) for the solvation free
                energy calculations in water and octanol. These
                directory or directories should contain all the files
                resulting from running :meth:`mdpow.fep.Ghyd.setup` and
                :meth:`mdpow.fep.Goct.setup` and the results of the MD FEP
                simulations.

   optional arguments:

   .. option::  -h, --help

		show this help message and exit

   .. option::  --plotfile FILE

		plot dV/dlambda to FILE; use png or pdf suffix to
		determine the file type. *'auto'* generates a pdf file
		:file:`{DIRECTORY}/dVdl_{molname}_pow.pdf` and
		*none* disables it The plot function is only
		available for mdpow estimator,and is disabled when
		using alchemlyb estimators. (default: none)

   .. option::  -o FILE, --outfile FILE

		append one-line results summary to `FILE` (default:
		:file:`pow.txt`)

   .. option::  -e FILE, --energies FILE

		append solvation free energies to FILE (default:
		:file:`energies.txt`)

   .. option:: --estimator {mdpow,alchemlyb}

		choose the estimator to be used, *alchemlyb* or *mdpow*
		estimators (default: *alchemlyb*)

   .. option:: --method {TI,MBAR,BAR}

		choose the method to calculate free energy (default:
		*MBAR*)

   .. option:: --force

		 force rereading all data (default: False)

   .. option:: --SI

		 enable statistical inefficiency (SI) analysis.
		 Statistical inefficiency analysis is performed by
		 default when usingalchemlyb estimators and is always
		 disabled when using mdpow estimator. (default: True)

   .. option:: --no-SI

		 disable statistical inefficiency
                 analysis. Statistical inefficiency analysis is
                 performed by default when usingalchemlyb estimators
                 and is disabled when using mdpow estimator. (default:
                 False)

   .. option:: -s N, --stride N

		 use every `N`-th datapoint from the original dV/dlambda
		 data. (default: 1)

   .. option:: --start START

		 start point for the data used from the original
		 dV/dlambda data. (default: 0)

   .. option:: --stop STOP

		stop point for the data used from the original
		dV/dlambda data. (default: None)

   .. option:: --ignore-corrupted

		skip lines in the md.xvg files that cannot be parsed.
		(default: False)

		.. warning::

		   Other lines in the file might have been corrupted
		   in such a way that they appear correct but are in
		   fact wrong. WRONG RESULTS CAN OCCUR! USE AT YOUR
		   OWN RISK

                It is recommended to simply rerun the affected simulation(s).



.. _mdpow-pow-outputformat-label:

Output data file formats
~~~~~~~~~~~~~~~~~~~~~~~~

Results are *appended* to data files.

.. Note:: Energies are always output in **kJ/mol**.


POW summary file
................

The :file:`pow.txt` output file summarises the results from the water and
octanol solvation calculations. Its name set with the option :option:`mdpow-pow -o`.  
It contains fixed column data in the following order and all energies are
in **kJ/mol**. See the :ref:`Table of computed water-octanol transfer energies
and logPow <table-logPow-label>` as an example.


**itp_name**

  *molecule* identifier from the itp file

**DeltaG0**

  transfer free energy from water to octanol (difference between
  **DeltaG0** for octanol and water), in kJ/mol; >0: partitions into
  water, <0: partitions into octanol,

**errDG0**

  error on **DeltaG0**; errors are calculated through propagation of
  errors from the errors on the means <dV/dlambda>

**logPOW**

 |logPow|, base-10 logarithm of the octanol-water partition
 coefficient; >0: partitions into octanol, <0: partitions preferrably
 into water

**errlogP**

 error on **logPow**

**directory**

 directory under which all data files are stored; by default this is
 the *name* of the molecule and hence it can be used to identify the
 compound.


.. _table-logPow-label:

.. Table:: Computed |logPow| and water-to-octanol transfer energies (in kJ/mol).

   ======== ======== ======== ========= ======== ===========================================
   itp_name  DeltaG0   errDG0    logPow  errlogP  directory
   ======== ======== ======== ========= ======== ===========================================
   BNZ        -12.87     0.43     +2.24     0.07   benzene
   OC9        -16.24     1.12     +2.83     0.20   octanol
   URE         +4.66     1.13     -0.81     0.20   urea
   902         +9.35     1.06     -1.63     0.18   water_TIP4P
   ======== ======== ======== ========= ======== ===========================================


Energy file
...........

The :file:`energy.txt` output file collects all computed energy terms together
with the results also found in the summary file :file:`pow.txt`. Its name is
set with the option :option:`mdpow-pow -e`.  It contains fixed column data in
the following order and all energies are in **kJ/mol**. As an example see
:ref:`Table of Solvation Energies <table-energies-label>` for the same
compounds as above.

**molecule**

  *molecule* identifier from the itp file

**solvent**

  solvent name (water, octanol)

**DeltaG0**

  solvation free energy difference in kJ/mol (Ben-Naim standard
  state, i.e. 1M gas/1M solution)

**errDG0**

  error on **DeltaG0**, calculated through propagation of errors from the
  errors on the mean <dV/dlambda>

**coulomb**

  Coulomb (discharging) contribution to the solvation free energy **DeltaG0**

**errCoul**

  error on **coulomb**

**VDW**

  Van der Waals/Lennard-Jones (decoupling) contribution to **DeltaG0**

**errVDW**

  error on **VDW**

**w2oct**

  transfer free energy from water to octanol (difference between
  **DeltaG0** for octanol and water)

**errw2oct**

  error on **w2oct**

**logPOW**

 |logPow|

**errlogP**

 error on **logPow**

**directory**

 directory under which all data files are stored; by default this is
 the *name* of the molecule and hence it can be used to identify the
 compound.


.. _table-energies-label:

.. Table:: Solvation free energies for compounds in water and octanol (in kJ/mol). 
										  
   ========== ========== ======== ========  ======== ========  ======== ========  ======== ========   ======== ========  ====================
   molecule   solvent     DeltaG0   errDG0   coulomb  errCoul      VDW    errVDW     w2oct errw2oct     logPOW  errlogP  directory
   ========== ========== ======== ========  ======== ========  ======== ========  ======== ========   ======== ========  ====================
   BNZ        water         -2.97     0.21     +7.65     0.07     -4.68     0.20    -12.87     0.43      +2.24     0.07  benzene
   BNZ        octanol      -15.84     0.37     +1.40     0.19    +14.44     0.32    -12.87     0.43      +2.24     0.07  benzene
   OC9        water        -16.03     0.32    +27.35     0.09    -11.32     0.31    -16.24     1.12      +2.83     0.20  octanol
   OC9        octanol      -32.28     1.08    +11.32     0.92    +20.96     0.56    -16.24     1.12      +2.83     0.20  octanol
   URE        water        -53.52     0.17    +56.94     0.11     -3.41     0.14     +4.66     1.13      -0.81     0.20  urea
   URE        octanol      -48.86     1.12    +35.75     1.09    +13.11     0.25     +4.66     1.13      -0.81     0.20  urea
   902        water        -25.46     0.11    +34.93     0.10     -9.48     0.06     +9.35     1.06      -1.63     0.18  water_TIP4P
   902        octanol      -16.11     1.05    +21.16     1.05     -5.05     0.09     +9.35     1.06      -1.63     0.18  water_TIP4P
   ========== ========== ======== ========  ======== ========  ======== ========  ======== ========   ======== ========  ====================


House-keeping scripts
---------------------

A number of scripts are provided to complete simple tasks; they can be
used to check that all required files are present or they can help in
fixing small problems without one having to write Python code to do
this oneself (e.g. by manipulating the checlpoint files). They
generally make the same assumptions about file system layout as the
other mdpow scripts.


Checking if the simulation is complete
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run :program:`mdpow-check` in order to check if all necessary files
are available.

Usage of the command:

   .. program:: mdpow-check

   :program:`mdpow-check` [options] *DIRECTORY* [*DIRECTORY* ...]

   Check status of the progress of the project in *DIRECTORY*.

   Output is only written to the status file (:file:`status.txt`). A quick way to find
   problems is to do a ::

	cat status.txt | gawk -F '|' '$2 !~ /OK/ {print $0}'

   Options:

   .. option:: -h, --help

      show this help message and exit

   .. option:: -o <FILE>, --outfile=<FILE>

      write status results to :file:`{FILE}` [:file:`status.txt`]


Changing paths in :file:`water.simulation` and :file:`octanol.simulation`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It can become necessary to recreate the :file:`{solvent}.simulation`
status/checkpoint files in order to change paths, e.g. when one moved
the directories or transferred all files to a different file system.

Typically, one would execute the :program:`mdpow-rebuild-simulation` command
in the parent directory of *moleculename*.

Usage of the command:

   .. program:: mdpow-rebuild-simulation

   :program:`mdpow-rebuild-simulation` [options] *DIRECTORY* [*DIRECTORY* ...]

   Re-create the ``water.simulation`` or ``octanol.simulation`` file with
   adjusted paths (now rooted at *DIRECTORY*).

   Options:

   .. option:: -h, --help

      show this help message and exit

   .. option:: --solvent=<NAME>

      rebuild file for 'water', 'octanol', or 'all' [all]


Re-building :file:`Ghyd.fep` and :file:`Goct.fep`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It can become necessary to recreate the :file:`{name}.fep` status/checkpoint
files (e.g. if the files became corrupted due to a software error or in order
to change paths).

Typically, one would execute the :program:`mdpow-rebuild-fep` command
in the parent directory of *moleculename*.

Usage of the command:

   .. program:: mdpow-rebuild-fep

   :program:`mdpow-rebuild-fep` [options] *DIRECTORY* [*DIRECTORY* ...]

   Re-create the :file:`Goct.fep` or :file:`Ghyd.fep` file using the appropriate
   equilibrium simulation file under *DIRECTORY*/FEP.

   This should only be necessary when the fep file was destroyed due to a software
   error or when the files are transferred to a different file system and some of
   the paths stored in the :file:`{name}.fep` files have to be changed.

   Options:

   .. option:: -h, --help

      show this help message and exit

   .. option:: --solvent=<NAME>

      rebuild fep for 'water', 'octanol', or 'all' [all]

   .. option:: --setup=<LIST>

      Re-generate queuing system scripts with appropriate paths: runs
      :meth:`fep.Gsolv.setup` with argument `qscript=[LIST]` after
      fixing Gsolv.

      ``LIST`` should contain a comma-separated list of queing system
      templates. For example:
      ``'icsn_8pd.sge,icsn_2pd.sge,local.sh'``. It is an error if
      ``--setup`` is set without a ``LIST``.

