Theory¶

ThermoDiff is based on simple numerical differentiation and reweighting schemes drawing from statistical thermodynamics, including free energy perturbation (FEP).

The main idea is that one can measure the “local sensitivity” of free energies to any individual force field parameter (here generically called \(\sigma\)) by simply using the difference quotient:

\[\frac{\partial F}{\partial\sigma} = \lim_{\Delta \sigma \rightarrow 0} \frac{F(\sigma + \Delta \sigma) - F(\sigma)}{\Delta \sigma}\]

also expressed in terms of the classical thermodynamical integration formula:

\[\frac{\partial F}{\partial\sigma} = \left\langle \frac{\partial H}{\partial \sigma} \right\rangle_{\sigma = \sigma_{0}} = \lim_{\Delta \sigma \rightarrow 0} \frac{\langle H(\mathbf{x}; \sigma + \Delta \sigma) - H(\mathbf{x}; \sigma) \rangle}{\Delta \sigma}\]

Here, \(\frac{\partial F}{\partial\sigma}\) can be calculated using alchemical approaches that yield \(\left\langle \frac{\partial H}{\partial \sigma} \right\rangle_{\sigma\) directly, or using the well-known Zwanzig equation:

\[\Delta F (A \rightarrow B) = -\beta^{-1} \log \left\langle e^{-\beta(H_B - H_A)} \right\rangle_A\]

where \(H_B = H(\mathbf{x}; \sigma + \Delta \sigma)\) and \(H_A = H(\mathbf{x}; \sigma)\). Since the two ensembles only differ by an arbitrarily small change in a single force field parameter, \(\Delta \sigma\), we avoid the common problem of phase space overlap that restricts the direct application of the Zwanzig equation to all but simplest model systems.

Practical implementation¶

Analytical derivatives¶

With the free energy module of Gromacs, it is possible to set up an alchemical topology where state A (\(\lambda=0\)) is the original ensemble, and state B (\(\lambda=1\)) corresponds to \(\sigma\) incremented by 1 unit. In this case, values of \(\frac{\partial H}{\partial \lambda}\) reported in a rerun correspond exactly to \(\frac{\partial H}{\partial \sigma}\), allowing for straightforward calculation of \(\frac{\partial F(\xi)}{\partial\sigma}\) as \(\left\langle \frac{\partial H}{\partial \sigma} \right\rangle_{\sigma, \xi}\).

To calculate the respective derivative for structure-based observables, one can apply the quotient rule to the definition of ensemble average:

\[\left\langle Q \right \rangle = \frac{\int Q(\mathbf{x})e^{-\beta U(\mathbf{x}, \sigma)}d\mathbf{x}}{\int e^{-\beta U(\mathbf{x}, \sigma)}d\mathbf{x}} = \frac{P}{Z}\]

\[\frac{\partial}{\partial\sigma} \left\langle Q \right \rangle = \frac{P'Z - PZ'}{Z^2} = \frac{P'}{Z} - \frac{P}{Z}\ \frac{Z'}{Z} = \frac{P'}{Z} + \left\langle Q \right \rangle \beta F'\]

knowing that \(\frac{Z'}{Z} = \log(Z)' = -\beta F'\). The \(P'\) term turns out to be

\[\frac{\partial}{\partial\sigma} \int Q(\mathbf{x})e^{-\beta U(\mathbf{x}, \sigma)}d\mathbf{x} = -\beta \int Q(\mathbf{x}) \frac{\partial U}{\partial\sigma}e^{-\beta U(\mathbf{x},\sigma)} d\mathbf{x}\]

and hence the \(\frac{P'}{Z}\) term can be written as \(-\beta\left\langle Q \frac{\partial U}{\partial\sigma} \right \rangle\). Putting it all together, the term in question is

\[\frac{\partial}{\partial\sigma} \left\langle Q \right \rangle = \beta \left\langle Q \left(\frac{\partial F}{\partial\sigma} - \frac{\partial U}{\partial\sigma} \right) \right \rangle = \beta \left(\left\langle Q\right \rangle \left\langle\frac{\partial U}{\partial\sigma}\right \rangle - \left\langle Q \frac{\partial U}{\partial\sigma} \right \rangle \right),\]

showing it is, in fact, the (negative) covariance between the observable itself and the internal energy’s sensitivity to \(\sigma, \frac{\partial U}{\partial \sigma}\).

One can trivially extend this formula to treat the ensemble average as a function of the reaction coordinate \(\xi\), and use the corresponding derivative of the free energy profile, \(\frac{\partial F(\xi)}{\partial\sigma}\).

Discrete states¶

In practice, we are often interested in the free energy difference between two arbitrarily defined states, A (e.g. bound) and B (e.g. unbound). If we have access to the free energy profile, and states A and B are defined with boundaries \(A_0, A_1\) and \(B_0, B_1\), we can rewrite the standard formula, \(\Delta F_{AB} = -\beta^{-1} \log\left(\frac{p_A}{p_B}\right)\), as:

\[\Delta F_{AB} = -\beta^{-1} \left (-\log \int_{A_0}^{A_1} e^{-\beta F(\xi,\sigma)}d\sigma +\log \int_{B_0}^{B_1} e^{-\beta F(\xi,\sigma)}d\sigma \right )\]

Differentiating with respect to \(\sigma\) then yields:

\[\frac{\partial}{\partial \sigma}\Delta F_{AB} = -\beta^{-1} \left (\frac{\beta}{p_A} \ \int_{A_0}^{A_1} \frac{\partial F}{\partial \sigma} e^{-\beta F(\xi,\sigma)}d\sigma - \frac{\beta}{p_B} \ \int_{B_0}^{B_1} \frac{\partial F}{\partial \sigma} e^{-\beta F(\xi,\sigma)}d\sigma \right) = \left\langle \frac{\partial F}{\partial \sigma} \right \rangle _A - \left\langle \frac{\partial F}{\partial \sigma} \right \rangle _B\]

so that the derivative of the free energy difference between states A and B is the difference between the respective averages of the derivative over ensembles corresponding to states A and B.

Numerical approximation¶

Alternatively, one can start with the original ensemble generated by the unaltered Hamiltonian \(H_A\) and assigns a weight \(\frac{1}{Z^{\prime}} e^{-\beta (H_B - H_A) }\) to each frame \(i\), corresponding to the probability with which the given frame would show up in an ensemble generated by the slightly perturbed Hamiltonian \(H_B\). This is evident by noting that reweighting original data with such weights yields the desired free energy values:

\[\left \langle e^{-\beta(H_B - H_A)} \right\rangle_A = \frac{\int e^{-\beta (H_B -H_A)} e^{-\beta H_A}d\mathbf{x}}{\int e^{-\beta H_A}d\mathbf{x}} = \frac{\int e^{-\beta H_B}d\mathbf{x}}{\int e^{-\beta H_A}d\mathbf{x}} = e^{-\beta \Delta F}\]

The normalization constant \(Z^{\prime}\) can be straightforwardly calculated from the data as \(\sum_{i} e^{-\beta (H_B - H_A) }\), or just by making sure the reweighting factors add up to 1.

Once weights are calculated, not only the derivatives of free energy \(\frac{\partial F}{\partial\sigma}\), but also of any observed quantity \(\frac{\partial Q}{\partial\sigma}\) can be computed in a simple manner, provided that all the relevant geometries are sampled properly in the input trajectories:

\[\frac{\partial F}{\partial\sigma} = \lim_{\Delta \sigma \rightarrow 0} \frac{-\log\left \langle e^{-\beta(H_B - H_A)} \right\rangle}{\beta\Delta \sigma}\]

\[Q[\Delta \sigma] = \frac{\int Q(\mathbf{x}) e^{-\beta (H_B -H_A)} w_0(\mathbf{x})d\mathbf{x}}{\int w_0(\mathbf{x})d\mathbf{x}}\]

\[\frac{\partial Q}{\partial\sigma} = \lim_{\Delta \sigma \rightarrow 0} \frac{Q[\Delta \sigma] - Q[0]}{\Delta \sigma}\]

In brief, the procedure can be thought of as a search for a subset of frames in which the small perturbation in \(\sigma\) would significantly affect the energetics of the system. Such frames will be assigned large or small weights in the reweighting procedure, and if they cluster significantly in a specific region of the reaction coordinate, this region will be respectively favored or disfavored by a finite change in \(\sigma\).

Assumming a locally linear behavior of \(F(\sigma)\) and \(Q(\sigma)\) (which can be a reasonable assumption sufficiently far from bifurcation points), one can also try to predict the response of the equilibrium ensemble to a finite change in \(\sigma\) as \(\Delta Q = \frac{\partial Q}{\partial\sigma} \Delta \sigma\).

Application to multiple targets and multiple trajectories¶

Since this is just a post-processing procedure, it is in principle very cheap to calculate \(\frac{\partial Q_j}{\partial \sigma_i}\) for a large set of attempted modifications \(\sigma_i\), using a large set of target observables \(q_i\) for which we have reliable reference values \(T_j\). We can also straightforwardedly include data from multiple trajectories, also using free energy methods, provided that Boltzmann weights are available. (Note that free energies can also be used as such observables.) Subsequently, different approaches can be used to identify a small set of parameters \(\sigma_{opt}\) that should simultaneously improve the description of multiple target observables, or that (in a more conservative approach) can be selected for further optimization.

Sensitivity matrix¶

One straightforward (but not necessarily optimal) way to do this is to (again) assume linear behavior of \(Q(\sigma)\), and define the sensitivity \(\mu_{ij}\) as

\[\mu_{ij} = \frac{\frac{\partial Q_j}{\partial \sigma_i}}{T_j - \bar{Q_j}}\]

where the denominator is the difference between the current (prior) and desired (experimental) estimated values of \(Q\), and the numerator estimates the first-order change in \(\bar{Q_j}\) in response to a unit change in \(\sigma_i\). (Loosely speaking, we divide “how much Q we get from a change in sigma” by “how much Q we want” to get “how much we need to change sigma”.)

Least-squares approximation¶

Now, \(\Delta\sigma_i^{opt} = \mu_{ij}^{-1}\) estimates the change in \(\sigma_i\) required to match the target \(T_j\), and the simultaneous optimization of many target quantities can be thought of solving the equation

\[\mu \mathbf{\Delta\sigma} = \mathbf{1}\]

for a set of optimal values, \(\Delta\sigma_i^{opt}\). Assuming that columns \(\mu_j\) are linearly independent and the number of parameters to optimize is less than the number of experimental targets, this is equivalent to solving the least-squares fit problem. In practice, it would be desirable to first compute the sensitivity matrix for a large number of tunable parameters and subsequently only calculate least-squares for a small subset of them (3-4) that shall produce best agreement with experimental targets, in order to reduce overfitting and minimize the effect of (so far neglected) couplings between individual parameters.