DMol - Theory and Implementation

Eq. 15

For an organic molecule, about ten iterations are typically required to obtain convergence at

Eq. 16

SCF procedure

Because H depends upon C, Eq. 14 must be solved by an iterative technique. This can be done by the following procedure:

Choose an initial set of C_iµ.
Construct an initial set of MOs _i.
Construct via Eq. 4.
Using , construct V_e and µ_xc.
Construct H_µ.
Solve Eq. 14 for a new set of C_iµ.
Construct a new _i and new .
If _new = _old, evaluate E_t via Eq. 12 and stop.
If _new _old, return to Step 4.

_new -

_old

< 10^-6. For metallic systems, many more iterations are frequently required.

Numerical basis sets

Atomic basis sets are generated numerically

The basis functions _m are given numerically as values on an atomic-centered spherical-polar mesh, rather than as analytical functions (i.e., Gaussian orbitals). The angular portion of each function is the appropriate spherical harmonic Y_lm(,). The radial portion F(r) is obtained by solving the atomic DFT equations numerically. A reasonable level of accuracy is usually obtained by using a range of 300 radial points from the nucleus to an outer distance of 10 Bohrs ( 5.3 Å). Radial functions are stored as a set of cubic spline coefficients for each of the 300 sections, so that F(r) is actually piecewise analytic. This is an important consideration for generating analytic energy gradients. In addition to the basis sets, the -²/ 2 terms required for evaluation of the kinetic energy are also stored as spline coefficients.

Advantages of numerically derived basis sets

The use of the exact DFT spherical atomic orbitals has several advantages. For one, the molecule can be dissociated exactly to its constituent atoms (within the DFT context). Because of the quality of these orbitals, basis set superposition effects (Delley 1990) are minimized, and an excellent description of even weak bonds is possible.

Additional basis functions, including polarization

Greater variational freedom is obtained by providing larger basis sets. Generation of an entire second set of functions results in doubling the basis set size; this is referred to as a double-numerical (DN) set. This notation is used by analogy with Gaussian double-zeta (DZ) sets, but the N is used to emphasize the numerical nature of these orbitals. Additional basis functions, including polarization, are obtained by several procedures:

DFT atomic ion calculations.
DFT excited-state atom calculations.
Hydrogenic orbitals.

For first-row atoms, functions from +2 ions provide a reasonable double basis set. A hydrogenic 3d orbital obtained for a nucleus of Z = 5 provides a good polarization function for each of these atoms. A hydrogenic 2p function for Z = 1.3 is used for hydrogen. The use of various nuclear charges to generate polarization functions is analogous to the variation of zeta in Gaussian basis sets. For metals, 4p polarization functions are generated by solving the atomic equations for a 4s

4p excited state. Basis set quality has been analyzed in detail by Delley (1990).

Frozen core functions reduce computational cost

In all cases, the frozen-core approximation may be used. Core functions are simply frozen at the values for the free atoms, and valence orbitals are orthogonalized to them. Use of frozen cores reduces the computational effort by reducing the size of the secular equation (Eq. 10) without much loss of accuracy.

Numerical integration

Evaluation of the integrals in Eqs. 15 and 16 must be accomplished by a 3D numerical integration procedure, due to the nature of the basis functions. The matrix elements need to be approximated by the finite sums:

Eq. 17

Eq. 18

The sums run over several numerical integration points r_i. The term H_eff (r_i) represents the value of the integrand of Eq. 15 at point ri and w(ri) represents a weight associated with each mesh point. Increasing the number of mesh points improves the numerical precision of the integral but also results in additional computational cost.

Atomic and molecular integration grids

Careful selection of a set of integration points is important for the quality of the calculation (Delley 1990, Ellis and Painter 1968, Boerrigter et al. 1988). In general, the grid used to generate the atomic basis set is not suitable for molecular calculations. The grid used for atomic basis sets can take advantage of spherical symmetry, which greatly simplifies matters. For molecules, it is necessary to be able to treat the rapid oscillations of the molecular orbitals near the nuclei and to avoid integration of the nuclei themselves because of the nuclear cusps (Delley 1990).

Integration points, atomic size, precision, and computational cost

The integration points are generated in a spherical pattern around each atomic center. Radial points are typically taken from the nucleus to an outer distance of 10 Bohrs ( 5.3 Å). The number of radial points within this distance is designed to scale with increasing atomic number. For example, Fe requires more points than C. The typical number of radial points N_R for a nucleus of charge Z is:

Eq. 19

This number may, of course, be manually adjusted to accommodate the required precision or allowed cost of a calculation. The spacing between points is logarithmic--points are spaced more closely near the nucleus where oscillations in the wavefunction are more rapid.

Atomic shells

Angular integration points N are generated at each of the N_R radial points, creating a series of shells around each nucleus. Angular points are selected by schemes designed to yield points r_i and weights w(r_i), which could yield exact angular integration for a spherical harmonic with a given value of l. Such quadrature schemes (Stroud 1971, Lebedev 1975 and 1977, Konyaev 1979) are available for functions up to l = 41. Alternatively, a product-Gauss rule in cos and may be used for arbitrary values of l (Stroud 1971). The product-Gauss methods use (l + 1) points on each shell, while the quadrature methods use more points:

l N
5 14
7 26
11 50
17 110
23 194
29 302
35 434
41 590

l	N
5	14
7	26
11	50
17	110
23	194
29	302
35	434
41	590

Assuring consistent precision during integration

In practice, the quadrature method is used to generate the integration grid, and the product method is used as a check of numerical precision. On each radial shell, the sum of the atomic densities and the atomic electrostatic potentials is computed using both angular schemes. If the difference between the results of the two schemes is too large, then the next highest l value is used to generate a set of new points. This assures a consistent level of numerical precision throughout the integration grid. Typically, angular sampling increases when the difference between the two schemes is greater than 10^-4. This yields 1000 points per atom and offers a good compromise between computational cost and numerical precision.

Partition functions improve convergence and avoid nuclear cusps

Partition functions are used to increase the convergence of the numerical integration and to avoid integrating over nuclear cusps (Delley 1990, Hirshfeld 1977, Becke 1988). A partition function _a is defined as:

Eq. 20

is an atom index and g(r - R) is a function that typically is large for small r - R and small for large r - R (i.e., larger near the nucleus). Integrals are rewritten using partition functions as:

Eq. 21

which is further reduced to a sum over 3D integration points:

Eq. 22

Preferred partition function

In practice, the partition functions are combined with the integration weights w(r_i) to simplify the computation. The preferred choice for a partition function is:

Eq. 23

where r = |r_i - R|, r₀ = 0.5, and

_a is the atomic charge density for atom

. Other options for partition functions include:

Eq. 24

Eq. 25

Eq. 26

Eq. 27

Evaluation of the effective potential

Evaluation of the exchange-correlation energy

_xc and potential µ_xc is accomplished with Eqs. 52, 55, and 57. This requires numerical evaluation of the charge density

(r) at many points in space (i.e.,

_xc and µ_xc are tabulated numerically). This restriction actually applies to most density function methods, even if analytical basis functions are used (Andzelm et al. 1989, Versluis and Ziegler 1988). The use of numerical basis functions facilitates this process, since all required quantities are already available on a grid of adequate numerical precision. An alternative approach (Baerends et al. 1973) is to fit the charge density to an analytic multipolar expansion via a least-squares fitting procedure. This simplifies the evaluation of

_xc and µ_xc, but still requires the use of a numerical grid for the least-squares fitting.

Evaluating the Coulombic potential numerically

The Coulombic potential is evaluated by solving the Poisson equation for the charge density:

Eq. 28

rather than by explicitly evaluating the Coulombic term as:

Eq. 29

In the this approach, the Poisson equation is solved in a completely numerical (non-basis set) approach (Delley 1990). This provides greater numerical precision, since the evaluation of V_e is essentially exact once the form of

(r) has been specified. Such a method requires specification of an analytic form of

(r), as discussed above. However, rather than use a least-squares fitting procedure, a projection scheme is used. The charge density is first partitioned into atomic densities and then decomposed into multipolar components. Appropriate partition functions can ensure that such an expansion is rapidly convergent.

This yields the model charge density

The density obtained in this way is called the model density. The term _a_lm r - R gives the model density for the multipolar component lm on atom for a shell at r - R distance from the nucleus:

Eq. 30

Note that the partition function p used for decomposition of the density is in general not the same as that used to improve the numerical integration in Eq. 22. The total model density is obtained from the summation over all

_a_lm:

Eq. 31

Effect of angular truncation on precision of model charge density

The total model charge density is, in general, not equal to the orbital density because of angular truncations. However, the flexibility of this model charge density is superior to that obtained with fitting procedures. The degree of angular truncation can be specified as an input parameter. Typically, a value of l one greater than that in the atomic basis provides sufficient precision; for example, the use of l = 3 truncation when p functions are present in the basis or l = 2 truncation if only p functions are used.

The Coulombic potential

The Coulombic potential for each component is calculated using the Green function of the Laplacian (Delley 1990):

Eq. 32

The total potential

and the total potential is given by:

Eq. 33

Computational self-consistent field procedure

Interpolating the numerical atomic bases onto the molecular grid

Before the self-consistent field (SCF) procedure can be used, a step is required that is analogous to the evaluation of integrals over atomic orbitals, such as in Hartree-Fock methods. This is the interpolation of the numerical atomic basis set onto the molecular grid. Neglecting symmetry and any frozen-core approximations, this step requires a computational effort on the order of N X P for N atomic orbitals and P integration points. The basis set is controlled by the user, the number of points typically being on the order of 1000 points per atom. The overlap matrix (Eq. 16) and the constant portion of the effective Hamiltonian (Eq. 15) (kinetic and nuclear attraction terms) are constructed at this time, and each requires N X (N + 1) X P operations. The interpolated values can be stored externally and read as they are required. Alternatively, these data can be generated as needed, obviating the need for storage--this is termed a direct SCF procedure, by analogy with the direct Hartree-Fock method (Almlöf et al. 1982).

Constructing the initial molecular electron density

In practice, it is more convenient to skip Steps 1 and 2 (choosing an initial C_iµ and constructing an initial set of _i, (see Choose an initial set of Cim.)) and to begin instead with an initial constructed from the superposition of atomic densities (quantities that are readily constructed from the numerical atomic basis set). In the SCF procedure, reconstruction of the new density requires a computational effort on the order of N X N_o X P, where N_o is the number of occupied orbitals.

Additional computational costs

Once (r) is known, evaluation of the exchange-correlation potential µxc requires only P operations. Construction of the Coulombic potential requires only P X M effort, where M is the number of multipolar functions. M is typically on the order of 9 functions atom^-1 (l = 2) or 16 functions atom^-1 (l = 3). Neither of these steps is especially time consuming.

Construction of the Hamiltonian matrix elements (Eq. 14) is among the most time-consuming steps, requiring N X (N + 1) X P operations each iteration. Solution of the secular equation is also time consuming, requiring N³ operations. This can be reduced by solving only for the eigenvectors corresponding to occupied orbitals to N² X N_o.

Reducing the computational cost

For large systems, the computational cost does not necessarily grow as rapidly as implied by the above comments. Since the atomic basis functions have a finite extent (10 Bohrs), only a limited number of points contribute to each matrix element and P eventually converges to a constant. In addition, construction of the density and the secular matrix can be accomplished using sparse matrix multiplier routines, further reducing the computational cost.

Damping and convergence

Construction of a new density follows solution of the secular equation. Damping is usually required to ensure smooth convergence. In the current method, simple damping is possible:

Eq. 34

where d is the damping factor,

_old is the density that was used to construct the secular matrix,

_new is the density constructed from the new MO coefficients without damping, and

is the density that is actually used in the next iteration. An interpolation/extrapolation scheme is available. This technique constructs an effective vector from

_old to

_new for the current iteration and for the previous iteration. The effective vectors are generally skew vectors. The point of closest approach between

_old and

_new is used to extrapolate the actual density for the next iteration. The Pulay DIIS method (Pulay 1982) has been implemented.

SCF convergence acceleration by DIIS

The direct inversion of the iterative subspace (or DIIS) method developed by Pulay (Pulay, 1982) has been implemented in DMol as a mechanism to speed up SCF convergence. The method rests on the suitable definition of an error vector that is zero when convergence is achieved and on performing a linear combination of a set of error vectors sampled along iterations that produces a new error vector with minimal norm. (This algorithm was present in previous versions of DMol, yet restricted to interpolation of at most two error vectors.) The DIIS method is much more powerful if one allows for a slightly larger dimension of iterative subspace. The default dimension currently adopted in DMol is 4. An upper limit of 10 was imposed in the current DIIS scheme, which is motivated by the fact that linear dependencies between error vectors and therefore the amount of redundant information increases with the size of the vector space.

The error vector for the DIIS procedure is defined as the difference between the input radial densities and the projected output radial densities (see Eq. 30 for the definition of the model density):

Eq. 35

The final model density is a linear combination of the densities at each SCF iteration i:

Eq. 36

with the constraint that

. The C_i coefficients are calculated by minimizing the norm:

Eq. 37

This overlap integral of error vectors e_i requires summation over all the multipolar components k = ,l,m and numerical integration over all the molecular grid points.

For spin-unrestricted calculations, the error vectors obtained from the total density (sum of densities for electrons of spin alpha and spin beta) and the spin density (difference of densities for electrons of spin alpha and spin beta) are combined. In practice, the spin density error vector is appended to the total density error vector.

Inversion of the DIIS linear equation system is achieved by means of a singular value decomposition. This is necessary for dealing with singularities caused by linear dependencies between error vectors.

Efficiently calculating the electrostatic potential

The equations of density functional theory include an electrostatic potential arising from the negatively charged electron density. For more efficient calculations, the potential is found by solving the Poisson equation rather than by the equivalent approach of four-center direct integrals. Using the Poisson equation requires an auxiliary density representation ^~, which is a function rather than the sum of squares of a function. ^~ differs from (Eq. 4):

Eq. 38

Effect of auxiliary density approximation on accuracy of calculated total energy

To minimize the impact of this difference a total-energy formula is used that is second-order in (Delley et al. 1983, Delley 1990, Delley 1991). The default starting density in DMol is the sum of the spherical atom densities. The total energy calculated in the first SCF cycle is thus the so-called Harris approximation (Harris 1985). The atomic dissociation energy in the first cycle is usually overestimated, since the electrostatic error term for the total energy:

Eq. 39

is negative definite. The less important second-order term for local density functionals is positive definite, which leads to a slight overestimation of the total energy during the SCF iterations.

The complete total-energy formula, correct to the second order in , is now:

Eq. 40

is the density for spin alpha and beta, respectively; n_i are the occupations of orbitals with the orbital and spin labels i,

;

_i is the corresponding eigenvalue, which has been calculated using the static V^~_e and exchange-correlation potential µ^~_xc_, arising from the spin densities

^~. E_xc is the exchange-correlation functional (local or nonlocal).

Energy gradients

Predicting chemical structure

The ability to evaluate the derivative of the total energy with respect to geometric changes is critical for the study of chemical systems. Without the first derivatives, a laborious point-by-point procedure is required, which is taxing to both computer and human resources. The availability of analytic energy derivatives for Hartree-Fock (Pulay 1969), CI (Brooks et al. 1980), and MBPT (Pople et al. 1979) theories (to name just a few) has made these remarkably successful methods for predicting chemical structures.

The energy gradient formulas for the Hartree-Fock-Slater method were first derived by Satako (1981) and later implemented practically using Slater basis sets (Versluis and Ziegler 1988). Others have used Gaussian basis sets to compute derivatives of the DFT energy (Andzelm et al. 1989).

First derivative of total energy with respect to change in nuclear position

The derivative of the total energy in Eq. 12 with respect to a nuclear perturbation in direction a (x, y, or z) of atom may be written as:

Eq. 41

where the derivative density is defined as:

Eq. 42

with:

Eq. 43

Derivative of the basis function

The derivative of the basis function _m with respect to the perturbation a can be computed analytically because of the representation of the numerical basis sets. The angular portion of _m is a spherical harmonic function, which is analytic and easily differentiated. The radial portion is represented by several cubic splines, each of which is also differentiable.

Derivation of other terms

The derivatives of the eigenvalues can be obtained from Eq. 10. Multiplying by _i and integrating gives an expression for _i:

Eq. 44

Differentiating and rearranging yields:

Eq. 45

The terms in Eqs. 45 and 49 involving Z represent the Hellmann-Feynman force (Hellmann 1939, Feynman 1939), which gives the derivative in the absence of any orbital relaxation.

Substituting Eq. 45 into Eq. 41 yields:

Eq. 46

Where E^a_t is the Hellmann-Feynman term. Now (Andzelm et al. 1989):

Eq. 47

and recalling the definition in Eq. 11, we can also write:

Eq. 48

The final equation for the derivative of the energy

Therefore, the terms in Eq. 46 involving _xc cancel. In addition, if Eq. 29 is used to construct the charge density, then the last two terms in Eq. 46 also cancel, leaving:

Eq. 49

which is formally the same as the equation derived by other researchers (Andzelm et al. 1989, Versluis and Ziegler 1988). In practice, however, it is necessary to compute both

V^a/2 and

^aV/2, because the model density from Eq. 31 is not exactly equal to the numerical charge density computed from Eq. 4.

Computational costs

The time required to evaluate all 3N first derivatives for an N-atom system is typically the same as the time required for 3-4 SCF iterations. If convergence is achieved in, say, 10-12 iterations, then 25-30% additional time is required to evaluate the derivatives. Others have obtained similar results (Versluis and Ziegler 1988).

Potential problems

Because of numerical precision, two potential problems have been observed in evaluating gradients. First, the energy minimum does not correspond exactly to the point with zero derivative (Versluis and Ziegler 1988). The gradients are typically about 10^-4 at the energy minimum. A second important point is that the sum of the gradients is not always zero, as it must be for translational invariance. The sum can be as high as 10^-3 if the calculation is very poor. Increasing the quality of the integration mesh and the number of multipolar functions in the model density can reduce this to about 3.0 X 10^-5. This magnitude of error seems to be permissible for geometry optimizations: the error introduced in the geometry is typically on the order of 0.001 Å. Only for very flat potential energy surfaces should this be a problem.

Minimization algorithms; molecular symmetry

Currently, the geometry is optimized using both Cartesian and internal coordinates (see Geometry optimization -- The OPTIMIZE suite of algorithms). When the geometry is optimized under conditions of symmetry, only forces that maintain molecular symmetry are evaluated, resulting in considerable time savings. Even in the absence of symmetry, certain forces can be omitted from the calculation, resulting in faster calculations. For example, for a substrate adsorbed on a metal cluster, it is possible to compute gradients for the substrate only and perform no optimization on the metal.

Electron gas model

The equations actually used in DMol

The parameterized equations for the electron gas exchange-correlation energy as used in DMol are presented explicitly in this section. An excellent review of the electron gas model can be found in Appendix E of Parr and Yang (1989). As mentioned earlier, the implementation in DMol is based on the work of von Barth and Hedin (1972).

First, define r_s as:

Eq. 50

The exchange energy

Then the exchange energy is given by:

Eq. 51

Spin-restricted computations

Eq. 52

The correlation contribution depends upon whether this is a spin-restricted or unrestricted computation. For a restricted computation, the correlation contribution is given by:

Eq. 53

: c_o = 0.0225,
: r₀ = 21,

Spin-unrestricted computations

Eq. 54

For a spin-unrestricted computation, the expression is:

Eq. 55

Eq. 56

DMol supports most of the chemically important symmetry point groups. If, however, a system is selected with a point group that is not supported, DMol automatically switches to the highest-order subgroup. The C_n, C_nh, and S_2n groups are not supported yet, because the complex character tables are not available in DMol. The rotational groups C

Eq. 57

Point-group symmetry

_v and D

_h are also not supported yet, but such systems can be computed in C_6v and D_6h, respectively, without major loss in efficiency. The supported groups are:

	C₁	C_s	C_i	C₂	C_2v     C_2h      D₂      D_2h     D_2d		C_3v		D₃	D_3h	D_3d		C_4v		D₄	D_4h	D_4d		C_5v		D₅	D_5h	D_6d		C_6v		D₆	D_6h	D_5d		T_d	O	O_h	I	I_h

Geometry optimization -- The OPTIMIZE suite of algorithms

Introduction

DMol includes a powerful suite of algorithms for geometry optimization, referred to collectively as OPTIMIZE. Within the Insight environment, you can control OPTIMIZE via parameters in the Optimize/Opt_Parameters command.

After this introductory section, the theory behind the principal algorithms in OPTIMIZE is presented, starting under Theory and implementation. (This section can be skipped by those whose sole interest is how to use the program.) Methodology--Insight, includes a discussion of input parameters, options, and how to use OPTIMIZE within the Insight interface. Lessons demonstrating how to use the Optimize/Opt_Parameters command are described in Tutorial--The Insight Environment.

What OPTIMIZE is and does

OPTIMIZE is a general geometry-optimization package for locating both minima and transition states on a potential energy surface. It can optimize in Cartesian coordinates or in a nonredundant set of internal coordinates that are generated automatically from input Cartesian coordinates. It also handles fixed constraints on distances, bond angles, and dihedral angles in Cartesian or (where appropriate) internal coordinates.

The process is iterative, with repeated calculations of energies and gradients and calculations or estimations of Hessians in every optimization cycle until convergence is attained (see scheme in Figure 1). The whole art of geometry optimization lies in calculating the step h so as to converge in as few such cycles as possible.

Figure 1 . Schematic of geometry optimization by OPTIMIZE set of algorithms

Ease of use

OPTIMIZE is designed to operate with minimal user input. All you need to provide is the initial geometry in Cartesian coordinates (obtained from the Insight or Discover programs or from an appropriate database), the type of stationary point sought (minimum or transition state), and details of any imposed constraints. All decisions as to the optimization strategy--how to handle the constraints, whether to use internal coordinates, which optimization algorithm to use--are made by OPTIMIZE.

You may, of course, override the default choices and force a particular optimization strategy, but there is no need to provide OPTIMIZE with anything other than the minimal information outlined above. In particular, you do not need to provide, for example, a Z-matrix or other connectivity data in order to take advantage of the potential efficiency gains associated with the use of internal coordinates. An excellent set of natural internal coordinates (Fogarasi et al. 1992, Baker 1993) can be generated automatically from the input Cartesians, and the optimization can be carried out using these coordinates.

Eigenvector following

The heart of the program (for both minimization and transition-state searches) is the EF (eigenvector-following) algorithm (Baker 1986). The Hessian mode-following option incorporated into this algorithm is capable of locating transition states by walking uphill from the associated minima. By following the lowest Hessian mode, the EF algorithm can locate transition states, starting from any reasonable input geometry and Hessian.

DIIS acceleration

An additional option available for minimization is GDIIS, which is based on the well known DIIS technique for accelerating SCF convergence (Pulay 1982).

Constrained optimizations

The strategy adopted for constrained optimization depends on the starting geometry and the nature of the constraints. Constraints can be handled easily in internal coordinates, provided that (1) the constrained parameter (distance, angle, or dihedral) is a natural part of the coordinate set and (2) the constraint is rigorously satisfied in the starting structure. If both (1) and (2) hold for all desired constraints, then OPTIMIZE carries out the optimization in internal coordinates. Otherwise, the constrained optimization is performed in Cartesian coordinates.

Good Hessian leads to efficient optimization in Cartesian space

Traditional wisdom has it that optimization in Cartesian coordinates is inefficient relative to internal coordinates; however, recent work (Baker and Hehre 1991) has clearly demonstrated that, if a reasonable estimate of the Hessian matrix is available (e.g., from a molecular mechanics forcefield) at the starting geometry, then optimization directly in Cartesian coordinates is as efficient as an internal coordinate optimization. In particular, constrained optimization can be handled in Cartesian coordinates as efficiently as with a Z-matrix, with the additional advantages that any distance, angle, or dihedral constraint between any atoms in the molecule can be dealt with (i.e., there is no formal connectivity requirement), and the desired constraint does not have to be satisfied in the starting structure.

Lagrange multiplier algorithm

OPTIMIZE incorporates a very accurate and efficient Lagrange multiplier algorithm for handling constraints in Cartesian coordinates, with a more robust (but less efficient) penalty function algorithm as a backup. Both algorithms are suitably modified versions of the basic EF algorithm (Baker 1992). The Lagrange multiplier code can locate constrained transition states, as well as minima.

The original Lagrange multiplier algorithm has been significantly enhanced, to incorporate both fixed and dummy atoms (pseudoatoms) (Baker and Bergeron 1993). Standard distance and angle constraints can be specified with respect to dummy atoms, greatly extending the range of constraints that can be handled. Fixed atoms can be eliminated from the calculation (since there is no need to calculate their gradients), resulting in potentially significant savings of CPU time in ab initio computations.

Theory and implementation

The EF algorithm and mode following

Shifting the Newton- Raphson step to favor optimization along an eigenmode

Mode following is a powerful technique for geometry optimization. It involves modifying the standard Newton-Raphson step:

Eq. 58

by introducing a shift parameter

so that (Cerjan and Miller 1981):

Eq. 59

In terms of a diagonal Hessian representation, this can be written:

Eq. 60

where the u_i and b_i are the eigenvectors and eigenvalues of the Hessian matrix H, and

= u^t_i g is the component of g along the local eigenmode u_i. Scaling the Newton-Raphson step in this way has the effect of directing the step to lie primarily (but not exclusively) along one of the local eigenmodes, depending on the value chosen for

How the EF algorithm chooses the shift parameter

Various recipes for choosing a suitable shift parameter exist: the EF algorithm utilizes a rational function approximation to the energy, yielding an eigenvalue equation of the form (Banerjee et al. 1985):

Eq. 61

from which a suitable

can be obtained. This RFO matrix equation has the following important properties:

The (n + 1) eigenvalues of Eq. 61 {_i} bracket the n eigenvalues {b_i} of the Hessian matrix _i b_i _i₊₁.
At convergence to a stationary point, one of the eigenvalues of the RFO matrix is zero and the other n eigenvalues are those of the Hessian at the stationary point.
For a saddlepoint of order m the zero eigenvalue separates the m negative and (n - m) positive Hessian eigenvalues.

EF enables both minimization and transition-state optimization

Property 3--the separability of the positive and negative Hessian eigenvalues--allows two shift parameters _p and _n to be used, one for modes along which the energy is to be maximized and the other for which it is minimized. Specifically, for a transition state (a saddlepoint of order 1) in terms of the Hessian eigenmodes, we have the two matrix equations:

Eq. 62

Eq. 63

where it is assumed that maximization is along the lowest Hessian mode b_i. Note that

_p is the highest eigenvalue of Eq. 62--it is always positive and approaches zero at convergence--while

_n is the lowest eigenvalue of Eq. 63--it is always negative and again approaches zero at convergence.

Choosing these values of gives a step that attempts to maximize along the lowest Hessian mode and minimize along all the others. It always does this regardless of the eigenvalue signature (unlike the standard Newton-Raphson step). The two shift parameters are then used in Eq. 61 to give a final step:

Eq. 64

This step may be further scaled down if it is considered too long. For minimization, only one shift parameter

_n would be used and this would act on all modes. It is often possible to locate different transition states from the same starting structure by maximizing along a mode other than the lowest (hence "mode following").

Constrained optimization

Lagrange multipliers as constraints

The essential problem in constrained optimization is to minimize a function of, say, n variables F(x) subject to a series of m constraints of the form C_i(x) = 0, (i = 1 ... m). This can be handled by introducing the Lagrangian function (Fletcher 1981):

Eq. 65

which replaces the function F(x) in the unconstrained case. Here, the

_i are the so-called Lagrange multipliers, one for each constraint C_i(x). Taking the derivative of Eq. 65 with respect to x and

gives:

Eq. 66

At a stationary point of the Lagrangian function, we have

Eq. 67

L = 0, that is, all

x_j = 0 and all

_i = 0. This latter condition means that all C_i(x) = 0, and so all constraints are satisfied. Hence, finding a set of values (x,

) for which

L = 0 gives a possible solution to the constrained optimization problem in precisely the same thing as finding an x for which g =

F = 0 gives a solution to the corresponding unconstrained problem.

EF and constrained optimizations

We can implement mode following in constrained optimization by simply adopting Eq. 61, but with H replaced by ²L and g replaced by L. However, it is important to realize that each constraint introduces an additional mode to the Lagrangian Hessian (²L), which has negative curvature (a negative Hessian eigenvalue). Thus, when considering minimization with m constraints, you should look for a stationary point of the Lagrangian function whose Hessian has m negative eigenvalues, that is, for a saddle point of order m.

Insofar as mode following is concerned then, assuming a diagonal Lagrangian Hessian representation, Eqs. 62 and 63 for an unconstrained system should be replaced by the following for a constrained system:

Eq. 68

Eq. 69

where now the b_i are the eigenvalues of

²L with corresponding eigenvectors u_i and

= u^t_i

L. Constrained transition-state searches can be carried out by selecting one extra mode to be maximized in addition to the m constraint modes, that is, by searching for a saddlepoint of the Lagrangian function of order m + 1.

GDIIS

In the GDIIS method, geometries (x_i) generated in previous optimization steps are linearly combined to find the "best" geometry on the current cycle (Császár and Pulay 1984):

Eq. 70

Finding appropriate coefficients for use in GDIIS method

The problem here, of course, is to find appropriate values for the coefficients C_i.

If we express each geometry (coordinate vector) by its deviation from the sought final converged geometry x_f, that is, x_i = x_f + e_i, then it is obvious that if the conditions:

Eq. 71

are satisfied, then the relation:

Eq. 72

Eq. 73

also holds.

Error vectors

The true error vectors e_i are, of course, unknown. However, they can be approximated by:

Eq. 74

where g_i is the gradient vector corresponding to the geometry x_i. Minimization of the norm of the residuum vector (Eq. 71), together with the constraint equation (Eq. 72), leads to a system of m + 1 linear equations:

Eq. 75

where B_ij =

e_i

e_j

is the scalar product of the error vectors e_i and e_j, and

is a Lagrange multiplier.

Calculating the intermediate geometry and its gradient

The coefficients C_i determined from Eq. 75 are used to calculate an intermediate interpolated geometry:

Eq. 76

and its corresponding interpolated gradient:

Eq. 77

Relaxing the intermediate geometry

A new, independent geometry is generated by relaxing the interpolated geometry according to:

Eq. 78

Modifications of the original GDIIS algorithm

In the original GDIIS algorithm, the Hessian matrix is static, that is, the original starting Hessian remains unchanged during the entire optimization. However, updating the Hessian at each cycle generally results in more rapid convergence, and this is the default in OPTIMIZE. Other modifications to the original method include limiting the number of previous geometries used in Eq. 70 by neglecting earlier geometries and eliminating any geometries more than a certain distance (default = 0.3 au) from the current geometry.

COSMO--solvation effects

The DMol module in release 4.0.0 of the Insight program includes some COSMO controls, which allow for the treatment of solvation effects.

The COnductor-like Screening MOdel (COSMO) (Klamt and Schüürmann 1993) is a continuum solvation model (CSM) (Tomasi and Persico 1994), where the solute molecule forms a cavity within the dielectric continuum of permittivity that represents the solvent:

The charge distribution of the solute polarizes the dielectric medium. The response of the dielectric medium is described by the generation of screening (or polarization) charges on the cavity surface. In contrast to other implementations of CSMs, COSMO does not require solution of the rather complicated boundary conditions for a dielectric in order to obtain screening charges, but instead calculates the screening charges using a much simpler boundary condition for a conductor. Then these charges are scaled by a factor f() = ( - 1) / ( + 1/2), to obtain a rather good approximation for the screening charges in a dielectric medium.

The deviations of this COSMO approximation from the exact solution are rather small. For strong dielectrics like water they are less than 1%, while for nonpolar solvents with 2 they may reach 10% of the total screening effects. However, for weak dielectrics, screening effects are small, and the absolute error therefore amounts to less than a kilocalorie per mole.

Altogether, COSMO is a considerable simplification of the CSM approach without significant loss of accuracy. Because of this simplification, COSMO allows for a more efficient implementation of the CSM into quantum chemical programs and for accurate calculation of gradients, which allows geometry optimization of the solute within the dielectric continuum.

The screening charges are determined from the boundary condition of vanishing potential on the surface of a conductor. If we define q as a vector of the screening charges on the surface of the cavity, and Q = + Z for the total solute charges such as electron density and nuclear charges Z, then the vector of potentials V_tot on the surface is V_tot = BQ + Aq = V_sol + V_pol, where BQ is the potential arising from the solute charges Q, and Aq is the potential arising from surface charges q. B and A are Coulomb matrices. For a conductor, the relation V_tot = 0 must hold, which defines the screening charges as:

Eq. 79

For further details of the COSMO theory see Klamt and Schüürmann (1993). COSMO provides the electrostatic contribution to the free energy of solvation. In addition, there are nonelectrostatic contributions to the total free energy of solvation, which describe the dispersion interactions and cavity formation effects.

DMol/COSMO

The COSMO electrostatic energy is analogous in form to the DMol electrostatic energy, with the Coulombic operator replaced by the dielectric operator D = BA^-1 B. From Eq. 40 we can write:

Eq. 80

represents the auxiliary density, which is introduced to solve the Poisson equation for the electrostatic potential of the solute. This total energy is minimized, resulting in the Kohn-Sham equations for the molecular orbitals. The Kohn-Sham Hamiltonian now includes an electrostatic COSMO potential:

Eq. 81

This potential is present in every SCF cycle. This direct incorporation of the solvent effects within the SCF procedure is a major computational advantage of the COSMO scheme. Since the DMol/COSMO orbitals are obtained using the variational scheme, we can derive accurate analytic gradients with respect to the coordinates of the solute atoms. The complete theory is presented in Klamt and Schüürmann (1993) and Andzelm et al. (1995). The gradients include the forces between the solute charges Q and the screening charges q.

To summarize, a DMol/COSMO calculation begins with the construction of the cavity surface. The screening charges are then evaluated using Eq. 79 and the initial solute charges Q. This allows for calculation of the electrostatic COSMO potential. The process is repeated until DMol SCF convergence is achieved. The final total energy includes the DMol/COSMO electrostatic energy (Eq. 80). If geometry optimization is requested, DMol/COSMO gradients are evaluated, and the new geometry is calculated. The next optimization cycle begins with reconstruction of the cavity surface, and the process continues until the DMol optimization convergence criteria are met.

The DMol/COSMO method has been tested extensively (Klamt and Schüürmann 1993, Andzelm et al. 1995). The results depend mainly on the choice of the van der Waals radii used to evaluate the cavity surface. The other parameters defining the cavity surface (see below) are less important. Of course, solvation energies depend on the choice of DMol parameters, such as the type of DFT functional, the basis set, and integration grid. Results obtained so far (Andzelm et al. 1995) suggest that the DMol/COSMO model can predict solvation energies for neutral solutes with an accuracy of about 2 kcal mol^-1.

Determination of the cavity surface (or solvent-accessible surface)

The surface is obtained as a superimposition of spheres centered at the atoms, discarding all parts lying on the interior part of the surface (Klamt and Schüürmann 1993). The spheres are represented by a discrete set of points, the so-called basic points. Eliminating the parts of the spheres that lie within the interior part of the molecule thus amounts to eliminating the basic grid points that lie in the interior of the molecule.

The radii of the spheres are determined as the sum of the van der Waals radii of the atoms of the molecule and of the probe radius. The surviving basic grid points are then scaled to lie on the surface generated by the spheres of van der Waals radii alone. The basic points are then collected into segments, which are also represented as discrete points on the surface. The screening charges are located at the segment points.

Determination of nonelectrostatic contributions to the free energy of solvation

The free energy of solvation is calculated as:

Eq. 82

where E^o is the total DMol energy of the molecule in vacuo, E is the total DMol/COSMO energy of the molecule in solvent, and

G_{nonelectrostatic} is the nonelectrostatic contribution due to dispersion and cavity formation effects. The nonelectrostatic contributions to the free energy of solvation are estimated from a linear interpolation of the free energies of hydration for linear-chain alkanes as a function of surface area:

Eq. 83 G_{nonelectrostatic} = A_Constant + B_Constant * surface_area

In the present implementation, the VWN potential, DNP basis set, and fine integration grid of DMol were used to calculate energies of methane and octane (C_2h). The experimental values of the free energy of solvation are 1.9 and 2.9 kcal mol^-1 for methane and octane, respectively (Ben-Naim and Marcus 1984). The default (see next section) set of COSMO parameters was chosen. The calculated surface areas were 38.4 and 104.1 Å² for methane and octane, respectively. The following values of A_Constant and B_Constant were used:

A_Constant = 1.67274 (kcal mol^-1)
B_Constant = 0.02052 (kcal mol^-1 Å^-2)

The best A_Constant and B_Constant values to use depend on the choice of DMol and COSMO input parameters. Our tests (Andzelm et al. 1995) indicate that selection of other parameters, such as the nonlocal BP potential and the van der Waals radii of atoms, can influence

G_{nonelectrostatic} by as much as 0.5 kcal mol^-1.

Electric field gradients

The electric field gradient at a nucleus provides information about the electronic environment of that atom. The property can be probed experimentally by measuring NMR line widths for molecular species that contain NMR-active isotopes with a nuclear quadrupole moment where the quadrupolar relaxation mechanism determines the observed line width. The fact that electric field gradients are a function of the chemical environment is also exploited in NQR (zero-field NMR, or commonly, nuclear quadrupole resonance) scanning to detect explosives (C&E News, 1996).

Derivation

The electric field gradient is defined as the gradient of the electric field and is therefore the second derivative (or Hessian) of the electrostatic potential at a given position:

Eq. 84

where R₀ = position at which the derivative is evaluated, Q = electric field gradient, V = electrostatic potential, R = general position vector, and a, b = x, y, or z.

The electric field vector F at a given position is related to the first derivative:

Eq. 85

The trace of Q defined as above yields (Poisson equation):

Eq. 86

with

= the charge density.

Q can be transformed into a traceless tensor Q' (or matrix) via:

Eq. 87

In DMol, the tensor Q is computed through expanding the electrostatic potential into a Taylor series with respect to each of the nuclei (if symmetry is imposed, only the symmetry-unique nuclei are visited). The matrix Q then appears as the coefficient of the second-order term in the Taylor expansion:

Eq. 88

Uses

The energy of an electrostatic quadrupole moment embedded in an electrostatic potential is determined by the electric field gradient at the position of the quadrupole. This interaction contributes to the NMR line width (as further outlined below).

A complete quantitative description of this interaction is actually more complex than what is presented above and is captured by the so-called Sternheimer factor (Sternheimer 1966).

Estimating NMR line widths in solution

The electric field gradient at a quadrupolar nucleus (spin I > 1/2) within a molecule determines the NMR longitudinal relaxation rate 1/T₁ or the line width W₁/2 (this assumes that the quadrupolar relaxation mechanism dominates):

Eq. 89

Note


Miyake et al. (1996) use a different formula, in which enters into the equation as (1 + ²)/3.

In Eq. 89, I = nuclear spin quantum number (e.g., I = 1 for ¹⁴N, I = 5/2 for ¹⁷O, I = 3/2 for ³³S, I = 7/2 for ⁴⁵Sc), = nuclear quadrupolar coupling constant and is = ² Q q_zz/h, where q_zz = largest principal component of the EFG tensor q, (asymmetry parameter) = |q_xx - q_yy|/q_zz, and the x, y, z axes are chosen such that || < 1. (Since enters into the relaxation time as ², the sign of is not important and may be dropped (Bagno & Scorrano 1996).)

t_c = rotational correlation time and can be estimated from the Debye-Stokes-Einstein formula as t_c = V_m /(kT), where V_m = hydrodynamic volume (= 4 (/3) R³, where R = hydrodynamic radius) and = solution viscosity.

The hydrodynamic volume can be estimated from (Noggle & Schirmer 1971):

Eq. 90

where M = molecular weight,

= solute density, and N_a = Avogadro's number.

Units and conversions

The EFG in atomic units can be converted to SI units with the conversion factor:

Nuclear quadrupole moments Q are listed as areas, where Q is the maximum expectation value of the zz traceless tensor element:

1 Barn = 100 fm² = 10^-28 m²

For example:

: ¹⁴N Q = 2.01 fm^2
17O Q = -2.558 fm²
³³S Q = -6.78 fm²
⁴⁵Sc Q = -22 fm² (Q < 0 : oblate, Q > 0 : prolate)

Quadrupole interaction energy tensor:

Fitting atomic point charges to the electrostatic potential (ESP)

Atomic multipole properties are often used to obtain the electrostatic parameters of classical forcefields. One of the most common approaches is to determine the atomic multipole properties by fitting them so as to reproduce the molecular electrostatic potential (ESP) (Singh & Kollman 1984). Numerous applications of ESP-fitted charges in simulations of biochemical systems prove the usefulness of this technique (Bakalarski et al. 1996, Bayly et al. 1993, Merz 1992, Grochowski & Lesyng in press). The ESP-derived charges can reproduce the intermolecular interaction properties of molecules well with a simple two-body additive potential.

The ESP is generated in the space of a molecule and can be calculated from the positions of the atomic nuclei and the electron density :

Eq. 91

Practical implementation of the fitting technique in DMol requires first, calculation of the ESP on a three-dimensional rectangular grid that covers the molecule. The extension of the grid r_i is determined by the atomic radii as will be explained in detail below.

The values of the atomic multipoles are obtained by minimizing the following expression for the mean square deviation between the calculated ESP and the model potential due to the atomic multipoles:

Eq. 92

where w_i are the point weights and

are the atom-centered charges and dipoles, respectively.

The current release of DMol allows for fitting only charges centered on the nuclei. The total molecular charge is conserved, using the Lagrange multiplier technique.

The grid points in Eq. 92 are selected based on the following criteria:

Eq. 93

are the internal and external radii of the atomic

and

shells and depend on the atom type.

To make the results less sensitive to the selection of the grid, the concept of a layer border was introduced. The weights in Eq. 92 change smoothly across the border layer, as evident from the following formula:

Eq. 94

are the partial weights calculated with respect to all ESP centers in the system:

and

Eq. 95

Eq. 96