Kabsch
Intro
The Kabsch algorithm uses linear and vector algebra to find the optimal rotation and translation of two sets of points in N-dimensional space as to minimize the RMSD between them. The following program is a Python implementation of the Kabsch algorithm.
This program when called will align the two selections, optimally, convert the proteins in the selection to ribbons and change the color of the selections to show the matched alignments.
WHAT THIS DOESN'T DO: This program does NOT provide a pairwise alignment of two structures from scratch. You have to tell it what the equivlanced items are. This is NOT DALI. (Yet.)
NOTE: This will undergo more edits as I continue this project.
NOTE: This has NOT been tested on any other machine than mine, by me. It works on all PyMols 0.97 and newer (haven't tested earlier versions) and I use Python 2.3's Numeric Version 23.3.
To use
- Save this script to Kabsch.py
- Open PyMol
- Load the alignment script: run Kabsch.py (The command optAlign is now defined in PyMol.)
- Load your proteins
- Align the proper segments (see below examples)
To align two equivalent sets of residues do:
optAglign SEL1 and n. CA and i. a-b, SEL2 and n. CA and c-d
where
- SEL1 is the first protein
- a-b is the range of residues to align in the first protein
- SEL2 is the second protein
- c-d is the range of residues to align in the second protein
Caveats
- Ensure that you're equivalencing N atoms to N atoms (run Count_Atoms over your two selections to ensure they are the same length)
- Sometimes PyMol doesn't seem to superimpose them right the first time. Hit the up-arrow and rerun the program if this happens. It always superimposes the correctly the second time. I think it has something to do with the orientation.
- The RMSD is only between the equivlanced atoms. Use PyMol's Rms_Cur if you want a full RMSD.
Examples
optAlign 1cll and n. CA and i. 4-20+30-60, 1ggz and n. CA and i. 4-20+30-60
optAlign 1kao and n. CA and i. 20-50, 1ctq and n. CA and i. 20-50
You already need a sequence alignment to effectively run this program.
The Code
#!python
##############################################################################
#
# @SUMMARY: -- Kabsch.py. A python implementation of the optimal superposition
# of two sets of vectors as proposed by Kabsch 1976 & 1978.
#
# @AUTHOR: Jason Vertrees
# @COPYRIGHT: Jason Vertrees (c), 2005.
# DATE : 2005-04-07
# REV : 1
#
#############################################################################
# math
import math
import Numeric
import LinearAlgebra
import Matrix
from array import *
import random
random.seed()
# system stuff
import os
import copy
# pretty printing
import pprint
import string
# for importing as a plugin into PyMol
#import tkSimpleDialog
#import tkMessageBox
from pymol import cmd
from pymol import stored
from pymol import selector
class kabsch:
"""
Kabsch alignment of two set of vectors to produce and optimal alignemnt.
Steps
=====
1. Calculate the center of mass for each protein. Then, move the protein
to its center of mass. We choose as a convention, to use the origin as
the center of mass of the first set of coordinates. This will allow us to
return one translation vector, instead of two.
Update: Since any rotation around a point that's not the origin, is in fact
an affine rotation. So, to beat that, we translate both to the center.
NAME: superpose(c1, c2)
POST: T is now defined.
2. Calculate the matrix, R, by (eq7, 1976). r_{i,j} = sum_n w_n * y_{ni} * x_{nj},
where y_{ni} is the ith component of the vector y_n.
NAME: calcR
POST: R is now defined
3. Calculate RtR (R-transpose * R).
NAME: calcRtR
POST: RtR is now defined
4. Calculate the corresponding eigenpairs for RtR, and sort them accordingly:
m1 >= m2 >= m3; set v3 = v1 x v2 to ensure a RHS
NAME: calcEigenPairs
POST: The eigen pairs are calculated, sorted such that m1 >= m2 >= m3 and
v3 = v1 x v2.
5. Calculate R*v_k and normalize the first two vectors to obtain b_1, b_2 and set
b_3 = b_1 x b_2. This also takes care of m1 > m2 = 0.
NAME: calcBVectors
POST: b-Vectors are defined and returned
6. Calculate U=(u_{ij})=(sum n b_{ki} * a_{kj}) to obtain the best rotation. Set
sigma_3 = -1 if b3(Ra3) < 0 else sigma_3 = +1.
NAME: calcU
POST: U is defined
7. Calculate the RMSD. The residual error is then
The E = E0 - sqrt(m1) - sqrt(m2) - sigma_3(sqrt(m3)).
NAME: calcRMSD
POST: RMSD is computed.
@note: This should be a static method that takes three parameters
1. The first protein's coordinates, f. This program will
accept coordinates in the form an array of 3D vectors/lists
2. The second protein's coordinates, g.
3. The array of integer pairs representing the pairs to align.
Coordinates should be formatted as as array of 2D vectors/lists.
"""
def __init__(self):
"""
Constructor. Doesn't do shit. @see kabsch.align.
"""
#
# Instance Variables: All three of these will be updated
# every time the user calls ~.align. Just to warn ya'.
#
# U, the rotation matrix
self.U = []
# T, the translation vector
self.T = []
# R, the RMSD
self.R = -1.0
#self.menuBar.addmenuitem('Plugin', 'command', 'Kabsch Align', label = "Align Selections to Optial RMSD", command = lamda s=self: fetchPDBDialog(s))
def align(self, c1, c2, pairs):
"""
Finds the best alignment of c1 and c2's pairs resulting in
the smallest possible RMSD.
@note:
- All weights in this first version are set to 1. Kabsch allows,
differential weighting. In the future, I may extend to this option,
and then pairs may become 3D (r1, r2, weight) or I may add another
parameter.
- Helper functions will soon be provided such that the user may
just use this package alone to compute the rotation.
@param c1: coordinats of the first vectors, as an array of 3D vectors.
@type c1: Python list
@param c2: coordinates of the second set of vectors, as an array of 3D vectors.
@type c2: Python list
@param pairs: the list of pairs as an array of 2D pairs.
@type pairs: Python list of 2D lists.
@return: U, the rotation matrix that gives the optimal rotation between the two proteins.
T, the translation vector given to align the two objects centers of mass.
R, the RMSD of the final rotation.
"""
#
# First we move the center of mass of one protein, to the
# center of mass of the other. This removes any translation
# between the two.
#
T1, T2, c1, c2 = self.superpose(c1, c2)
# Calculate the initial RMSD
E0 = self.calcE0(c1, c2)
# Calculate R via eq. 7.
R = self.calcR(c1, c2)
# Calculate R(transpose)*R
RtR = self.calcRtR(R)
# Determined the eigenpairs for the matrix RtR.
eValues, eVectors = self.calcEigenPairs(RtR)
# Determine the bVectors as required
bVectors = self.calcBVectors(R, eVectors)
# Calculate the roation matrix
U = self.calcU(eVectors, bVectors)
# Calculate the final RMSD using U.
RMSD = self.calcRMSD(E0, eValues, eVectors, bVectors, R)
return U, T1, T2, RMSD, c1, c2
def superpose(self, c1, c2 ):
"""
Calculate the center of mass for each protein. Then, move the protein
to its center of mass. We choose as a convention, to use the origin as
the center of mass of the first set of coordinates. This will allow us to
return one translation vector, instead of two.
(CORRECT)
@precondition: c1 and c2 are well defined lists of N-dimensional points with length > 0.
@postcondition: T is now defined.
@param c1: coordinats of the first vectors, as an array of 3D vectors.
@type c1: Python list
@param c2: coordinates of the second set of vectors, as an array of 3D vectors.
@type c2: Python list
@return: T the translation vector.
c2 one list of coordinates that of c2 translated to the COM of c1.
"""
# make sure we don't get bad data
assert(len(c1) == len(c2) != 0)
L = len(c1)
#
# Centers of Mass
#
c1COM = Numeric.zeros((3,1), Numeric.Float64)
c2COM = Numeric.zeros((3,1), Numeric.Float64)
# calculate the CsOM
for i in range(0, L):
for j in range(0,3):
c1COM[j] = c1COM[j] + c1[i][j]
c2COM[j] = c2COM[j] + c2[i][j]
T1 = - c1COM / L
T2 = - c2COM / L
# move everything back to the origin.
for i in range(0, L):
for j in range(0,3):
c1[i][j] = c1[i][j] + T1[j]
c2[i][j] = c2[i][j] + T2[j]
return T1, T2, c1, c2
def calcR( self, c1, c2 ):
"""
Calculate the matrix, R, by (eq7, 1976). M{r_{i,j} = sum_n w_n * y_{ni} * x_{nj}},
where M{y_{ni}} is the ith component of the vector M{y_n}.
(CORRECT)
@param c1: coordinats of the first vectors, as an array of 3D vectors.
@type c1: Python list
@param c2: coordinates of the second set of vectors, as an array of 3D vectors.
@type c2: Python list
@postcondition: R is now defined.
@return: R
@rtype : 3x3 matrix
"""
# Create the 3x3 matrix
R = Numeric.zeros((3,3), Numeric.Float64)
L = len(c1)
for k in range(0, L):
for i in range(0, 3):
for j in range(0, 3):
R[i][j] = R[i][j] + (c2[k][i] * c1[k][j])
# return R the 3x3 PSD Matrix.
return R
def calcRtR( self, R ):
"""
Calculate RtR (R-transpose * R).
(CORRECT)
@param R: Matrix
@type R: 3x3 Matrix
@precondition: R is a the well formed matrix as per Kabsch.
@postcondition: RtR is now defined
@return: M{R^tR}
@rtype : 3x3 matrix
"""
RtR = Numeric.matrixmultiply(Numeric.transpose(R), R)
return RtR
def calcEigenPairs( self, RtR ):
"""
Calculate the corresponding eigenpairs for RtR, and sort them accordingly:
M{m1 >= m2 >= m3}; set M{v3 = v1 x v2} to ensure a RHS
(CORRECT?)
@postcondition: The eigen pairs are calculated, sorted such that M{m1 >= m2 >= m3} and
M{v3 = v1 x v2}.
@param RtR: 3x3 Matrix of M{R^t * R}.
@type RtR: 3x3 Matrix
@return : Eigenpairs for the RtR matrix.
@rtype : List of stuff
"""
eVal, eVec = LinearAlgebra.eigenvectors(RtR)
# This is cool. We sort it using Numeric.sort(eVal)
# then we reverse it using nifty-crazy ass notation [::-1].
eVal2 = Numeric.sort(eVal)[::-1]
eVec2 = [[],[],[]] #Numeric.zeros((3,3), Numeric.Float64)
# Map the vectors to their appropriate owners
if ( eVal2[0] == eVal[0]):
eVec2[0] = eVec[0]
if ( eVal2[1] == eVal[1] ):
eVec2[1] = eVec[1]
eVec2[2] = eVec[2]
else:
eVec2[1] = eVec[2]
eVec2[2] = eVec[1]
elif( eVal2[0] == eVal[1]):
eVec2[0] = eVec[1]
if ( eVal2[1] == eVal[0] ):
eVec2[1] = eVec[0]
eVec2[2] = eVec[2]
else:
eVec2[1] = eVec[2]
eVec2[2] = eVec[0]
elif( eVal2[0] == eVal[2]):
eVec2[0] = eVec[2]
if ( eVal2[1] == eVal[1] ):
eVec2[1] = eVec[1]
eVec2[2] = eVec[0]
else:
eVec2[1] = eVec[0]
eVec2[2] = eVec[1]
# print "v3 = v1 x v2"
#print eVec2[2]
# eVec2[2] = a1 CROSS a2
eVec2[2][0] = eVec2[0][1]*eVec2[1][2] - eVec2[0][2]*eVec2[1][1]
eVec2[2][1] = eVec2[0][2]*eVec2[1][0] - eVec2[0][0]*eVec2[1][2]
eVec2[2][2] = eVec2[0][0]*eVec2[1][1] - eVec2[0][1]*eVec2[1][0]
# print eVec2[2]
return [eVal2, eVec2]
def calcBVectors( self, R, eVectors ):
"""
Calculate M{R*a_k} and normalize the first two vectors to obtain M{b_1, b_2} and set
M{b_3 = b_1 x b_2}. This also takes care of {m2 > m3 = 0}.
(CORRECT)
@postcondition: b-Vectors are defined and returned
@return: The three B-vectors
@rtype: List of 3D vectors (Python LOL).
"""
bVectors = Numeric.zeros((3,3), Numeric.Float64)
for i in range(0,3):
bVectors[i] = Numeric.matrixmultiply(R, eVectors[i])
# what if the bVector[0] == 0?
bVectors[0] = bVectors[0] / Numeric.sqrt(Numeric.add.reduce(bVectors[0]**2))
bVectors[1] = bVectors[1] / Numeric.sqrt(Numeric.add.reduce(bVectors[1]**2))
bVectors[2] = bVectors[2] / Numeric.sqrt(Numeric.add.reduce(bVectors[2]**2))
# b2 = b0 cross b1
# print "b3 = b1 x b2"
# print bVectors[2]
bVectors[2][0] = bVectors[0][1]*bVectors[1][2] - bVectors[0][2]*bVectors[1][1]
bVectors[2][1] = bVectors[0][2]*bVectors[1][0] - bVectors[0][0]*bVectors[1][2]
bVectors[2][2] = bVectors[0][0]*bVectors[1][1] - bVectors[0][1]*bVectors[1][0]
#bVectors[2] = bVectors[0] * bVectors[1]
# print bVectors[2]
return bVectors
def calcU( self, eVectors, bVectors ):
"""
Calculate M{U=(u_{ij})=(sum n b_{ki} * a_{kj})} to obtain the best rotation. Set
M{sigma_3 = -1 if b3(Ra3) < 0 else sigma_3 = +1}.
(CORRECT)
@postcondition: U is defined
@param eVectors: Eigenvectors for the system.
@type eVectors: Eigenvectors
@param bVectors: BVectors as described by Kabsch.
@type bVectors: BVectors
@return: U the rotation matrix.
@rtype :3x3 matrix.
"""
U = Numeric.zeros((3,3), Numeric.Float64)
for k in range(0,3):
for i in range(0,3):
for j in range(0,3):
U[i][j] = U[i][j] + Numeric.matrixmultiply(bVectors[k][i], eVectors[k][j])
return U
def calcE0( self, c1, c2 ):
"""
Calculates the initial RMSD, which Kacbsch called E0.
(CORRECT)
@param c1: coordinats of the first vectors, as an array of 3D vectors.
@type c1: Python list
@param c2: coordinates of the second set of vectors, as an array of 3D vectors.
@type c2: Python list
@return: E0 the initial RMSD.
@rtype : float.
"""
E0 = 0.0
L = len(c1)
for i in range( 0, L ):
for j in range(0, 3):
E0 = E0 + 0.5*( (c1[i][j]*c1[i][j])+(c2[i][j]*c2[i][j]))
return E0
def calcRMSD( self, E0, eValues, eVectors, bVectors, R):
"""
Calculate the RMSD. The residual error is then
The M{E = E0 - sqrt(m1) - sqrt(m2) - sigma_3(sqrt(m3))}.
@param E0: Initial RMSD as calculated in L{calcE0}.
@type E0: float.
@param eVectors: Eigenvectors as calculated from L{calcEigenPairs}
@type eVectors: vectors, dammit!
@param bVectors: B-vectors as calc. from L{calcBVectors}
@type bVectors: More vectors.
@param R: The matrix R, from L{calcR}.
@type R: 3x3 matrix.
@postcondition: RMSD is computed.
@return: The RMSD.
"""
simga3 = 0
if ( Numeric.matrixmultiply(bVectors[2], Numeric.matrixmultiply( R, eVectors[2])) < 0):
sigma3 = -1
else:
sigma3 = 1
E = E0 - math.sqrt(eValues[0]) - math.sqrt(eValues[1]) - simga3*(math.sqrt(eValues[2]))
return E
def calcSimpleRMSD( self, c1, c2 ):
"""
Calculates the usual concept of RMSD between two set of points. The CalcRMSD above
sticks to Kabsch's alignment method protocol and calculates something much different.
@see kabsch.calcRMSD
@param c1: List of points #1
@type c1: LOL
@param c2: List of points #2
@type c2: LOL
@return: RMSD between the two
"""
RMSD = 0.0
for i in range(0, len(c1)):
for j in range(0,3):
RMSD = RMSD + (c2[i][j]-c1[i][j])**2
RMSD = RMSD / len(c1)
RMSD = Numeric.sqrt(RMSD)
return RMSD
#
#
# Thus, a typical user session with Kabsch will be:
# 1. Read in the PDBs
# c1 = parsePDB(fileName)
# c2 = parsePDB(fileName2)
# pairs = getPairs(c1,c2)
# kasbch = new kabsch()
# U, T, R = kabsch.align( c1, c2, pairs )
#
# The user will be handed back a rotation matrix U,
# and translation vector T.
#
#
#####################################################################
#
# UTIL Functions
def rotatePoints(self, U, c2):
"""
Rotate all points in c2 based on the rotation matrix U.
@param U: 3x3 Rotation matrix
@type U: 3x3 matrix
@param c2: List of points to rotate
@type c2: List of 3D vectors
@return: List of rotated points
"""
L = len(c2)
for n in range(0,L):
c2[n][0] = c2[n][0] * U[0][0] + c2[n][1] * U[1][0] + c2[n][2] * U[2][0]
c2[n][1] = c2[n][0] * U[0][1] + c2[n][1] * U[1][1] + c2[n][2] * U[2][1]
c2[n][2] = c2[n][0] * U[0][2] + c2[n][1] * U[1][2] + c2[n][2] * U[2][2]
return c2
def writeU( self, U, fileName ):
"""
Convenience function. Writes U to disk.
"""
if ( len(fileName) == 0 ):
fileName = "./U"
outFile = open( fileName, "wb")
for i in range(0,3):
for j in range(0,3):
outFile.write( string.ljust(str(U[i][j]),20) )
outFile.write("\n")
outFile.close()
def optAlign( sel1, sel2 ):
"""
optAlign accurately aligns two selections to the lowest possible RMSD.
@note: The selections, sel1 and sel2 must be the same length (for now).
An example selection is: optAlign 1GGZ and i. 4-146 and n. CA, 1CLL and i. 4-146 and n. CA
Coming soon: Monte Carlo simulation to choose the best pairs to align across the molecules.
So, I'll soon be able to say, "optAlign 1GGZ, 1CLL". :)
@param sel1: First PyMol selection with N-atoms
@param sel2: Second PyMol selection with N-atoms
"""
cmd.reset()
# make the lists for holding coordinates
# partial lists
stored.sel1 = []
stored.sel2 = []
# full lists
stored.mol1 = []
stored.mol2 = []
# now put the coordinates into a list
# partials
cmd.iterate_state(1, selector.process(sel1), "stored.sel1.append([x,y,z])")
cmd.iterate_state(1, selector.process(sel2), "stored.sel2.append([x,y,z])")
# full molecule
mol1 = cmd.identify(sel1,1)[0][0]
mol2 = cmd.identify(sel2,1)[0][0]
cmd.iterate_state(1, mol1, "stored.mol1.append([x,y,z])")
cmd.iterate_state(1, mol2, "stored.mol2.append([x,y,z])")
K = kabsch()
U, T1, T2, RMSD, c1, c2 = K.align(stored.sel1, stored.sel2, [])
stored.mol2 = map(lambda \
v:[T2[0]+((v[0]*U[0][0])+(v[1]*U[1][0])+(v[2]*U[2][0])), \
T2[1]+((v[0]*U[0][1])+(v[1]*U[1][1])+(v[2]*U[2][1])), \
T2[2]+((v[0]*U[0][2])+(v[1]*U[1][2])+(v[2]*U[2][2]))], \
stored.mol2)
stored.mol1 = map(lambda v:[ v[0]+T1[0], v[1]+T1[1], v[2]+T1[2] ], stored.mol1)
cmd.alter_state(1,mol1,"(x,y,z)=stored.mol1.pop(0)")
cmd.alter_state(1,mol2,"(x,y,z)=stored.mol2.pop(0)")
print "RMSD=%f" % RMSD
cmd.hide('everything')
cmd.show('ribbon', sel1 + ' or ' + sel2)
cmd.color('magenta', mol1 )
cmd.color('yellow', mol2 )
cmd.color('red', 'visible')
cmd.show('ribbon', 'not visible')
cmd.center('visible')
cmd.orient()
cmd.zoom('visible')
cmd.extend("optAlign", optAlign)
References
[Kabsch, 1976] Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. Acta. Crystal, 32A:922-923.
[Kabsch, 1978] Kabsch, W. (1978). A discussion of the solution for the best rotation to related two sets of vectors. Acta. Crystal, 34A:827-828.