Script Tutorial

From PyMOLWiki
Jump to navigation Jump to search

Introduction

One of the more powerful features of PyMOL is that it supports Python scripting. That gives you the power of using all the Python libraries, especially the the Python API to write programs in other languages and then send the results back into PyMOL (this is what cealign does).

The PyMOLWiki has a rather extensive script_library full of useful scripts (feel free to add your own).

Here I intend to provide the necessary details to get you coding PyMOL scripts as quickly as possible. I provide the basic information as well as code and links to the Python API.

General Scripts

Scripting in the Python language follows a simple recipe, in PyMOL.

To write them:

  1. Write the function, let's call it doSimpleThing, in a Python file, let's call the file pyProgram.py.
  2. Add the following command to the end of the pyProgram.py file
    cmd.extend(doSimpleThing,doSimpleThing)
    

To use them:

  1. simply import the script into PyMOL:
    run /home/userName/path/toscript/pyProgram.py
    
  2. Then, just type the name of the command: doSimpleThing and pass any needed arguments.

That's it. Your script can, through Python, import any modules you need and also edit modify objects in PyMOL.

Getting PyMOL Data into your Script

To get PyMOL data into your script you will need to somehow get access to the PyMOL objects and pull out the data. For example, if you want the atomic coordinates of a selection of alpha carbon atoms your Python function may do something like this (all PyMOL functions are referenced in the See Also section, below):

# Import PyMOL's stored module.  This will allow us with a 
# way to pull out the PyMOL data and modify it in our script.
# See below.
from pymol import stored

def functionName( userSelection ):
    # this array will be used to hold the coordinates.  It
    # has access to PyMOL objects and, we have access to it.
    stored.alphaCabons = []

    # let's just get the alpha carbons, so make the
    # selection just for them
    userSelection = userSelection + " and n. CA"

    # iterate over state 1, or the userSelection -- this just means
    # for each item in the selection do what the next parameter says.
    # And, that is to append the (x,y,z) coordinates to the stored.alphaCarbon
    # array.
    cmd.iterate_state(1, selector.process(userSelection), "stored.alphaCarbons.append([x,y,z])")

    # stored.alphaCarbons now has the data you want.

    ... do something to your coordinates ...

Getting Data From your Script into PyMOL

Usually this step is easier. To get your data into PyMOL, it's usually through modifying some object, rotating a molecule, for example. To do that, you can use the alter or alter_state commands. Let's say for example, that we have translated the molecular coordinates from the last example by some vector (we moved the alpha carbons). Now, we want to make the change and see it in PyMOL. To write the coordinates back we do:

# we need to know which PyMOL object to modify.  There could be many molecules and objects
# in the session, and we don't want to ruin them.  The following line, gets the object
# name from PyMOL
objName = cmd.identify(sel2,1)[0][0]

# Now, we alter each (x,y,z) array for the object, by popping out the values
# in stored.alphaCarbons.  PyMOL should now reflect the changed coordinates.
cmd.alter_state(1,objName,"(x,y,z)=stored.alphaCarbons.pop(0)")

Example

Here's a script I wrote for cealign. It takes two selections of equal length and computes the optimal overlap, and aligns them. See Kabsch for the original code. Because this tutorial is for scripting and not optimal superposition, the original comments have been removed.

def optAlign( sel1, sel2 ):
        """
        @param sel1: First PyMol selection with N-atoms
        @param sel2: Second PyMol selection with N-atoms
        """

        # make the lists for holding coordinates
        # partial lists
        stored.sel1 = []
        stored.sel2 = []
        # full lists
        stored.mol1 = []
        stored.mol2 = []

        # -- CUT HERE
        sel1 = sel1 + " and N. CA"
        sel2 = sel2 + " and N. CA"
        # -- CUT HERE

        # This gets the coordinates from the PyMOL objects
        cmd.iterate_state(1, selector.process(sel1), "stored.sel1.append([x,y,z])")
        cmd.iterate_state(1, selector.process(sel2), "stored.sel2.append([x,y,z])")

        # ...begin math that does stuff to the coordinates...
        mol1 = cmd.identify(sel1,1)[0][0]
        mol2 = cmd.identify(sel2,1)[0][0]
        cmd.iterate_state(1, mol1, "stored.mol1.append([x,y,z])")
        cmd.iterate_state(1, mol2, "stored.mol2.append([x,y,z])")
        assert( len(stored.sel1) == len(stored.sel2))
        L = len(stored.sel1)
        assert( L > 0 )
        COM1 = numpy.sum(stored.sel1,axis=0) / float(L)
        COM2 = numpy.sum(stored.sel2,axis=0) / float(L)
        stored.sel1 = stored.sel1 - COM1
        stored.sel2 = stored.sel2 - COM2
        E0 = numpy.sum( numpy.sum(stored.sel1 * stored.sel1,axis=0),axis=0) + numpy.sum( numpy.sum(stored.sel2 * stored.sel2,axis=0)
,axis=0)
        reflect = float(str(float(numpy.linalg.det(V) * numpy.linalg.det(Wt))))
        if reflect == -1.0:
                S[-1] = -S[-1]
                V[:,-1] = -V[:,-1]
        RMSD = E0 - (2.0 * sum(S))
        RMSD = numpy.sqrt(abs(RMSD / L))
        U = numpy.dot(V, Wt)
        # ...end math that does stuff to the coordinates...

        # update the _array_ of coordinates; not PyMOL the coords in the PyMOL object
        stored.sel2 = numpy.dot((stored.mol2 - COM2), U) + COM1
        stored.sel2 = stored.sel2.tolist()

        # This updates PyMOL.  It is removing the elements in 
        # stored.sel2 and putting them into the (x,y,z) coordinates
        # of mol2.
        cmd.alter_state(1,mol2,"(x,y,z)=stored.sel2.pop(0)")

        print "RMSD=%f" % RMSD

        cmd.orient(sel1 + " and " + sel2)

# The extend command makes this runnable as a command, from PyMOL.
cmd.extend("optAlign", optAlign)

Advanced Scripts

Let's consider a more complicated script. Python while incredibly useful, is much slower at math than is C/C++, FORTRAN, etc. It's faster for complicated problems to package your data, send it to C, do some math, and pass the results back to Python than to just do everything in Python. () The beauty of the Python API, is that we can do just that.

Python is a great language, but sometimes we have libraries built in other languages, or Python's math is just too slow to be useful. (I tested a structure alignment problem, using equivalent code, in C and Python. The C code was about 10x slower.) So, we can export our PyMOL data to the other language, do the math/problem, and import the changes back into PyMOL. This is shown below using the Python API and C. (This example code comes from cealign.)

This is more advanced scripting, and requires some knowledge of the Python API.


Python, PyMOL and C

Here, I will show you how to write a C-module that plugs into Python and talks nicely with PyMOL. To follow this, you should have some programming experience in both Python and C. The example actually shows how to make a generic C-function and use it in Python.

First, let's assume that we want to call a function, let's call it funName. Let's assume funName will take a Python list of lists and return a list. I will also assume we have funName.h and funName.c for C code files. (This is a more complex example to show a real-world problem. If you were just sending an integer or float instead of packaged lists, the code is simpler.) So, to start, let's look at the Python code that will call the C-function:

#
# -- in someCode.py
#
# Call funName.  Pass it a list () of lists.  (sel1 and sel2 are lists.)
# Get the return value into rValFromC.
#
rValFromC = funName( (sel1, sel2) );

where sel1 and sel2 could be any list of atom coordinates, say, from PyMOL. (See above.)

Ok, this isn't hard. Now, we need to see what the code that receives this function call in C, looks like. Well, first we need to let C know we're integrating with Python. So, in your header file of funName.h we put:

// in funName.h
#include <Python.h>

Next, by default your C-function's name is funName_funName (and that needs to be setup, I'll show how, later). So, let's define funName:

static PyObject*
funName_funName(PyObject* self, PyObject* args)
{
...more code...

This is the generic call. funName is taking two pointers to PyObjects. It also returns a PyObject. This is how you get the Python data into and out of C. It shows up in "args," and we then unpackage it into C. Then we tinker with the data, package it up using the Python API, and send it back to Python/PyMOL.

Let's unpack the data in args. Remember, args has a Python list of lists. So, to unpackage that we do the following inside of funName:

 1static PyObject*
 2funName_funName(PyObject* self, PyObject* args)
 3{
 4       PyObject *listA, *listB;
 5
 6       if ( ! PyArg_ParseTuple(args, "(OO)", &listA, &listB) ) {
 7                printf("Could not unparse objects\n");
 8                return NULL;
 9        }
10
11        // let Python know we made two lists
12        Py_INCREF(listA);
13        Py_INCREF(listB);
14 ... more code ...

Line 4 creates the two C objects that we will unpackage the lists into. They are pointers to PyObjects. Line 6 is where the magic happens. We call, [ PyArg_ParseTuple] passing it the args we got from Python. The (OO) is Python's code for I'm expecting two Objects inside a list (). Were it three objects, then (OOO). The first object will be put into &listA and the second into &listB. The exact argument building specifications are very useful. Next, we check for success. Unpacking could fail. If it does, complain and quit. Else, listA and listB now have data in them. To avoid memory leaks we need to manually keep track of PyObjects we're tooling around with. That is, I can create PyObjects in C (being sneaky and not telling Python) and then when Python quits later on, it'll not know it was supposed to clean up after those objects (making a leak). To, we let Python know about each list with Py_INCREF(listA) and Py_INCREF(listB). This is reference counting.

Now, just for safety, let's check the lists to make sure they actually were passed something. A tricky user could have given us empty lists, looking to hose the program. So, we do:

     // handle empty selections (should probably do this in Python)
     const int lenA = PyList_Size(listA);
     if ( lenA < 1 ) {
             printf("CEALIGN ERROR: First selection didn't have any atoms.  Please check your selection.\n");
             // let Python remove the lists
             Py_DECREF(listA);
             Py_DECREF(listB);
             return NULL;
      }

We check the list size with, PyList_Size and if it's 0 -- we quit. But, before quitting we give control of the lists back to Python so it can clean up after itself. We do that with Py_DECREF.

Now, we should have access to the data the user sent us, in listA and listB, and it should be there and be clean. But, not forgetting that listA and listB are list of 3D coordinates, let's unpack them further into sets of coordinates. Because we know the length of the lists, we can do something like the following:

 1       // make space for the current coords
 2       pcePoint coords = (pcePoint) malloc(sizeof(cePoint)*length);
 3
 4       // loop through the arguments, pulling out the
 5       // XYZ coordinates.
 6       int i;
 7       for ( i = 0; i < length; i++ ) {
 8               PyObject* curCoord = PyList_GetItem(listA,i);
 9               Py_INCREF(curCoord);
10       
11               PyObject* curVal = PyList_GetItem(curCoord,0);
12               Py_INCREF(curVal);
13               coords[i].x = PyFloat_AsDouble(curVal);
14               Py_DECREF(curVal);
15
16               curVal = PyList_GetItem(curCoord,1);
17               Py_INCREF(curVal);
18               coords[i].y = PyFloat_AsDouble(curVal);
19               Py_DECREF(curVal);
20
21               curVal = PyList_GetItem(curCoord,2);
22               Py_INCREF(curVal);
23               coords[i].z = PyFloat_AsDouble(curVal);
24               Py_DECREF(curVal);
25
26               Py_DECREF(curCoord);
27        }
28
29 ... more code ...

Where, pcePoint is just a float[3]. Line 2 just gets some memory ready for the 3xlenght list of coordinates. Then, for each item for 1..length, we unpack the list using PyList_GetItem, into curCoord. This then gets further unpacked into the float[3], coords.

... More later ...

Getting Your Data from C back into Python/PyMOL

Once you're done with your calculations and want to send your data back to PyMOL, you need to package it up into a Python object, using the Python API, and then return it. You should be aware of the expected return value and how you're packaging the results. If you user calls,

(results1,results2) = someCFunction(parameters1,parameters2)

then you need to package a list with two values. To build values for returning to PyMOL, use [ Py_BuildValue]. Py_BuildValue takes a string indicating the type, and then a list of values. Building values for return has been documented very well. Consider an example: if I want to package an array of integers, the type specifier for two intsPy_BuildValue is, "[i,i]", so my call could be:

# Package the two ints into a Python pair of ints.
PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );

# Don't forget to tell Python about the object.
Py_INCREF(thePair);

If you need to make a list of things to return, you iterate through a list and make a bunch of thePairs and add them to a Python list as follows:

# Make the python list
PyObject* theList = PyList_New(0);
# Tell Python about it
Py_INCREF(theList);

for ( int i = 0; i < someLim; i++ ) {
  PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );
  Py_INCREF(thePair);
  PyList_Append(theList,thePair);

To add a list of lists, just make an outer list,

PyObject* outerList = PyList_New(0);

and iteratively add to it your inner lists:

PyObject* outerList = PyList_New(0);
Py_INCREF(outerList);

for ( int i = 0; i < someLim; i++ ) {
  // make the inner list, called curList;
  curList = PyObject* curList = PyList_New(0);
  Py_INCREF(curList);

  // fill the inner list, using PyList_Append with some data, shown above
  ...

  PyList_Append(outerList,curList);

Initialization

We need to discuss how our functions will be called from Python. First, we need to create a method table.

static PyMethodDef CEMethods[] = {
        {"ccealign", ccealign_ccealign, METH_VARARGS, "Align two proteins using the CE Algorithm."},
        {NULL, NULL, 0, NULL}
};

Each module undergoes initialization. By default the modules init. function is: initNAME(). So, in our example above, initccealign()". During this initialization step, we need to call [Py_InitModule]. For or above example, we'd have

PyMODINIT_FUNC
initccealign(void)
{
    (void) Py_InitModule("ccealign", CEMethods);
}

Finally, the main function that starts the whole shebang should look something like:

int
main(int argc, char* argv[])
{
        Py_SetProgramName(argv[0]);
        Py_Initialize();
        initccealign();
        return(EXIT_SUCCESS);
}

This will be cleaned up later.

Installing Your Module

You can use python's setuptools. For example, I have cealign setup to install as simply as:

python setup.py build cealign
python setup.py install cealign

and I'm set.

Notes

I'll finish this soon. Tree

  • Add the calling functions in C
  • discuss installation

Example

See the source code for cealign.


See Also

stored, iterate_state, identify.