Script Tutorial: Difference between revisions

From PyMOLWiki
Jump to navigation Jump to search
(remove duplicated content)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Introduction =
This page has been splitted into:
One of the more powerful features of PyMOL is that it supports Python scripting.  That gives you the power of using all the Python libraries, especially the [http://docs.python.org/api/api.html the Python API] to write programs in other languages and then send the results back into PyMOL (this is what [[cealign]] does). 
 
The PyMOLWiki has a rather extensive [[script_library]] full of useful scripts (feel free to add your own).
 
Here I intend to provide the necessary details to get you coding PyMOL scripts as quickly as possible.  I provide the basic information as well as code and links to the Python API.
 
= General Scripts =
Scripting in the Python language follows a simple recipe, in PyMOL.
 
'''To write them''':
#Write the function, let's call it '''doSimpleThing''', in a Python file, let's call the file '''pyProgram.py'''.
#Add the following command to the end of the '''pyProgram.py''' file <source lang="python">cmd.extend(doSimpleThing,doSimpleThing)</source>
 
'''To use them''':
# simply import the script into PyMOL: <source lang="python">run /home/userName/path/toscript/pyProgram.py</source>
# Then, just type the name of the command: ''doSimpleThing'' and pass any needed arguments.
 
That's it.  Your script can, through Python, import any modules you need and also edit modify objects in PyMOL.
 
== Getting PyMOL Data into your Script ==
To get PyMOL data into your script you will need to somehow get access to the PyMOL objects and pull out the data.  For example, if you want the atomic coordinates of a selection of alpha carbon atoms your Python function may do something like this (all PyMOL functions are referenced in the See Also section, below):
<source lang="python">
# Import PyMOL's stored module.  This will allow us with a
# way to pull out the PyMOL data and modify it in our script.
# See below.
from pymol import stored
 
def functionName( userSelection ):
    # this array will be used to hold the coordinates.  It
    # has access to PyMOL objects and, we have access to it.
    stored.alphaCabons = []
 
    # let's just get the alpha carbons, so make the
    # selection just for them
    userSelection = userSelection + " and n. CA"
 
    # iterate over state 1, or the userSelection -- this just means
    # for each item in the selection do what the next parameter says.
    # And, that is to append the (x,y,z) coordinates to the stored.alphaCarbon
    # array.
    cmd.iterate_state(1, selector.process(userSelection), "stored.alphaCarbons.append([x,y,z])")
 
    # stored.alphaCarbons now has the data you want.
 
    ... do something to your coordinates ...
</source>
 
=== Getting Data From your Script into PyMOL ===
Usually this step is easier.  To get your data into PyMOL, it's usually through modifying some object, rotating a molecule, for example.  To do that, you can use the [[alter]] or [[alter_state]] commands.  Let's say for example, that we have translated the molecular coordinates from the last example by some vector (we moved the alpha carbons).  Now, we want to make the change and see it in PyMOL.  To write the coordinates back we do:
<source lang="python">
# we need to know which PyMOL object to modify.  There could be many molecules and objects
# in the session, and we don't want to ruin them.  The following line, gets the object
# name from PyMOL
objName = cmd.identify(sel2,1)[0][0]
 
# Now, we alter each (x,y,z) array for the object, by popping out the values
# in stored.alphaCarbons.  PyMOL should now reflect the changed coordinates.
cmd.alter_state(1,objName,"(x,y,z)=stored.alphaCarbons.pop(0)")
</source>
 
== Example ==
Here's a script I wrote for [[cealign]].  It takes two selections '''of equal length''' and computes the optimal overlap, and aligns them.  See [[Kabsch]] for the original code.  Because this tutorial is for scripting and not optimal superposition, the original comments have been removed.
 
<source lang="python">
def optAlign( sel1, sel2 ):
        """
        @param sel1: First PyMol selection with N-atoms
        @param sel2: Second PyMol selection with N-atoms
        """
 
        # make the lists for holding coordinates
        # partial lists
        stored.sel1 = []
        stored.sel2 = []
        # full lists
        stored.mol1 = []
        stored.mol2 = []
 
        # -- CUT HERE
        sel1 = sel1 + " and N. CA"
        sel2 = sel2 + " and N. CA"
        # -- CUT HERE
 
        # This gets the coordinates from the PyMOL objects
        cmd.iterate_state(1, selector.process(sel1), "stored.sel1.append([x,y,z])")
        cmd.iterate_state(1, selector.process(sel2), "stored.sel2.append([x,y,z])")
 
        # ...begin math that does stuff to the coordinates...
        mol1 = cmd.identify(sel1,1)[0][0]
        mol2 = cmd.identify(sel2,1)[0][0]
        cmd.iterate_state(1, mol1, "stored.mol1.append([x,y,z])")
        cmd.iterate_state(1, mol2, "stored.mol2.append([x,y,z])")
        assert( len(stored.sel1) == len(stored.sel2))
        L = len(stored.sel1)
        assert( L > 0 )
        COM1 = numpy.sum(stored.sel1,axis=0) / float(L)
        COM2 = numpy.sum(stored.sel2,axis=0) / float(L)
        stored.sel1 = stored.sel1 - COM1
        stored.sel2 = stored.sel2 - COM2
        E0 = numpy.sum( numpy.sum(stored.sel1 * stored.sel1,axis=0),axis=0) + numpy.sum( numpy.sum(stored.sel2 * stored.sel2,axis=0)
,axis=0)
        reflect = float(str(float(numpy.linalg.det(V) * numpy.linalg.det(Wt))))
        if reflect == -1.0:
                S[-1] = -S[-1]
                V[:,-1] = -V[:,-1]
        RMSD = E0 - (2.0 * sum(S))
        RMSD = numpy.sqrt(abs(RMSD / L))
        U = numpy.dot(V, Wt)
        # ...end math that does stuff to the coordinates...
 
        # update the _array_ of coordinates; not PyMOL the coords in the PyMOL object
        stored.sel2 = numpy.dot((stored.mol2 - COM2), U) + COM1
        stored.sel2 = stored.sel2.tolist()
 
        # This updates PyMOL.  It is removing the elements in
        # stored.sel2 and putting them into the (x,y,z) coordinates
        # of mol2.
        cmd.alter_state(1,mol2,"(x,y,z)=stored.sel2.pop(0)")
 
        print "RMSD=%f" % RMSD
 
        cmd.orient(sel1 + " and " + sel2)
 
# The extend command makes this runnable as a command, from PyMOL.
cmd.extend("optAlign", optAlign)
</source>
 
= Advanced Scripts =
Let's consider a more complicated script.  Python while incredibly useful, is much slower at math than is C/C++, FORTRAN, etc.  It's faster for complicated problems to package your data, send it to C, do some math, and pass the results back to Python than to just do everything in Python.  ()  The beauty of the Python API, is that we can do just that.
 
Python is a great language, but sometimes we have libraries built in other languages, or Python's math is just too slow to be useful.  (I tested a structure alignment problem, using equivalent code, in C and Python.  The C code was about 10x slower.)  So, we can export our PyMOL data to the other language, do the math/problem, and import the changes back into PyMOL.  This is shown below using the Python API and C.  (This example code comes from [[cealign]].)
 
This is more advanced scripting, and requires some knowledge of the [http://docs.python.org/api/api.html Python API].
 
 
=== Python, PyMOL and C ===
Here, I will show you how to write a C-module that plugs into Python and talks nicely with PyMOL.  To follow this, you should have some programming experience in both Python and C.  The example actually shows how to make a generic C-function and use it in Python.
 
First, let's assume that we want to call a function, let's call it '''funName'''.  Let's assume '''funName''' will take a Python list of lists and return a list.  I will also assume we have '''funName.h''' and '''funName.c''' for C code files.  (This is a more complex example to show a real-world problem.  If you were just sending an integer or float instead of packaged lists, the code is simpler.) So, to start, let's look at the Python code that will call the C-function:
<source lang="python">
#
# -- in someCode.py
#
# Call funName.  Pass it a list () of lists.  (sel1 and sel2 are lists.)
# Get the return value into rValFromC.
#
rValFromC = funName( (sel1, sel2) );
</source>
where '''sel1''' and '''sel2''' could be any list of atom coordinates, say, from PyMOL.  (See above.)
 
Ok, this isn't hard.  Now, we need to see what the code that receives this function call in C, looks like.  Well, first we need to let C know we're integrating with Python.  So, in your [http://docs.python.org/api/includes.html header file] of '''funName.h''' we put:
<source lang="c">
// in funName.h
#include <Python.h>
</source>
 
Next, by default your C-function's name is '''funName_funName''' (and that needs to be setup, I'll show how, later).  So, let's define funName:
<source lang="c">
static PyObject*
funName_funName(PyObject* self, PyObject* args)
{
...more code...
</source>
This is the generic call.  '''funName''' is taking two pointers to PyObjects.  It also returns a PyObject.  This is how you get the Python data into and out of C.  It shows up in "args," and we then unpackage it into C.  Then we tinker with the data, package it up using the Python API, and send it back to Python/PyMOL.
 
Let's unpack the data in '''args'''.  Remember, '''args''' has a Python list of lists.  So, to unpackage that we do the following inside of funName:
<source lang="c" line="1">
static PyObject*
funName_funName(PyObject* self, PyObject* args)
{
      PyObject *listA, *listB;
 
      if ( ! PyArg_ParseTuple(args, "(OO)", &listA, &listB) ) {
                printf("Could not unparse objects\n");
                return NULL;
        }
 
        // let Python know we made two lists
        Py_INCREF(listA);
        Py_INCREF(listB);
... more code ...
</source>
Line 4 creates the two C objects that we will unpackage the lists into.  They are pointers to PyObjects.
Line 6 is where the magic happens.  We call, '''[ PyArg_ParseTuple]''' passing it the args we got from Python.  The '''(OO)''' is Python's code for ''I'm expecting two <u>O</u>bjects inside a list <u>()</u>''.  Were it three objects, then '''(OOO)'''.  The first object will be put into '''&listA''' and the second into '''&listB'''.  The exact [http://docs.python.org/api/arg-parsing.html argument building specifications] are very useful.
Next, we check for success.  Unpacking could fail.  If it does, complain and quit.  Else, '''listA''' and '''listB''' now have data in them.  To avoid memory leaks we need to manually keep track of PyObjects we're tooling around with.  That is, I can create PyObjects in C (being sneaky and not telling Python) and then when Python quits later on, it'll not know it was supposed to clean up after those objects (making a leak).  To, we let Python know about each list with '''Py_INCREF(listA)''' and '''Py_INCREF(listB)'''.  This is [http://docs.python.org/api/countingRefs.html reference counting].
 
Now, just for safety, let's check the lists to make sure they actually were passed something.  A tricky user could have given us empty lists, looking to hose the program.  So, we do:
<source lang="c">
    // handle empty selections (should probably do this in Python)
    const int lenA = PyList_Size(listA);
    if ( lenA < 1 ) {
            printf("CEALIGN ERROR: First selection didn't have any atoms.  Please check your selection.\n");
            // let Python remove the lists
            Py_DECREF(listA);
            Py_DECREF(listB);
            return NULL;
      }
 
</source>
We check the list size with, '''[http://docs.python.org/api/listObjects.html PyList_Size]''' and if it's 0 -- we quit.  But, before quitting we give control of the lists back to Python so it can clean up after itself.  We do that with '''Py_DECREF'''.
 
Now, we should have access to the data the user sent us, in '''listA''' and '''listB,''' and it should be there and be clean.  But, not forgetting that '''listA''' and '''listB''' are list of 3D coordinates, let's unpack them further into sets of coordinates.  Because we know the length of the lists, we can do something like the following:
<source lang="c" line="1">
      // make space for the current coords
      pcePoint coords = (pcePoint) malloc(sizeof(cePoint)*length);
 
      // loop through the arguments, pulling out the
      // XYZ coordinates.
      int i;
      for ( i = 0; i < length; i++ ) {
              PyObject* curCoord = PyList_GetItem(listA,i);
              Py_INCREF(curCoord);
     
              PyObject* curVal = PyList_GetItem(curCoord,0);
              Py_INCREF(curVal);
              coords[i].x = PyFloat_AsDouble(curVal);
              Py_DECREF(curVal);
 
              curVal = PyList_GetItem(curCoord,1);
              Py_INCREF(curVal);
              coords[i].y = PyFloat_AsDouble(curVal);
              Py_DECREF(curVal);
 
              curVal = PyList_GetItem(curCoord,2);
              Py_INCREF(curVal);
              coords[i].z = PyFloat_AsDouble(curVal);
              Py_DECREF(curVal);
 
              Py_DECREF(curCoord);
        }
 
... more code ...
</source>
Where, '''pcePoint''' is just a float[3].  Line 2 just gets some memory ready for the 3xlenght list of coordinates.  Then, for each item for 1..length, we unpack the list using '''[http://docs.python.org/api/listObjects.html PyList_GetItem]''', into '''curCoord'''.  This then gets further unpacked into the float[3], '''coords'''.
 
... More later ...
 
=== Getting Your Data from C back into Python/PyMOL ===
Once you're done with your calculations and want to send your data back to PyMOL, you need to package it up into a Python object, using the Python API, and then return it.  You should be aware of the expected return value and how you're packaging the results.  If you user calls,
<source lang="python">
(results1,results2) = someCFunction(parameters1,parameters2)
</source>
then you need to package a list with two values.  To build values for returning to PyMOL, use '''[ Py_BuildValue]'''.  Py_BuildValue takes a string indicating the type, and then a list of values.  [http://docs.python.org/ext/buildValue.html Building values] for return has been documented very well.  Consider an example: if I want to package an array of integers, the type specifier for two intsPy_BuildValue is, "[i,i]", so my call could be:
<source lang="c">
# Package the two ints into a Python pair of ints.
PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );
 
# Don't forget to tell Python about the object.
Py_INCREF(thePair);
</source>
 
If you need to make a list of things to return, you iterate through a list and make a bunch of '''thePairs''' and add them to a Python list as follows:
<source lang="c">
# Make the python list
PyObject* theList = PyList_New(0);
# Tell Python about it
Py_INCREF(theList);
 
for ( int i = 0; i < someLim; i++ ) {
  PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );
  Py_INCREF(thePair);
  PyList_Append(theList,thePair);
</source>
To add a list of lists, just make an outer list, <source lang="c">PyObject* outerList = PyList_New(0);</source> and iteratively add to it your inner lists:
<source lang="c">
PyObject* outerList = PyList_New(0);
Py_INCREF(outerList);
 
for ( int i = 0; i < someLim; i++ ) {
  // make the inner list, called curList;
  curList = PyObject* curList = PyList_New(0);
  Py_INCREF(curList);
 
  // fill the inner list, using PyList_Append with some data, shown above
  ...
 
  PyList_Append(outerList,curList);
</source>
 
=== Initialization ===
We need to discuss how our functions will be called from Python.  First, we need to create a [http://docs.python.org/ext/methodTable.html method table].
<source lang="c">
static PyMethodDef CEMethods[] = {
        {"ccealign", ccealign_ccealign, METH_VARARGS, "Align two proteins using the CE Algorithm."},
        {NULL, NULL, 0, NULL}
};
</source>
 
Each module undergoes initialization.  By default the modules init. function is: '''initNAME()'''.  So, in our example above, '''initccealign()".  During this initialization step, we need to call [Py_InitModule].  For or above example, we'd have
<source lang="c">
PyMODINIT_FUNC
initccealign(void)
{
    (void) Py_InitModule("ccealign", CEMethods);
}
</source>
 
Finally, the main function that starts the whole shebang should look something like:
<source lang="c">
int
main(int argc, char* argv[])
{
        Py_SetProgramName(argv[0]);
        Py_Initialize();
        initccealign();
        return(EXIT_SUCCESS);
}
</source>
 
This will be cleaned up later.
 
=== Installing Your Module ===
 
You can use python's setuptools.  For example, I have cealign setup to install as simply as:
<source lang="bash">
python setup.py build cealign
python setup.py install cealign
</source>
and I'm set.
 
==== Notes ====
I'll finish this soon.
[[User:Inchoate|Tree]]
 
* Add the calling functions in C
* discuss installation
 
== Example ==
See the source code for [[cealign]].
 


* [[Simple Scripting]]
* [[Advanced Scripting]]


== See Also ==
== See Also ==
[[stored]], [[iterate_state]], [[identify]].


[[Category:Development|Script_Tutorial]]
* [[:Category:Scripting]]
* [[:Category:Development]]

Latest revision as of 10:29, 29 May 2011