Advanced Scripting

From PyMOLWiki
Revision as of 16:24, 6 March 2008 by Inchoate (talk | contribs)
Jump to navigation Jump to search

On this page, we discuss more complex scripting. Python is great, but it is much slower at mathematics than C/C++/Java/FORTRAN. For that reason, you may find it more useful to export your data to another language, operate on it there and then import the results back into PyMOL. We discuss the Python API and the general operating procedure for successfully writing your own scripts.

Advanced Scripting

Python while incredibly useful, is much slower at math than some other strictly typed languages and sometimes we have libraries built in other languages. It's faster, for complicated problems to package your data, send it to C, do some math, and pass the results back to Python than to just do everything in Python. The beauty of the Python API, is that we can do just that.

This is more advanced scripting, and requires some knowledge of the Python API, and some outside language. The example shown here is in C. The C++ extensions are very similar.


Python, PyMOL and C

Here, I will show you how to write a C-module that plugs into Python and talks nicely with PyMOL. The example actually shows how to make a generic C-function and use it in Python.

First, let's assume that we want to call a function, let's call it funName. Let's assume funName will take a Python list of lists and return a list. I will also assume we have funName.h and funName.c for C code files. (This is a more complex example to show a real-world problem. If you were just sending an integer or float instead of packaged lists, the code is simpler; if you understand unpacking the lists then you'll certainly understand unpacking a simple scalar.)

Calling the External Function

So, to start, let's look at the Python code that will call the C-function:

#
# -- in someCode.py
#
# Call funName.  Pass it a list () of lists.  (sel1 and sel2 are lists.)
# Get the return value into rValFromC.
#
rValFromC = funName( (sel1, sel2) );

where sel1 and sel2 could be any list of atom coordinates, say, from PyMOL. (See above.)

Ok, this isn't hard. Now, we need to see what the code that receives this function call in C, looks like. Well, first we need to let C know we're integrating with Python. So, in your header file of funName.h we put:

// in funName.h
#include <Python.h>

Next, by default your C-function's name is funName_funName (and that needs to be setup, I'll show how, later). So, let's define funName:

static PyObject*
funName_funName(PyObject* self, PyObject* args)
{
...more code...

This is the generic call. funName is taking two pointers to PyObjects. It also returns a PyObject. This is how you get the Python data into and out of C. It shows up in "args" array of packaged Python objects and we then unpack it into C. Upon completion we tinker with the data, package it up using the Python API, and send it back to Python/PyMOL.

Unpacking the Data

Let's unpack the data in args. Remember, args has a Python list of lists. So, to unpack that we do the following inside of funName:

static PyObject*
funName_funName(PyObject* self, PyObject* args)
{
       PyObject *listA, *listB;

       if ( ! PyArg_ParseTuple(args, "(OO)", &listA, &listB) ) {
                printf("Could not unparse objects\n");
                return NULL;
        }

        // let Python know we made two lists
        Py_INCREF(listA);
        Py_INCREF(listB);
 ... more code ...

Line 4 creates the two C objects that we will unpack the lists into. They are pointers to PyObjects. Line 6 is where the magic happens. We call, [ PyArg_ParseTuple] passing it the args we got from Python. The (OO) is Python's code for I'm expecting two Objects inside a list (). Were it three objects, then (OOO). The first object will be put into &listA and the second into &listB. The exact argument building specifications are very useful.

Reference Counting

Next, we check for success. Unpacking could fail. If it does, complain and quit. Else, listA and listB now have data in them. To avoid memory leaks we need to manually keep track of PyObjects we're tooling around with. That is, I can create PyObjects in C (being sneaky and not telling Python) and then when Python quits later on, it'll not know it was supposed to clean up after those objects (making a leak). To, we let Python know about each list with Py_INCREF(listA) and Py_INCREF(listB). This is reference counting.

Now, just for safety, let's check the lists to make sure they actually were passed something. A tricky user could have given us empty lists, looking to hose the program. So, we do:

     // handle empty selections (should probably do this in Python, it's easier)
     const int lenA = PyList_Size(listA);
     if ( lenA < 1 ) {
             printf("CEALIGN ERROR: First selection didn't have any atoms.  Please check your selection.\n");
             // let Python remove the lists
             Py_DECREF(listA);
             Py_DECREF(listB);
             return NULL;
      }

We check the list size with, PyList_Size and if it's 0 -- we quit. But, before quitting we give control of the lists back to Python so it can clean up after itself. We do that with Py_DECREF.

More Complex Unpacking

If you're dealing with simple scalars, then you might be able to skip this portion.

Now, we should have access to the data the user sent us, in listA and listB, and it should be there and be clean. But, not forgetting that listA and listB are list of 3D coordinates, let's unpack them further into sets of coordinates. Because we know the length of the lists, we can do something like the following:

       // make space for the current coords
       pcePoint coords = (pcePoint) malloc(sizeof(cePoint)*length);

       // loop through the arguments, pulling out the
       // XYZ coordinates.
       int i;
       for ( i = 0; i < length; i++ ) {
               PyObject* curCoord = PyList_GetItem(listA,i);
               Py_INCREF(curCoord);
       
               PyObject* curVal = PyList_GetItem(curCoord,0);
               Py_INCREF(curVal);
               coords[i].x = PyFloat_AsDouble(curVal);
               Py_DECREF(curVal);

               curVal = PyList_GetItem(curCoord,1);
               Py_INCREF(curVal);
               coords[i].y = PyFloat_AsDouble(curVal);
               Py_DECREF(curVal);

               curVal = PyList_GetItem(curCoord,2);
               Py_INCREF(curVal);
               coords[i].z = PyFloat_AsDouble(curVal);
               Py_DECREF(curVal);

               Py_DECREF(curCoord);
        }

 ... more code ...

Where, pcePoint is just a float[3]. Line 2 just gets some memory ready for the 3xlenght list of coordinates. Then, for each item for 1..length, we unpack the list using PyList_GetItem, into curCoord. This then gets further unpacked into the float[3], coords.

... More later ...

Sending the Results back to Python/PyMOL

Once you're done with your calculations and want to send your data back to PyMOL, you need to package it up into a Python object, using the Python API, and then return it. You should be aware of the expected return value and how you're packaging the results. If you user calls,

(results1,results2) = someCFunction(parameters1,parameters2)

then you need to package a list with two values. To build values for returning to PyMOL, use [ Py_BuildValue]. Py_BuildValue takes a string indicating the type, and then a list of values. Building values for return has been documented very well. Consider an example: if I want to package an array of integers, the type specifier for two intsPy_BuildValue is, "[i,i]", so my call could be:

# Package the two ints into a Python pair of ints.
PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );

# Don't forget to tell Python about the object.
Py_INCREF(thePair);

If you need to make a list of things to return, you iterate through a list and make a bunch of thePairs and add them to a Python list as follows:

# Make the python list
PyObject* theList = PyList_New(0);
# Tell Python about it
Py_INCREF(theList);

for ( int i = 0; i < someLim; i++ ) {
  PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );
  Py_INCREF(thePair);
  PyList_Append(theList,thePair);

To add a list of lists, just make an outer list,

PyObject* outerList = PyList_New(0);

and iteratively add to it your inner lists:

PyObject* outerList = PyList_New(0);
Py_INCREF(outerList);

for ( int i = 0; i < someLim; i++ ) {
  // make the inner list, called curList;
  curList = PyObject* curList = PyList_New(0);
  Py_INCREF(curList);

  // fill the inner list, using PyList_Append with some data, shown above
  ...

  PyList_Append(outerList,curList);

Initialization

We need to discuss how our functions will be called from Python. First, we need to create a method table.

static PyMethodDef CEMethods[] = {
        {"ccealign", ccealign_ccealign, METH_VARARGS, "Align two proteins using the CE Algorithm."},
        {NULL, NULL, 0, NULL}
};

Each module undergoes initialization. By default the modules init. function is: initNAME(). So, in our example above, initccealign()". During this initialization step, we need to call [Py_InitModule]. For or above example, we'd have

PyMODINIT_FUNC
initccealign(void)
{
    (void) Py_InitModule("ccealign", CEMethods);
}

Finally, the main function that starts the whole shebang should look something like:

int
main(int argc, char* argv[])
{
        Py_SetProgramName(argv[0]);
        Py_Initialize();
        initccealign();
        return(EXIT_SUCCESS);
}

This will be cleaned up later.

Installing Your Module

You can use python's setuptools. For example, I have cealign setup to install as simply as:

python setup.py build cealign
python setup.py install cealign

and I'm set.

I'll fill this out later.

Notes

I'll finish this soon. Tree

  • Add the calling functions in C
  • discuss installation
  • discuss the pains of debugging

Example

See the source code for cealign.


See Also

stored, iterate_state, identify.