2008

Getting RESTful with web.py

Django may be the Python web framework getting all the press recently, but web.py is definitely a nice, simple framework. One of the nice aspects of web.py is that it exposes methods for the basic HTTP methods (GET, POST, PUT, DELETE, etc.) and uses these methods to process each request from the client. This approach makes it amazingly easy to write a RESTful API.

web.py

import web
class Resource(object):
    def GET(self, name):
        # return the resource
    def POST(self, name):
        # update/create the resource

This approach is very similar to what Google App Engine does with its webapp.

Google App Engine

From the App Engine docs:

from google.appengine.ext import webapp
class Resource(webapp.RequestHandler):
    def get(self):
        # return the resource
    def post(self):
        # update/create the resource

You can still make a nice REST app with Django, but in each view you have to check the request type in the HttpRequest object. (If you are interested in creating a Django REST API, check out django-rest-interface. UPDATE: Simon Willison also has a nice way to get the web.py style dispatching in Django with RestView.)

Django

def resource(request):
    if request.method == 'GET':
        # return the resource
    elif request.method == 'POST':
        # update/create the resource

RESTify web.py

So web.py has a nice way of dealing with the HTTP methods, let's take a look at an example. I created a simple REST-based key-value pair database, called docstore.py. Docstore will store whatever you send it to it with the key you specify. It has a few implementations, one using an in-memory dictionary, one using a file approach, and another using Python's shelve module. For this REST example, let's just use the dictionary storage engine (just a warning, if you use the dictionary approach in a CGI environment, you will lose state after each request).

REST is a great representation of what we will want to do with the docstore.py application. When we want to obtain a copy of the resource from the server, the client (like a browser) issues a HTTP GET request with the key as the resource name. To publish a document, we can use the HTTP PUT method when storing the document and get a UUID back as its key or we can use the HTTP POST method to store the document with a predefined key. In both cases the response back will contain the key that the value is stored at. If we no longer want the document on the server, we use a HTTP DELETE method.

First, we will set up the basics: the imports, urls mappings, and run statement:

import web
import re
import uuid

urls = ('/memory/(.*)', 'MemoryDB')
if __name__ == "__main__":
    web.run(urls, globals())

Additionally, we will specify what a valid key is. We will use a regular expression and a decorator, which help prevent against directory traversal attacks when using the filesystem implementation.

VALID_KEY = re.compile('[a-zA-Z0-9_-]{1,255}')
def is_valid_key(key):
    """Checks to see if the parameter follows the allow pattern of
    keys.
    """
    if VALID_KEY.match(key) is not None:
        return True
    return False

def validate_key(fn):
    """Decorator for HTTP methods that validates if resource
    name is a valid database key. Used to protect against
    directory traversal.
    """
    def new(*args):
        if not is_valid_key(args[1]):
            web.badrequest()
        return fn(*args)
    return new

Now we define an abstract class for the database, creating a common interface for the three implementations of the data store. This abstract class is where the REST goodness is.

We use four of the HTTP methods, GET, POST, PUT, and DELETE. The GET method, when no key is specified will print a list of all the keys in the database. We decorator the methods that use the key to ensure that the key is safe. The PUT method generates a UUID and delegates to the POST method using that UUID as the key. In the POST method, we obtain the contents of the HTTP request using "web.data()".

class AbstractDB(object):
    """Abstract database that handles the high-level HTTP primitives.
    """
    def GET(self, name):
        if len(name) <= 0:
            print '<html><body><b>Keys:</b><br />'
            for key in self.keys():
                print ''.join(['<a href="',str(key),'">',str(key),'</a><br />'])
            print '</body></html>'
        else:
            self.get_resource(name)

    @validate_key
    def POST(self, name):
        data = web.data()
        self.put_key(str(name), data)
        print str(name)

    @validate_key
    def DELETE(self, name):
        self.delete_key(str(name))

    def PUT(self, name=None):
        """Creates a new document with the request's data and
        generates a unique key for that document.
        """
        key = str(uuid.uuid4())
        self.POST(key)

    @validate_key
    def get_resource(self, name):
        result = self.get_key(str(name))
        if result is not None:
            print result

Finally, we create an implementation of the AbstractDB, MemoryDB, that stores all the key-value pairs in a Python dictionary that is shared among instances the MemoryDB (but will be lost when run in a CGI mode). If a key is requested and that key does not exist in the dictionary, we return a 404 Not Found error, using "web.notfound()". web.py defines a few common HTTP errors in webapi.py, including:

  • web.badrequest() : 400 Bad Request error
  • web.notfound() : 404 Not Found error
  • web.gone() : 410 Gone error
  • web.internalerror() : 500 Internal Server error
class MemoryDB(AbstractDB):
    """In memory storage engine.  Lacks persistence."""
    database = {}
    def get_key(self, key):
        try:
            return self.database[key]
        except KeyError:
            web.notfound()

    def put_key(self, key, data):
        self.database[key] = data

    def delete_key(self, key):
        try:
            del(self.database[key])
        except KeyError:
            web.notfound()

    def keys(self):
        return self.database.iterkeys()

Testing it out

In one command window, run the server:

$ python docstore.py
http://0.0.0.0:8080/

And, assuming you have httplib2 installed, open a instance of IDLE:

$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib2
>>> h = httplib2.Http()

Let's store a message with a key of '12345' and then get it back:

>>> h.request('http://localhost:8080/memory/12345','POST','hello')
({'transfer-encoding': 'chunked', 'date': 'Sun, 21 Sep 2008 00:17:08 GMT', 'status': '200', 'server': 'CherryPy/3.0.1'}, '12345\n')
>>> h.request('http://localhost:8080/memory/12345','GET')
({'transfer-encoding': 'chunked', 'date': 'Sun, 21 Sep 2008 00:17:38 GMT', 'status': '200', 'content-location': 'http://localhost:8080/memory/12345', 'server': 'CherryPy/3.0.1'}, 'hello\n')
http://cdn.johnpaulett.com/upload/webpy-datastore12345.png

Now let's delete the object at key 12345, we then get a 404 error if we try to retrieve key 12345:

>>> h.request('http://localhost:8080/memory/12345','DELETE')
({'transfer-encoding': 'chunked', 'date': 'Sun, 21 Sep 2008 00:18:16 GMT', 'status': '200', 'server': 'CherryPy/3.0.1'}, '')
>>> h.request('http://localhost:8080/memory/12345','GET')
({'transfer-encoding': 'chunked', 'date': 'Sun, 21 Sep 2008 00:18:20 GMT', 'status': '404', 'content-type': 'text/html', 'server': 'CherryPy/3.0.1'}, 'not found')

PUT will generate a UUID:

>>> h.request('http://localhost:8080/memory/','PUT','a new message')
({'transfer-encoding': 'chunked', 'date': 'Sun, 21 Sep 2008 00:19:40 GMT', 'status': '200', 'server': 'CherryPy/3.0.1'}, '4dc6a4ca-ebeb-41ac-81b2-5c2764c0fba8\n')
>>> h.request('http://localhost:8080/memory/4dc6a4ca-ebeb-41ac-81b2-5c2764c0fba8','GET')
({'transfer-encoding': 'chunked', 'date': 'Sun, 21 Sep 2008 00:20:09 GMT', 'status': '200', 'content-location': 'http://localhost:8080/memory/4dc6a4ca-ebeb-41ac-81b2-5c2764c0fba8', 'server': 'CherryPy/3.0.1'}, 'a new message\n')

And we can even throw binary data up into the database

>>> f=open('exploits_of_a_mom.png','rb')
>>> h.request('http://localhost:8080/memory/johnnytables','POST',f.read())
...
>>> f.close()
http://cdn.johnpaulett.com/upload/webpy-johnnytables.png

Let's look at the list of keys:

http://cdn.johnpaulett.com/upload/webpy-datastore.png

Conclusion

As you have hopefully seen, web.py offers a very simple way to create a RESTful application. Take a look at the other implementations of the AbstractDB.

Also, check out RESTful Web Services by Leonard Richardson and Sam Ruby for a great description of building RESTful APIs.

New jsonpickle release

Just released 0.0.5 for jsonpickle, the Python to JSON object serializer. Check out the announcement.

Eclipse 3.4 (Ganymede) on Ubuntu

These instructions refer to outdated version of Eclipse and Ubuntu. Please refer to the new instructions on installing Eclipse Galileo (3.5) on Ubuntu Jaunty (9.04)

Eclipse Ganymede (the successor to Europa) was released today. Ubuntu seems to be stuck on Eclipse 3.2 since at least Feisty Fawn. There are nice features that we are missing out on (Mylyn, inline renames, etc.). JDK First things first, you need a JDK (Java SDK) in order to use Eclipse. I am a fan of the OpenJDK, Sun's open source version of its JDK. OpenJDK recently reached full Sun JDK compliance. But any JDK should work, assuming it is at least Java 5.

sudo apt-get install openjdk-6-jdk

Then update your ~./bashrc file, appending the JAVA_HOME (adjust this if you use a different JDK).

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/

Get Ganymede

wget http://ftp.osuosl.org/pub/eclipse/technology/epp/downloads/release/ganymede/R/eclipse-java-ganymede-linux-gtk.tar.gz
tar xzvf eclipse-java-ganymede-linux-gtk.tar.gz
mv eclipse eclipse3.4

We should be ready to go:

eclipse3.4/eclipse

And your nice new Eclipse is up and running. Suggested Plugins Eclipse is great because it has so many plugins. I even use it as my default Python editor. If you go to Help > Software Updates, you will see a vastly improved update dialog (the previous one was painful). Here are my favorites:

Let me know what you think of the new Eclipse and if there are other plugins you just can not live without.

Update: As Scott points out in the comments, there is an open request on launchpad to include a more recent version of Eclipse in the Ubuntu repositories: https://bugs.edge.launchpad.net/ubuntu/+source/eclipse/+bug/123064

pymedia on Ubuntu Hardy Heron

Recently, I needed to use pymedia, for some audio and video encoding. The problem though, is that pymedia was nowhere to be found in the Ubuntu Hardy Heron package repository, and the only .deb installation candidate from the pymedia website was for an older version of pymedia and Python 2.4. Not wanting to run an old version and having Python 2.5 as a requirement, I needed to compile the package myself--no easy task, it turns out.

Step 1: Get pymedia

wget http://internap.dl.sourceforge.net/sourceforge/pymedia/pymedia-1.3.7.3.tar.gz
tar xzvf pymedia-1.3.7.3.tar.gz
cd pymedia-*

Step 2: Get the pymedia dependencies

As noted here.

sudo apt-get install python-dev libogg-dev libvorbis-dev liblame-dev libfaad-dev libasound2-dev python-pygame

Step 3: Get GCC 3.4

Note pymedia will not compile with GCC 4.0.

sudo apt-get install gcc-3.4 g++-3.4
export CC=gcc-3.4

Step 4: Build/compile pymedia

python setup.py build

Step 5: Be a good Ubuntu user with checkinstall

Checkinstall is great because it installs the package as a .deb file.

sudo apt-get install checkinstall
sudo checkinstall python setup.py install

Note: If you want to be a bad Ubuntu user, you can run "sudo python setup.py install" instead of the checkinstall command.

Step 6: Try it out

python
>>> import pymedia

bminews.com launched

I am launching a social news site, bminews.com. It is a specialty site for practitioners in the fields of Bioinformatics and Medical Informatics. Join in! I am looking for a site with great news on new developments, technologies, and opportunities in the field. Related topics (programming/consulting/career advice/math/scientific writing/conferences) are all encouraged.

jsonpickle

I have been working on an open source project, jsonpickle. The goal of the project is to be able to serialize a Python object into standard JSON notation. Python can "pickle" objects into a special binary format, but sometimes it is nice to get a human-readable format. Especially with projects like CouchDB that have use a JSON-based API. jsonpickle is on its seconds release and can now officially handle Mark Pilgrim's Universal Feed Parser. Feel free to join in by finding bugs and working on the code! It is pretty easy to use:

>>> import feedparser, jsonpickle
>>> doc = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
>>> pickled = jsonpickle.dumps(doc)
>>> unpickled = jsonpickle.loads(pickled)
>>> doc['feed']['title'] == unpickled['feed']['title']
True

Building Python Packages from Source on Windows

I always forget how to build Python packages, such as psyco and simplejson that require C/C++ code to be compiled. The usual error I get from running "python setup.py install" is

error: Python was built with Visual Studio 2003; extensions must be
built with a compiler than can generate compatible binaries. Visual
Studio 2003 was not found on this system. If you have Cygwin
installed, you can try compiling with MingW32, by passing "-c
mingw32" to setup.py.

Now, I do not have Visual Studio 2003, but I do have mingw32. (Grab cygwin and when selecting packages, make sure than mingw-runtime and gcc are selected.) Now, back with our setup.py file, execute:

python setup.py build_ext --compiler=mingw32 install

Hopefully that should solve any issues.

CPAN on Windows

To use Perl's CPAN <http://www.cpan.org>`_on Windows with `cygwin, you need to install some additional programs in cygwin. Run cygwin's setup.exe (I like clicking the "View" button to change the listing to Full, so I get an alphabetical list of the packages). Make sure that you install the following packages:

  • perl (just in case you do not have it)
  • gzip
  • tar
  • unzip
  • make
  • lynx
  • wget
  • ncftp
  • gnupg

Open the Cygwin bash shell and enter:

perl -MCPAN -e shell

Accept the defaults, and you are good to go. Once in the CPAN shell, you can install modules with commands like:

install Date::Parse