Web-Based Annotation

December 19th, 2006

We intend to add annotation/commentarysupport to the open shakespeare web demo either in this release or next. As a first step I’ve been looking to see what (open-source) web-based annotation systems are already out there. Below is a list of what I’ve been able to find so far (if you know of more please post a comment). After examining several of these in some detail the one we’re going to try our properly is marginalia (if you’re interested our current efforts to do this including writing a python wsgi annotation service backend can be found here in the subversion repository).

  1. stet: javascript annotation system used for gpl v3 comments system

  2. commentary: javascript based wsgi middleware developed by ian bicking

    • http://pythonpaste.org/commentary/
    • Rather hacked together (apparently he coded it in a week). Had problems getting it working locally and no documentation to help in adaptation. Seems to be unmaintained (demo site is currently down) which is perhaps not surprising given how many other projects Ian has on the go.
    • One nice feature is that you don’t seem to have to mess with the underlying web pages you want to add comments to (this only works if you are sitting on top of another wsgi application)
  3. marginalia: javascript library and spec for adding web annotation to pages

  4. annotea: W3C project based on RDF

    • http://www.w3.org/2001/Annotea/
    • Been around a long time and now seems to be inactive
    • Server and client support rather lacking. No simple interface based on, e.g., javascript — you have to write a special client yourself — which is a major drawback
    • That said the protocol is well-documented and so writing a client (or a server) shouldn’t be that hard (other than having to mess around with rdf in javascript …)
    • The Schema seems reasonable
    • xpointer based which according to the marginalia site is a problem

UPDATE (2008-06): a new version is available (v1.2): http://www.rufuspollock.org/2008/06/23/markdown2latex-mkdn2latex-12/

Over the last year I’ve written quite a few papers using markdown plus asciimathml. While this is great for web publication (and editing) and gives me lots of styling freedom via css it doesn’t produce output that’s as nice as that produced by latex especially in paginated form (also latex mathematics support is also currently better than that of obtained from asciimathml or latexmathml).

Unable to find any python code that would do what I want I played around for a couple of hours with the python-markdown script until I got something functional. After a few weeks of use which has allowed me to iron out the bugs and making several improvements I feel the script is now ready for public release. Hope people find it useful.

Download

Get it from: http://project.knowledgeforge.net/okftext/svn/trunk/python/mkdn2latex.py

(You can also it check it out using subversion from the same url if you want)

For the script to function you will also need to install the python-markdown module v1.5 (make sure you install it under the name markdown.py).

Usage

The following will print the latex output to the console (standard out):

 $ mkdn2latex.py path-to-markdown-file.mkd

To convert a markdown file straight to a latex output file do:

 $ mkdn2latex.py path-to-markdwon-file.mkd > path-to-output-file.ltx

NB: As provided the script expects mathematics in your markdown file to be delimited with ‘$\$’ (this should be dollar dollar — the slash is there to stop this being rendered as maths in the blog) as opposed to the standard asciimathml delimiters of ‘`’ or ‘$’.

Whenever I’ve had a few spare minutes over the last couple of months I’ve been hacking away on svnrepo, a pythonic API to local subversion repositories and it is now robust enough to warrant a 0.1 release. svnrepo is (and was intended to be) very small, just a single module, that wrapped the python subversion bindings for repository access to make them simpler to use and more object-oriented. At present the module requires subversion >= 1.3 but I’m hoping to scale that dependency back in future releases.

Getting it

The module is Open Source software (MIT-licensed) and you can either:

  1. Download it directly from: http://www.rufuspollock.org/code/svnrepo/svnrepo.py

  2. Or get it the python package index. If you are using setuptools just do:

    $ easy_install svnrepo

What it looks like

There are unit tests at:

http://www.rufuspollock.org/code/svnrepo/svnrepo_test.py

And they are pretty good at demonstrating how to use the API but just for the sake of demonstration. Assume that you have an existing subversion repository at REPOSPATH.


from svnrepo import *
REPOSPATH = ...
repos = Repository(REPOSPATH)

history = repos.history('/')
for revision in history:
    print history

rev = repos.get_revision() # get the youngest revision
print rev.log # the log message of the revision
print rev.date

# get a node
rootdir = rev.get_node('/')
print rootdir.is_dir()
print rootdir.list_dir()

# create a new revision
newrev = repos.new_revision()
newrev.log = 'My new revision'
newrev.author = 'me'
fs = newrev.file_system
filepath = 'tmp.txt'
newfile = fs.make_file(filepath)

text = 'nothing ever exists entirely alone'
newfile.write(text)

propname = 'copyright'
propval = 'nemo'
newfile.set_property(propname, propval)

newrev.commit()

Having looked around for a while without success for something that would spit out csv files as ascii tables I decided to hack something together. The result is a small python script csv2ascii.py. It is currently fairly crude, for example it just truncates cell text which is too long, but I hope I’ll have some more time to improve it soon.

Example

Suppose you had the following in a file called example.csv:

"YEAR","PH","RPH","RPH_1","LN_RPH","LN_RPH_1","HH","LN_HH"
1971,7.8523,43.9168,42.9594,3.7822,3.7602,16185,9.691843   
1972,10.5047,55.1134,43.9168370988587,4.0093,3.7822,16397,9.704855

Running:

 $ ./csv2ascii.py example.csv

Would result in:

+------+------+------+------+------+------+------+------+
| YEAR |  PH  | RPH  |RPH_1 |LN_RPH|LN_RPH|  HH  |LN_HH |
+------+------+------+------+------+------+------+------+
| 1971 |7.8523|43.916|42.959|3.7822|3.7602|16185 |9.6918|
+------+------+------+------+------+------+------+------+
| 1972 |10.504|55.113|43.916|4.0093|3.7822|16397 |9.7048|
+------+------+------+------+------+------+------+------+

The Open Access Initiative Protocol for Metadata Harvesting (OAIPMH) is growing rapidly as the standard web protocol for making metadata, primarily bibliographic information, available online for programmatic access and I’ve long meant to write something that would allow be to pull information down from remote repositories into my local bibliographic database automatically (it would save an awful lot of typing).

I’ve mentioned the oaipmh package provided by infrae.com before however the documentation they provide has got rather out of date and though I’ve made a few attempts I’ve never quite been able to get it to work. However after a bit more effort recently with the newer v2.0+ of the package I’ve managed to get something basic working which you can find at http://www.rufuspollock.org/code/oaipmh/demo.py.

I should note that my main interest, at least at present, is in the client-side, not the server-side of oaipmh so the code is oriented in that direction — as I mentioned above my aim is to automatically pull down article metadata into my local bibliographic system from sites such as repec (repec oai url).

WSGI Middleware

September 28th, 2006

WSGI Middleware

In a previous tutorial we just wrote a basic ‘Hello World’ application in WSGI. At the end of you might, rightly, have been wondering what’s the point of WSGI — after all you could have written that ‘Hello World’ app using plain CGI (or anything else for that matter). In this tutorial we are going to start answering that question by taking a look at WSGI middleware and write a simple piece of middleware ourselves.

A Simple Example

Here a simple piece of middleware that adds authentication based on the remote address of the client (this tutorial and its code is available in raw form at http://www.rufuspollock.org/code/wsgi/):


from wsgiref.simple_server import make_server, demo_app

class AuthenticationMiddleware:
    """A modified version of an original example at:
    http://isapi-wsgi.python-hosting.com/wiki/WSGI-Gateway-or-Glue
    """

    def __init__(self, app, allowed_addresses):
        """
        @param app: the WSGI app we will that comes after us
        @param allowed_addresses: list of remote addresses from which to allow
                                  access
        """
        self.app = app
        self.allowed_addresses = allowed_addresses

    def __call__(self, environ, start_response):
        """The standard WSGI interface"""
        addr = environ.get('REMOTE_ADDR','UNKNOWN') 

        if addr in self.allowed_addresses: # pass through to the next app
            return self.app(environ, start_response)
        else: # put up a response denied
            start_response(
                '403 Forbidden', [('Content-type', 'text/html')])
            return ['You are forbidden to view this resource']

addresses = [ '127.0.0.1' ]
simple_app_with_auth = AuthenticationMiddleware(demo_app, addresses)

if __name__ == '__main__': 

    httpd = make_server('', 8000, simple_app_with_auth)
    print "Serving HTTP on port 8000..."

    # Respond to requests until process is killed
    httpd.serve_forever()

The Basic Idea

As explained in [pep-333] the basic idea of middleware is of something that ‘plays both sides’:

Note that a single object may play the role of a server with respect to some application(s), while also acting as an application with respect to some server(s). Such “middleware” components can perform such functions as:

  • Routing a request to different application objects based on the target URL, after rewriting the environ accordingly. * Allowing multiple applications or frameworks to run side-by-side in the same process * Load balancing and remote processing, by forwarding requests and responses over a network * Perform content postprocessing, such as applying XSL stylesheets

A diagram helps:

             WSGI SERVER

               V   A
               V   A
               |   |
               |   |
      +---------------------+
      |        |   |        |
      |   +-------------+   |
      |   |    V   A    |   |
      |   |   +-----+   |   |
      |   |   | APP |   |   |
      |   |   +-----+   |   |
      |   | MIDDLEWARE1 |   |
      |   +-------------+   |
      |     MIDDLEWARE2     |
      +---------------------+

   The WSGI Application + Middleware 'Onion'

Basically middleware wraps an underlying wsgi application and then presents itself as the new wsgi application to external callers. In python code the above would like:

core_app = SomeWsgiApplication()
# remember the middleware is itself a wsgi application
wrapped_once = Middleware1(core_app)
# wrap the new wsgi application!
wrapped_twice = Middleware2(wrapped_once)

# alternatively we could do it all in one
wrapped = Middleware2(Middleware1(core_app))

Remarks

Middleware is useful because it dramatically increases the possibilities for using standard web application plumbing — any piece of middleware can now be plugged together very easily with either other middleware or an application.

Middleware is usually one of three types:

  • pre-processors
  • post-processors
  • those that do both (rare)

Examples of pre-processors are:

  • Authenticators (including session management)
  • Dispatchers including proxies and controllers

Examples of post-processors:

In general, pre-processors are a little simpler because they don’t have to deal with the ‘chunking’ aspect of WSGI (a WSGI application return an iterable rather than just a single buffer so as to allow ‘chunking’ of output — this will be useful, for example, when streaming large files, see the’Buffering and Streaming’ section in PEP 333 for more information).

‘Hello World’ with WSGI

August 31st, 2006

I’ve been seeing a lot of talk about WSGI (Web Server Gateway Interface) and its benefits over the last six months or so and I’ve been meaning to take a look — not least because of the potential to use wsgi middleware to make a nice front-controller for KForge.

First Stop

A quick google takes me to: http://www.wsgi.org/wsgi. I’m looking to just write the proverbial ‘hello world’ app at this stage. Most of the references are bit too high level (or complex) for me (though this one is an exception). So here I’m going to detail my experiences of familiarizing myself with wsgi by writing the classic ‘hello world’ app (if you looking to do something more sophisticated with wsgi check out a toolkit such as paste or pylons the framework built on top of paste).

Hello World

1. Install wsgiref

wsgiref is the wsgi reference implementation that is now part of python 2.5 standard library. If you are running python version less than 2.5 you will want to do:

  $ sudo easy_install wsgiref

2. Get a web server

We’ll use the wsgiref simple server as detailed in the docs (if you want to use a ‘proper’ webserver see the section below on making your wsgi app available via fastcgi). Create a python module, simpletest.py say, and insert:

  from wsgiref.simple_server import make_server, demo_app

  httpd = make_server('', 8000, demo_app)
  print "Serving HTTP on port 8000..."

  # Respond to requests until process is killed
  httpd.serve_forever()

  # Alternative: serve one request, then exit
  ##httpd.handle_request()

3. Run it

Start the server:

  $ python simpletest.py

Then visit http://localhost:8000/

Bingo! We’ve got our first working wsgi app (demo_app should output ‘Hello world!’ followed by a list of variable values).

4. Make our own Hello World app

We haven’t yet written anything ourselves — we’re just using the demo_app bundled with wsgiref. So change simpletest.py to be:

  def simple_app(environ, start_response):
      """Simplest possible application object""" 
      status = '200 OK'
      response_headers = [('Content-type','text/plain')]
      start_response(status, response_headers)
      return ['My Own Hello World!\n']

  from wsgiref.simple_server import make_server, demo_app

  httpd = make_server('', 8000, simple_app)
  print "Serving HTTP on port 8000..."

  # Respond to requests until process is killed
  httpd.serve_forever()

Run this and visit http://localhost:8000/ and you should see a blank page containing ‘My Own Hello World!’.

5. Using a Class

Finally for completeness here’s the same application but done as a class:

  class SimpleApp:
      """Produce the same output, but using a class
      """
      def __init__(self, environ, start_response):
          self.environ = environ
          self.start = start_response

      def __iter__(self):
          status = '200 OK'
          response_headers = [('Content-type','text/plain')]
          self.start(status, response_headers)
          yield 'My Own Hello world!\n'

  from wsgiref.simple_server import make_server, demo_app

  # httpd = make_server('', 8000, simple_app)
  # the same but using a class
  httpd = make_server('', 8000, SimpleApp)

  print "Serving HTTP on port 8000..."

  # Respond to requests until process is killed
  httpd.serve_forever()

Serving an WSGI App via FastCGI

This section explains how to serve your WSGI app via FastCGI (other methods using scgi or even cgi take an almost identical approach).

1. Install a fastcgi interface to wsgi:

Use flup which provides a fastcgi and scgi interface to wsgi:

  $ sudo easy_install flup

2. Install a simple standalone fastcgi implementation:

  1. Download http://www.saddi.com/software/py-lib/py-lib/fcgi.py
  2. Install this somewhere you can import it as import fcgi

3. Attach your wsgi application to this fcgi server

Create a python file (server.fcgi) and paste in the following:

  #!/usr/bin/env python
  from myapplication import app # Assume app is your WSGI application object
  from fcgi import WSGIServer
  WSGIServer(app).run()

Now you can just point your webserver at this file (make sure you’ve configured it to handle .fcgi files using fastcgi) and your app is available via fastcgi.

References