Skip to content

-->

What Model-View-Controller really means

Model-View-Controller mental model Model-View-Controller is an architectural pattern commonly used in software applications, which works like this:

  • The model provides a domain-specific representation of the data used by the application.
  • The view renders data to externally-usable formats, typically UI elements.
  • The controller accepts and handles foreign inputs (for example, user input), performs relevant operations on models and initiates a response.

If a recruiter asks you to explain MVC, you can recite these three points and ace the interview. But I think it kind of misses the whole point of MVC.

According to Trygve Reenskaug, who first described MVC in 1979: "The essential purpose of MVC is to bridge the gap between the human user's mental model and the digital model that exists in the computer...MVC was conceived as a general solution to the problem of users controlling a large and complex data set."

So MVC is as much about usability as it is about system architecture. Unlike other design patterns (such as those described by the Gang of Four), MVC is an 'outward looking' pattern, that applies to an entire system. It's original purpose is to help users to understand the working of a system by providing a consistant mapping between the user's mental model, and the domain- or business-logic.

It's hardly surprising then, that MVC has become such a popular pattern; it helps developers understand systems as much as it helps users.

There are about 1 million frameworks available that implement MVC along with related gadgetry: UI templating systems, Object-relational mappers etc. Some frameworks provide strict enforcement of MVC's "rules", for example by prohibiting access to models from within the views. Whichever MVC-system we use, we shouldn't lose sight of the original aim — to bridge the gap between the human user's mental model and the digital model. Something which no framework can do for you.

Update: at the behest of inn0 I encourage you to have a look at Trygve Reenskaug's article about the origins of MVC - it's a great read.


Data.gov.uk: Where's the Data?

The new data.gov.uk site has been launched which open up government for reuse by companies and individuals. Sounds great!

  1. Government publishes data using "open standards, open source and open data"
  2. Geeks from accross the whole country get to work analysing, cross-referencing and building cool applications
  3. Everyone wins

The problem is it's still quite difficult to find really usable data. For example, take the first data set: 2008 Injury Road Traffic Collisions in Northern Ireland. What we actually get is a link to a landing page on the Police Service of Norther Ireland website with links to statistics on everything from crime statistics to Workforce Composition Figures.

And it turns out they're all PDFs, a next-to-useless format for data processing.

Also, I have found many pages that are simply a placeholder for future data, such as the page on the Annual abstract of statistics which currently states "There is currently no text in this page."

Now I think it's a great start, and there are already some pretty cool apps available, but I think that data.gov.uk could do better job of distinguishing between usable datasets, and placeholders or pdf reports.


Blogging with PyBlosxom

pyBlosxom logo This blog started a few weeks ago as a standard Wordpress blog, and I quickly discovered I wanted something a bit more 'interesting' as platform.

Sure, Wordpress ticks all the boxes and is increadibly easy to use, but part of me wanted something a little more ... nerdy

One of the things that I dislike about many blogging platforms is the very fact that they are web-based. But it's not so much the web interface that annoys me so much as the workflow you are forced into. Rather than managing content through a WYSIWYG editor, I'd much rather like to edit my posts with my favourite text editor, on any computer, and remotely manage everything. I'd also like version control, and be able to see the history of all my posts and additions.

I first experimented with Blosxom which is based on the simple idea of dropping text files into a directory, and a single perl script does the rest. However, I wanted something I could hack around with and I knew that if I went the perl route my blog would soon be abandoned. Wait! I'm not a perl hater, I just wanted somethying a bit more ... pythonic :)

Enter PyBlosxom. Practically the same, but written in python, PyBlosxom doesn't seem have as many users or plugins, but it does a good job all the same. I quickly ported the Carrington theme from Wordpress set about configuring it.

I wrote a little script called genblog.sh that generates the whole site statically, and manages all the css and other static files. I can edit my blog posts wherever I am and simply commit them into svn. A simple "deployment" script is all it takes to update the site:

Additionally, PyBlosxom allows you to use different configurations by passing a command-line option to the pyblosxom.py script. This way I am able to generate a 'preview' of new posts before I publish them, for example:

You'll notice that the whole site is plain vanilla HTML. To be honest, disqus is much better that anything I could locally myself, so there was no need to run PyBlosxom as a CGI script, but that is also an option.

I think it would be nice to see more develpment of things like PyBlosxom. Some people use github as a blog for the same reasons - a weblog with a hacker-friendly workflow. What do you use?


Web-scraping and Geo-locating Ticket Restaurants

For a number of years I have been a user of various restaurant ticket schemes. These are 'cheques' given out by employers to allow people to buy lunch in a participating restaurant. These scremes are quite popular in Spain and there are three major systems in use (along with others): Sodexo Pass, Ticket Restaurant and Cheque Gourmet.

One annoying problem is the difficulty of finding restaurants that participate in any of these schemes. It would be nice to have different restaurants marked on a map, but each service offeres a fairly low-quality restaurant finder. In this web-scraping example, our aim is to scrape as much restaurant information as we can from different sources, and compile it all into a MySQL database. Useful information might include the name of the restaurant or bar, it's address, phone number and geographical coordinates. Lets go!

NB: all the code for this exercise is available in my public SVN repository:
http://svn.happy.cat/public/restaurants/trunk/
You will need to install some dependencies: see install-deps.sh

The basic strategy involves 3 steps:

  1. Download raw data from public website
  2. Extract meaningful information
  3. Merge and load into useable data store
To do the downloading we use simple shell scripts and wget; for example:

Easy peasy. In the Mobiticket case (a wap service offered by Ticket Restaurant) we need two steps, first to download an index and then to download the individual restaurant pages one by one. To avoid clobbering these services, we put a delay between each HTTP request.

Now comes the tricky part. To extract meaningful data from the HTML or WML that we downloaded, we revert to using python and some simple regular expressions. We could have probably used BeautifulSoup or some other kind of XML processing along with XPATH. I throw my hands up - I have no defence. In fact, if you are really getting serious about scraping, have a look at Scrapy.

Anyway, the files *-extract-to-tab.py simply churn through the data and spit out tabular data. For example:

So we run our scripts like this:

And we should have three tabular files which we can import into our database.

At this stage we will need to set up a database table to store the data and create a config file. See the example config and setup the table with the restaurants.sql script.

That should give us a nice big table of some 66,725 restaurants. Olé!

But wouldn't it be nice to be able to put these on a map? Fortunately, we've been able to pinpoint around 10,000 restaurants because mobiticket included a map in their application, so we could scrape the latitude and longitude. But this accounts for less than 18% of the restaurants we know about. However, Google provides a reverse geocoding service which we can use to extract the coordinates for a given street address.

To use this service you need an API key which you should put in your config.py. Then run:

and go and have a cup of tea. Infact you'll have to wait quite a while, because your database contains around 45,000 restaurants that need to be geo-located, and Google places a limit of 15,000 geocode requests in a 24 hour period from a single IP address. For this reason we have to put a 6 second delay between each API request.

Anyway, I hope you enjoyed this little detour into web scraping. As you can see, the basic pattern is:

  • download data from various sources
  • process and normalize
  • collate into a central store

Like I mentioned, there are frameworks like Scrapy that make it easy; personally I like to split the work into various smaller steps, so that I can introduce 'save points'. In Scrapy remember to do this in the item pipeline. There's a great tutorial to get you going.