Skip to content

-->

Bft: Use MySQL to store Hierarchical Schemaless Data

Last year Brett Taylor wrote an interesting article about how FriendFeed uses MySQL to store schema-less data.

Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned. We can change the "schema" simply by storing new properties.

We index data in these entities by storing indexes in separate MySQL tables. If we want to index three properties in each entity, we will have three MySQL tables - one for each index. If we want to stop using an index, we stop writing to that table from our code and, optionally, drop the table from MySQL. If we want a new index, we make a new MySQL table for that index and run a process to asynchronously populate the index without disrupting our live service.

This seemed quite cool, so I've been working on a little weekend project that implements something similar. It's called Bft, as in "Big Fat Table". If you're interested, the code is in my public SVN repository:

http://svn.happy.cat/public/bft/Zend/lib/Bft

Features

  • Supports sharded databases through simple INI configuration.
  • Entities are stored as JSON encoded arrays of data along with a UUID.
  • Implemented as a Zend plugin using Zend_Db.
  • Supports ad-hoc indexes called 'collections'. Each collection can be related to a parent entity, and are therefore heirarchical.
  • Management of shards by implementing your own ConnectionManager.

Bonus: ReSTful API

Given that indexes can be created on-the-fly, I made a simple ReST API to create, modify, retrieve and delete entities and collections. See documentation below.

Usage

Create a new instance of Bft

Alternatively, you can use your own connection manager to control how entities and collections map to database shards.

Create a new entity

Print the UUID of the contact

Create a collection, and add the entity

List the entities in a collection:

List the IDs of entities in a collection

Check if a collection contains a particular entity

Using the API

Installation:

  1. Create a new host, and point the DocumentRoot at Api/public
  2. Create config file Api/config.ini (see Api/config.example.ini)
  3. That's it.

URLs contain collection names, starting with a dash ('-') or entity IDs (UUIDs). Here are some examples using Daniel Stenberg's curl utility.

  1. Create an entity within the collection '-contacts'
  2. Retrieve the contact
  3. Add a contact to the group "-contacts" AND the sub-group '-friends':
  4. List all contacts
  5. List only friends
  6. Update friend details
  7. Delete Robert from the friends list
  8. Note that Robert is still stored in the contacts list. To delete completely, you must delete him from the root list (contacts)

Tests do exist but the coverage is low at the moment and the code is slightly raw. Please send in bugs and patches to adam at happy dot cat!


blog comments powered by Disqus