Tuesday 18 June 2013

Lost in Database

We have a problem Houston!  I want to find a record inside my database, but I'm not exactly sure what that record is. The record I'm looking for might be plain text inside a table field, but it could also be in some other format eg. a spreadsheet. I'm afraid the best I can do to begin my search is to provide you with a two consecutive letters eg. "sa".  From here we will narrow the search iteratively.

Faced with a problem like this most database systems can't cope.  That's because all database types, regardless of their design, trade-off searchability vs maintainability.  Let's look at this question for the three main database types.

Navigational (hierarchical and network)

Navigational databases comprise linked lists written to disk.  They generally have a root entry point and from here searches are conducted down branches and a list of addresses is returned that meet the criteria.  Used with indexes, searching navigational databases is very fast.  Their major downfall is that they have no pre-defined structure, so cannot be searched using a computationally complete query language.  Instead a domain specific query language must be created for each database.

Object-oriented

Object-oriented databases are really navigational databases dressed in sheep's clothing. Their main point of difference from navigational databases, is that they can store any type of data because the instructions required to interpret that data are stored in the same object as the data. Consequently, object-oriented databases are very good at storing multi-media files.  However, just like navigational databases, object-oriented databases cannot be searched using a computationally complete query language. They also grow very large quickly as the instructions required to interpret their data is replicated unnecessarily many times over in each object instance.

Relational

Relational databases use a quite different approach to data storage than navigational and object-oriented databases.  Instead of linked lists, relational databases use a collection of tables. The tables are not connected together directly as a linked list is.  Instead, the linkage between tables is an indirect one, where the id field of one table is stored as a field in another table and referred to as a foreign key.  The advantage that relational databases have over the other databases is that the query language used to join these tables (relations) together is computationally complete. That's why it's called SQL (STRUCTURED query language) folks! SQL is not a domain specific. It can be used on any relational database, regardless of it's design. Unfortunately for us, the table structure found in relational databases is not suited to binary data such as our spreadsheet, since SQL is optimised to traverse tables with fixed length fields.

Summary

While navigational databases and object-oriented databases can return search results quickly, to do so they use domain specific query languages.  In other words they trade-off maintainability for searchability.  Conversely, relational databases are slower, but anyone can query them using SQL, their universal query language.  Having SQL is a real bonus for maintainability, but it comes at the cost of not being suitable for manipulating binary data.

Solution

How does any of this help me to find my lost record?  Neither navigational databases nor relational databases are very promising. At face value, object-oriented databases look like they'll do the job, providing they have the correct code available to search through the spreadsheets stored inside them. The truth is they don't. It can be done, but we are back to that maintainability problem again.

BUT, through a happy coincidence with Classmaker, we get the best of both worlds, because when I implemented Classmaker I chose to structure the data hierarchically inside a relational database.  Most planning packages use tags to aggregate similar records together. This produces a similar effect to a network database where there are multiple routes to the same record.  I chose instead, to arrange my records hierarchically with Long Term plans above Unit plans above Lesson plans and Binary file attachments. This hierarchical structure means that while Classmaker can't slice and dice it's database like one with tags could, it was not a hard job to EXPORT the entire database structure to disk, because the directory structure on disk is hierarchical too. There's a further problem with tags. If you don't attach the correct tags to your records, you're NEVER going to locate them!

Windows Search

Windows Search is an interesting facility.  It comes with the IFilter plug-in. This is an API that lets third party binary file formats expose themselves to Windows Search for indexing. Using Windows Search we can look for all spreadsheets, or in fact, any file that contains the phrase "sa".

Conclusion

Classmaker uses Windows Search indirectly to let you examine every record in it's database whether plain text  or binary format for the phrase "sa". Simply export all Classmaker's data to your local Documents directory.  Windows Search automatically indexes all files written there.

I have used Windows Search many times since I discovered it.  It's great knowing that no matter what I load into Classmaker, I will always be able to retrieve it again.  No more lost records.  No more calls to Houston!