Sunday 17 April 2022

Storing Your Code in a Database 2

Now that I've been working on my Web application for a while I've realised a few things. 

  • I've essentially created an Access database, but using a Web front-end. In Access the code is divided into collections e.g. forms, queries, reports etc. When you create an application you create the individual components and then stitch them all together. I'm doing the same thing, but have yet to determine the most effective collections to have.
  • I'm using my own traditional PC app (Classmaker) as my IDE. This is much faster to use than a Web IDE. It also means I can keep the same Web page up and simply refresh it to see my changes. I could code directly against a Cloud based code database using SSH tunneling as Classmaker uses Isectd to communicate with it's database. 
  • Because I'm using a code database via Classmaker, its relatively easy to move code components around and I'm doing that a lot at the moment. I think once the app is mature, I will have a number of collections each containing scores of components that I call from a few Web forms. The Web forms themselves probably won't have much code in them.
  • Rolling back changes to code is easy to do in a database. I just have a flag attached to the record. To make changes in a live environment I would duplicate the record, but with the flag set. If the flag is set that record is not selected for parsing.
  • I'm using FORTH as my application language. It's my own implementation written on top of Python and it's optimised for working with disconnected recordsets and strings. FORTH is a tricky language which requires you to be in the "zone" to be productive. Because I'm architecting my development environment as I go (eating my own dog food!), not a lot of progress towards the app is happening yet. I'm hopeful the gains on my investment will occur later.
  • FORTH does have several advantages though:
    • It has no syntax. This is nice. I can just write the code and format it any way I want to.
    • My FORTH doesn't have local variables (a discussion about the need for local variables deserves it's own blog post 😏), just the stack and global variables. This means it is a great language, maybe the best there is... for code refactoring. Combine this with a database and it's a simple task to bundle some repetitive code into a word and shift it to it's own record.
    • FORTH is fast. You wouldn't think so, being an interpreted language written on top of an interpreted language, but...
      •  The language core (both Python and FORTH) is compiled into pcode (assembly language for the Python virtual machine) and stored in memory, when the web server is started, so when you call a core compiled FORTH word you're calling a memory resident subroutine. The best analogy for what I'm doing is Apache's modPy, except I'm using Isectd to do it, not Apache and it's modFORTH rather than modPy. Apache thinks it's calling a CGI program, so I could use any CGI web server. The result would be the same.
      • In FORTH, code is compiled/executed in a single pass. The word list is examined and if a subroutine is to be compiled that happens and the compiled code branch memory address is inserted into the dictionary. Further down the word list if that word appears the dictionary is looked up and the compiled word is executed. Whenever I branch to a database record it's word list is inserted into the trunk word list at that point and execution continues, so the database code is parsed only as it is called. JIT compilation/execution? Whatever you call it, no code is prematurely compiled apart from the language core.
  • When using C++ or even Python it can be very difficult to work out just where an error has come from and what it is. In a database this can be narrowed down to a specific record. The smaller the record the less code to go through. With FORTH it's easier as you can locate in the word list the exact word the code failed on. This means that syntax errors can be immediately identified. Logic errors are more tricky, since they may originate earlier in the code base. I've found that most of the Python errors that are raised from a FORTH logic error are meaningless. It gets worse! If you've created a word that comprises many other words and that word should fail, you know it failed, but you don't know which subword caused the problem, because the subwords were compiled much earlier in the program. The lesson I'm learning is.. don't prematurely refactor. At least you can dump the contents of the stack (essentially all your local variables) at any point in your code to see what's going on. I tend to work back from the bug, inserting dump statements at random points in the code.

Monday 11 April 2022

Storing Your Code in a Database 1

I've always been concerned about how to handle different versions of the same code for different customers. I've seen several different approaches over the years. 

The one that seemed to work best involved code stored in a Pick database. At my first job, we used Advanced Revelation, a DOS version of Pick. The Pick system mixed code files written in Pick Basic and multi-value data files together in a hashed database. We used to take a global source code file and append an asterisk and the customers name to the end of it when a customisation was required e.g. accounts*customerA, accounts*customerB etc. Code management modules were written that compiled accounts*customerA into the global accounts executable and then bundled all the executable records together into an archive that the customer extracted onto their system. Another department at that same work place were developing a C++ app, storing the code in standard text files. They got into all sorts of trouble handling variations, mainly because, I think, when developing using text files, to avoid clutter you tend to let the text files grow too large and then it becomes very difficult to refactor them. 

I've seen a FoxPro app where most of the app is standard, but you were able to request minor modifications e.g. customised reports. How this was handled at the vendor end, I'm not sure, but I did notice fields in the database that my customer didn't use.

Another approach is to have a single code base, but include an ini file that let's different customers switch on and off the functionality they need. Ini files used like this can grow to be very large e.g. Apache Web Server. I think the ini file approach is going to become unworkable as the code base becomes increasingly convoluted trying to account for every possibility.

I have no experience doing this, but you could build a conventional app using text files and then store these inside a versioning database e.g. Fossil

Today we have hosted web apps. I don't think much has changed except that you no longer have the upgrade pain that came with distributed apps. I'm guessing the standard approach is to build an app in a CMS and to associate code files and database records with registered users. So you end up with one huge CMS app and one large relational database. The database is a concern, because over time it is going to end up with lots of redundant fields from customisations. You must be passing some kind of unique token between the server and the client to ensure that the client can't trespass onto someone else's data and if that token gets hacked, the hacker might be able to gain access to your entire customer base! To avoid that database engineers resort to using GUIDs to identify records, so it is virtually impossible to pull up a database record using a random key.

I'm working on a different approach.

  • Each customer has their own database for data. Consequently you can keep it simple with integer id's and integer foreign keys instead of a mess of GUIDs which as well as being confusing massively impact on performance as a index key. The database server manages multiple small databases.
  • Their code, though, resides in a SINGLE database for code. The code database is organised like a library with shelves, books and chapters (pages exist too but these are called indirectly). A shelf is an entire application which comprises many books, but the customer directly accesses just four books, a private book (administration), a public book (customer web pages that can be used by anyone), a protected book (customer web pages that can only be accessed by their registered users) and a global book. The customer books contain only the chapters which are customised for them. Every time a client requests a web page, the customer books are browsed first. If the book's chapter is missing then the chapter is sourced from the global book instead.
  • It's impossible for a customer to request another customer's book because to read their book you have to be standing right in front of it. This is pushing the analogy a bit far! I have just ONE very simple cgi file (all it does is collect cgi parameters and cookies, sends them to the code database and returns the reply to the web browser) that calls the code database, but there are MANY COPIES of that file in a shallow directory tree. Each copy has it's own ini file which specifies the name of the book it can access and which database. None of that information passes over the Internet. Access to the cgi files is password protected where necessary (using SSL encrypted basic authentication).