FORTH is a wonderfully simple and compact programming language. Take a look at Rosetta Code Language Comparison. Nearly every column on their table for FORTH is either... N/A, None or No! I laughed at the column about the language being standardised (ANSI, ISO/IEC 15145:1997). Nothing could be further from the truth. At the beginning, a vanilla FORTH doesn't even have variables! You make them yourself by coding the word var (or whatever you think a variable should be called) using
: var create 1 allot ; which is a compile time word
every time the word var is invoked at run-time, create looks at the next word in the word list, let's call it counter for this example, assigns it a dictionary key, saves the current heap slot along with the instruction to push the heap slot to the data stack as a dictionary value and finally, advances to a new slot on the heap. So next time you call the word counter (not var. It's done what it needed to do), FORTH looks up the dictionary, and pushes it's heap slot to the data stack. Now you can retrieve the contents of the heap slot, using the built-in word @ or write a new value to the heap slot using the built-in word !
You're probably lost already, but this isn't a lesson in how to use FORTH. I cut my teeth on FORTH written in Python and highly recommend it. Instead I want to posit my pearl of FORTH. Somebody said somewhere that the pearl of Forth is it's create does> word pair, which in brief, let's you pair a compile time action (create) with a run time action (does>). This word pairing let's a user create their own language on top of FORTH. In the past people have created Object Oriented FORTHS and even BASIC. However, this is just syntactic sugar. I've discovered quite early in my FORTH journey that it's better to stay as close to the machine (Python virtual machine in my case) as you can.
No... for me the pearl of FORTH is that anybody can write their own FORTH and consequently there are hundreds of them out there. Few of them follow the FORTH standard.
The reason you can write your own FORTH is due to it's very simple compile / interpret loop. FORTH doesn't have a compiler in the standard sense. Instead it has compiling words like var above. In FORTH there is no syntax, just a stream of white space separated tokens called words which are consumed in a single pass. The compile / interpret loop works as follows:
1. We begin in interpret mode. If the word presented is a : we enter compile mode. Otherwise consume the word and execute it immediately.
2. If we are in compile mode, consume each word, compile it, but don't execute it. Instead once ; is reached, store the compiled pcode in the dictionary for future execution, when the word we have just made (var above) is called again.
3. Go back to 1.
A number of implications flow from this simple pcode compiler:
Compiled words can be defined that contain words that haven't been defined yet, as long as all words have been defined by execution time.
Recursion is possible.
All of FORTH's built-in tokens including the compiler control tokens : and ; can be renamed by storing them as constants.
- the code base is very malleable (the refactoring everybody talks about) and it is easy to change things so that you can re-use code e.g. printing a table is commonplace so I have standard table printing code where I redefine what can happen before, in and after a table cell. Now it's easy to print out HTML, tab delimited format, CSV or whatever.
- no types. If you want a type you have to explicitly cast it. Using HTML its all strings, so most of the time I don't want to be fooling around with types anyway. The only ones I've implemented so far are str and int. The other day I wanted a currency type. I found the appropriate Javascript and dropped my number between it. A quick and dirty solution that I wouldn't want to use for a table with hundreds of values, but formatting variables by surrounding them with FORTH string constants is standard practice for me.
- avoid local variables because they make it difficult to refactor. A web app is a collection of independent pages, so the chances of global variables standing on each other is reduced. I have a number of predefined global variables which I use over and over. Then I came across the concept of shadowing. A FORTH variable is really an array of one (1 allot).
var counter
1 allot
You now have a variable with two slots. To refer to the first slot you simply call it by its name:
counter
counter 1 +
gets you the second slot.
This shadowing concept means you can store a backup of your variable in the same variable name.
The Weaknesses
- being able to have local variables would be nice occasionally 😀. Stack twiddling is not useful work. My solution has been to turn the stack into a deque and to use the bottom of the stack to store one or two local variables. Many FORTHS use the top of the control stack to store a local variable. I haven't tried that, preferring the deque concept instead because you don't have to monitor it so closely. If I'm iterating through a table there's going to be a lot of values on the stack, so there's no chance the bottom value will be accessed from above. I've added a bottom peek instruction which works well, but would be a disaster at the top, since FORTH words almost without exception pop from and push to the stack.
- it's very easy to write inscrutable code. To address this, I indent my code following the typical conventions e.g. nested if statements. Plastering the code with comments is standard practice in other languages. Instead I try to use words that make sense when being read from left to right. Done right you end up with statements that intuitively make sense e.g. empty counter ! means put a '' into the counter variable and store it on the heap. Traditional FORTHS have a lot of one and two character words like @ ! . , : ; etc. I've tried to avoid those except for the most common ones that everyone knows. If I'm reimplementing a Python function I use the same name that Python has. I don't eschew comments completely. A longer explanation of what the code does is placed in the first record, although I've found the need for commenting is not that critical because the unit name and database record name combination gives you plenty of clues as to what the code is doing. Despite these things, it is not easy to read FORTH code. You have to be in the "zone" which requires time. C code is much easier to understand when you first see it.
A FORTH written in C would remain a toy without many many man hours, but that's not what I'm doing here. I'm using FORTH as a way to simplify Python down to doing one thing... emitting scripts to std out or to file. Creating those scripts is faster than doing them directly in Python, because FORTH doesn't do any syntax checking (Python's tab/space syntax checking drives me mad!) and FORTH never crashes. It just emits the Python error and remains in it's REPL (Read Evaluate Print Loop). The scripts are typically SQL, HTML, CSS and Javascript, but I could target any tool that is scriptable. LaTEX and Python are likely to be targets in future.