Friday, 18 June 2021

Safe Python Eval

Many of us want to use a safe simple scripting language to extend our Internet applications and I have seen many discussions on the Internet on how to implement Python so that it's eval function can't be exploited. The problem is, nobody can be sure that they've covered every scenario, because Python has become complex over time, with multiple ways to achieve the same objective.

My scenario is a typical one. I have a web based database application hosted on a remote server. Now I want it's users to access it's database to produce ad hoc reports, without compromising the database or anything else! I could do the typical thing of providing a screen where users can submit Select queries, but SQL by itself isn't enough to get a report that is useful except in the most simple scenarios. To begin with you can only work with one query result at a time. Reports frequently require multiple query results to be combined. Tables of course can be joined using SQL, but the result usually has lots of redundant data fields that must be omitted. A declarative language like SQL just can't do it alone. It needs to be combined with a procedural language. After all, that's how my web application works. The procedural capability of Python is combined with SQL to produce web pages that make sense to users.

After surveying the scripting languages out there, Python came in at number one, BUT, it has an eval function that can access the file system or carry out other security exploits. Vanilla Python isn't a sensible choice as an extension scripting language.

I have looked at FORTH in the past, ATLAST and Ficl were front runners, but they are traditional FORTHs with weak string manipulating capability. Strong string capability is what I need, because my database queries return as nested Python string lists. I came up with that format years ago before JSON became popular, using Python eval to parse the database query, returning a table that Python can iterate over using it's index functions.

Then I came across FORTH written in Python, a toy project for educational purposes. I realised it wasn't going to be fast, because FORTH is an interpreted language written in an interpreted language! What it did have going for it was very strong string capability and the ability to cherry pick the Python functions required, leaving everything else behind. Suddenly, eval wasn't a problem as I could overlay it with some checks to make sure eval could only eval a list. FORTH is a simple typeless stack based language with the entire environment comprising two stacks and a heap (Python lists), a dictionary (Python dictionary) and a heap pointer (Python integer). FORTH has no syntax! The language is just a stream of white space delimited tokens called words. Traditional FORTH has no error checking and crashes are common place. But having Python underneath FORTH  makes it uncrashable. Sure your FORTH code will frequently bomb, but the FORTH REPL (Read Evaluate Print Loop) just displays the Python error and awaits your next command.

I made some alterations to the Python code to get a FORTH that isn't quite FORTH anymore (as the saying goes 'when you've seen one FORTH - you've seen one FORTH')! Everything became a string, both the stack and the heap able to handle strings or lists of any size in addition to numbers. Actually Python is cleverer than that. Lists can store variables of any type, so my database lists stay as lists (under the hood, they're probably C arrays) which are passed by reference rather than value. Numbers stay as numbers and dates as dates unless they are cast explicitly, something I've brought over from Python. But apart from that the toy project has stayed largely untouched. All it can do is step, branch, loop, create variables, read variables and print to the console. But at just over 300 lines of Python, it's Turing complete and this is all you need to create a report. Already I have written a complex report utilising two tables that cannot be joined, the report output is styled HTML.

Perhaps the most interesting thing I've done is to create a persistant library of Forth definitions. The way I've done that is to store the FORTH environment after loading forth.py (the REPL) and forth.fr (the persistant library) as Python command line arguments. The working code FORTH file is then called manually in the REPL. Each time it is called the FORTH environment is restored to the finish of forth.fr. This means you can call the working code file over and over without having to restart the REPL.

But there's still a problem!  FORTH over Python might be safe, but a user could write a function that never returns. Endless loops are easy to create inadvertently in procedural languages. Code needs to be added to Isectd that checks how long a worker has been busy for. If it has been busy for longer than 20 seconds and has no clients attached then Isectd sends a kill command to the process which will automatically restart after it has been killed.

The working FORTH code is included in PocketClassmaker install file from the downloads area of my website.