Month: November 2013

Parsing SQL with pyparsing

November 1, 2013

Recently, I was working on a NoSQL database and wanted to expose a SQL interface to it so I can use it just like a RDBMS from my application. Not being much familiar with the python ecosystem libraries, I quickly searched and found this python library called pyparsing.

Now, if you know anything about parsing, you know regex and traditional lex parsers can get complicated very soon. But after playing with pyparsing for a few minutes, I quickly realized pyparsing makes it really easy to write and execute grammars. Pyparsing has a set of good APIs, handles spaces well, makes debugging easy and have a good documentation.

The code below doesn’t cover all the edge-cases and documented grammar of SQL, but it was a good excuse to learn pyparsing anyway; good enough for my usecase.

Install pypasing python module.

pip install pyparsing

Here is my parse_sql.py

from pyparsing import CaselessKeyword, delimitedList, Each, Forward, Group, \
        Optional, Word, alphas,alphanums, nums, oneOf, ZeroOrMore, quotedString, \
        Upcase

keywords = ["select", "from", "where", "group by", "order by", "and", "or"]
[select, _from, where, groupby, orderby, _and, _or] = [ CaselessKeyword(word)
        for word in keywords ]

table = column = Word(alphas)
columns = Group(delimitedList(column))
columnVal = (nums | quotedString)

whereCond = (column + oneOf("= != < > >= <=") + columnVal)
whereExpr = whereCond + ZeroOrMore((_and | _or) + whereCond)

selectStmt = Forward().setName("select statement")
selectStmt << (select +
        ('*' | columns).setResultsName("columns") +
        _from +
        table.setResultsName("table") +
        Optional(where + Group(whereExpr), '').setResultsName("where").setDebug(False) +
        Each([Optional(groupby + columns("groupby"),'').setDebug(False),
            Optional(orderby + columns("orderby"),'').setDebug(False)
            ])
        )

def log(sql, parsed):
    print "##################################################"
    print sql
    print parsed.table
    print parsed.columns
    print parsed.where
    print parsed.groupby
    print parsed.orderby

sqls = [
        """select * from users where username='johnabc'""",
        """SELECT * FROM users WHERE username='johnabc'""",
        """SELECT * FRom users""",
        """SELECT * FRom USERS""",
        """SELECT * FROM users WHERE username='johnabc' or email='johnabc@gmail.com'""",
        """SELECT id, username, email FROM users WHERE username='johnabc' order by email, id""",
        """SELECT id, username, email FROM users WHERE username='johnabc' group by school""",
        """SELECT id, username, email FROM users WHERE username='johnabc' group by city, school order by firstname, lastname"""
        ]

for sql in sqls:
    log(sql, selectStmt.parseString(sql))

To run the script


python parse_sql.py

As soon as I wrote my crappy little version and blogged about it, I actually found simpleSQL.py written by Paul McGuire, the author of pyparsing. Oh well!