Help me find an appropriate ruby/python parser generator

The first parser generator I've worked with was Parse::RecDescent, and the guides/tutorials available for it were great, but the most useful feature it has was it's debugging tools, specifically the tracing capabilities ( activated by setting $RD_TRACE to 1 ). I am looking for a parser generator that can help you debug it's rules.

The thing is, it has to be written in python or in ruby, and have a verbose mode/trace mode or very helpful debugging techniques.

Does anyone know such a parser generator ?

EDIT: when I said debugging, I wasn't referring to debugging python or ruby. I was referring to debugging the parser generator, see what it's doing at every step, see every char it's reading, rules it's trying to match. Hope you get the point.

BOUNTY EDIT: to win the bounty, please show a parser generator framework, and illustrate some of it's debugging features. I repeat, I'm not interested in pdb, but in parser's debugging framework. Also, please don't mention treetop. I'm not interested in it.


Asked by: Carlos808 | Posted: 30-11-2021






Answer 1

Python is a pretty easy language to debug. You can just do import pdb pdb.settrace().

However, these parser generators supposedly come with good debugging facilities.

http://www.antlr.org/

http://www.dabeaz.com/ply/

http://pyparsing.wikispaces.com/

In response to bounty

Here is PLY debugging in action.

Source Code

tokens = (
    'NAME','NUMBER',
    )

literals = ['=','+','-','*','/', '(',')']

# Tokens

t_NAME    = r'[a-zA-Z_][a-zA-Z0-9_]*'

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

t_ignore = " \t"

def t_newline(t):
    r'\n+'
    t.lexer.lineno += t.value.count("\n")

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

# Build the lexer
import ply.lex as lex
lex.lex(debug=1)

# Parsing rules

precedence = (
    ('left','+','-'),
    ('left','*','/'),
    ('right','UMINUS'),
    )

# dictionary of names
names = { }

def p_statement_assign(p):
    'statement : NAME "=" expression'
    names[p[1]] = p[3]

def p_statement_expr(p):
    'statement : expression'
    print(p[1])

def p_expression_binop(p):
    '''expression : expression '+' expression
                  | expression '-' expression
                  | expression '*' expression
                  | expression '/' expression'''
    if p[2] == '+'  : p[0] = p[1] + p[3]
    elif p[2] == '-': p[0] = p[1] - p[3]
    elif p[2] == '*': p[0] = p[1] * p[3]
    elif p[2] == '/': p[0] = p[1] / p[3]

def p_expression_uminus(p):
    "expression : '-' expression %prec UMINUS"
    p[0] = -p[2]

def p_expression_group(p):
    "expression : '(' expression ')'"
    p[0] = p[2]

def p_expression_number(p):
    "expression : NUMBER"
    p[0] = p[1]

def p_expression_name(p):
    "expression : NAME"
    try:
        p[0] = names[p[1]]
    except LookupError:
        print("Undefined name '%s'" % p[1])
        p[0] = 0

def p_error(p):
    if p:
        print("Syntax error at '%s'" % p.value)
    else:
        print("Syntax error at EOF")

import ply.yacc as yacc
yacc.yacc()

import logging
logging.basicConfig(
    level=logging.INFO,
    filename="parselog.txt"
)

while 1:
    try:
        s = raw_input('calc > ')
    except EOFError:
        break
    if not s: continue
    yacc.parse(s, debug=1)

Output

lex: tokens   = ('NAME', 'NUMBER')
lex: literals = ['=', '+', '-', '*', '/', '(', ')']
lex: states   = {'INITIAL': 'inclusive'}
lex: Adding rule t_NUMBER -> '\d+' (state 'INITIAL')
lex: Adding rule t_newline -> '\n+' (state 'INITIAL')
lex: Adding rule t_NAME -> '[a-zA-Z_][a-zA-Z0-9_]*' (state 'INITIAL')
lex: ==== MASTER REGEXS FOLLOW ====
lex: state 'INITIAL' : regex[0] = '(?P<t_NUMBER>\d+)|(?P<t_newline>\n+)|(?P<t_NAME>[a-zA-Z
_][a-zA-Z0-9_]*)'
calc > 2+3
PLY: PARSE DEBUG START

State  : 0
Stack  : . LexToken(NUMBER,2,1,0)
Action : Shift and goto state 3

State  : 3
Stack  : NUMBER . LexToken(+,'+',1,1)
Action : Reduce rule [expression -> NUMBER] with [2] and goto state 9
Result : <int @ 0x1a1896c> (2)

State  : 6
Stack  : expression . LexToken(+,'+',1,1)
Action : Shift and goto state 12

State  : 12
Stack  : expression + . LexToken(NUMBER,3,1,2)
Action : Shift and goto state 3

State  : 3
Stack  : expression + NUMBER . $end
Action : Reduce rule [expression -> NUMBER] with [3] and goto state 9
Result : <int @ 0x1a18960> (3)

State  : 18
Stack  : expression + expression . $end
Action : Reduce rule [expression -> expression + expression] with [2,'+',3] and goto state
 3
Result : <int @ 0x1a18948> (5)

State  : 6
Stack  : expression . $end
Action : Reduce rule [statement -> expression] with [5] and goto state 2
5
Result : <NoneType @ 0x1e1ccef4> (None)

State  : 4
Stack  : statement . $end
Done   : Returning <NoneType @ 0x1e1ccef4> (None)
PLY: PARSE DEBUG END
calc >

Parse Table generated at parser.out

Created by PLY version 3.2 (http://www.dabeaz.com/ply)

Grammar

Rule 0     S' -> statement
Rule 1     statement -> NAME = expression
Rule 2     statement -> expression
Rule 3     expression -> expression + expression
Rule 4     expression -> expression - expression
Rule 5     expression -> expression * expression
Rule 6     expression -> expression / expression
Rule 7     expression -> - expression
Rule 8     expression -> ( expression )
Rule 9     expression -> NUMBER
Rule 10    expression -> NAME

Terminals, with rules where they appear

(                    : 8
)                    : 8
*                    : 5
+                    : 3
-                    : 4 7
/                    : 6
=                    : 1
NAME                 : 1 10
NUMBER               : 9
error                : 

Nonterminals, with rules where they appear

expression           : 1 2 3 3 4 4 5 5 6 6 7 8
statement            : 0

Parsing method: LALR

state 0

    (0) S' -> . statement
    (1) statement -> . NAME = expression
    (2) statement -> . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    NAME            shift and go to state 1
    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3

    expression                     shift and go to state 6
    statement                      shift and go to state 4

state 1

    (1) statement -> NAME . = expression
    (10) expression -> NAME .

    =               shift and go to state 7
    +               reduce using rule 10 (expression -> NAME .)
    -               reduce using rule 10 (expression -> NAME .)
    *               reduce using rule 10 (expression -> NAME .)
    /               reduce using rule 10 (expression -> NAME .)
    $end            reduce using rule 10 (expression -> NAME .)


state 2

    (7) expression -> - . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 9

state 3

    (9) expression -> NUMBER .

    +               reduce using rule 9 (expression -> NUMBER .)
    -               reduce using rule 9 (expression -> NUMBER .)
    *               reduce using rule 9 (expression -> NUMBER .)
    /               reduce using rule 9 (expression -> NUMBER .)
    $end            reduce using rule 9 (expression -> NUMBER .)
    )               reduce using rule 9 (expression -> NUMBER .)


state 4

    (0) S' -> statement .



state 5

    (8) expression -> ( . expression )
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 10

state 6

    (2) statement -> expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    $end            reduce using rule 2 (statement -> expression .)
    +               shift and go to state 12
    -               shift and go to state 11
    *               shift and go to state 13
    /               shift and go to state 14


state 7

    (1) statement -> NAME = . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 15

state 8

    (10) expression -> NAME .

    +               reduce using rule 10 (expression -> NAME .)
    -               reduce using rule 10 (expression -> NAME .)
    *               reduce using rule 10 (expression -> NAME .)
    /               reduce using rule 10 (expression -> NAME .)
    $end            reduce using rule 10 (expression -> NAME .)
    )               reduce using rule 10 (expression -> NAME .)


state 9

    (7) expression -> - expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 7 (expression -> - expression .)
    -               reduce using rule 7 (expression -> - expression .)
    *               reduce using rule 7 (expression -> - expression .)
    /               reduce using rule 7 (expression -> - expression .)
    $end            reduce using rule 7 (expression -> - expression .)
    )               reduce using rule 7 (expression -> - expression .)

  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]
  ! *               [ shift and go to state 13 ]
  ! /               [ shift and go to state 14 ]


state 10

    (8) expression -> ( expression . )
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    )               shift and go to state 16
    +               shift and go to state 12
    -               shift and go to state 11
    *               shift and go to state 13
    /               shift and go to state 14


state 11

    (4) expression -> expression - . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 17

state 12

    (3) expression -> expression + . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 18

state 13

    (5) expression -> expression * . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 19

state 14

    (6) expression -> expression / . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 20

state 15

    (1) statement -> NAME = expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    $end            reduce using rule 1 (statement -> NAME = expression .)
    +               shift and go to state 12
    -               shift and go to state 11
    *               shift and go to state 13
    /               shift and go to state 14


state 16

    (8) expression -> ( expression ) .

    +               reduce using rule 8 (expression -> ( expression ) .)
    -               reduce using rule 8 (expression -> ( expression ) .)
    *               reduce using rule 8 (expression -> ( expression ) .)
    /               reduce using rule 8 (expression -> ( expression ) .)
    $end            reduce using rule 8 (expression -> ( expression ) .)
    )               reduce using rule 8 (expression -> ( expression ) .)


state 17

    (4) expression -> expression - expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 4 (expression -> expression - expression .)
    -               reduce using rule 4 (expression -> expression - expression .)
    $end            reduce using rule 4 (expression -> expression - expression .)
    )               reduce using rule 4 (expression -> expression - expression .)
    *               shift and go to state 13
    /               shift and go to state 14

  ! *               [ reduce using rule 4 (expression -> expression - expression .) ]
  ! /               [ reduce using rule 4 (expression -> expression - expression .) ]
  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]


state 18

    (3) expression -> expression + expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 3 (expression -> expression + expression .)
    -               reduce using rule 3 (expression -> expression + expression .)
    $end            reduce using rule 3 (expression -> expression + expression .)
    )               reduce using rule 3 (expression -> expression + expression .)
    *               shift and go to state 13
    /               shift and go to state 14

  ! *               [ reduce using rule 3 (expression -> expression + expression .) ]
  ! /               [ reduce using rule 3 (expression -> expression + expression .) ]
  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]


state 19

    (5) expression -> expression * expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 5 (expression -> expression * expression .)
    -               reduce using rule 5 (expression -> expression * expression .)
    *               reduce using rule 5 (expression -> expression * expression .)
    /               reduce using rule 5 (expression -> expression * expression .)
    $end            reduce using rule 5 (expression -> expression * expression .)
    )               reduce using rule 5 (expression -> expression * expression .)

  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]
  ! *               [ shift and go to state 13 ]
  ! /               [ shift and go to state 14 ]


state 20

    (6) expression -> expression / expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 6 (expression -> expression / expression .)
    -               reduce using rule 6 (expression -> expression / expression .)
    *               reduce using rule 6 (expression -> expression / expression .)
    /               reduce using rule 6 (expression -> expression / expression .)
    $end            reduce using rule 6 (expression -> expression / expression .)
    )               reduce using rule 6 (expression -> expression / expression .)

  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]
  ! *               [ shift and go to state 13 ]
  ! /               [ shift and go to state 14 ]

Answered by: Melissa644 | Posted: 01-01-2022



Answer 2

I don't know anything about its debugging features, but I've heard good things about PyParsing.

http://pyparsing.wikispaces.com/

Answered by: Sienna996 | Posted: 01-01-2022



Answer 3

I know the bounty has already been claimed, but here is an equivalent parser written in pyparsing (plus support for function calls with zero or more comma-delimted arguments):

from pyparsing import *

LPAR, RPAR = map(Suppress,"()")
EQ = Literal("=")
name = Word(alphas, alphanums+"_").setName("name")
number = Word(nums).setName("number")

expr = Forward()
operand = Optional('-') + (Group(name + LPAR + 
                                  Group(Optional(delimitedList(expr))) +
                                  RPAR) |
                           name | 
                           number | 
                           Group(LPAR + expr + RPAR))
binop = oneOf("+ - * / **")
expr << (Group(operand + OneOrMore(binop + operand)) | operand)

assignment = name + EQ + expr
statement = assignment | expr

This test code runs the parser through its basic paces:

tests = """\
    sin(pi/2)
    y = mx+b
    E = mc ** 2
    F = m*a
    x = x0 + v*t +a*t*t/2
    1 - sqrt(sin(t)**2 + cos(t)**2)""".splitlines()

for t in tests:
    print t.strip()
    print statement.parseString(t).asList()
    print

Gives this output:

sin(pi/2)
[['sin', [['pi', '/', '2']]]]

y = mx+b
['y', '=', ['mx', '+', 'b']]

E = mc ** 2
['E', '=', ['mc', '**', '2']]

F = m*a
['F', '=', ['m', '*', 'a']]

x = x0 + v*t +a*t*t/2
['x', '=', ['x0', '+', 'v', '*', 't', '+', 'a', '*', 't', '*', 't', '/', '2']]

1 - sqrt(sin(t)**2 + cos(t)**2)
[['1', '-', ['sqrt', [[['sin', ['t']], '**', '2', '+', ['cos', ['t']], '**', '2']]]]]

For debugging, we add this code:

# enable debugging for name and number expressions
name.setDebug()
number.setDebug()

And now we reparse the first test (displaying the input string and a simple column ruler):

t = tests[0]
print ("1234567890"*10)[:len(t)]
print t
statement.parseString(t)
print

Giving this output:

1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']

Pyparsing also supports packrat parsing, a sort of parse-time memoization (read more about packratting here). Here is the same parsing sequence, but with packrat enabled:

same parse, but with packrat parsing enabled
1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']

This was an interesting exercise, and helpful to me to see debugging features from other parser libraries.

Answered by: Lydia374 | Posted: 01-01-2022



Answer 4

ANTLR above has the advantage to generate human readable and understandable code, since it is (a very sophisticated and powerful) top-down parser, so you can step through it with a regular debugger and see what it really is doing.

That's why it is my parser generator of choice.

Bottom up parser generators like PLY have the disadvantage that for larger grammars it is almost impossible to understand what the debugging output really means and why the parsing table is like it is.

Answered by: Maddie678 | Posted: 01-01-2022



Answer 5

Python wiki has a list of Language Parsers written in Python.

Answered by: Luke852 | Posted: 01-01-2022



Similar questions

python - Which of these scripting languages is more appropriate for pen-testing?

Closed. This question is opinion-based. It is not c...


Would python be an appropriate choice for a video library for home use software

I am thinking of creating a video library software which keep track of all my videos and keep track of videos that I already haven't watched and stats like this. The stats will be specific to each user using the software. My question is, is python appropriate to create this software or do I need something like c++.


Is Python appropriate for algorithms focused on scientific computing?


python - How do I determine the appropriate check interval?

I'm just starting to work on a tornado application that is having some CPU issues. The CPU time will monotonically grow as time goes by, maxing out the CPU at 100%. The system is currently designed to not block the main thread. If it needs to do something that blocks and asynchronous drivers aren't available, it will spawn another thread to do the blocking operation. Thus we have the main thread being almost tot...


arrays - Most appropriate data structure (Python)

I'm new to Python and have what is probably a very basic question about the 'best' way to store data in my code. Any advice much appreciated! I have a long .csv file in the following format: Scenario,Year,Month,Value 1,1961,1,0.5 1,1961,2,0.7 1,1961,3,0.2 etc. My scenario values run from 1 to 100, year goes from 1961 to 1990 and month goes from 1 to 12. My file therefore has 100*29...


python - Numpy time based vector operations where state of preceding elements matters - are for loops appropriate?

What do numpy arrays provide when performing time based calculations where state matters. In other words, where what has occurred in earlier or later in a sequence is important. Consider the following time based vectors, TIME = np.array([0., 10., 20., 30., 40., 50., 60., 70., 80., 90.]) FLOW = np.array([100., 75., 60., 20.0, 60.0, 50.0, 20.0, 30.0, 20.0, 10.0]) TEMP = np.array([300., 310...


python extend or append a list when appropriate

Is there a simple way to append a list if X is a string, but extend it if X is a list? I know I can simply test if an object is a string or list, but I was wondering if there is a quicker way than this?


python - Finding appropriate cut-off values

I try to implement Hampel tanh estimators to normalize highly asymmetric data. In order to do this, I need to perform the following calculation: Given x - a sorted list of numbers and...


python - What is an appropriate way to datamine the total number of results of a keyword search?

newbie programmer and lurker here, hoping for some sensible advice. :) Using a combination of Python, BeautifulSoup, and the Bing API, I was able to find what I wanted with the following code: import urllib2 from BeautifulSoup import BeautifulStoneSoup Appid = #My Appid query = #My query soup = BeautifulStoneSoup(urllib2.urlopen("http://api.search.live.net/xml.aspx?Appid=" + Appid + "&amp;query=" ...


python - Defining appropriate number of processes

I have a python code treating a lot of apache logs (decompress, parse, crunching numbers, regexping etc). One parent process which takes a list of files (up to few millions), and sends a list of files to parse to workers, using multiprocess pool. I wonder, if there is any guidelines / benchmarks / advices which can help me to estimate ideal number of child process ? Ie. having one process per core...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top