Help me find an appropriate ruby/python parser generator
The first parser generator I've worked with was Parse::RecDescent, and the guides/tutorials available for it were great, but the most useful feature it has was it's debugging tools, specifically the tracing capabilities ( activated by setting $RD_TRACE to 1 ). I am looking for a parser generator that can help you debug it's rules.
The thing is, it has to be written in python or in ruby, and have a verbose mode/trace mode or very helpful debugging techniques.
Does anyone know such a parser generator ?
EDIT: when I said debugging, I wasn't referring to debugging python or ruby. I was referring to debugging the parser generator, see what it's doing at every step, see every char it's reading, rules it's trying to match. Hope you get the point.
BOUNTY EDIT: to win the bounty, please show a parser generator framework, and illustrate some of it's debugging features. I repeat, I'm not interested in pdb, but in parser's debugging framework. Also, please don't mention treetop. I'm not interested in it.
Asked by: Carlos808 | Posted: 30-11-2021
Answer 1
Python is a pretty easy language to debug. You can just do import pdb pdb.settrace().
However, these parser generators supposedly come with good debugging facilities.
http://pyparsing.wikispaces.com/
In response to bounty
Here is PLY debugging in action.
Source Code
tokens = (
'NAME','NUMBER',
)
literals = ['=','+','-','*','/', '(',')']
# Tokens
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
def t_NUMBER(t):
r'\d+'
t.value = int(t.value)
return t
t_ignore = " \t"
def t_newline(t):
r'\n+'
t.lexer.lineno += t.value.count("\n")
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex(debug=1)
# Parsing rules
precedence = (
('left','+','-'),
('left','*','/'),
('right','UMINUS'),
)
# dictionary of names
names = { }
def p_statement_assign(p):
'statement : NAME "=" expression'
names[p[1]] = p[3]
def p_statement_expr(p):
'statement : expression'
print(p[1])
def p_expression_binop(p):
'''expression : expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression'''
if p[2] == '+' : p[0] = p[1] + p[3]
elif p[2] == '-': p[0] = p[1] - p[3]
elif p[2] == '*': p[0] = p[1] * p[3]
elif p[2] == '/': p[0] = p[1] / p[3]
def p_expression_uminus(p):
"expression : '-' expression %prec UMINUS"
p[0] = -p[2]
def p_expression_group(p):
"expression : '(' expression ')'"
p[0] = p[2]
def p_expression_number(p):
"expression : NUMBER"
p[0] = p[1]
def p_expression_name(p):
"expression : NAME"
try:
p[0] = names[p[1]]
except LookupError:
print("Undefined name '%s'" % p[1])
p[0] = 0
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOF")
import ply.yacc as yacc
yacc.yacc()
import logging
logging.basicConfig(
level=logging.INFO,
filename="parselog.txt"
)
while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
if not s: continue
yacc.parse(s, debug=1)
Output
lex: tokens = ('NAME', 'NUMBER')
lex: literals = ['=', '+', '-', '*', '/', '(', ')']
lex: states = {'INITIAL': 'inclusive'}
lex: Adding rule t_NUMBER -> '\d+' (state 'INITIAL')
lex: Adding rule t_newline -> '\n+' (state 'INITIAL')
lex: Adding rule t_NAME -> '[a-zA-Z_][a-zA-Z0-9_]*' (state 'INITIAL')
lex: ==== MASTER REGEXS FOLLOW ====
lex: state 'INITIAL' : regex[0] = '(?P<t_NUMBER>\d+)|(?P<t_newline>\n+)|(?P<t_NAME>[a-zA-Z
_][a-zA-Z0-9_]*)'
calc > 2+3
PLY: PARSE DEBUG START
State : 0
Stack : . LexToken(NUMBER,2,1,0)
Action : Shift and goto state 3
State : 3
Stack : NUMBER . LexToken(+,'+',1,1)
Action : Reduce rule [expression -> NUMBER] with [2] and goto state 9
Result : <int @ 0x1a1896c> (2)
State : 6
Stack : expression . LexToken(+,'+',1,1)
Action : Shift and goto state 12
State : 12
Stack : expression + . LexToken(NUMBER,3,1,2)
Action : Shift and goto state 3
State : 3
Stack : expression + NUMBER . $end
Action : Reduce rule [expression -> NUMBER] with [3] and goto state 9
Result : <int @ 0x1a18960> (3)
State : 18
Stack : expression + expression . $end
Action : Reduce rule [expression -> expression + expression] with [2,'+',3] and goto state
3
Result : <int @ 0x1a18948> (5)
State : 6
Stack : expression . $end
Action : Reduce rule [statement -> expression] with [5] and goto state 2
5
Result : <NoneType @ 0x1e1ccef4> (None)
State : 4
Stack : statement . $end
Done : Returning <NoneType @ 0x1e1ccef4> (None)
PLY: PARSE DEBUG END
calc >
Parse Table generated at parser.out
Created by PLY version 3.2 (http://www.dabeaz.com/ply)
Grammar
Rule 0 S' -> statement
Rule 1 statement -> NAME = expression
Rule 2 statement -> expression
Rule 3 expression -> expression + expression
Rule 4 expression -> expression - expression
Rule 5 expression -> expression * expression
Rule 6 expression -> expression / expression
Rule 7 expression -> - expression
Rule 8 expression -> ( expression )
Rule 9 expression -> NUMBER
Rule 10 expression -> NAME
Terminals, with rules where they appear
( : 8
) : 8
* : 5
+ : 3
- : 4 7
/ : 6
= : 1
NAME : 1 10
NUMBER : 9
error :
Nonterminals, with rules where they appear
expression : 1 2 3 3 4 4 5 5 6 6 7 8
statement : 0
Parsing method: LALR
state 0
(0) S' -> . statement
(1) statement -> . NAME = expression
(2) statement -> . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
NAME shift and go to state 1
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
expression shift and go to state 6
statement shift and go to state 4
state 1
(1) statement -> NAME . = expression
(10) expression -> NAME .
= shift and go to state 7
+ reduce using rule 10 (expression -> NAME .)
- reduce using rule 10 (expression -> NAME .)
* reduce using rule 10 (expression -> NAME .)
/ reduce using rule 10 (expression -> NAME .)
$end reduce using rule 10 (expression -> NAME .)
state 2
(7) expression -> - . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 9
state 3
(9) expression -> NUMBER .
+ reduce using rule 9 (expression -> NUMBER .)
- reduce using rule 9 (expression -> NUMBER .)
* reduce using rule 9 (expression -> NUMBER .)
/ reduce using rule 9 (expression -> NUMBER .)
$end reduce using rule 9 (expression -> NUMBER .)
) reduce using rule 9 (expression -> NUMBER .)
state 4
(0) S' -> statement .
state 5
(8) expression -> ( . expression )
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 10
state 6
(2) statement -> expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
$end reduce using rule 2 (statement -> expression .)
+ shift and go to state 12
- shift and go to state 11
* shift and go to state 13
/ shift and go to state 14
state 7
(1) statement -> NAME = . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 15
state 8
(10) expression -> NAME .
+ reduce using rule 10 (expression -> NAME .)
- reduce using rule 10 (expression -> NAME .)
* reduce using rule 10 (expression -> NAME .)
/ reduce using rule 10 (expression -> NAME .)
$end reduce using rule 10 (expression -> NAME .)
) reduce using rule 10 (expression -> NAME .)
state 9
(7) expression -> - expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 7 (expression -> - expression .)
- reduce using rule 7 (expression -> - expression .)
* reduce using rule 7 (expression -> - expression .)
/ reduce using rule 7 (expression -> - expression .)
$end reduce using rule 7 (expression -> - expression .)
) reduce using rule 7 (expression -> - expression .)
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
! * [ shift and go to state 13 ]
! / [ shift and go to state 14 ]
state 10
(8) expression -> ( expression . )
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
) shift and go to state 16
+ shift and go to state 12
- shift and go to state 11
* shift and go to state 13
/ shift and go to state 14
state 11
(4) expression -> expression - . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 17
state 12
(3) expression -> expression + . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 18
state 13
(5) expression -> expression * . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 19
state 14
(6) expression -> expression / . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 20
state 15
(1) statement -> NAME = expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
$end reduce using rule 1 (statement -> NAME = expression .)
+ shift and go to state 12
- shift and go to state 11
* shift and go to state 13
/ shift and go to state 14
state 16
(8) expression -> ( expression ) .
+ reduce using rule 8 (expression -> ( expression ) .)
- reduce using rule 8 (expression -> ( expression ) .)
* reduce using rule 8 (expression -> ( expression ) .)
/ reduce using rule 8 (expression -> ( expression ) .)
$end reduce using rule 8 (expression -> ( expression ) .)
) reduce using rule 8 (expression -> ( expression ) .)
state 17
(4) expression -> expression - expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 4 (expression -> expression - expression .)
- reduce using rule 4 (expression -> expression - expression .)
$end reduce using rule 4 (expression -> expression - expression .)
) reduce using rule 4 (expression -> expression - expression .)
* shift and go to state 13
/ shift and go to state 14
! * [ reduce using rule 4 (expression -> expression - expression .) ]
! / [ reduce using rule 4 (expression -> expression - expression .) ]
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
state 18
(3) expression -> expression + expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 3 (expression -> expression + expression .)
- reduce using rule 3 (expression -> expression + expression .)
$end reduce using rule 3 (expression -> expression + expression .)
) reduce using rule 3 (expression -> expression + expression .)
* shift and go to state 13
/ shift and go to state 14
! * [ reduce using rule 3 (expression -> expression + expression .) ]
! / [ reduce using rule 3 (expression -> expression + expression .) ]
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
state 19
(5) expression -> expression * expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 5 (expression -> expression * expression .)
- reduce using rule 5 (expression -> expression * expression .)
* reduce using rule 5 (expression -> expression * expression .)
/ reduce using rule 5 (expression -> expression * expression .)
$end reduce using rule 5 (expression -> expression * expression .)
) reduce using rule 5 (expression -> expression * expression .)
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
! * [ shift and go to state 13 ]
! / [ shift and go to state 14 ]
state 20
(6) expression -> expression / expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 6 (expression -> expression / expression .)
- reduce using rule 6 (expression -> expression / expression .)
* reduce using rule 6 (expression -> expression / expression .)
/ reduce using rule 6 (expression -> expression / expression .)
$end reduce using rule 6 (expression -> expression / expression .)
) reduce using rule 6 (expression -> expression / expression .)
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
! * [ shift and go to state 13 ]
! / [ shift and go to state 14 ]
Answered by: Melissa644 | Posted: 01-01-2022
Answer 2
I don't know anything about its debugging features, but I've heard good things about PyParsing.
http://pyparsing.wikispaces.com/
Answered by: Sienna996 | Posted: 01-01-2022Answer 3
I know the bounty has already been claimed, but here is an equivalent parser written in pyparsing (plus support for function calls with zero or more comma-delimted arguments):
from pyparsing import *
LPAR, RPAR = map(Suppress,"()")
EQ = Literal("=")
name = Word(alphas, alphanums+"_").setName("name")
number = Word(nums).setName("number")
expr = Forward()
operand = Optional('-') + (Group(name + LPAR +
Group(Optional(delimitedList(expr))) +
RPAR) |
name |
number |
Group(LPAR + expr + RPAR))
binop = oneOf("+ - * / **")
expr << (Group(operand + OneOrMore(binop + operand)) | operand)
assignment = name + EQ + expr
statement = assignment | expr
This test code runs the parser through its basic paces:
tests = """\
sin(pi/2)
y = mx+b
E = mc ** 2
F = m*a
x = x0 + v*t +a*t*t/2
1 - sqrt(sin(t)**2 + cos(t)**2)""".splitlines()
for t in tests:
print t.strip()
print statement.parseString(t).asList()
print
Gives this output:
sin(pi/2)
[['sin', [['pi', '/', '2']]]]
y = mx+b
['y', '=', ['mx', '+', 'b']]
E = mc ** 2
['E', '=', ['mc', '**', '2']]
F = m*a
['F', '=', ['m', '*', 'a']]
x = x0 + v*t +a*t*t/2
['x', '=', ['x0', '+', 'v', '*', 't', '+', 'a', '*', 't', '*', 't', '/', '2']]
1 - sqrt(sin(t)**2 + cos(t)**2)
[['1', '-', ['sqrt', [[['sin', ['t']], '**', '2', '+', ['cos', ['t']], '**', '2']]]]]
For debugging, we add this code:
# enable debugging for name and number expressions
name.setDebug()
number.setDebug()
And now we reparse the first test (displaying the input string and a simple column ruler):
t = tests[0]
print ("1234567890"*10)[:len(t)]
print t
statement.parseString(t)
print
Giving this output:
1234567890123
sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Pyparsing also supports packrat parsing, a sort of parse-time memoization (read more about packratting here). Here is the same parsing sequence, but with packrat enabled:
same parse, but with packrat parsing enabled
1234567890123
sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
This was an interesting exercise, and helpful to me to see debugging features from other parser libraries.
Answered by: Lydia374 | Posted: 01-01-2022Answer 4
ANTLR above has the advantage to generate human readable and understandable code, since it is (a very sophisticated and powerful) top-down parser, so you can step through it with a regular debugger and see what it really is doing.
That's why it is my parser generator of choice.
Bottom up parser generators like PLY have the disadvantage that for larger grammars it is almost impossible to understand what the debugging output really means and why the parsing table is like it is.
Answered by: Maddie678 | Posted: 01-01-2022Answer 5
Python wiki has a list of Language Parsers written in Python.
Answered by: Luke852 | Posted: 01-01-2022Similar questions
python - Which of these scripting languages is more appropriate for pen-testing?
Closed. This question is opinion-based. It is not c...
Would python be an appropriate choice for a video library for home use software
I am thinking of creating a video library software which keep track of all my videos and keep track of videos that I already haven't watched and stats like this. The stats will be specific to each user using the software.
My question is, is python appropriate to create this software or do I need something like c++.
Is Python appropriate for algorithms focused on scientific computing?
python - How do I determine the appropriate check interval?
I'm just starting to work on a tornado application that is having some CPU issues. The CPU time will monotonically grow as time goes by, maxing out the CPU at 100%. The system is currently designed to not block the main thread. If it needs to do something that blocks and asynchronous drivers aren't available, it will spawn another thread to do the blocking operation.
Thus we have the main thread being almost tot...
arrays - Most appropriate data structure (Python)
I'm new to Python and have what is probably a very basic question about the 'best' way to store data in my code. Any advice much appreciated!
I have a long .csv file in the following format:
Scenario,Year,Month,Value
1,1961,1,0.5
1,1961,2,0.7
1,1961,3,0.2
etc.
My scenario values run from 1 to 100, year goes from 1961 to 1990 and month goes from 1 to 12. My file therefore has 100*29...
python - Numpy time based vector operations where state of preceding elements matters - are for loops appropriate?
What do numpy arrays provide when performing time based calculations where state matters. In other words, where what has occurred in earlier or later in a sequence is important.
Consider the following time based vectors,
TIME = np.array([0., 10., 20., 30., 40., 50., 60., 70., 80., 90.])
FLOW = np.array([100., 75., 60., 20.0, 60.0, 50.0, 20.0, 30.0, 20.0, 10.0])
TEMP = np.array([300., 310...
python extend or append a list when appropriate
Is there a simple way to append a list if X is a string, but extend it if X is a list? I know I can simply test if an object is a string or list, but I was wondering if there is a quicker way than this?
python - Finding appropriate cut-off values
I try to implement Hampel tanh estimators to normalize highly asymmetric data. In order to do this, I need to perform the following calculation:
Given x - a sorted list of numbers and...
python - What is an appropriate way to datamine the total number of results of a keyword search?
newbie programmer and lurker here, hoping for some sensible advice. :)
Using a combination of Python, BeautifulSoup, and the Bing API, I was able to find what I wanted with the following code:
import urllib2
from BeautifulSoup import BeautifulStoneSoup
Appid = #My Appid
query = #My query
soup = BeautifulStoneSoup(urllib2.urlopen("http://api.search.live.net/xml.aspx?Appid=" + Appid + "&query=" ...
python - Defining appropriate number of processes
I have a python code treating a lot of apache logs (decompress, parse, crunching numbers, regexping etc). One parent process which takes a list of files (up to few millions), and sends a list of files to parse to workers, using multiprocess pool.
I wonder, if there is any guidelines / benchmarks / advices which can help me to estimate ideal number of child process ? Ie. having one process per core...
Still can't find your answer? Check out these communities...
PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python