Is there a library similar to pyparsing in Java? [closed]
I need to quickly build a parser for a very simplified version of a html-like markup language in Java. In python, I would use pyparsing library to do this. Is there something similar for Java? Please, don't suggest libraries already out there for html parsing, my application is a school assignment which will demonstrate walking a tree of objects and serializing to text using visitor pattern, so I'm not thinking in real world terms here. Basically all I need here is tags, attributes and text nodes.
Asked by: Walter650 | Posted: 28-01-2022
Answer 1
Another good parser generator is ANTLR, that might be what you're looking for.
Answered by: Carlos237 | Posted: 01-03-2022Answer 2
May be overkill for your use, but javacc is an excellent industrial-strength parser generator. I've used this program/library several times, its reliable and worth learning, particularly if you are going to work with languages and compilers. Here's the description of the program from the website listed above:
Answered by: Darcy989 | Posted: 01-03-2022Java Compiler Compiler [tm] (JavaCC [tm]) is the most popular parser generator for use with Java [tm] applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.
Answer 3
A quick search for parser generators in Java yields JParsec. I've never used it - but it's inspired by a Haskell library, so by definition it must be good:-)
Answered by: Thomas475 | Posted: 01-03-2022Answer 4
I like JParsec (which I just discovered thanks to Torsten) because it doesn't generate code... :-) Perhaps less efficient, but enough for small tasks.
I found a similar library, JTopas.
There is a good list of parser (generators or not) at Java Source.
Answered by: Brooke949 | Posted: 01-03-2022Answer 5
There are quite a number choices for stringhandling in java.
Maybe the very basic java.util.Scanner
and java.util.StringTokenizer
Classes are helpfull for you?
Another good choice is maybe the org.apache.commons.lang.text
library.
http://commons.apache.org/lang/apidocs/org/apache/commons/lang/text/package-summary.html
Similar questions
python - pyparsing - load ABNF?
can pyparsing read ABNF from a file instead of having to define it in terms of python objects?
If not, is there something which can do similar (load an ABNF file into a parser object)
python - How do I parse indents and dedents with pyparsing?
Here is a subset of the Python grammar:
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: pass_stmt
pass_stmt: 'pass'
compound_stmt: if_stmt
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
(You can read...
python - Find following tag with pyparsing
I'm using pyparsing to parse HTML. I'm grabbing all embed tags, but in some cases there's an a tag directly following that I also want to grab if it's available.
example:
import pyparsing
target = pyparsing.makeHTMLTags("embed")[0]
target.setParseAction(pyparsing.withAttribute(src=pyparsing.withAttribute.ANY_VALUE))
target.ignore(pyparsing.htmlComment)
result = target.sear...
python - pyparsing question
This code works:
from pyparsing import *
zipRE = "\d{5}(?:[-\s]\d{4})?"
fooRE = "^\!\s+.*"
zipcode = Regex( zipRE )
foo = Regex( fooRE )
query = ( zipcode | foo )
tests = [ "80517", "C6H5OH", "90001-3234", "! sfs" ]
for t in tests:
try:
results = query.parseString( t )
print t,"->", results
except ParseException, pe:
print pe
I'm stuck on two issu...
python - Pyparsing CSV string with random quotes
I have a string like the following:
<118>date=2010-05-09,time=16:41:27,device_id=FE-2KA3F09000049,log_id=0400147717,log_part=00,type=statistics,subtype=n/a,pri=information,session_id=o49CedRc021772,from="prvs=4745cd07e1=example@example.org",mailer="mta",client_name="example.org,[194.177.17.24]",resolved=OK,to="example@example.org",direction="in",message_length=6832079,virus="",disposition="Accept",cla...
python - pyparsing ambiguity
I'm trying to parse some text using PyParser. The problem is that I have names that can contain white spaces. So my input might look like this. First, a list of names:
Joe
bob
Jimmy X
grjiaer-rreaijgr Y
Then, things they do:
Joe A
bob B
Jimmy X C
the problem of course is that a thing they do can be the same as the end of the name:
Jimmy X X...
python - what next after pyparsing?
I have a huge grammar developed for pyparsing as part of a large, pure Python application.
I have reached the limit of performance tweaking and I'm at the point where the diminishing returns make me start to look elsewhere. Yes, I think I know most of the tips and tricks and I've profiled my grammar and my application to dust.
What next?
I hope to find a parser that gives me the same readability, usability...
python - Pyparsing problem with operators
I did a grammar with pyparsing, and I have a problem.
The grammar tries to parse a search query (with operator precedence, parenthesis, etc), and I need for spaces to work like the and operator.
For example, this works fine:
(word and word) or word
But this fails:
(word word) or word
And I want the second query to works like the first one.
...
python - Matching nonempty lines with pyparsing
I am trying to make a small application which uses pyparsing to extract data from files produced by another program.
These files have following format.
SOME_KEYWORD:
line 1
line 2
line 3
line 4
ANOTHER_KEYWORD:
line a
line b
line c
How can i construct grammar which will help to extract line 1, line 2 ... line 4 and line a
python - PyParsing OR statement
This is going to end up being really simple, but I'm trying to match one of the two patterns:
"GET /ligonier-broadcast-media/mp3/rym20110421.mp3 HTTP/1.1"
or
-
I've tried something like this:
key = Word(alphas + nums + "/" + "-" + "_" + "." + "?" + "=" + "%" + "&")
uri = Or("-" | Group(
Suppress("\"") +
...
Still can't find your answer? Check out these communities...
PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python