Is there a library similar to pyparsing in Java? [closed]

I need to quickly build a parser for a very simplified version of a html-like markup language in Java. In python, I would use pyparsing library to do this. Is there something similar for Java? Please, don't suggest libraries already out there for html parsing, my application is a school assignment which will demonstrate walking a tree of objects and serializing to text using visitor pattern, so I'm not thinking in real world terms here. Basically all I need here is tags, attributes and text nodes.

Asked by: Walter650 | Posted: 28-01-2022

Answer 1

Another good parser generator is ANTLR, that might be what you're looking for.

Answered by: Carlos237 | Posted: 01-03-2022

Answer 2

May be overkill for your use, but javacc is an excellent industrial-strength parser generator. I've used this program/library several times, its reliable and worth learning, particularly if you are going to work with languages and compilers. Here's the description of the program from the website listed above:

Java Compiler Compiler [tm] (JavaCC [tm]) is the most popular parser generator for use with Java [tm] applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.

Answered by: Darcy989 | Posted: 01-03-2022

Answer 3

A quick search for parser generators in Java yields JParsec. I've never used it - but it's inspired by a Haskell library, so by definition it must be good:-)

Answered by: Thomas475 | Posted: 01-03-2022

Answer 4

I like JParsec (which I just discovered thanks to Torsten) because it doesn't generate code... :-) Perhaps less efficient, but enough for small tasks.
I found a similar library, JTopas.

There is a good list of parser (generators or not) at Java Source.

Answered by: Brooke949 | Posted: 01-03-2022

Answer 5

There are quite a number choices for stringhandling in java. Maybe the very basic java.util.Scanner and java.util.StringTokenizer Classes are helpfull for you?

Another good choice is maybe the org.apache.commons.lang.text library.

Answered by: Rafael901 | Posted: 01-03-2022

Similar questions

python - pyparsing - load ABNF?

can pyparsing read ABNF from a file instead of having to define it in terms of python objects? If not, is there something which can do similar (load an ABNF file into a parser object)

python - How do I parse indents and dedents with pyparsing?

Here is a subset of the Python grammar: single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE stmt: simple_stmt | compound_stmt simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE small_stmt: pass_stmt pass_stmt: 'pass' compound_stmt: if_stmt if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite] suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT (You can read...

python - Find following tag with pyparsing

I'm using pyparsing to parse HTML. I'm grabbing all embed tags, but in some cases there's an a tag directly following that I also want to grab if it's available. example: import pyparsing target = pyparsing.makeHTMLTags("embed")[0] target.setParseAction(pyparsing.withAttribute(src=pyparsing.withAttribute.ANY_VALUE)) target.ignore(pyparsing.htmlComment) result = target.sear...

python - pyparsing question

This code works: from pyparsing import * zipRE = "\d{5}(?:[-\s]\d{4})?" fooRE = "^\!\s+.*" zipcode = Regex( zipRE ) foo = Regex( fooRE ) query = ( zipcode | foo ) tests = [ "80517", "C6H5OH", "90001-3234", "! sfs" ] for t in tests: try: results = query.parseString( t ) print t,"->", results except ParseException, pe: print pe I'm stuck on two issu...

python - Pyparsing CSV string with random quotes

I have a string like the following: <118>date=2010-05-09,time=16:41:27,device_id=FE-2KA3F09000049,log_id=0400147717,log_part=00,type=statistics,subtype=n/a,pri=information,session_id=o49CedRc021772,from="",mailer="mta",client_name=",[]",resolved=OK,to="",direction="in",message_length=6832079,virus="",disposition="Accept",cla...

python - pyparsing ambiguity

I'm trying to parse some text using PyParser. The problem is that I have names that can contain white spaces. So my input might look like this. First, a list of names: Joe bob Jimmy X grjiaer-rreaijgr Y Then, things they do: Joe A bob B Jimmy X C the problem of course is that a thing they do can be the same as the end of the name: Jimmy X X...

python - what next after pyparsing?

I have a huge grammar developed for pyparsing as part of a large, pure Python application. I have reached the limit of performance tweaking and I'm at the point where the diminishing returns make me start to look elsewhere. Yes, I think I know most of the tips and tricks and I've profiled my grammar and my application to dust. What next? I hope to find a parser that gives me the same readability, usability...

python - Pyparsing problem with operators

I did a grammar with pyparsing, and I have a problem. The grammar tries to parse a search query (with operator precedence, parenthesis, etc), and I need for spaces to work like the and operator. For example, this works fine: (word and word) or word But this fails: (word word) or word And I want the second query to works like the first one. ...

python - Matching nonempty lines with pyparsing

I am trying to make a small application which uses pyparsing to extract data from files produced by another program. These files have following format. SOME_KEYWORD: line 1 line 2 line 3 line 4 ANOTHER_KEYWORD: line a line b line c How can i construct grammar which will help to extract line 1, line 2 ... line 4 and line a

python - PyParsing OR statement

This is going to end up being really simple, but I'm trying to match one of the two patterns: "GET /ligonier-broadcast-media/mp3/rym20110421.mp3 HTTP/1.1" or - I've tried something like this: key = Word(alphas + nums + "/" + "-" + "_" + "." + "?" + "=" + "%" + "&") uri = Or("-" | Group( Suppress("\"") + ...

Still can't find your answer? Check out these communities...

PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python