Extracting a parenthesized Python expression from a string

I've been wondering about how hard it would be to write some Python code to search a string for the index of a substring of the form ${expr}, for example, where expr is meant to be a Python expression or something resembling one. Given such a thing, one could easily imagine going on to check the expression's syntax with compile(), evaluate it against a particular scope with eval(), and perhaps even substitute the result into the original string. People must do very similar things all the time.

I could imagine solving such a problem using a third-party parser generator [oof], or by hand-coding some sort of state machine [eek], or perhaps by convincing Python's own parser to do the heavy lifting somehow [hmm]. Maybe there's a third-party templating library somewhere that can be made to do exactly this. Maybe restricting the syntax of expr is likely to be a worthwhile compromise in terms of simplicity or execution time or cutting down on external dependencies -- for example, maybe all I really need is something that matches any expr that has balanced curly braces.

What's your sense?

Update:

Thanks very much for your responses so far! Looking back at what I wrote yesterday, I'm not sure I was sufficiently clear about what I'm asking. Template substitution is indeed an interesting problem, and probably much more useful to many more people than the expression extraction subproblem I'm wondering about, but I brought it up only as a simple example of how the answer to my question might be useful in real life. Some other potential applications might include passing the extracted expressions to a syntax highlighter; passing the result to a real Python parser and looking at or monkeying with the parse tree; or using the sequence of extracted expressions to build up a larger Python program, perhaps in conjunction with some information taken from the surrounding text.

The ${expr} syntax I mentioned is also intended as an example, and in fact I wonder if I shouldn't have used $(expr) as my example instead, because it makes the potential drawbacks of the obvious approach, along the lines of re.finditer(r'$\{([^}]+)\}', s), a bit easier to see. Python expressions can (and often do) contain the ) (or }) character. It seems possible that handling any of those cases might be much more trouble than it's worth, but I'm not convinced of that yet. Please feel free to try to make this case!

Prior to posting this question, I spent quite a bit of time looking at Python template engines hoping that one might expose the sort of low-level functionality I'm asking about -- namely, something that can find expressions in a variety of contexts and tell me where they are rather than being limited to finding expressions embedded using a single hard-coded syntax, always evaluating them, and always substituting the results back into the original string. I haven't figured out how to use any of them to solve my problem yet, but I do very much appreciate the suggestions regarding more to look at (can't believe I missed that wonderful list on the wiki!). The API documentation for these things tends to be pretty high-level, and I'm not too familiar with the internals of any of them, so I'm sure I could use help looking at those and figuring out how to get them to do this sort of thing.

Thanks for your patience!


Asked by: Sam104 | Posted: 28-01-2022






Answer 1

I think what you're asking about is being able to insert Python code into text files to be evaluated. There are several modules that already exist to provide this kind of functionality. You can check the Python.org Templating wiki page for a comprehensive list.

Some google searching also turned up a few other modules you might be interested in:

If you're really looking just into writing this yourself for whatever reason, you can also dig into this Python cookbook solution Yet Another Python Templating Utility (YAPTU) :

"Templating" (copying an input file to output, on the fly inserting Python expressions and statements) is a frequent need, and YAPTU is a small but complete Python module for that; expressions and statements are identified by arbitrary user-chosen regular-expressions.

EDIT: Just for the heck of it, I whipped up a severely simplistic code sample for this. I'm sure it has bugs but it illustrates a simplified version of the concept at least:

#!/usr/bin/env python

import sys
import re

FILE = sys.argv[1]

handle = open(FILE)
fcontent = handle.read()
handle.close()

for myexpr in re.finditer(r'\${([^}]+)}', fcontent, re.M|re.S):
    text = myexpr.group(1)
    try:
        exec text
    except SyntaxError:
        print "ERROR: unable to compile expression '%s'" % (text)

Tested against the following text:

This is some random text, with embedded python like 
${print "foo"} and some bogus python like

${any:thing}.

And a multiline statement, just for kicks: 

${
def multiline_stmt(foo):
  print foo

multiline_stmt("ahem")
}

More text here.

Output:

[user@host]$ ./exec_embedded_python.py test.txt
foo
ERROR: unable to compile expression 'any:thing'
ahem

Answered by: Joyce869 | Posted: 01-03-2022



Answer 2

I think your best bet is to match for all curly braced entries, and then check against Python itself whether or not it's valid Python, for which compiler would be helpful.

Answered by: Melissa696 | Posted: 01-03-2022



Answer 3

If you want to handle arbitrary expressions like {'{spam': 42}["spam}"], you can't get away without full-blown parser.

Answered by: Adelaide601 | Posted: 01-03-2022



Answer 4

After posting this, reading the replies so far (thanks everyone!), and thinking about the problem for a while, here is the best approach I've been able to come up with:

  1. Find the first ${.
  2. Find the next } after that.
  3. Feed whatever's in between to compile(). If it works, stick a fork in it and we're done.
  4. Otherwise, keep extending the string by looking for subsequent occurences of }. As soon as something compiles, return it.
  5. If we run out of } without being able to compile anything, use the results of the last compilation attempt to give information about where the problem lies.

Advantages of this approach:

  • The code is quite short and easy to understand.
  • It's pretty efficient -- optimal, even, in the case where the expression contains no }. Worst-case seems like it wouldn't be too bad either.
  • It works on many expressions that contain ${ and/or }.
  • No external dependencies. No need to import anything, in fact. (This surprised me.)

Disadvantages:

  • Sometimes it grabs too much or too little. See below for an example of the latter. I could imagine a scary example where you have two expressions and the first one is subtly wrong and the algorithm ends up mistakenly grabbing the whole thing and everything in between and returning it as valid, though I haven't been able to demonstrate this. Perhaps things are not so bad as I fear. I don't think misunderstandings can be avoided in general -- the problem definition is kind of slippery -- but it seems like it ought to be possible to do better, especially if one were willing to trade simplicity or execution time.
  • I haven't done any benchmarks, but I could imagine there being faster alternatives, especially in cases that involve lots of } in the expression. That could be a big deal if one wanted to apply this technique to sizable blocks of Python code rather than just very short expressions.

Here is my implementation.

def findExpr(s, i0=0, begin='${', end='}', compArgs=('<string>', 'eval')):
  assert '\n' not in s, 'line numbers not implemented'
  i0 = s.index(begin, i0) + len(begin)
  i1 = s.index(end, i0)
  code = errMsg = None
  while code is None and errMsg is None:
    expr = s[i0:i1]
    try: code = compile(expr, *compArgs)
    except SyntaxError, e:
      i1 = s.find(end, i1 + 1)
      if i1 < 0: errMsg, i1 = e.msg, i0 + e.offset
  return i0, i1, code, errMsg

And here's the docstring with some illustrations in doctest format, which I didn't insert into the middle of the function above only because it's long and I feel like the code is easier to read without it.

'''
Search s for a (possibly invalid) Python expression bracketed by begin
and end, which default to '${' and '}'.  Return a 4-tuple.

>>> s = 'foo ${a*b + c*d} bar'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(6, 15, 'a*b + c*d', None)
>>> ' '.join('%02x' % ord(byte) for byte in code.co_code)
'65 00 00 65 01 00 14 65 02 00 65 03 00 14 17 53'
>>> code.co_names
('a', 'b', 'c', 'd')
>>> eval(code, {'a': 1, 'b': 2, 'c': 3, 'd': 4})
14
>>> eval(code, {'a': 'a', 'b': 2, 'c': 'c', 'd': 4})
'aacccc'
>>> eval(code, {'a': None})
Traceback (most recent call last):
  ...
NameError: name 'b' is not defined

Expressions containing start and/or end are allowed.

>>> s = '{foo ${{"}": "${"}["}"]} bar}'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(7, 23, '{"}": "${"}["}"]', None)

If the first match is syntactically invalid Python, i0 points to the
start of the match, i1 points to the parse error, code is None and
errMsg contains a message from the compiler.

>>> s = '{foo ${qwerty asdf zxcvbnm!!!} ${7} bar}'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(7, 18, 'qwerty asdf', 'invalid syntax')
>>> print code
None

If a second argument is given, start searching there.

>>> i0, i1, code, errMsg = findExpr(s, i1)
>>> i0, i1, s[i0:i1], errMsg
(33, 34, '7', None)

Raise ValueError if there are no further matches.

>>> i0, i1, code, errMsg = findExpr(s, i1)
Traceback (most recent call last):
  ...
ValueError: substring not found

In ambiguous cases, match the shortest valid expression.  This is not
always ideal behavior.

>>> s = '{foo ${x or {} # return {} instead of None} bar}'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(7, 25, 'x or {} # return {', None)

This implementation must not be used with multi-line strings.  It does
not adjust line number information in the returned code object, and it
does not take the line number into account when computing the offset
of a parse error.

'''

Answered by: Elise758 | Posted: 01-03-2022



Similar questions

parsing parenthesized list in python's imaplib

I am looking for simple way to split parenthesized lists that come out of IMAP responses into Python lists or tuples. I want to go from '(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quoted-printable" 1207 50 NIL NIL NIL NIL))' to (BODYSTRUCTURE, ("text", "plain", ("charset", "ISO-8859-1"), None, None, "quoted-printable", 1207, 50, None, None, None, None)...


regex - Parenthesized repetitions in Python regular expressions

I have the following string (say the variable name is "str") (((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 ...


python - Generator expression must be parenthesized if not sole argument

I'm very new to Python and am trying to install the FuncDesigner package. It gives the following error: Generator expression must be parenthesized if not sole argument and points to the following line: kw = {'skipArrayCast':True} if isComplexArray else {} r = ooPoint((v, x[S.oovar_indexes[i]:S.oovar_indexes[i+1]]) for i, v in enum...


python - "FailedParse: [...] Expecting end of text" when trying to parse parenthesized expressions in grako

In search_query.ebnf, I have the following grammar definition for grako 3.14.0: @@grammar :: SearchQuery start = search_query $; search_query = parenthesized_query | combined_query | search_term; parenthesized_query = '(' search_query ')'; combined_query = search_query binary_operator search_query; binary_operator = '&amp;' | '|'; search_ter...


regex - How do I repeat a parenthesized group n times using a Python regular expression?

I'm trying to get pairs of |&lt;digit&gt;&lt;whitespace&gt; out of a string with many of them. I'm using the regex (\|\d+\s+){2} to do this, i.e.: &gt;&gt;&gt; import re &gt;&gt;&gt; s = '|11 |22 |\n|33 |444 |\n' &gt;&gt;&gt; re.findall('(\|\d+\s+){2}', s) ['|22 ', '|444 '] I expected instead is: ['|11 |22 |', '|33 |444 |'] ...


Python 2.6 Generator expression must be parenthesized if not sole argument

I rewrote the following Python 2.7+ code as follows for Python 2.6. Python 2.7+ options = {key: value for key, value in options.items() if value is not None} Python 2.6 options = dict((key, value) for key, value in options.items() if value is not None) But I am getting the following error SyntaxError: Generator expression must be p...


python - SyntaxError: Generator expression must be parenthesized

I just installed django and after installing that I created a django project and was trying to run django server by command: python manage.py runserver After that I'am getting error as: SyntaxError: Generator expression must be parenthesized


python - parse parenthesized numbers to negative numbers

How can i Parse parenthesized numbers in a list of strings to negative numbers (or strings with negative sign). example input list1= ['abcd','(1,234)','Level-2 (2):','(31)%', 'others','(3,102.2)%'] output ['abcd',-1234,'Level-2 (2):','-31%', 'others','-3102.2%'] strings only with numbers inside parenthesis or numbers with comma/dot inside parenthesis followed by a percen...


python - Retrieve definition for parenthesized abbreviation, based on letter count

I need to retrieve the definition of an acronym based on the number of letters enclosed in parentheses. For the data I'm dealing with, the number of letters in parentheses corresponds to the number of words to retrieve. I know this isn't a reliable method for getting abbreviations, but in my case it will be. For example: String = 'Although family health history (FHH) is commonly accepted as an important risk factor...


python - SyntaxError: Generator expression must be parenthesized in Zipline

i'm install zipline , i follow the website step https://pythonprogramming.net/zipline-local-install-python-programming-for-finance/ when i type import zipline , give me error message , just like &gt;&gt;&gt; import zipline Traceback (most recent call last): File "&lt;stdin&g...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top