Regular expressions but for writing in the match

When using regular expressions we generally, if not always use them to extract some kind of information. What I need is to replace the match value with some other value...

Right now I'm doing this...

def getExpandedText(pattern, text, replaceValue):
    """
        One liner... really ugly but it's only used in here.
    """

    return text.replace(text[text.find(re.findall(pattern, text)[0]):], replaceValue) + \
            text[text.find(re.findall(pattern, text)[0]) + len(replaceValue):]

so if I do sth like

>>> getExpandedText("aaa(...)bbb", "hola aaaiiibbb como estas?", "ooo")
'hola aaaooobbb como estas?'

It changes the (...) with 'ooo'.

Do you guys know whether with python regular expressions we can do this?

thanks a lot guys!!


Asked by: Sophia802 | Posted: 06-10-2021






Answer 1

sub (replacement, string[, count = 0])

sub returns the string obtained by replacing the leftmost non-overlapping occurrences of the RE in string by the replacement replacement. If the pattern isn't found, string is returned unchanged.

    p = re.compile( '(blue|white|red)')
    >>> p.sub( 'colour', 'blue socks and red shoes')
    'colour socks and colour shoes'
    >>> p.sub( 'colour', 'blue socks and red shoes', count=1)
    'colour socks and red shoes'

Answered by: Kimberly729 | Posted: 07-11-2021



Answer 2

You want to use re.sub:

>>> import re
>>> re.sub(r'aaa...bbb', 'aaaooobbb', "hola aaaiiibbb como estas?")
'hola aaaooobbb como estas?'

To re-use variable parts from the pattern, use \g<n> in the replacement string to access the n-th () group:

>>> re.sub( "(svcOrdNbr +)..", "\g<1>XX", "svcOrdNbr               IASZ0080")
'svcOrdNbr               XXSZ0080'

Answered by: Alina418 | Posted: 07-11-2021



Answer 3

Of course. See the 'sub' and 'subn' methods of compiled regular expressions, or the 're.sub' and 're.subn' functions. You can either make it replace the matches with a string argument you give, or you can pass a callable (such as a function) which will be called to supply the replacement. See https://docs.python.org/library/re.html

Answered by: Daniel966 | Posted: 07-11-2021



Answer 4

If you want to continue using the syntax you mentioned (replace the match value instead of replacing the part that didn't match), and considering you will only have one group, you could use the code below.

def getExpandedText(pattern, text, replaceValue):
    m = re.search(pattern, text)
    expandedText = text[:m.start(1)] + replaceValue + text[m.end(1):]
    return expandedText

Answered by: Lily316 | Posted: 07-11-2021



Answer 5

def getExpandedText(pattern,text,*group):
    r""" Searches for pattern in the text and replaces
    all captures with the values in group.

    Tag renaming:
    >>> html = '<div> abc <span id="x"> def </span> ghi </div>'
    >>> getExpandedText(r'</?(span\b)[^>]*>', html, 'div')
    '<div> abc <div id="x"> def </div> ghi </div>'

    Nested groups, capture-references:
    >>> getExpandedText(r'A(.*?Z(.*?))B', "abAcdZefBgh", r'<\2>')
    'abA<ef>Bgh'
    """
    pattern = re.compile(pattern)
    ret = []
    last = 0
    for m in pattern.finditer(text):
        for i in xrange(0,len(m.groups())):
            start,end = m.span(i+1)

            # nested or skipped group
            if start < last or group[i] is None:
                continue

            # text between the previous and current match
            if last < start:
                ret.append(text[last:start])

            last = end
            ret.append(m.expand(group[i]))

    ret.append(text[last:])
    return ''.join(ret)

Edit: Allow capture-references in the replacement strings.

Answered by: Ada571 | Posted: 07-11-2021



Similar questions

python - Regular Expressions in unicode strings

I have some unicode text that I want to clean up using regular expressions. For example I have cases where u'(2'. This exists because for formatting reasons the closing paren ends up in an adjacent html cell. My initial solution to this problem was to look ahead at the contents of the next cell and using a string function determine if it held the closing paren. I knew this was not a great solution but it worked. Now I...


regex - Python regular expressions with more than 100 groups?

Is there any way to beat the 100-group limit for regular expressions in Python? Also, could someone explain why there is a limit.


regex - Speed of many regular expressions in python

I'm writing a python program that deals with a fair amount of strings/files. My problem is that I'm going to be presented with a fairly short piece of text, and I'm going to need to search it for instances of a fairly broad range of words/phrases. I'm thinking I'll need to compile regular expressions as a way of matching these words/phrases in the text. My concern, however, is that this will take a lot of time....


c++ - How can you detect if two regular expressions overlap in the strings they can match?

I have a container of regular expressions. I'd like to analyze them to determine if it's possible to generate a string that matches more than 1 of them. Short of writing my own regex engine with this use case in mind, is there an easy way in C++ or Python to solve this problem?


scrapy - Need help with the regular expressions in Python

Help please to make from the string like: &lt;a href="http://testsite.com" class="className"&gt;link_text_part1 &lt;em&gt;another_text&lt;/em&gt; link_text_part2&lt;/a&gt; string like: link_text_part1 another_text link_text_part2 using regular expressions in Python !note testsite.com changes


python - How to wrap long lines in a text using Regular Expressions when you also need to indent the wrapped lines?

How can one change the following text The quick brown fox jumps over the lazy dog. to The quick brown fox + jumps over the + lazy dog. using regex? UPDATE1 A solution for Ruby is still missing... A simple one I came to so far is def textwrap text, width, indent="\n" return text.split("\n").collect do |line| l...


strip - How to remove tags from a string in python using regular expressions? (NOT in HTML)

I need to remove tags from a string in python. &lt;FNT name="Century Schoolbook" size="22"&gt;Title&lt;/FNT&gt; What is the most efficient way to remove the entire tag on both ends, leaving only "Title"? I've only seen ways to do this with HTML tags, and that hasn't worked for me in python. I'm using this particularly for ArcMap, a GIS program. It has it's own tags for its layout elements,...


python - How do I use regular expressions to parse HTML tags?

Was wondering how I would extrapolate the value of an html element using a regular expression (in python preferably). For example, &lt;a href="http://google.com"&gt; Hello World! &lt;/a&gt; What regex would I use to extract Hello World! from the above html?


regex - How to change file names using regular expressions in Python?

I have a directory which contains subdirectories which contain files. All the file names have a prefix which I want to eliminate. The prefix is not exactly the same among all the files, but I have a regular expression that represents exactly the language of these prefixes. I'm trying to write a script in Python to change the name of each file to its name without the prefix. I don't know yet how to "play" with files in Pyth...


python - Regular Expressions - testing if a String contains another String

Suppose you have some this String (one line) 10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4" and you want to extract the part between the GET and HTTP (i.e., some url) but only if it contains the word 'puzzle'. How would you do that using...


python - How do I process a string such as this using regular expressions?

How can I create a regex for a string such as this: &lt;SERVER&gt; &lt;SERVERKEY&gt; &lt;COMMAND&gt; &lt;FOLDERPATH&gt; &lt;RETENTION&gt; &lt;TRANSFERMODE&gt; &lt;OUTPUTPATH&gt; &lt;LOGTO&gt; &lt;OPTIONAL-MAXSIZE&gt; &lt;OPTIONAL-OFFSET&gt; Most of these fields are just simple words, but some of them can be paths, such as FOLDERPATH, OUTPUTPATH, these paths can also be paths with a filenam...


python - Regular Expressions in unicode strings

I have some unicode text that I want to clean up using regular expressions. For example I have cases where u'(2'. This exists because for formatting reasons the closing paren ends up in an adjacent html cell. My initial solution to this problem was to look ahead at the contents of the next cell and using a string function determine if it held the closing paren. I knew this was not a great solution but it worked. Now I...


regex - Python regular expressions - how to capture multiple groups from a wildcard expression?

I have a Python regular expression that contains a group which can occur zero or many times - but when I retrieve the list of groups afterwards, only the last one is present. Example: re.search("(\w)*", "abcdefg").groups() this returns the list ('g',) I need it to return ('a','b','c','d','e','f','g',) Is that possible? How can I do it?


python - Matching a pair of comments in HTML using regular expressions

I have a mako template that looks something like this: % if staff: &lt;!-- begin staff --&gt; ... &lt;!-- end staff --&gt; % endif That way if I pass the staff variable as being True, those comments should appear. I'm trying to test this by using a regular expression that looks like this: re.search('&lt;!-- begin staff --&gt;.*&lt;!-- end staff --&gt;', text)


regex - Parsing template schema with Python and Regular Expressions

I'm working on a script for work to extract data from an old template engine schema: [%price%] { $54.99 } [%/price%] [%model%] { WRT54G } [%/model%] [%brand%]{ LINKSYS } [%/brand%] everything within the [% %] is the key, and everything in the { } is the value. Using Python and regex, I was able to get this far: (?&lt;=[%)(?P\w*?)(?=\%]) which returns ['price', 'model', 'brand'] ...


regex - Python regular expressions with more than 100 groups?

Is there any way to beat the 100-group limit for regular expressions in Python? Also, could someone explain why there is a limit.


python - Matching blank lines with regular expressions

I've got a string that I'm trying to split into chunks based on blank lines. Given a string s, I thought I could do this: re.split('(?m)^\s*$', s) This works in some cases: &gt;&gt;&gt; s = 'foo\nbar\n \nbaz' &gt;&gt;&gt; re.split('(?m)^\s*$', s) ['foo\nbar\n', '\nbaz'] But it doesn't work if the line is completely empty:


object - Evaluation of boolean expressions in Python

What truth value do objects evaluate to in Python? Related Questions Boolean Value of Objects in Python: Discussion about overriding the way it is evaluated


Nesting generator expressions in the argument list for a python function call

I like to use the following idiom for combining lists together, sometimes: &gt;&gt;&gt; list(itertools.chain(*[[(e, n) for e in l] for n, l in (('a', [1,2]),('b',[3,4]))])) [(1, 'a'), (2, 'a'), (3, 'b'), (4, 'b')] (I know there are easier ways to get this particular result, but it comes comes in handy when you want to iterate over the elements in lists of lists of lists, or something like ...


python - Renaming contents of text file using Regular Expressions

I have a text file with several lines in the following format: gatename #outputs #inputs list_of_inputs_separated_by_spaces * gate_id example: nand 3 2 10 11 * G0 (The two inputs to the nand gate are 10 and 11) or 2 1 10 * G1 (The only input to the or gate is gate 10) What I need to do is rename the contents such that I eliminate the #outputs column so that the end result is:






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top