Python and "re"

A tutorial I have on Regex in python explains how to use the re module in python, I wanted to grab the URL out of an A tag so knowing Regex I wrote the correct expression and tested it in my regex testing app of choice and ensured it worked. When placed into python it failed.

After much head scratching I found out the issue, it automatically expects your pattern to be at the start of the string. I have found a fix but I would like to know how to change:

regex = ".*(a_regex_of_pure_awesomeness)"

into

regex = "a_regex_of_pure_awesomeness"

Okay, it's a standard URL regex but I wanted to avoid any potential confusion about what I wanted to get rid of and possibly pretend to be funny.


Asked by: Kevin622 | Posted: 27-01-2022






Answer 1

In Python, there's a distinction between "match" and "search"; match only looks for the pattern at the start of the string, and search looks for the pattern starting at any location within the string.

Python regex docs
Matching vs searching

Answered by: Aida919 | Posted: 28-02-2022



Answer 2

from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup(your_html)
for a in soup.findAll('a', href=True):
    # do something with `a` w/ href attribute
    print a['href']

Answered by: Andrew758 | Posted: 28-02-2022



Answer 3

>>> import re
>>> pattern = re.compile("url")
>>> string = "   url"
>>> pattern.match(string)
>>> pattern.search(string)
<_sre.SRE_Match object at 0xb7f7a6e8>

Answered by: Thomas787 | Posted: 28-02-2022



Answer 4

Are you using the re.match() or re.search() method? My understanding is that re.match() assumes a "^" at the beginning of your expression and will only search at the beginning of the text, while re.search() acts more like the Perl regular expressions and will only match the beginning of the text if you include a "^" at the beginning of your expression. Hope that helps.

Answered by: Kate412 | Posted: 28-02-2022



Similar questions

python - what is the right regex for this?

what is the regex for such a task? --> replace "[[...:" with "[[" That is to say, I want to replace *some text * inside [[...: with [[. The problem with my code is that it remove *text * inside the first [[ ]] &gt;&gt;&gt; string = "Some text here [[dont remove me]] and some extra text [[remove me:and let this]] here." &gt;&gt;&gt; clean = re.sub(r'\[\[.+:', '[[', string) &gt;&gt;&gt; clean...


python - Does this regex do what I think it does?


regex - is this possible in re in python?

In regular expression in python [abc] matches either a or b or c How to do [abc] that matches either ab or bc ? Is this possible ?


python - Regex for X or not Y

I have a long string of text that's broken up by semi-colons, so I have a regex that captures [^\;]+. However, it's bugging because the content contains HTML apostrophes ( &amp;#39; ). How can I write a regex that will capture everything but the semi-colons unless the semi-colon is part of the HTML apostrophe?


python - How to use regex for words

I was trying a simple regex code to match the following: line = 'blah black blacksheep blah' if re.match(r'(\bblack\b)', line): print 'found it! What am I doing wrong, I can't find "black" by itself?


python - Regex | If present then stop

I need a regex that matches everything (any character) in: everything.html everything match .+ until .html or end of string. the .html is optional, but if it is present, stop matching.


Python - Regex, look behind

I want to add a newline after "CREATE TABLE tablename (", so my idea was to match the first occurence of ( and replace it by (\n: Source Text: abcd something CREATE TABLE schema.test1(attribute1 DECIMAL(28, 7) NULL , ATTRIBUTE2 DECIMAL(28, 7) KEY NOT NULL , ATTRIBUTE3 DECIMAL(28, 7) NOT NULL , SET("db_alias_name" = 'TEST') ; efgh something else CREATE TABLE schema.t...


regex - Python re different results

i'm using regex101.com to test for a string in regex. The code i use on the site is this: Regular expression = nv[_](.*)[_] Test string http://www.imdb.com/chart/top?ref_=nv_mv_250_6 In my python when i use the suggested code instead of having mv_250 i have nv_mv_250_ The code suggested by the site is this import re p = re.compile(ur'nv[_](.*)[_]') test_str...


regex - Using | for python re

I have tag id ABxxx, where x are any digits. In this tag, there can be spaces before AB, in between (AB) with unlimited space after, or have AB_. See list below. (AB)_xxx (AB)_xxx AB_xxx (AB) xxx ABxxx I want to get tuple pairs for each tag. I created a re expression to get those. ...


python - Same regex giving different results

This is regarding a regex that tries to identify different parts of an SQL delete statement. Please check the regex here DELETE\s*(?P&lt;table_alias&gt;[a-zA-z_0-9]*)\s*FROM\s*(?P&lt;table_name&gt;[a-zA-z_0-9]+)\s([a-zA-Z0-9_]*)\s*(?P&lt;where_statement&gt;WHERE){0,1}(\s.)*? And this piece of


python - How to do an if or else with RegEx?

I'm learning Python still, and I'm working on a bot for my Discord server. I want to make a command to check the availability of a skin (item) from my list (cat.json). So a friend recommended me to use RegEx to identify the item and it worked as it should. I want to apply rules to this system (if/else) but when I use if/else the command doesn't work right anymore... Here is my code:


Python Regex vs PHP Regex

Not a competition, it is instead me trying to find why a certain regex works in one but not the other. (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) That's my Regex and I'm trying to run it on 127.255.0.0 Using Pythons regex I get nothing, using...


regex - How to parse for tags with '+' in python

I'm getting a "nothing to repeat" error when I try to compile this: search = re.compile(r'([^a-zA-Z0-9])(%s)([^a-zA-Z0-9])' % '+test', re.I) The problem is the '+' sign. How should I handle that?


Slow regex in Python?

I'm trying to match these kinds of strings {@csm.foo.bar} without matching any of these {@csm.foo.bar-@csm.ooga.booga} {@csm.foo.bar-42} The regex I use is r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)\}" It gets dog slow if the string contains multiple matches. Why? It runs very fast if I take away the brace matching, like this


Regex From .NET to Python

I have a regular expression which works perfectly well (although I am sure it is weak) in .NET/C#: ((^|\s))(?&lt;tag&gt;\@(?&lt;tagname&gt;(\w|\+)+))(?($|\s|\.)) I am trying to move it over to Python, but I seem to be running into a formatting issue (invalid expression exception). It is a lame question/request, but I have been staring at this for a while, but nothing obvious is jum...


regex - How can I parse text in Python?

Sample Text: SUBJECT = 'NETHERLANDS MUSIC EPA' CONTENT = 'Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK ' Expected result: " NETHERLANDS MUSIC EPA | 36 before Michael Buble performs in Amsterdam Canadian singer ...


How to use ? and ?: and : in REGEX for Python?

I understand that * = "zero or more" ? = "zero or more" ...what's the difference? Also, ?: &lt;&lt; my book uses this, it says its a "subtlety" but I don't know what exactly these do!


python - In regex, what does [\w*] mean?

What does this regex mean? ^[\w*]$


python - regex in for loop

How do you use a regex with a for loop in Python example data abc 1 xyz 0 abc 2 xyz 1 abc 3 xyz 2 How do you write regex for something like below for i in range(1, 3): re.match(abc +i xyz +(i-1))


regex - How to read this file using Python?

I have a DNA file in the following format: &gt;gi|5524211|gb|AAD44166.1| cytochrome ACCAGAGCGGCACAGCAGCGACATCAGCACTAGCACTAGCATCAGCATCAGCATCAGC CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT ACACCCCCCCCGGTGTGTGTGGGGGGTTAAAAATGATGAGTGATGAGTGAGTTGTGTG CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT TTCTATCATCATTCGGCGGGGGGATATATTATAGCGCGCGATTATTGCGCAGTCTACG TCATCGACTACGATCAGCATCAGCATCAGCA...


python - Regex to Split 1st Colon

I have a time in ISO 8601 ( 2009-11-19T19:55:00 ) which is also paired with a name commence. I'm trying to parse this into two. I'm currently up to here: import re sColon = re.compile('[:]') aString = sColon.split("commence:2009-11-19T19:55:00") Obviously this returns: &...


python - What's the best Django search app?


How can I use a DLL file from Python?

What is the easiest way to use a DLL file from within Python? Specifically, how can this be done without writing any additional wrapper C++ code to expose the functionality to Python? Native Python functionality is strongly preferred over using a third-party library.


python - PubSub lib for c#

Is there a c# library which provides similar functionality to the Python PubSub library? I think it's kind of an Observer Pattern which allows me to subscribe for messages of a given topic instead of using events.


python - What is the best way to copy a list?

This question already has answers here:


python - Possible Google Riddle?

My friend was given this free google website optimizer tshirt and came to me to try and figure out what the front logo meant. t-shirt So, I have a couple of guesses as to what it means, but I was just wondering if there is something more. My first guess is that eac...


How do you check whether a python method is bound or not?

Given a reference to a method, is there a way to check whether the method is bound to an object or not? Can you also access the instance that it's bound to?


ssh - How to scp in Python?

What's the most pythonic way to scp a file in Python? The only route I'm aware of is os.system('scp "%s" "%s:%s"' % (localfile, remotehost, remotefile) ) which is a hack, and which doesn't work outside Linux-like systems, and which needs help from the Pexpect module to avoid password prompts unless you already have passwordless SSH set up to the remote host. I'm aware of Twisted'...


python - How do I create a new signal in pygtk

I've created a python object, but I want to send signals on it. I made it inherit from gobject.GObject, but there doesn't seem to be any way to create a new signal on my object.


python - What do I need to import to gain access to my models?

I'd like to run a script to populate my database. I'd like to access it through the Django database API. The only problem is that I don't know what I would need to import to gain access to this. How can this be achieved?


python - How do I edit and delete data in Django?

I am using django 1.0 and I have created my models using the example in the Django book. I am able to perform the basic function of adding data; now I need a way of retrieving that data, loading it into a form (change_form?! or something), EDIT it and save it back to the DB. Secondly how do I DELETE the data that's in the DB? i.e. search, select and then delete! Please show me an example of the code ...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top