How can I translate the following filename to a regular expression in Python?

I am battling regular expressions now as I type.

I would like to determine a pattern for the following example file: b410cv11_test.ext. I want to be able to do a search for files that match the pattern of the example file aforementioned. Where do I start (so lost and confused) and what is the best way of arriving at a solution that best matches the file pattern? Thanks in advance.

Further clarification of question:

I would like the pattern to be as follows: must start with 'b', followed by three digits, followed by 'cv', followed by two digits, then an underscore, followed by 'release', followed by .'ext'


Asked by: Ted196 | Posted: 06-12-2021






Answer 1

Now that you have a human readable description of your file name, it's quite straight forward to translate it into a regular expression (at least in this case ;)

must start with

The caret (^) anchors a regular expression to the beginning of what you want to match, so your re has to start with this symbol.

'b',

Any non-special character in your re will match literally, so you just use "b" for this part: ^b.

followed by [...] digits,

This depends a bit on which flavor of re you use:

The most general way of expressing this is to use brackets ([]). Those mean "match any one of the characters listed within. [ASDF] for example would match either A or S or D or F, [0-9] would match anything between 0 and 9.

Your re library probably has a shortcut for "any digit". In sed and awk you could use [[:digit:]] [sic!], in python and many other languages you can use \d.

So now your re reads ^b\d.

followed by three [...]

The most simple way to express this would be to just repeat the atom three times like this: \d\d\d.

Again your language might provide a shortcut: braces ({}). Sometimes you would have to escape them with a backslash (if you are using sed or awk, read about "extended regular expressions"). They also give you a way to say "at least x, but no more than y occurances of the previous atom": {x,y}.

Now you have: ^b\d{3}

followed by 'cv',

Literal matching again, now we have ^b\d{3}cv

followed by two digits,

We already covered this: ^b\d{3}cv\d{2}.

then an underscore, followed by 'release', followed by .'ext'

Again, this should all match literally, but the dot (.) is a special character. This means you have to escape it with a backslash: ^\d{3}cv\d{2}_release\.ext

Leaving out the backslash would mean that a filename like "b410cv11_test_ext" would also match, which may or may not be a problem for you.

Finally, if you want to guarantee that there is nothing else following ".ext", anchor the re to the end of the thing to match, use the dollar sign ($).

Thus the complete regular expression for your specific problem would be:

^b\d{3}cv\d{2}_release\.ext$

Easy.

Whatever language or library you use, there has to be a reference somewhere in the documentation that will show you what the exact syntax in your case should be. Once you have learned to break down the problem into a suitable description, understanding the more advanced constructs will come to you step by step.

Answered by: Daisy935 | Posted: 07-01-2022



Answer 2

To avoid confusion, read the following, in order.

First, you have the glob module, which handles file name regular expressions just like the Windows and unix shells.

Second, you have the fnmatch module, which just does pattern matching using the unix shell rules.

Third, you have the re module, which is the complete set of regular expressions.

Then ask another, more specific question.

Answered by: Daryl666 | Posted: 07-01-2022



Answer 3

I would like the pattern to be as follows: must start with 'b', followed by three digits, followed by 'cv', followed by two digits, then an underscore, followed by 'release', followed by .'ext'

^b\d{3}cv\d{2}_release\.ext$

Answered by: Lenny213 | Posted: 07-01-2022



Answer 4

Your question is a bit unclear. You say you want a regular expression, but could it be that you want a glob-style pattern you can use with commands like ls? glob expressions and regular expressions are similar in concept but different in practice (regular expressions are considerably more powerful, glob style patterns are easier for the most common cases when looking for files.

Also, what do you consider to be the pattern? Certainly, * (glob) or .* (regex) will match the pattern. Also, _test.ext (glob) or ._test.ext (regexp) pattern would match, as would many other variations.

Can you be more specific about the pattern? For example, you might describe it as "b, followed by digits, followed by cv, followed by digits ..."

Once you can precisely explain the pattern in your native language (and that must be your first step), it's usually a fairly straight-forward task to translate that into a glob or regular expression pattern.

Answered by: Brooke392 | Posted: 07-01-2022



Answer 5

if the letters are unimportant, you could try \w\d\d\d\w\w\d\d_test.ext which would match the letter/number pattern, or b\d\d\dcv\d\d_test.ext or some mix of the two.

Answered by: Emily958 | Posted: 07-01-2022



Answer 6

When working with regexes I find the Mochikit regex example to be a great help.

/^b\d\d\dcv\d\d_test\.ext$/

Then use the python re (regex) module to do the match. This is of course assuming regex is really what you need and not glob as the others mentioned.

Answered by: Chester344 | Posted: 07-01-2022



Similar questions

regex - Translate a Python regular expression to Perl


regex - Translate regular expression to python

I need to translate this regular expression to the python language: (([[:alpha:]]|\.)*/PERSON([[:space:]]|$))+ I have a .txt file that contains names of people with the /PERSON tag and other words that do not have tags. Leo/PERSON Messi/PERSON hello Once you make the equivalent program in python to the regular expression above, the output must be this (all...


python - How to translate this generator function to lambda expression

def f(nums): sum = 0 for i in nums: sum += i yield sum I was trying to initiate a new list which every index's value is previous accumulation, according to args nums(type list), using list comprehension. the final result would look like[i for i in f(nums)] Is there ways to translate the function to lambda expression? or any other ones to make it...


python - how to translate variable logical expression to pyspark filter

I am looking for a way to transform user input of logical expression into a filter to apply on a data set in pyspark. eg. from pyspark.sql import SparkSession from pyspark.sql.types import * spark = SparkSession.builder.enableHiveSupport().getOrCreate() cSchema = StructType( [ StructField("object_id", StringType()), StructField("object_parts", ArrayType(StringType())) ] ) ...


python - Regular expression to match start of filename and filename extension

What is the regular expression to match strings (in this case, file names) that start with 'Run' and have a filename extension of '.py'? The regular expression should match any of the following: RunFoo.py RunBar.py Run42.py It should not match: myRunFoo.py RunBar.py1 Run42.txt The SQL equivalent of what I am looking for is ... LIKE 'Run%.py' ...


regex - python regular expression to split paragraphs

How would one write a regular expression to use in python to split paragraphs? A paragraph is defined by 2 linebreaks (\n). But one can have any amount of spaces/tabs together with the line breaks, and it still should be considered as a paragraph. I am using python so the solution can use python's regular expression syntax whi...


python - Problem with Boolean Expression with a string value from a lIst

I have the following problem: # line is a line from a file that contains ["baa","beee","0"] line = TcsLine.split(",") NumPFCs = eval(line[2]) if NumPFCs==0: print line I want to print all the lines from the file if the second position of the list has a value == 0. I print the lines but after that the following happens: Traceback (most recent call last): ['baaa'...


python - split twice in the same expression?

Imagine I have the following: inFile = "/adda/adas/sdas/hello.txt" # that instruction give me hello.txt Name = inFile.name.split("/") [-1] # that one give me the name I want - just hello Name1 = Name.split(".") [0] Is there any chance to simplify that doing the same job in just one expression?


python - Regular expression to extract URL from an HTML link

I’m a newbie in Python. I’m learning regexes, but I need help here. Here comes the HTML source: <a href="http://www.ptop.se" target="_blank">http://www.ptop.se</a> I’m trying to code a tool that only prints out http://ptop.se. Can you help me please?


python - What is the regular expression for the "root" of a website in django?

I'm using django and when users go to www.website.com/ I want to point them to the index view. Right now I'm doing this: (r'^$', 'ideas.idea.views.index'), However, it's not working. I'm assuming my regular expression is wrong. Can anyone help me out? I've looked at python regular expressions but they didn't help me.


regex - Python Regular Expression to add links to urls

I'm trying to make a regular expression that will correctly capture URLs, including ones that are wrapped in parenthesis as in (http://example.com) and spoken about on coding horror at https://blog.codinghorror.com/the-problem-with-urls/ I'm currently using the foll...


python - Regular expression to detect semi-colon terminated C++ for & while loops

In my Python application, I need to write a regular expression that matches a C++ for or while loop that has been terminated with a semi-colon (;). For example, it should match this: for (int i = 0; i < 10; i++); ... but not this: for (int i = 0; i < 10; i++) This looks trivial at first glance, until you realise...


regex - How do i write a regular expression for the following pattern in python?

How do i look for the following pattern using regular expression in python? for the two cases Am looking for str2 after the "=" sign Case 1: str1=str2 Case 2: str1 = str2 please note there can be a space or none between the either side of the "=" sign Mine is like this, but only works for one of the cases! m=re...


regex - Why is the regular expression returning an error in python?

Am trying the following regular expression in python but it returns an error import re ... #read a line from a file to variable line # loking for the pattern 'WORD' in the line ... m=re.search('(?<=[WORD])\w+',str(line)) m.group(0) i get the following error: AttributeError: 'NoneType' object has no attribute 'group'






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top