How would you implement ant-style patternsets in python to select groups of files?
Ant has a nice way to select groups of files, most handily using ** to indicate a directory tree. E.g.
**/CVS/* # All files immediately under a CVS directory.
mydir/mysubdir/** # All files recursively under mysubdir
More examples can be seen here:
http://ant.apache.org/manual/dirtasks.html
How would you implement this in python, so that you could do something like:
files = get_files("**/CVS/*")
for file in files:
print file
=>
CVS/Repository
mydir/mysubdir/CVS/Entries
mydir/mysubdir/foo/bar/CVS/Entries
Asked by: Brad360 | Posted: 28-01-2022
Answer 1
Sorry, this is quite a long time after your OP. I have just released a Python package which does exactly this - it's called Formic and it's available at the PyPI Cheeseshop. With Formic, your problem is solved with:
import formic
fileset = formic.FileSet(include="**/CVS/*", default_excludes=False)
for file_name in fileset.qualified_files():
print file_name
There is one slight complexity: default_excludes. Formic, just like Ant, excludes CVS directories by default (as for the most part collecting files from them for a build is dangerous), the default answer to the question would result in no files. Setting default_excludes=False disables this behaviour.
Answered by: Alberta460 | Posted: 01-03-2022Answer 2
As soon as you come across a **
, you're going to have to recurse through the whole directory structure, so I think at that point, the easiest method is to iterate through the directory with os.walk, construct a path, and then check if it matches the pattern. You can probably convert to a regex by something like:
def glob_to_regex(pat, dirsep=os.sep):
dirsep = re.escape(dirsep)
print re.escape(pat)
regex = (re.escape(pat).replace("\\*\\*"+dirsep,".*")
.replace("\\*\\*",".*")
.replace("\\*","[^%s]*" % dirsep)
.replace("\\?","[^%s]" % dirsep))
return re.compile(regex+"$")
(Though note that this isn't that fully featured - it doesn't support [a-z]
style glob patterns for instance, though this could probably be added). (The first \*\*/
match is to cover cases like \*\*/CVS
matching ./CVS
, as well as having just \*\*
to match at the tail.)
However, obviously you don't want to recurse through everything below the current dir when not processing a **
pattern, so I think you'll need a two-phase approach. I haven't tried implementing the below, and there are probably a few corner cases, but I think it should work:
Split the pattern on your directory seperator. ie
pat.split('/') -> ['**','CVS','*']
Recurse through the directories, and look at the relevant part of the pattern for this level. ie.
n levels deep -> look at pat[n]
.If
pat[n] == '**'
switch to the above strategy:- Reconstruct the pattern with
dirsep.join(pat[n:])
- Convert to a regex with
glob\_to\_regex()
- Recursively
os.walk
through the current directory, building up the path relative to the level you started at. If the path matches the regex, yield it.
- Reconstruct the pattern with
If pat doesn't match
"**"
, and it is the last element in the pattern, then yield all files/dirs matchingglob.glob(os.path.join(curpath,pat[n]))
If pat doesn't match
"**"
, and it is NOT the last element in the pattern, then for each directory, check if it matches (with glob)pat[n]
. If so, recurse down through it, incrementing depth (so it will look atpat[n+1]
)
Answer 3
os.walk
is your friend. Look at the example in the Python manual
(https://docs.python.org/2/library/os.html#os.walk) and try to build something from that.
To match "**/CVS/*
" against a file name you get, you can do something like this:
def match(pattern, filename):
if pattern.startswith("**"):
return fnmatch.fnmatch(file, pattern[1:])
else:
return fnmatch.fnmatch(file, pattern)
In fnmatch.fnmatch
, "*" matches anything (including slashes).
Answer 4
There's an implementation in the 'waf' build system source code. http://code.google.com/p/waf/source/browse/trunk/waflib/Node.py?r=10755#471 May be this should be wrapped up in a library of its own?
Answered by: Briony687 | Posted: 01-03-2022Answer 5
Yup. Your best bet is, as has already been suggested, to work with 'os.walk'. Or, write wrappers around 'glob' and 'fnmatch' modules, perhaps.
Answered by: David223 | Posted: 01-03-2022Answer 6
os.walk is your best bet for this. I did the example below with .svn because I had that handy, and it worked great:
import re
for (dirpath, dirnames, filenames) in os.walk("."):
if re.search(r'\.svn$', dirpath):
for file in filenames:
print file
Answered by: Carlos525 | Posted: 01-03-2022
Similar questions
Still can't find your answer? Check out these communities...
PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python