urllib2 file name

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen('http://example.com/somefile.zip')

Is there an easy way to get the file name other then parsing the original URL?

EDIT: changed openfile to urlopen... not sure how that happened.

EDIT2: I ended up using:

filename = url.split('/')[-1].split('#')[0].split('?')[0]

Unless I'm mistaken, this should strip out all potential queries as well.


Asked by: Rafael898 | Posted: 24-09-2021






Answer 1

Did you mean urllib2.urlopen?

You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition'], but as it is I think you'll just have to parse the url.

You could use urlparse.urlsplit, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway:

>>> urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')
>>> urlparse.urlsplit('http://example.com/somedir/somefile.zip')
('http', 'example.com', '/somedir/somefile.zip', '', '')

Might as well just do this:

>>> 'http://example.com/somefile.zip'.split('/')[-1]
'somefile.zip'
>>> 'http://example.com/somedir/somefile.zip'.split('/')[-1]
'somefile.zip'

Answered by: Elian280 | Posted: 25-10-2021



Answer 2

If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this:

[user@host]$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.basename("http://example.com/somefile.zip")
'somefile.zip'
>>> os.path.basename("http://example.com/somedir/somefile.zip")
'somefile.zip'
>>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar")
'somefile.zip?foo=bar'

Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.

Answered by: Brianna334 | Posted: 25-10-2021



Answer 3

I think that "the file name" isn't a very well defined concept when it comes to http transfers. The server might (but is not required to) provide one as "content-disposition" header, you can try to get that with remotefile.headers['Content-Disposition']. If this fails, you probably have to parse the URI yourself.

Answered by: John848 | Posted: 25-10-2021



Answer 4

Just saw this I normally do..

filename = url.split("?")[0].split("/")[-1]

Answered by: Thomas925 | Posted: 25-10-2021



Answer 5

Using urlsplit is the safest option:

url = 'http://example.com/somefile.zip'
urlparse.urlsplit(url).path.split('/')[-1]

Answered by: Daisy824 | Posted: 25-10-2021



Answer 6

Do you mean urllib2.urlopen? There is no function called openfile in the urllib2 module.

Anyway, use the urllib2.urlparse functions:

>>> from urllib2 import urlparse
>>> print urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')

Voila.

Answered by: Luke828 | Posted: 25-10-2021



Answer 7

You could also combine both of the two best-rated answers : Using urllib2.urlparse.urlsplit() to get the path part of the URL, and then os.path.basename for the actual file name.

Full code would be :

>>> remotefile=urllib2.urlopen(url)
>>> try:
>>>   filename=remotefile.info()['Content-Disposition']
>>> except KeyError:
>>>   filename=os.path.basename(urllib2.urlparse.urlsplit(url).path)

Answered by: Marcus219 | Posted: 25-10-2021



Answer 8

The os.path.basename function works not only for file paths, but also for urls, so you don't have to manually parse the URL yourself. Also, it's important to note that you should use result.url instead of the original url in order to follow redirect responses:

import os
import urllib2
result = urllib2.urlopen(url)
real_url = urllib2.urlparse.urlparse(result.url)
filename = os.path.basename(real_url.path)

Answered by: Lenny961 | Posted: 25-10-2021



Answer 9

I guess it depends what you mean by parsing. There is no way to get the filename without parsing the URL, i.e. the remote server doesn't give you a filename. However, you don't have to do much yourself, there's the urlparse module:

In [9]: urlparse.urlparse('http://example.com/somefile.zip')
Out[9]: ('http', 'example.com', '/somefile.zip', '', '', '')

Answered by: Thomas816 | Posted: 25-10-2021



Answer 10

not that I know of.

but you can parse it easy enough like this:

url = 'http://example.com/somefile.zip'
print url.split('/')[-1]

Answered by: Cherry181 | Posted: 25-10-2021



Answer 11

using requests, but you can do it easy with urllib(2)

import requests
from urllib import unquote
from urlparse import urlparse

sample = requests.get(url)

if sample.status_code == 200:
    #has_key not work here, and this help avoid problem with names

    if filename == False:

        if 'content-disposition' in sample.headers.keys():
            filename = sample.headers['content-disposition'].split('filename=')[-1].replace('"','').replace(';','')

        else:

            filename = urlparse(sample.url).query.split('/')[-1].split('=')[-1].split('&')[-1]

            if not filename:

                if url.split('/')[-1] != '':
                    filename = sample.url.split('/')[-1].split('=')[-1].split('&')[-1]
                    filename = unquote(filename)

Answered by: Tara195 | Posted: 25-10-2021



Answer 12

You probably can use simple regular expression here. Something like:

In [26]: import re
In [27]: pat = re.compile('.+[\/\?#=]([\w-]+\.[\w-]+(?:\.[\w-]+)?$)')
In [28]: test_set 

['http://www.google.com/a341.tar.gz',
 'http://www.google.com/a341.gz',
 'http://www.google.com/asdasd/aadssd.gz',
 'http://www.google.com/asdasd?aadssd.gz',
 'http://www.google.com/asdasd#blah.gz',
 'http://www.google.com/asdasd?filename=xxxbl.gz']

In [30]: for url in test_set:
   ....:     match = pat.match(url)
   ....:     if match and match.groups():
   ....:         print(match.groups()[0])
   ....:         

a341.tar.gz
a341.gz
aadssd.gz
aadssd.gz
blah.gz
xxxbl.gz

Answered by: Catherine807 | Posted: 25-10-2021



Answer 13

Using PurePosixPath which is not operating system—dependent and handles urls gracefully is the pythonic solution:

>>> from pathlib import PurePosixPath
>>> path = PurePosixPath('http://example.com/somefile.zip')
>>> path.name
'somefile.zip'
>>> path = PurePosixPath('http://example.com/nested/somefile.zip')
>>> path.name
'somefile.zip'

Notice how there is no network traffic here or anything (i.e. those urls don't go anywhere) - just using standard parsing rules.

Answered by: Anna646 | Posted: 25-10-2021



Answer 14

import os,urllib2
resp = urllib2.urlopen('http://www.example.com/index.html')
my_url = resp.geturl()

os.path.split(my_url)[1]

# 'index.html'

This is not openfile, but maybe still helps :)

Answered by: Melanie650 | Posted: 25-10-2021



Similar questions

http - Python urllib2 with keep alive

How can I make a "keep alive" HTTP request using Python's urllib2?


python - HTTPS log in with urllib2

I currently have a little script that downloads a webpage and extracts some data I'm interested in. Nothing fancy. Currently I'm downloading the page like so: import commands command = 'wget --output-document=- --quiet --http-user=USER --http-password=PASSWORD https://www.example.ca/page.aspx' status, text = commands.getstatusoutput(command) Although this works perfectly, I thought...


python - I set a proxy server on urllib2, and then I can't change it

Like the title says, my code basically does this: set proxy, test proxy, do some cool stuff But after the proxy is set the first time, it sticks that way, never changing. This is the failing code: # Pick proxy r = random.randint(0, len(proxies) - 1) proxy = proxies[r] print proxy # Setup proxy l_proxy_support = urllib2.ProxyHandler({"http": "http://{0}:{1}".format(*p...


python - Urllib2 Send Post data through proxy

I have configured a proxy using proxyhandler and sent a request with some POST data: cookiejar = cookielib.CookieJar() proxies = {'http':'http://some-proxy:port/'} opener = urllib2.build_opener(urllib2.ProxyHandler(proxies),urllib2.HTTPCookieProcessor(cookiejar) ) opener.addheaders = [('User-agent', "USER AGENT")] urllib2.install_opener(opener) url = "URL" opener.open(url, urllib.urlencode({"DATA1":"DATA...


python - Send a file with webpy and urllib2

I need to send a file to another server using oauth and webpy. For now I'll ignore the oauth part as sending the file itself is already a challenge. Here's my partial code: class create_video: def POST(self): x = web.input(video_original={}) At this point I want to send the file over the network using urllib2. Note that I also have other parameters to send. UP...


python - Tell urllib2 to use custom DNS

I'd like to tell urllib2.urlopen (or a custom opener) to use 127.0.0.1 (or ::1) to resolve addresses. I wouldn't change my /etc/resolv.conf, however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom na...


python - Where would I import urllib2 for a class?

I have a class that needs access to urllib2, the trivial example for me is: class foo(object): myStringHTML = urllib2.urlopen("http://www.google.com").read() How should I structure my code to include urllib2? In general, I want to store foo in a utility module with a number of other classes, and be able to import foo by itself from the module: from utilpackage import f...


python - How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it: params = (url, urlencode(data),) if data else (url,) req = Request(*params) response = urlopen(req) #check headers, content-length, etc... #parse the response XML with lxml... My first thought was ...


urllib2 - Python and urllib

I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp census site using urllib. What form is the zip file in when I get it and how do I save it? I'm fairly new to Python and don't understand how urllib works. This is my attempt: import urllib, sys zip_file = urllib.u...


python - How can I use a SOCKS 4/5 proxy with urllib2?

How can I use a SOCKS 4/5 proxy with urllib2 to download a web page?


python - How do I send a custom header with urllib2 in a HTTP Request?

I want to send a custom "Accept" header in my request when using urllib2.urlopen(..). How do I do that?


http - Python urllib2 with keep alive

How can I make a "keep alive" HTTP request using Python's urllib2?


python - HTTPS log in with urllib2

I currently have a little script that downloads a webpage and extracts some data I'm interested in. Nothing fancy. Currently I'm downloading the page like so: import commands command = 'wget --output-document=- --quiet --http-user=USER --http-password=PASSWORD https://www.example.ca/page.aspx' status, text = commands.getstatusoutput(command) Although this works perfectly, I thought...


Python urllib2 timeout when using Tor as proxy?

I am using Python's urllib2 with Tor as a proxy to access a website. When I open the site's main page it works fine but when I try to view the login page (not actually log-in but just view it) I get the following error... URLError: <urlopen error (10060, 'Operation timed out')> To counteract this I did the following: import socket socket.setdefaulttimeout(None).


Does urllib2 in Python 2.6.1 support proxy via https

Does urllib2 in Python 2.6.1 support proxy via https? I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml: NOTE Currently urllib2 does not support fetching of https locations through ...


python - urllib2 read to Unicode

I need to store the content of a site that can be in any language. And I need to be able to search the content for a Unicode string. I have tried something like: import urllib2 req = urllib2.urlopen('http://lenta.ru') content = req.read() The content is a byte stream, so I can search it for a Unicode string. I need some way that when I do urlopen and then read...


python - why is urllib2 missing table fields which I can see in the Firefox source?

the html that I am receiving from urllib2 is missing dozens of fields of data that I can see when I view the source of the URL in Firefox. Any advice would be much appreciated. Here is what it looks like: from FireFox view source: # ...<td class=td6>as</td></tr></thead>|ManyFields|<br></div><div id="c1">... from urllib2 return html: ...


python - I set a proxy server on urllib2, and then I can't change it

Like the title says, my code basically does this: set proxy, test proxy, do some cool stuff But after the proxy is set the first time, it sticks that way, never changing. This is the failing code: # Pick proxy r = random.randint(0, len(proxies) - 1) proxy = proxies[r] print proxy # Setup proxy l_proxy_support = urllib2.ProxyHandler({"http": "http://{0}:{1}".format(*p...


python - Urllib2 Send Post data through proxy

I have configured a proxy using proxyhandler and sent a request with some POST data: cookiejar = cookielib.CookieJar() proxies = {'http':'http://some-proxy:port/'} opener = urllib2.build_opener(urllib2.ProxyHandler(proxies),urllib2.HTTPCookieProcessor(cookiejar) ) opener.addheaders = [('User-agent', "USER AGENT")] urllib2.install_opener(opener) url = "URL" opener.open(url, urllib.urlencode({"DATA1":"DATA...


python - Send a file with webpy and urllib2

I need to send a file to another server using oauth and webpy. For now I'll ignore the oauth part as sending the file itself is already a challenge. Here's my partial code: class create_video: def POST(self): x = web.input(video_original={}) At this point I want to send the file over the network using urllib2. Note that I also have other parameters to send. UP...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top