HTML parser in Python [closed]
Using the Python Documentation I found the HTML parser but I have no idea which library to import to use it, how do I find this out (bearing in mind it doesn't say on the page).
Asked by: Nicole511 | Posted: 28-01-2022
Answer 1
You probably really want BeautifulSoup, check the link for an example.
But in any case
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> h.feed('<html></html>')
>>> h.get_starttag_text()
'<html>'
>>> h.close()
Answered by: Daniel586 | Posted: 01-03-2022
Answer 2
Try:
import HTMLParser
In Python 3.0, the HTMLParser module has been renamed to html.parser you can check about this here
Python 3.0
import html.parser
Python 2.2 and above
import HTMLParser
Answered by: Rubie584 | Posted: 01-03-2022
Answer 3
I would recommend using Beautiful Soup module instead and it has good documentation.
Answered by: Miller505 | Posted: 01-03-2022Answer 4
You may be interested in lxml. It is a separate package and has C components, but is the fastest. It has also very nice API, allowing you to easily list links in HTML documents, or list forms, sanitize HTML, and more. It also has capabilities to parse not well-formed HTML (it's configurable).
Answered by: Miller560 | Posted: 01-03-2022Answer 5
You should also look at html5lib for Python as it tries to parse HTML in a way that very much resembles what web browsers do, especially when dealing with invalid HTML (which is more than 90% of today's web).
Answered by: Agata672 | Posted: 01-03-2022Answer 6
I don't recommend BeautifulSoup if you want speed. lxml is much, much faster, and you can fall back in lxml's BS soupparser if the default parser doesn't work.
Answered by: Ted863 | Posted: 01-03-2022Answer 7
For real world HTML processing I'd recommend BeautifulSoup. It is great and takes away much of the pain. Installation is easy.
Answered by: Ada472 | Posted: 01-03-2022Answer 8
There's a link to an example on the bottom of (http://docs.python.org/2/library/htmlparser.html) , it just doesn't work with the original python or python3. It has to be python2 as it says on the top.
Answered by: Adelaide301 | Posted: 01-03-2022Similar questions
How to import a python file in python script more than once
Is it possible to import a python file more than once in a python script because i run a loop back to my driver file in a function by using the import command but it only works once? thanks
edit: Resolved myself thanks
Cannot import SQLite with Python 2.6
I'm running Python 2.6 on Unix and when I run the interactive prompt (SQLite is supposed to be preinstalled) I get:
[root@idev htdocs]# python
Python 2.6 (r26:66714, Oct 23 2008, 16:25:34)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite
T...
Time taken by an import in Python
I want to know how much time an import takes for both built-in as well as user defined modules.
How to import python module in a shared folder?
I have some python modules in a shared folder on a Windows machine.
The file is \mtl12366150\test\mymodule.py
os.path.exists tells me this path is valid.
I appended to sys.path the folder \mtl12366150\test (and os.path.exists tells me this path is valid).
When I try to import mymodule I get an error saying the module doesn't exist.
Is there a way to import module that are located in...
import - Python module seeing a full list as empty in another module
I'm working on a pygame project and have the main engine layed out. The problem is I hit a bug that I just can not seem to figure out. What happens is one module can't read a variable from another module.
It's not that the variable can't be read, it just sees an empty list instead of what it really is.
Instead of posting the entire source code I reproduced the bug in two small snippets that hopefully a sk...
import media does not work in Python
I am reading a book about python. There is a sample. The first line is:
import media
I am trying to do the same, but the error like below
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
import media
ImportError: No module named media
I want to know, is the media a default library?
Best Regards,
java - import os to j2me
I am trying to write this code to j2me. Does anyone has any idea how to do this?
Thanks!
import os
if os.path.isfile("c:\\python\\myfolder\\test.txt"):
Puppy Linux - import gtk throws error in Python
I am using Linux version 2.6.24.16. I believe it is using Puppy Linux 4.2. I am actually using Puppy Arcade, which is a specialized branch. Their help file hints that it is 4.2, however.
I am using Python 2.6.4 which I installed through a puppy package released here: http://code.google...
How do I import a third party module in Python?
I've found a third party module which I would like to use. How do I technically import that module?
Particularly, I want to use a module called context_manager. obviously, I cannot just import garlicsim.general_misc.context_managerbecause it won't find
import - Refer to a Module within the Module in Python
I have a directory structure for a module like the following:
- foo
- __init__.py
- gui.py
I use the foo module from other places. Now I want to use something from the foo module in gui.py, but when I try to, I get this:
jsternberg@aquila:~$ python foo/gui.py
Traceback (most recent call last):
File "foo/gui.py", line 3, in <module>
import foo
ImportError:...
Still can't find your answer? Check out these communities...
PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python