HTML parser in Python [closed]

Using the Python Documentation I found the HTML parser but I have no idea which library to import to use it, how do I find this out (bearing in mind it doesn't say on the page).


Asked by: John870 | Posted: 24-09-2021






Answer 1

You probably really want BeautifulSoup, check the link for an example.

But in any case

>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> h.feed('<html></html>')
>>> h.get_starttag_text()
'<html>'
>>> h.close()

Answered by: David215 | Posted: 25-10-2021



Answer 2

Try:

import HTMLParser

In Python 3.0, the HTMLParser module has been renamed to html.parser you can check about this here

Python 3.0

import html.parser

Python 2.2 and above

import HTMLParser

Answered by: Kellan380 | Posted: 25-10-2021



Answer 3

I would recommend using Beautiful Soup module instead and it has good documentation.

Answered by: Freddie801 | Posted: 25-10-2021



Answer 4

You may be interested in lxml. It is a separate package and has C components, but is the fastest. It has also very nice API, allowing you to easily list links in HTML documents, or list forms, sanitize HTML, and more. It also has capabilities to parse not well-formed HTML (it's configurable).

Answered by: Chester160 | Posted: 25-10-2021



Answer 5

You should also look at html5lib for Python as it tries to parse HTML in a way that very much resembles what web browsers do, especially when dealing with invalid HTML (which is more than 90% of today's web).

Answered by: Dexter395 | Posted: 25-10-2021



Answer 6

I don't recommend BeautifulSoup if you want speed. lxml is much, much faster, and you can fall back in lxml's BS soupparser if the default parser doesn't work.

Answered by: Leonardo753 | Posted: 25-10-2021



Answer 7

For real world HTML processing I'd recommend BeautifulSoup. It is great and takes away much of the pain. Installation is easy.

Answered by: Walter302 | Posted: 25-10-2021



Answer 8

There's a link to an example on the bottom of (http://docs.python.org/2/library/htmlparser.html) , it just doesn't work with the original python or python3. It has to be python2 as it says on the top.

Answered by: Miller708 | Posted: 25-10-2021



Similar questions

How to import a python file in python script more than once

Is it possible to import a python file more than once in a python script because i run a loop back to my driver file in a function by using the import command but it only works once? thanks edit: Resolved myself thanks


Cannot import SQLite with Python 2.6

I'm running Python 2.6 on Unix and when I run the interactive prompt (SQLite is supposed to be preinstalled) I get: [root@idev htdocs]# python Python 2.6 (r26:66714, Oct 23 2008, 16:25:34) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. &gt;&gt;&gt; import sqlite T...


Time taken by an import in Python

I want to know how much time an import takes for both built-in as well as user defined modules.


How to import python module in a shared folder?

I have some python modules in a shared folder on a Windows machine. The file is \mtl12366150\test\mymodule.py os.path.exists tells me this path is valid. I appended to sys.path the folder \mtl12366150\test (and os.path.exists tells me this path is valid). When I try to import mymodule I get an error saying the module doesn't exist. Is there a way to import module that are located in...


import - Python module seeing a full list as empty in another module

I'm working on a pygame project and have the main engine layed out. The problem is I hit a bug that I just can not seem to figure out. What happens is one module can't read a variable from another module. It's not that the variable can't be read, it just sees an empty list instead of what it really is. Instead of posting the entire source code I reproduced the bug in two small snippets that hopefully a sk...


import media does not work in Python

I am reading a book about python. There is a sample. The first line is: import media I am trying to do the same, but the error like below Traceback (most recent call last): File "&lt;pyshell#6&gt;", line 1, in &lt;module&gt; import media ImportError: No module named media I want to know, is the media a default library? Best Regards,


java - import os to j2me

I am trying to write this code to j2me. Does anyone has any idea how to do this? Thanks! import os if os.path.isfile("c:\\python\\myfolder\\test.txt"):


Puppy Linux - import gtk throws error in Python

I am using Linux version 2.6.24.16. I believe it is using Puppy Linux 4.2. I am actually using Puppy Arcade, which is a specialized branch. Their help file hints that it is 4.2, however. I am using Python 2.6.4 which I installed through a puppy package released here: http://code.google...


How do I import a third party module in Python?

I've found a third party module which I would like to use. How do I technically import that module? Particularly, I want to use a module called context_manager. obviously, I cannot just import garlicsim.general_misc.context_managerbecause it won't find


import - Refer to a Module within the Module in Python

I have a directory structure for a module like the following: - foo - __init__.py - gui.py I use the foo module from other places. Now I want to use something from the foo module in gui.py, but when I try to, I get this: jsternberg@aquila:~$ python foo/gui.py Traceback (most recent call last): File "foo/gui.py", line 3, in &lt;module&gt; import foo ImportError:...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top