Can't retrieve link from webpage
I am using bs4 to run through a bunch of websites and grab a specific link off each page but I am having an issue grabbing that link.
I have tried getting all the links using.
soup = BeautifulSoup(browser.page_source,"lxml")
print(soup.find_all('a'))
I have tried many other ways including telling it the exact address of one site.
but every time seems to return everything but the link I want.
For context my code goes to pages of this site
https://ce.naco.org/?find=true
These are two of many pages that I am searching for the link in
https://ce.naco.org/?county_info=06019
https://ce.naco.org/?county_info=08045
Under "COUNTY CONTACT" there is a link in most of these pages and that is the link I want to grab but I just can't find a way to make it return only that link it just seems to be invisible to bs4.
I think it has something to do with how the page loads data based on what the user clicks and since bs4 isn't interacting with the site it doesn't load the data??? but this is just a guess.
Asked by: Brad616 | Posted: 30-11-2021
Answer 1
Instead of scraping the page, just use this endpoint to grab the data:
https://ce.naco.org/get/county?fips=06019
Here's how:
import requests
data = requests.get("https://ce.naco.org/get/county?fips=06019").json()
print(f'{data["county"]["Full_Address"]}\n{data["county"]["County_Website"]}')
Output:
2281 Tulare St<br>Hall Of Records<br>Fresno, CA 93721-2105
http://www.co.fresno.ca.us
This works for both county codes
:
import requests
county_codes = ["06019", "08045"]
with requests.Session() as s:
for county_code in county_codes:
data = requests.get(f"https://ce.naco.org/get/county?fips={county_code}").json()
print(f'{data["county"]["Full_Address"]}\n{data["county"]["County_Website"]}')
Output:
2281 Tulare St<br>Hall Of Records<br>Fresno, CA 93721-2105
http://www.co.fresno.ca.us
108 8Th St<br>Glenwood Springs, CO 81601-3355
http://www.garfield-county.com/
Answered by: Emily511 | Posted: 01-01-2022
Similar questions
html - How can I retrieve the page title of a webpage using Python?
How can I retrieve the page title of a webpage (title html tag) using Python?
python - Unable to retrieve code from webpage, because of query string?
html - How can I retrieve the page title of a webpage using Python?
How can I retrieve the page title of a webpage (title html tag) using Python?
python - Unable to retrieve code from webpage, because of query string?
html - How can I retrieve the page title of a webpage using Python?
How can I retrieve the page title of a webpage (title html tag) using Python?
python - How to retrieve an element from a set without removing it?
Suppose the following:
>>> s = set([1, 2, 3])
How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it - something I can only be sure of after an asynchronous call to another host.
Quick and dirty:
>>> elem = s.pop()
>>> s.add(elem)
sql server - Python: Retrieve Image from MSSQL
I'm working on a Python project that retrieves an image from MSSQL. My code is able to retrieve the images successfully but with a fixed size of 63KB. if the image is greater than that size, it just brings the first 63KB from the image!
The following is my code:
#!/usr/bin/python
import _mssql
mssql=_mssql.connect('<ServerIP>','<UserID>','<Password>')
mssql.select_db('<Database...
python - Best way to retrieve variable values from a text file?
Referring on this question, I have a similar -but not the same- problem..
On my way, I'll have some text file, structured like:
var_a: 'home'
var_b: 'car'
var_c: 15.5
And I need that python read the file and then create a variable named var_a with value 'home', and so on.
Example...
python - How to retrieve the selected text from the active window
I am trying to create a simple open source utility for windows using Python that can perform user-defined actions on the selected text of the currently active window. The utility should be activated using a pre-defined keyboard shortcut.
Usage is partially outlined in the following example:
The user selects some text using the mouse or the keyboard (in any application window)
python - How can I retrieve last x elements in Django
I am trying to retrieve the latest 5 posts (by post time)
In the views.py, if I try blog_post_list = blogPosts.objects.all()[:5] It retreives the first 5 elements of the blogPosts objects, how can I reverse this to retreive the latest ones?
Cheers
python - Retrieve module object from stack frame
Given a frame object, I need to get the corresponding module object. In other words, implement callers_module so this works:
import sys
from some_other_module import callers_module
assert sys.modules[__name__] is callers_module()
(That would be equivalent because I can generate a stack trace in the function for this test case. The imports are there simply to make that example complete an...
How do I retrieve Hotmail contacts with python
How can I retrieve contacts from hotmail with python?
Is there any example?
linux - How to retrieve the process start time (or uptime) in python
How to retrieve the process start time (or uptime) in python in Linux?
I only know, I can call "ps -p my_process_id -f" and then parse the output. But it is not cool.
python - Retrieve the two highest item from a list containing 100,000 integers
How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?
Still can't find your answer? Check out these communities...
PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python