How to iterate through HTML with BeautifulSoup

May 5, 2020
Day 3 of 101 Days of Python
A question posted on Reddit today involves iterating through html tags to find a specific set of information, and build a dictionary in Python.
I have a basic weather website request as below:
import requests, bs4, lxml url = ('https://forecast.weather.gov/MapClick.php?..') page = requests.get(url) soup = bs4.BeautifulSoup(page.content, 'lxml')
A large section of the page with a class and id holds the information i'm looking for. I've pulled that out with the below:
weather = soup.find(id='detailed-forecast-body')
I would like to get a dictionary that is like {Today: "a slight change of ...", Tonight: "mostly clear, with a..."
I can list all the weather elements from above using the below:
sections = weather.find_all(class_='col-sm-2 forecast-label')
<div class="col-sm-2 forecast-label"><b>Today</b></div> <div class="col-sm-2 forecast-label"><b>Tonight</b></div>
forecasts = weather.find_all(class_='col-sm-10 forecast-text')
I'm struggling to understand how I can iterate through the weather object, to pull out just the text I want.
Any help is greatly appreciated.
There is great news here: we have 99% of the solution worked out already. We just need to build a dictionary from what has already been built with beautiful soup:
First, we do need to change one thing. The url that was provided does not actually give us the information we need. So I grabbed an actual url from forecast.weather.gov so that we can actually get html.
So, if we take the code from the question, and use a real url, we get something like this.
import requests, bs4, lxml # This is the broken url... #url = ('https://forecast.weather.gov/MapClick.php?..') url= 'https://forecast.weather.gov/MapClick.php?x=194&y=139&site=gid&zmx=&zmy=&map_x=194&map_y=139#.XrDOw_l7mV4' page = requests.get(url) soup = bs4.BeautifulSoup(page.content, 'lxml') weather = soup.find(id='detailed-forecast-body') sections = weather.find_all(class_='col-sm-2 forecast-label')
Lets take a look at sections, and forecasts. These two variables do contain the html day and forecast information that we are looking for.
>>> sections [<div class="col-sm-2 forecast-label"><b>This Afternoon</b></div>, <div class="col-sm-2 forecast-label"><b>Tonight</b></div>, <div class="col-sm-2 forecast-label"><b>Wednesday</b></div>, <div class="col-sm-2 forecast-label"><b>Wednesday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Thursday</b></div>, <div class="col-sm-2 forecast-label"><b>Thursday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Friday</b></div>, <div class="col-sm-2 forecast-label"><b>Friday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Saturday</b></div>, <div class="col-sm-2 forecast-label"><b>Saturday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Sunday</b></div>, <div class="col-sm-2 forecast-label"><b>Sunday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Monday</b></div>] >>> forecasts [<div class="col-sm-10 forecast-text">Sunny, with a high near 71. Breezy, with a northwest wind 15 to 20 mph, with gusts as high as 25 mph. </div>, <div class="col-sm-10 forecast-text">A chance of showers and thunderstorms before 11pm, then a slight chance of showers between 11pm and 2am. Mostly cloudy, then gradually becoming mostly clear, with a low around 42. Northwest wind 5 to 10 mph. Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch, except higher amounts possible in thunderstorms. </div>, <div class="col-sm-10 forecast-text">Sunny, with a high near 64. North wind 10 to 15 mph, with gusts as high as 20 mph. </div>, <div class="col-sm-10 forecast-text">A 30 percent chance of showers after 1am. Increasing clouds, with a low around 43. North northeast wind 5 to 10 mph becoming southeast after midnight. New precipitation amounts of less than a tenth of an inch possible. </div>, <div class="col-sm-10 forecast-text">A chance of showers, with thunderstorms also possible after 1pm. Mostly cloudy, with a high near 62. South southeast wind 10 to 15 mph, with gusts as high as 20 mph. Chance of precipitation is 40%. New rainfall amounts of less than a tenth of an inch, except higher amounts possible in thunderstorms. </div>, <div class="col-sm-10 forecast-text">A 30 percent chance of showers and thunderstorms before 1am. Mostly cloudy, with a low around 37. New rainfall amounts of less than a tenth of an inch, except higher amounts possible in thunderstorms. </div>, <div class="col-sm-10 forecast-text">Sunny, with a high near 59.</div>, <div class="col-sm-10 forecast-text">Areas of frost after 5am. Otherwise, mostly clear, with a low around 36.</div>, <div class="col-sm-10 forecast-text">Areas of frost before 8am. Otherwise, mostly sunny, with a high near 69.</div>, <div class="col-sm-10 forecast-text">Mostly cloudy, with a low around 41.</div>, <div class="col-sm-10 forecast-text">Mostly sunny, with a high near 60.</div>, <div class="col-sm-10 forecast-text">Mostly clear, with a low around 36.</div>, <div class="col-sm-10 forecast-text">Isolated showers. Partly sunny, with a high near 60. Chance of precipitation is 20%.</div>]
So all we really need to do is the last 1% of the work, which is to build a dictionary where we get the keys from sections, and the values from forecasts.
Lets start by building a list of the strings we want from each. So from sections, we can build a list of strings that we want to use as dictionary keys (named time_periods), and from forecasts we can build a list of strings that we want to use as dictionary values (named time_period_forecasts).
time_periods = [] for section in sections: time_periods += section.contents[0].contents time_period_forecasts = [] for forecast in forecasts: time_period_forecasts += forecast.contents
Finally, we need to put all of this together into the dictionary. Here we are just iterating through the index of each element in the time_periods and time_period_forecasts lists (this can be done many other ways). We could also improve this by adding a check that the lengths of the two lists are the same as well, but we will skip that for now.
# The Dictionary d = {} for i in range(len(time_periods)): d[time_periods[i]] = time_period_forecasts[i]
Let's see if that worked. This is a nice opportunity to use Python's pretty print module pprint, since some of our forecasts are quite long.
>>> import pprint >>> pprint.pprint(d) {'Friday': 'Mostly sunny, with a high near 61.', 'Friday Night': 'Mostly clear, with a low around 37.', 'Monday': 'A 20 percent chance of showers. Partly sunny, with a high near ' '60.', 'Saturday': 'A 20 percent chance of showers after 1pm. Mostly sunny, with a ' 'high near 64.', 'Saturday Night': 'A 20 percent chance of showers and thunderstorms. Partly ' 'cloudy, with a low around 41.', 'Sunday': 'Partly sunny, with a high near 61.', 'Sunday Night': 'Mostly clear, with a low around 36.', 'Thursday': 'A 30 percent chance of showers. Cloudy, with a high near 59. ' 'New precipitation amounts of less than a tenth of an inch ' 'possible. ', 'Thursday Night': 'A chance of showers and thunderstorms before 1am, then a ' 'slight chance of showers. Mostly cloudy, with a low ' 'around 38. Chance of precipitation is 30%.', 'Tonight': 'Mostly clear, with a low around 43. Breezy, with a northwest wind ' '15 to 20 mph, with gusts as high as 25 mph. ', 'Tuesday': 'Sunny, with a high near 70. Breezy, with a north northwest wind ' '15 to 20 mph, with gusts as high as 25 mph. ', 'Tuesday Night': 'A chance of showers and thunderstorms before midnight, then ' 'a slight chance of showers between midnight and 2am. ' 'Partly cloudy, with a low around 42. Northwest wind 5 to 10 ' 'mph. Chance of precipitation is 30%. New precipitation ' 'amounts of less than a tenth of an inch, except higher ' 'amounts possible in thunderstorms. ', 'Wednesday': 'Sunny, with a high near 64. North wind 10 to 15 mph, with gusts ' 'as high as 25 mph. ', 'Wednesday Night': 'A 20 percent chance of showers after 1am. Partly cloudy, ' 'with a low around 42. North northeast wind 5 to 10 mph ' 'becoming east southeast after midnight. '}
Nice!
How-could-Coronavirus-layoffs-impact-Home-Prices-in-the-United-States
How-to-scrape-Yahoo-Finance-data-with-Python-and-Beautiful-Soup
How-to-scrape-website-content-with-Python
How-to-use-Classes-in-Python
How-to-iterate-through-HTML-with-BeautifulSoup
Day-2-of-101-Days-of-Python
Day-1-of-101-Days-of-Python
Homemade-Time-Series-With-OCaml
Grok-Correlation
Efficient-Functional-Programming
Hello-World