Nova Scotians are intensely interested in the weather, of which we have an abundance. “If you don’t like the weather, wait 5 minutes” is our mantra. This extreme variability is caused by the jet stream which meanders over our heads on its way out over the Atlantic Ocean from North America. When it is south of us, we freeze in arctic air and when north of us the weather moderates. Further, the jet stream drags the weather from much of North America to us so we experience what others have been getting a few days after they do. [center]
[/center]
Wanting a quick summary of American weather, I wrote a script to present the conditions and forecasts for a number of cities in the USA, and it occurred to me when I got it running, that it was a good exercise in parsing data from the source text of a web site; an educational subject for a tutorial here with much broader implications than just grabbing the weather. Many of us return regularly to web sites to follow the market, get exchange rates, check the weather, check on flights, check on deals, etc., so this example is just a framework for doing that with an AppleScript.
The first step in choosing a site for yourself, however, is to make sure that all the information you want is available in the source text of the site. In the browser of your choice, view the source, and in that window search for some key words that you see on the page near what you want to know. Make sure that the data you want is not the output of a PHP/SQL interaction, but is actually visible in the text. For this example, there was another requirement – I wanted to find a site from which I could get the weather in a lot of cities and it would always be in the same format. I discovered that the Boston Globe is such a site and even better, weather.boston.com had a Weather Finder section in which a simple code revealed the weather in all the cities I cared about. What’s more, it was “plain vanilla” CSS and HTML. Current conditions for all cities are in the same format as shown in the figure below for Boston, MA.
Figure 1: Typical Current Conditions Report weather.boston.com
Searching in the source text for that page, I found the section shown in Figure 2 (Click on the image to enlarge it to readable size). Notice that the line for the div id “currentCond” would be a good place to start looking for the results and a quick search (using your browser’s Find) will reveal that it’s unique. Further note that the data we want for the current conditions somewhere always begin with “wNumbers”> and if you check other locations (cities) you’ll see that starting point and the order of the data is constant.
Figure 2: HTML corresponding to Current Conditions (click to enlarge).
Further down the page for any city’s weather, a simple Five-Day Forecast is shown, starting with the forecast for “today”, and using both images and words to describe the day’s conditions and the temperature highs and lows. Figure 3 below is a sample (again, click to enlarge).
Figure 3: Typical 5-Day forecast section at weather.boston.com.
With a little more poking around in the HTML for the site, we find two relevant sections; one that lists the forecast conditions (text that appears just below the images for each day, and one that lists the temperature hi/lo values for each day. Figures 4 and 5 show them. The first set, in Figure 4, occurs shortly after a line in the code that reads: “class=“forecast”>” with forecast quoted, but without the enclosing quotes.
Figure 4: Typical Code for 5-Day Conditions (under weather icons).
Figure 5: Typical Code for 5-Day Hi/Lo Temperatures.
At this point, we are ready to use AppleScript’s Text Item Delimiters to parse the HTML for the parts we want. Parsing, for those not familiar with the term is the process of analyzing a sequence of tokens of some kind in text to determine its grammatical structure with respect to a given grammar. A parser is the component of a compiler that carries out this task. Parsing transforms input text into a data structure – in AppleScript, a set of lists of the text items separated by the text item delimiters. If all of this is still fuzzy, stop here and read the article in MacScripter.net/unScripted: A Tutorial for Using AppleScript’s Text Item Delimiters
If you are comfortable with text item delimiters, then the best way to proceed is with a heavily commented AppleScript to parse the HTML for 10 cities in the United States all reached through weather.boston.com so they will have exactly the same format. Rather than trying to read the text from the source text in a browser, we will use the unix function cURL to grab it. Curl is very powerful, and rather complex. You can see it’s man page (if you care to) by running this script.
do shell script "man -t curl | open -f -a /Applications/Preview.app"
And now… Here’s the main event; a heavily commented script.
-- Begin --
set astid to AppleScript's text item delimiters -- for later reference by variable. It is always good practice to store these and finally set them back. AppleScript's text item delimiters are a global property of AppleScript so if you leave them in a strange setting, other scripts open at the same time will respond to them.
set WkDays to getWeekdays(current date, 5, true) -- a handler (at the bottom) to compute the days of the week for the forecast, rather than finding them from the site source.
set Weather to "American Cities Weather Summary" & return -- this is the string that will become the complete summary dialog text.
set Forecast to "5-Day Forecasts for American Cities" & return & "---" & return -- this is the string that will start the 5-day forecast dialog text -- but it's too long for 10 cities to fit on one dialog, so:
set FCPage2 to "5-Day Forecasts Continued," & return & "---" & return -- the dialog for forecasts for the last five cities.
set Cities to {"Boston", "New York City", "Wash. DC", "Miami", "Chicago", "Denver", "Houston", "San Francisco", "San Diego", "Seattle"} -- these are spread out all over the USA
-- City codes in the same order as the list of cities above -- modify to suit yourself. They appear in the URL to get the weather for each city.
set Codes to {"BOS", "NYC", "DCA", "MIA", "ORD", "DEN", "IAH", "SFO", "SAN", "SEA"}
-- Offer a choice of cities using the handler to get back the positions of the choices in the lists.
set ChosenCities to getChosen(Cities)
(*
You can look up more codes by going to [url=http://weather.boston.com/]http://weather.boston.com/[/url]
and using the search boxes on the right. When you choose one, the
code will appear at the end of the URL for the page. A more complete
list is at: [url=http://stuff.mit.edu/doc/weather-cities.txt]http://stuff.mit.edu/doc/weather-cities.txt[/url] Searching for
codes by city in that list is a fast way to find lesser known Cities in
the USA. (Canadian codes are not included in either of these, but
Google is your friend if you want to try some of them. I didn't
explore codes for cities elsewhere in the world if they exist.)
*)
-- Start of code to parse the weather.boston.com site.
-- set some text item delimiter definitions we'll need.
set cond_start to "div id=\"currentCond\"" -- chop off the top of the HTML
set Parts to "<span class=\"wNumbers\">" -- general Partlocator for data
set Fore to "id=\"fiveForecast\"" -- place to start looking for 5-day forecast.
set pages to 1 -- used for displaying forecasts when the number of choices is 5 or more.
-- Start of the main loop to cycle through the cities one by one.
set Ccount to count ChosenCities
-- Check that something was chosen, then extract the data
if Ccount is not 0 then
repeat with k from 1 to Ccount -- the main loop for dealing with chosen cities
set CityChoice to item k of ChosenCities
-- placeholders inside the loop so they'll be reset for each city.
set WC to {} -- a placeholder for weather conditions in a city.
set WF to {} -- a placeholder for forecast conditions in a city.
set HiLo to {} -- a placeholder for daily high/low temperatures
set CtyCode to item CityChoice of Codes
set City to item CityChoice of Cities
-- Use cURL to download the HTML for the page. This is the major time
-- consumer because we are looking up 10 cities in this example.
set T to (do shell script "curl [url=http://weather.boston.com/?code="]http://weather.boston.com/?code="[/url] & CtyCode)
-- CURRENT CONDITIONS EXTRACTION for EACH PAGE
-- Now parse the data for each. The advantage of using a single site
-- for all of them becomes clear: the format is the same for all of them.
set AppleScript's text item delimiters to cond_start -- to chop off most of the front end we'll keep everything after this.
set lastPart to text item 2 of T -- keep the last part, where the action is.
set AppleScript's text item delimiters to "alt=" -- look for the conditions image's alternative text
set Y to text item 2 of lastPart -- grab what's after alt=
set AppleScript's text item delimiters to " height" -- look for what follows the conditions
set Conds to text item 1 of Y -- the text for current conditions, eg: "Mostly Cloudy"
set AppleScript's text item delimiters to "class=\"degrees\">" -- look for temperature next
-- NOTE: when the astid contains quotes, they must be escaped as they are above.
set X to text item 2 of lastPart -- grab all of it
set AppleScript's text item delimiters to "°" -- look for what follows the number
-- NOTE: your browser will translate the astid above to a degree symbol. You should replace that with ampersand, "&", followed by the letters "deg", and then by a semicolon. ";" if this HTML code for a degree symbol is not in the script you download.
set tTemp to text item 1 of X -- the first part is the temperature number we want.
set AppleScript's text item delimiters to Parts -- now move down to the numbers for conditions
set tItems to (text items 2 thru -1 of lastPart) -- five of them, ignore the rest -- we start with item 2 because there is "stuff" after the first one we don't want.
-- grab the pieces from the data set
repeat with anItem in tItems -- cycle through conditions and move data to our storage.
set end of WC to first word of contents of anItem -- individual figures
end repeat -- end of grabbing data for conditions.
set AppleScript's text item delimiters to astid
set {feel, tWind, Dir, tHum, Bar} to WC -- give the five items variable names for easy reference in building the dialog text (instead of item 1 of..., item 2 of ....)
-- Build the dialog display for current conditions (ASCII char 188 is a degree symbol)
-- NOTE: if the display is too tall for your screen, remove some cities and their codes
-- from the Cities list and CtyCode list definitions at the beginning of the script.
set Weather to Weather & "---" & return & City & ", " & Conds & " and " & tTemp & (ASCII character 188) & "F" & ", " & "Feels: " & feel & (ASCII character 188) & "F" & return & Dir & " Wind " & " @ " & tWind & " mph " & "with " & tHum & "% Rel. Humidity" & return & "Barometric Pressure " & Bar & " in." & return
set Weather to Weather & "---" -- to bound the last entry.
-- 5-DAY FORECAST EXTRACTION FROM EACH PAGE
-- Now build the forecast summary for the cities; refer to the forecast figure.
-- We want the words under the images, not the images themselves, and
-- because we know what day it is, we don't need the text for the weekdays.
-- We will calculate the days of the week require using a handler.
set AppleScript's text item delimiters to Fore -- a unique indicator in the site code.
set partTwo to text item 2 of lastPart -- forecast data is after our first cut above.
-- We could have started from the html itself (the variable "T"),
-- but moving down reduces the searching (not that TID searches are slow).
set AppleScript's text item delimiters to "class=\"forecast\">"
set Z to text items of partTwo
set AppleScript's text item delimiters to "</td>" -- the end of each forecast item
repeat with m from 2 to 6
set end of WF to first text item of item m of Z
end repeat -- end of extracting forecast data
set AppleScript's text item delimiters to "class=\"hightemp\">"
set Q to text items 2 thru 6 of partTwo
set AppleScript's text item delimiters to "class=\"lowtemp\">"
set R to text items 2 thru 6 of partTwo
set AppleScript's text item delimiters to "°" -- follows the temperatures.
-- See the warning above about what the text item delimiter should be.
repeat with n from 1 to 5 -- 5-day forecasts
set Hi to text item 1 of item n of Q
set Lo to text item 1 of item n of R
set HiLo's end to "" & Hi & "/" & Lo & space & (ASCII character 188) & "F"
end repeat -- end of repeat through forecast days
set AppleScript's text item delimiters to astid
-- build the Forecast text using items from WkDays, WF (the conditions), & HiLo
set Forecast to Forecast & "For: " & City & return -- heading for each city
-- add the five day conditions & hi/lo temperatures
repeat with m from 1 to 5 -- Pick out the data from lists
set WD to item m of WkDays
set dayFCst to item m of WF
set dayHiLo to item m of HiLo
set Forecast to Forecast & WD & ": " & dayFCst & " Hi/Lo = " & dayHiLo & return
end repeat -- end of building forecasts list
set Forecast to Forecast & "---" & return
if k = 5 then -- 10 cities won't fit on one dialog box, so we'll do 5 and 5.
copy Forecast to FCPage1 -- first "page"
copy FCPage2 to Forecast -- second "page"
set pages to 2
end if
end repeat -- end of repeat through the chosen cities
-- display Weather with button choice for displaying forecast as well.
set B to button returned of (display dialog Weather buttons {"Done", "Forecast"} default button "Forecast")
if B is "Forecast" and pages = 2 then -- show the Forecast data if required.
if button returned of (display dialog FCPage1 buttons {"Cancel", "More"} default button "More") is "More" then display dialog Forecast -- now the second page from above.
else
display dialog Forecast buttons {"Done"} default button 1
end if
end if -- end of check for something chosen.
-- Handler to list some weekdays from a date including the day of that date if startToday is true.
-- For this example, the forecast lists do start with today, so we set it to true.
to getWeekdays(aDate, howMany, startToday) -- startToday is true/false, false is tomorrow.
set startnum to 1 -- start list from tomorrow
if startToday then set startnum to 2 -- starts the list with today's weekday
-- Note that weekday names are AppleScipt constants.
set WkDays to {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}
set WkDy to (weekday of aDate) as number
set WD to {}
-- note fiddle because there is no weekday numbered 0, they start with Sunday = 1
repeat with k from 1 to howMany
set WD's end to item (((WkDy + k - startnum) mod 7) + 1) of WkDays as text
end repeat
return WD
end getWeekdays
-- Handler to return the position of an item in a list of choices so an
-- item in a companion list can be substituted for the choice. In this example,
-- a City is chosen from a list of cities, but a City's code is also required.
-- The handler returns a list of numbers which is used to get both the
-- cities and the corresponding city codes for the weather.
to getChosen(aList)
set numList to {}
set chosen to {}
-- Make a numbered list in "numList".
repeat with k from 1 to count aList
set end of numList to (k as text) & ". " & item k of aList -- number the original list.
end repeat
-- Offer a choice from the numbered list.
set choices to choose from list numList with prompt "Hold the Command key down to make multiple choices." with multiple selections allowed -- get the picks now including numbers.
-- Extract the numbers only from the choices found in list "choices".
repeat with j from 1 to count choices
tell choices to set end of chosen to (text 1 thru ((my (offset of "." in item j)) - 1) of item j) as integer -- look for the period in the numbered list, and take the characters before it.
end repeat
-- Return a list of numbers for chosen positions in the given list.
return chosen
end getChosen
I’ve used the weather in 10 American cities as the example here, but if you’ve followed the process, you’ll be able to extract any data from a web page if the data you want appears in the source HTML for the page. Build your own notification and enjoy.