Ah. Of course! Adam’s article. I read it when it came out, but wasn’t ready to think about databases at the time. I can see now where some of the techniques in your scripts came from.
Here’s my version of the batch script. It’s similar to yours, but uses some text manipulation appended to the page-fetch shell script to pick out the lines containing “Email Print”, keep only the bits after "Email Print ", and substitute “‘’” (two single quotes) for “'” (one). Another difference is that instead of using double quotes in the “sqlite3” commands, it uses the ‘quoted form’ of those bits. This leaves any double quotes in descriptions free to be kept as such within the single-quoted text.
If it weren’t for my page-fetch shell script reducing the page contents as they come in, I’d parse the first page separately for its stock entries instead of fetching it again after getting the number of pages from it.
on main()
-- How many pages will be downloaded at once? (Not too many. There's a limit to the length of text that can be used in a shell script. )
set batch to 50
set sqlite3Head to "sqlite3 " & quoted form of POSIX path of ((path to documents folder as text) & "Databases:Home Depot.db") & space
do shell script (sqlite3Head & "\"create table if not exists depot(id integer primary key, Date, Description, Model, Internet, 'Store Sku', 'Store SO Sku', Price); \"")
set astid to AppleScript's text item delimiters
set pageSource to (do shell script "curl [url=http://www.homedepot.com/buy/?page=1)]http://www.homedepot.com/buy/?page=1")[/url]
set AppleScript's text item delimiters to {"<a href=\"?page="}
set totalPages to (first word of text item -1 of pageSource) as integer
-- set totalPages to 100 -- Enable this to reduce the number of pages for testing.
script o
property lineList : missing value
property batchList : missing value
end script
repeat with startPage from 1 to totalPages by batch
set endPage to startPage + batch - 1
if (endPage comes after totalPages) then set endPage to totalPages
set pageRange to "[" & startPage & ("-" & endPage & "]")
set editedText to (do shell script "curl [url=http://www.homedepot.com/buy/?page=]http://www.homedepot.com/buy/?page="[/url] & pageRange & " | grep -o 'Email Print .*' | sed -e s/'Email Print '/''/g -e s/\"'\"/\"''\"/g")
set o's lineList to editedText's paragraphs
set o's batchList to {}
considering case
repeat with i from 1 to (count o's lineList)
set theRecord to {"Date, Description, Model, Price"}
set recordData to {}
-- Get only the part of the line we want.
set AppleScript's text item delimiters to " / "
set theItem to text 1 thru text item -3 of item i of o's lineList
-- Everything before the (last) " Model # " in the line is the description.
set AppleScript's text item delimiters to " Model # "
set theDescription to text 1 thru text item -2 of theItem
set otherStuff to text item -1 of theItem
-- The price is whatever comes after the last " $" in whatever we have left.
set AppleScript's text item delimiters to " $"
set thePrice to text item -1 of otherStuff
set otherStuff to text 1 thru text item -2 of otherStuff
-- The model number comes before the first " Internet # " or " Store S" (if either's present) .
set AppleScript's text item delimiters to {" Internet # ", " Store S"} -- Requires Snow Leopard or later.
set theModel to text item 1 of otherStuff
set end of recordData to theDescription
set end of recordData to theModel
set end of recordData to thePrice
-- Internet, Store SKU, and Store SO SKU numbers are blocks of digits and will be the first 'word' after their respective labels.
if (otherStuff contains "Internet #") then
set AppleScript's text item delimiters to "Internet # "
set end of theRecord to "Internet"
set end of recordData to word 1 of text item 2 of otherStuff
end if
if (otherStuff contains "Store SKU #") then
set AppleScript's text item delimiters to "Store SKU # "
set end of theRecord to "'Store SKU'"
set end of recordData to word 1 of text item 2 of otherStuff
end if
if (otherStuff contains "Store SO SKU #") then
set AppleScript's text item delimiters to "Store SO SKU # "
set end of theRecord to "'Store SO SKU'"
set end of recordData to word 1 of text item 2 of otherStuff
end if
set AppleScript's text item delimiters to ", "
set theRecord to theRecord as text
set AppleScript's text item delimiters to "', '"
set end of o's batchList to "insert into depot ( " & theRecord & ") values( datetime(), '" & recordData & "'); "
end repeat
end considering
set AppleScript's text item delimiters to ""
do shell script (sqlite3Head & quoted form of (o's batchList as text))
end repeat
set AppleScript's text item delimiters to astid
beep 2
end main
main()
Here’s an attempt at a version of that for sqlite3. I’m afraid my knowledge of text manipulation by shell script only goes so far.
- Get sqlite3 to return everything from the database. The default return format is one record per line, the fields in each line delimited with bar characters. The fields are in the same left-to-right order used when they were originally specified in the creation of the table. Field 2 is the date in ISO format; field 3 is the description; field 4 is the model number.
- Sort the records on the date fields. However, since the records seem to be returned in creation order (ie. date order), this step is probably unnecessary.
- Since at least “Eden Arbors Vienna Arbor” and “Eden Arbors The London Arbor” have the same model number, it’s necessary to take the description into account as well as the model. To this end, perform a stable sort on the description fields. This groups records with equal descriptions without changing their order with respect to each other (ie. sorted on date).
- Perform another stable sort on the model fields. Thiss group records with equal model numbers without changing their order with respect to each other (sorted on description, sub sorted on date).
- Using vanilla AppleScript, go through a list of the sorted lines. At the end of each run of a particular model number or description, compare the last two prices in the run. If the last one’s less than the one before it, append the relevant details to a results list.
- Return the results list.
on main()
script o
-- A list of the lines returned by sqlite3 for all the records in the database. Here they're assumed to have come off in date order, so they're just stable-sorted on field 3 (descriptions) and on field 4 (model numbers).
property recordLines : paragraphs of (do shell script ("sqlite3 " & quoted form of POSIX path of ((path to documents folder as text) & "Databases:Home Depot.db") & " 'select * from depot; ' | sort -st '|' -k 3 | sort -st '|' -k 4"))
property priceDrops : {}
end script
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "|"
set i to 1
set currentDescription to text item 3 of beginning of o's recordLines
set currentModel to text item 4 of beginning of o's recordLines
repeat with j from 2 to (count o's recordLines)
set thisDescription to text item 3 of item j of o's recordLines
set thisModel to text item 4 of item j of o's recordLines
if (not ((thisModel is currentModel) and (thisDescription is currentDescription))) then
-- A different model number or description. Compare the last two prices (if more than one) for the previous model and insert details of any price drop into o's priceDrops.
if (j - i > 1) then checkPrices(j - 1, o)
-- Reset for a run of the model number just reached.
set i to j
set currentDescription to thisDescription
set currentModel to thisModel
end if
end repeat
if (j > i) then checkPrices(j, o) -- Check the run in progress at the end of the list.
set AppleScript's text item delimiters to astid
return o's priceDrops
end main
-- Compare the price in the indexed list in o's recordLines with that in the list before. If it's less, append relevant details to o's priceDrop list.
on checkPrices(j, o)
set latestDetails to item j of o's recordLines
set previousDetails to item (j - 1) of o's recordLines
set latestPrice to text item -1 of latestDetails
set previousPrice to text item -1 of previousDetails
considering numeric strings
if (latestPrice < previousPrice) then set end of o's priceDrops to {|description|:text item 3 of previousDetails, model:text item 4 of previousDetails, |previous date|:text item 2 of previousDetails, |previous price|:previousPrice, |latest date|:text item 2 of latestDetails, |latest price|:latestPrice}
end considering
end checkPrices
main()