I could use a little help
I have a spreadsheet that is tab delimited with fields that contain a varying number of pipe delimited URL’s.
I would like to download the URLS. So.
How do I count the number of URL’s in each field?
How do I identify each URL for use in the script later?
How would I download the correct numberof all the URL’s in the string using something like:
do shell script "curl -o " & DownloadDestinationPath & “/” & last text item of URL & " -L " & FileDownload
Thank you for all your help!
Mark
If the only goal all URLs from one tab delimited textfile I recommend you use an regular expression to grab those from the file.
You can use:
- do shell script with either of the following commands (there are many more), awk, sed, grep
- You could use AppleScriptObjC and the NSRegularExpression class
- You could use an scripting addition like AppleScript Toolbox.osax or SatImage.osax which both supports regex
Using a do shell script seems just fine here. For the purpose of a working example it downloadins a webpage from macscripter. You have to change that with reading the file into a variable. As you can see relative paths are ignored and only URLs are grabbed from the page.
set stringContainingURLs to do shell script "curl [url=http://www.macscripter.net/viewforum.php?id=2]http://www.macscripter.net/viewforum.php?id=2"[/url]
set regex to ""
-- build regex to for better understanding
-- first grab the scheme part
set regex to regex & "[a-z]{3,9}://"
-- then pick the host part
-- server may only begin with an letter
set regex to regex & "[a-z]"
-- rest of server address may contain -, . and numbers
set regex to regex & "[a-z0-9\\-\\.]+"
-- optionally the URL may have an path
set regex to regex & "(/[a-z0-9+~%/._-]*)?"
-- optionally the URL may have an query
set regex to regex & "(\\?[+=&~%/.a-z0-9_-]*)?"
-- optionally the URL May have an anchor
set regex to regex & "(#[~%/.a-z0-9_-]*)?"
do shell script "egrep -io " & quoted form of regex & " <<<" & quoted form of stringContainingURLs