Tab Delimited with Pipe Delimited URL's

I could use a little help

I have a spreadsheet that is tab delimited with fields that contain a varying number of pipe delimited URL’s.

I would like to download the URLS. So.

How do I count the number of URL’s in each field?
How do I identify each URL for use in the script later?
How would I download the correct numberof all the URL’s in the string using something like:

do shell script "curl -o " & DownloadDestinationPath & “/” & last text item of URL & " -L " & FileDownload

Thank you for all your help!

Mark

If the only goal all URLs from one tab delimited textfile I recommend you use an regular expression to grab those from the file.

You can use:

  • do shell script with either of the following commands (there are many more), awk, sed, grep
  • You could use AppleScriptObjC and the NSRegularExpression class
  • You could use an scripting addition like AppleScript Toolbox.osax or SatImage.osax which both supports regex

Using a do shell script seems just fine here. For the purpose of a working example it downloadins a webpage from macscripter. You have to change that with reading the file into a variable. As you can see relative paths are ignored and only URLs are grabbed from the page.

set stringContainingURLs to do shell script "curl [url=http://www.macscripter.net/viewforum.php?id=2]http://www.macscripter.net/viewforum.php?id=2"[/url]

set regex to ""
-- build regex to for better understanding
-- first grab the scheme part
set regex to regex & "[a-z]{3,9}://"

-- then pick the host part
-- server may only begin with an letter
set regex to regex & "[a-z]"
-- rest of server address may contain -, . and numbers
set regex to regex & "[a-z0-9\\-\\.]+"

-- optionally the URL may have an path
set regex to regex & "(/[a-z0-9+~%/._-]*)?"

-- optionally the URL may have an query
set regex to regex & "(\\?[+=&~%/.a-z0-9_-]*)?"

-- optionally the URL May have an anchor
set regex to regex & "(#[~%/.a-z0-9_-]*)?"


do shell script "egrep -io " & quoted form of regex & " <<<" & quoted form of stringContainingURLs