sed, applescript, replace strings, regex

Hello All!

I have user submitted data as a variable in a tab delimited spread sheet that I would like pass verbatim to a .html document.

I am going to use sed and have found a sed command that will work and is spectacular!

Could anyone help me put this into applescript please?

replace=‘Laurel & Hardy; PS\2’ # sample input containing metachars.

replaceEscaped=$(sed ‘s/[&/]/\&/g’ <<<“$replace”) # escape it

sed -n “s/(.) (.)/$replaceEscaped/p” <<<“foo bar” # if ok, outputs $replace as is

I am very confused…I have tried to split the commands but applescript needs to be escaped and the chopsticks start happening.

this came from the best sed reference that I have found. http://stackoverflow.com/questions/29613304/is-it-possible-to-escape-regex-metacharacters-reliably-with-sed

even just a pointer of using the sed commands that a mac can see properly.

thankyou all so very much

Model: trashcan
Browser: Safari 537.36
Operating System: Mac OS X (10.10)

Hi.

If I’ve correctly understood the object of the code:

do shell script "replace='Laurel & Hardy; PS\\2' # sample input containing metachars.
replaceEscaped=$(sed 's/[&/\\]/\\\\&/g' <<<\"$replace\") # escape it
sed -n \"s/\\(.*\\) \\(.*\\)/$replaceEscaped/p\" <<<\"foo bar\" # if ok, outputs $replace as is"

The reason the sed string in the last line is in double quotes rather than single quotes is to make the shell insert the contents of the variable replaceEscaped rather than the text “$replaceEscaped”. Alternatively, you could use single quotes, but interrupt them with a double-quoted section (a bit like inserting a variable into a string in AppleScript):

do shell script "replace='Laurel & Hardy; PS\\2' # sample input containing metachars.
replaceEscaped=$(sed 's/[&/\\]/\\\\&/g' <<<\"$replace\") # escape it
sed -n 's/\\(.*\\) \\(.*\\)/'\"$replaceEscaped\"'/p' <<<\"foo bar\" # if ok, outputs $replace as is"

There are various ways to reduce the “wicker fence” effect. For instance, when the Mac sed’s -E option’s used, “(”, “)”, and “|” are understood to be metacharacters and don’t need to be escaped. (It’s the literal equivalents which have to be escaped then.)

do shell script "replace='Laurel & Hardy; PS\\2' # sample input containing metachars.
replaceEscaped=$(sed 's/[&/\\]/\\\\&/g' <<<\"$replace\") # escape it
sed -En 's/(.*) (.*)/'\"$replaceEscaped\"'/p' <<<\"foo bar\" # if ok, outputs $replace as is"

. not to mention the fact that the parentheses in the last line aren’t needed anyway. :wink: Also, the delimiter in sed’s ‘s’ command doesn’t have to be a slash. Whatever character comes after the ‘s’ (with a few exceptions) is taken as the delimiter:

do shell script "replace='Laurel & Hardy; PS\\2' # sample input containing metachars.
replaceEscaped=$(sed 's:[&/\\]:\\\\&:g' <<<\"$replace\") # escape it
sed -n 's:.* .*:'\"$replaceEscaped\"':p' <<<\"foo bar\" # if ok, outputs $replace as is"

That’s not a particularly efficient regex in the last line, but it serves for the demo. :slight_smile:

Edit: A bit more about the AppleScript text:

The double quotes at either end signify that it’s text and aren’t in the text itself.

The text contains double-quote and backslash characters and each of these needs to be escaped with a backslash when the text is typed into Script Editor. The escaping backslashes are also not actually in the text but are just there to help it compile. If you replace ‘do shell script’ with ‘display dialog’ in the first script above, the text displayed is the actual shell code.

The shell code itself contains some escaping, also involving backslashes. Each of these backslashes must be represented by two in the AppleScript text (as seen in Script Editor). The most extreme example is the “\\&” in the second line. There, the “&” is a metacharacter representing the text matched by the “[&/\]” regex. The aim is to insert a backslash character in front of the matched text. This backslash has to be written as two backslashes in the sed command in order to get a single backslash character in the text being edited. Both sed backslashes have to be escaped when representing the code inside an AppleScript string in Script Editor. Hence four backslashes are needed to get two in sed to get one actual backslash in the text which eventually winds up as an escape character itself in replaceEscaped!

Wow! Nigel!

Thank you, your time is truly appreciated! i am a Script Kitty so to have such a wonderful explanation is wonderful!

Would you have any tips on using sed for the input string? As it is User submitted, I have no control over what is typed and would like to place it into html regardless of content and characters.

If that is not possible, is there a good way to sanitize the data from the spreadsheet that you could suggest?

I have been looking at perl \Q \E quotemeta

Once again thank you!

UPDATE!

I have spent 2 days so far exploring a reliable way to get VERBATIM user input into a .html doc. via applescript
Using awk, sed, perl, python gsed/gnu and all of them crap out on one thing or another.

I have decided, although a bit messy, to use Text Wrangler for all User defined input fields as follows:

do shell script “open -n -a ‘TextWrangler.app’”
tell application “TextWrangler”
open “/Users/xxx/xxxx/xxx/index.html”
replace “KEYWORD” using PREDEFINEDVARIABLE searching in text 1 of front document options {starting at top:false, wrap around:false, backwards:false, case sensitive:true, match words:true, extend selection:false}
save documents
close documents
quit
end tell

Obviously not fast - but fast enough. More importantly Reliable. It will take anything you throw at it as long as grep is off.

Any other suggestions as to doing this or something simlilar on the command line or via do shell script or do script would be really appreciated.

It looks like you’re after a literal replace, in which case you can just use simple AppleScript:

on replace:theFind withString:theReplace inString:theString
	considering case
		set saveTID to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {theFind}
		set theString to text items of theString
		set AppleScript's text item delimiters to {theReplace}
		set theString to theString as text
		set AppleScript's text item delimiters to saveTID
	end considering
end replace:withString:inString:

Hi snaplash.

I’ve been struggling to understand your descriptions of what you want to do.

By “pass . to a .html document”, do you mean there’s a field in the document where you want the text entered? Or do you want the text inserted into the HTML source code in a .html file?

If the latter, does “verbatim” mean you want the text inserted into the code literally as it is? Or do you want it HTML’d to ensure that it doesn’t interfere with the code and looks like itself when the document’s displayed?

What’s the criterion for whereabouts in the document/source code the text has to go?

I assume you already have getting the input from the user or spreadsheet under control.

Hello All!

Firstly I just want you to know that everyones thoughts on this are appreciated - you guys are helping me immeasureably.

I have fields in a Spreadsheet that has been supplied by a Realtor. I would like to put this into a html document in predefined tags within

. So for example:

PROPERTYOVERVIEW

Obviously my issue is that I cannot control what the Realtor types in so yes, I would like it to replace literally.

At the moment, I am using sed to delete the

and

should they exist within the users input and replace with TextWrangler. All other input seems to be fine.

I have a spreadsheet with about 200 fields that I want to pass these to the .html document including Image Links and and Video Links

My business is Automated Video Production for Real Estate, and the finished videos (mp4 and webm) get placed into a OnePage website along with a max of 100 images

The time it takes is not a huge worry for me as this text replace process will be taking place during render process which is 2-3minutes. I just need it to be reliable.

Hope this clarifies!

Hi.

This seems to be the simplest approach:

(* Use your own code to get together the text from a spreadsheet field, the key it has to replace in the HTML (without the <p> and </p> tags), and the HTML text itself, then call prepareAndInsert() with the three strings as parameters. The result is the HTML code with a prepared version of the input text where the key was. Repeat with more fields and keys as required. Save the HTML when finished.
*)

-- Encode text for display in an HTML document and replace a key in the document's source code with it. Return the edited source code.
on prepareAndInsert(theInput, theKey, theHTML)
	set theInput to HTMLEncode(theInput)
	set theHTML to replace("<p>" & theKey & "</p>", "<p>" & theInput & "</p>", theHTML)
	
	return theHTML
end prepareAndInsert

-- Doctor some text for display in an HTML document. Return the edited text.
on HTMLEncode(theText)
	-- Delete any <p> and </p> tags.
	set theText to replace({"<p>", "</p>"}, "", theText)
	-- Replace any ampersands, angle brackets, or line endings with HTML equivalents.
	-- The concatenations are to stop the HTML entities from being interpreted when this script's posted!
	set theText to replace("&", "&" & "amp;", theText)
	set theText to replace("<", "&" & "lt;", theText)
	set theText to replace(">", "&" & "gt;", theText)
	set theText to replace({return & linefeed, linefeed, return}, "<br />", theText)
	
	-- A few other possible replacements, but they may not be necessary:
	-- set theText to replace(character id 160, "&" & "nbsp;", theText)
	-- set theText to replace("\"", "&" & "quot;", theText)
	-- set theText to replace(""", "&" & "ldquo;", theText)
	-- set theText to replace(""", "&" & "rdquo;", theText)
	-- set theText to replace("˜", "&" & "lsquo;", theText)
	-- set theText to replace("'", "&" & "rsquo;", theText)
	-- set theText to replace("«", "&" & "laquo;", theText)
	-- set theText to replace("»", "&" & "raquo;", theText)
	
	return theText
end HTMLEncode

-- Replace a substring or substrings in some text with another string. Return the edited text.
on replace(substrings, replacement, theText)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to substrings
	set textItems to theText's text items
	set AppleScript's text item delimiters to replacement
	set theText to textItems as text
	set AppleScript's text item delimiters to astid
	
	return theText
end replace

Nigel, Thank you!

That again is a thoughtful and well explained solution (especially handiling the wildcards).

Completely resolved! Wonderful!