Trouble with regular expressions in applescript

I am creating an droplet to clean up some XML files that I will be exporting from FMP. I started by working with some find and replace commands in TextWrangler. I am now getting them into an AppleScript to have these actions automated. The problematic grep string works in the TextWrangler find window. However AppleScript editor will not allow me to compile with the same said grep string. I have tried to escape some chars that I thought might be the problem, to no avail.

The grep string I am trying to utilize is…

(?<=BarCode>)[\S\s]*?(?=</BarCode)

Any ideas as to where I am going wrong?

You can see my progress below. The line that wont compile is commented out.


on open theDroppedStuff
	set TextWranglerAppPath to path to application "TextWrangler"
	tell application "Finder" to open file theDroppedStuff using TextWranglerAppPath
	
	--Set reference to docuement that has been opened
	set myDocument to (name of document 1 of application "TextWrangler") as string
	
	--open textwranger - clean xml to human readable
	tell document myDocument of application "TextWrangler"
		
		--bring all nodes to new line
		replace "<" using "\\r<" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--bring all closing tags up a line
		replace "\\r</" using "</" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--bring nested "DATA" nodes up a line
		replace "\\r<DATA>" using "<DATA>" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		
		--Can't compile - Syntax error - unknown token
		--remove specific node's content
		--replace "(?<=BarCode\>)[\S\s]*?(?=\<\/BarCode)" using "This is a test" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
	end tell
end open

on run
	-- this handles a double-clicked icon
	display dialog "Use this script to clean an FMP XML file. Drop the XML file on this applicaion icon."
end run

Model: iMac
AppleScript: 2.2.2
Browser: Safari 536.25
Operating System: Mac OS X (10.8)

try:


quoted form of searchstring

I am not sure how that would be written…

I would think you mean…???

However I still can not compile. Is this what you are thinking?


set searchstring to "(?<=BarCode\>)[\S\s]*?(?=\<\/BarCode)"


replace quoted form of searchstring using "This is a test" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}


Hi, Greg.

You probably just need to escape the backslashes so that AppleScript sees them as backslash characters and not escapements that it needs to worry about itself:

"(?<=BarCode\\>)[\\S\\s]*?(?=\\<\\/BarCode)"

Or

set searchstring to quoted form of "(?<=BarCode\>)[\S\s]*?(?=\<\/BarCode)"

(But I guess that Nigels way of doing it is best.)

Nigel escapes the backslashes which needs to be escaped when in AS. Your code isn’t allowed in AS

Agreed!

Then I have learned something today! :slight_smile: I find it natural that it doesn’t work, when I think of it, as the string is illegal for starters! I thought it erred, when the search began.

I see it as I reread the post that the problem was about getting the script to compile :frowning:

I tried escaping the different chars also ever before I posted, I should have mentioned. When I escape the chars the code will compile however the code just does not function. I don’t get any errors.

When I run the same code in the TextWrangler find and replace it will replace the content in between the nodes.

When I use the escaped form in AS it does nothing.

Any other ideas?’

Your ‘search mode’ option is ‘literal’. It probably needs to be ‘grep’ to work with a regular expression.

Thats the ticket! Thank you so much.

Special chars needed to be escaped and the search mode needed to be grep.

Below is a working copy of my code, incase it would be helpful to someone in the future.

Thank you very much guys!

May the Lord Jesus Christ bless you unto salvation through faith in his work on the cross!


--Code to run as dropplet
on open theDroppedStuff
	set TextWranglerAppPath to path to application "TextWrangler"
	tell application "Finder" to open file theDroppedStuff using TextWranglerAppPath
	
	--Set reference to docuement that has been opened
	set myDocument to (name of document 1 of application "TextWrangler") as string
	
	tell document myDocument of application "TextWrangler"
		
		--Put all node on seperate lines
		replace "<" using "\\r<" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--Bring all closing nodes up to the same line as their content
		replace "\\r</" using "</" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--Bring all nested "Data" tags up a line
		replace "\\r<DATA>" using "<DATA>" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		
		--Remove content of specific XML node - My node is called "BarCode"
		replace "(?<=BarCode\\>)[\\S\\s]*?(?=\\<\\/BarCode)" using "" searching in text 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--Find and replace the following line - caused problems when importing into my next system
		--<?xml version="1.0" encoding="UTF-8" ?>
		replace "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>" using "" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--Find and replace the following line - caused problems when importing into my next system
		--<!-- This grammar has been deprecated - use FMPXMLRESULT instead -->
		replace "<!-- This grammar has been deprecated - use FMPXMLRESULT instead -->" using "" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--Remove all bullet points
		--This could be exponded on to loop through or a list of bad chars
		replace "¢" using "_" searching in text 1 options {search mode:literal, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--Remove whitespace
		replace "^\\r" using "" searching in text 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		
		--Save the file
		save
	end tell
end open

--Code to run when clicked
on run
	-- this handles a double-clicked icon
	display dialog "Use this script to clean an FMP XML file. Drop the XML file on this applicaion icon."
end run