Help with TextWrangler AppleScript

jimboreg · May 15, 2016, 11:13pm

I’m struggling to find a way to delete all lines in a text document that do NOT begin with a alphabetical character (upper OR lowercase, a to z only) in TextWrangler (v5).

I simply need to strip out any lines wnich begin with numerical characters or special characters like \

thanks

Av8TnTek · May 15, 2016, 11:51pm

Regular Expressions are your friend for things like this. You will find a grep reference in Textwrangler’s Help menu. Something like:

^\d+.*

Should find all lines starting with numerics simply replace them with a null string. Special characters like “/” can be listed in a character class to accomplish the same for them. Since you didn’t list the specific characters or provide a simple file to work with I hesitate to give any example for them.

jimboreg · May 16, 2016, 10:22am

Thanks for taking the time to reply. I’ll investigate the GREP help file, and post code if I have any success!

Cheers

Nigel_Garvey · May 16, 2016, 11:53am

Hi.

The most reliable regex I’ve been able to produce for this this morning is:

[format]^(\r|[^A-Za-z][^\r]*\r?)[/format]

It matches lines beginning with line breaks (ie. empty lines) or lines beginning with anything other than A-Z or a-z, up to and including the line breaks at the ends (if any).

You can type it into TextWrangler’s “Find” dialog, make sure the “Replace” field is empty, check the “Grep” checkbox, and click on “Replace All”. Unfortunately, it doesn’t delete any trailing line break at the end.

You can also do it by script with this, which does also remove any trailing line break:

tell application "TextWrangler"
	tell front text window
		-- Remove empty lines or lines not beginning with A-Z in either case.
		replace "^(\\r|[^A-Za-z][^\\r]*\\r?)" using "" options {search mode:grep, starting at top:true}
		-- Remove any trailing line break at the end.
		replace "\\r\\z" using "" options {search mode:grep, starting at top:true}
	end tell
end tell

TextWrangler’s searches are case-insensitive by default, so you could shorten [^A-Za-z] to either [^A-Z] or [^a-z]. If you also want to allow lines beginning with accented or non-English letters, change it to [^[:alpha:]].

ccstone · May 16, 2016, 12:40pm

Hey Jim,

Nigel has covered the find/replace scripting perfectly well, but don’t forget about the “Process Lines Containing” command.

It can be operated manually or scripted:


tell application "TextWrangler"
	tell front text document
		process lines containing matching string "^(?:[^a-z]+|[[:blank:]]*$)" matching with grep true ¬
			output options {deleting matched lines:true}
	end tell
end tell

Like find/replace it is case-insensitive by default.

–
Chris

{ MacBookPro6,1 · 2.66 GHz Intel Core i7 · 8GB RAM · OSX 10.11.4 }
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

jimboreg · May 16, 2016, 1:54pm

Nigel and Chris - amazingly helpful stuff!

thanks

Nigel_Garvey · May 16, 2016, 6:48pm

ccstone:


tell application "TextWrangler"
	tell front text document
		process lines containing matching string "^(?:[^a-z]+|[[:blank:]]*$)" matching with grep true ¬
			output options {deleting matched lines:true}
	end tell
end tell

Nice, Chris. It explicitly deletes lines, which is neater than my approach of replacing the lines’ contents and endings with “” ” even though it’s essentially the same thing. Fooling around with it myself, I see that ‘process lines’ only has to match something in a line, not the whole line. This allows the regex to be simpler.

It still seems that the only way to delete a spare line break at the end of the document is to use a separate instance of ‘replace’. Annoyingly, ‘process lines’ only works in documents and ‘replace’ only in windows: Edit: Both commands do in fact work in either text documents or text windows provided they’re addressed to the ‘text’ of those containers. I’ve replaced the original scriptwork below in the light of Chris’s reply in post #13.


tell application "TextWrangler"
	tell text of front text document -- or: tell text of front text window
		process lines containing matching string "^(?=[^a-z]|$)" matching with grep true ¬
			output options {deleting matched lines:true}
		replace "\\r\\z" using "" options {search mode:grep, starting at top:true}
	end tell
end tell

jimboreg · May 17, 2016, 12:57pm

Hi Nigel,

As you can see. grep is new to me, but it’s beginning to make (some) sense now.

re: “Annoyingly, ‘process lines’ only works in documents and ‘replace’ only in windows”

I found this this appears to work to strip out any blank lines:

[format]
tell application “TextWrangler”
tell front text document
process lines containing matching string “^[[:space:]]*$” matching with grep true ¬
output options {deleting matched lines:true}
end tell
end tell[/format]

I still have one small problem - what I’m trying to delete is any line that does not start with a single space followed by an alphabetical character.

I tried adapting your search string:

[format]^(?=[^a-z]|$)[/format]

like this:

[format]^(?=[^[[:space:]][a-z]]|$)[/format]

but it doesn’t work, even though this does:

[format]tell application “TextWrangler”
tell front text document
process lines containing matching string “^[[:space:]][a-z]” matching with grep true ¬
output options {deleting matched lines:true}
– ^ means start of line
end tell
end tell[/format]

One final question - [format]grep -v[/format] inverts the search, but why can’t I type [format]matching with grep -v true[/format] ?

Marc_Anthony · May 17, 2016, 2:39pm

b[/b] is a positive lookahead. Try the negative version”b[/b].

^(?![[:space:]]{1}[[:alpha:]])

jimboreg · May 17, 2016, 8:26pm

Thanks Marc - for the explanation of “?=” compared to “?!”

AND for the script - it worked perfectly.

The people on this site are GREAT!

cheers,
jamie

Nigel_Garvey · May 17, 2016, 9:29pm

Hi jamie.

Yes. I think all of the scripts above strip out empty lines within text. It’s just if there’s one left over at the end ” ie. the text actually ends with a line break ” that an extra step’s needed to remove it. I think the BBEdit people decided to leave it be normally in case people wanted to keep it.

The regex (grep) in the script is correctly written to match a “white space” character (which could be either a space or a tab, or in some cases a line break) at the beginning of a line, followed by a letter between a and z. The script would delete any lines which do begin this way.

The regex you tried immediately above it isn’t correctly formed. There’s an additional layer of square brackets which shouldn’t be there. (Posix shortcuts like [:space:] can be a bit confusing!)

As Marc said, the form (?= . ) is what’s known as a positive lookahead. The regex it contains doesn’t form part of the final match but indicates what must come immediately after what is matched:

[format]^(?=[^a-z]|$)[/format]

. matches the beginning of a line in which the beginning is followed either by a character which isn’t a letter or by the end of the line.

[format]^([^a-z]|$)[/format]

. includes the non-letter or line end in the match. The difference is academic in the case of an empty line. It’s actually also academic with ‘process lines’, where the regex is only used to identify the line, not an exact piece of text.

A negative lookahead matches something not followed by whatever the lookahead matches, which is what you seem to need here:

[format]^(?![[:space:]][a-z])[/format]

Since an empty line is one of those not beginning with a space and an alphabetic character, it’s already covered. If you want the space to be literally a space character, not a tab, use a literal space in the regex:

tell application "TextWrangler"
	tell front text document
		-- Delete lines not beginning with a space and a letter.
		process lines containing matching string "^(?! [a-z])" matching with grep true ¬
			output options {deleting matched lines:true}
	end tell
end tell

grep -v is a command-line command which invokes the system’s ‘grep’ program with the -v option. It’s not the same as the ‘grep’ term used in TextEdit’s AppleScript implementation.

By the way, this forum has special tags for posting AppleScript code: [applescript] and [/applescript]. There’s a button for them on the posting page. Enclosing AppleScript code in them causes it to be displayed as above with a clickable link which opens it in the clicker’s default script editor.

ccstone · May 18, 2016, 12:53pm

Hey Jim,

When posting AppleScript please use the {AppleScript} code button and not the {Format} button.

{Format} is for NON-AppleScript code.

–
Best Regards,
Chris

ccstone · May 18, 2016, 2:45pm

Hey Nigel,

Not so.

Starting with this text in the front TextWrangler 5.0.1 document:

[format]
01 Now is the time for all good men to come to the aid of their country.
02 Now is the time for all good men to come to the aid of their country.
03 Now is the time for all good men to come to the aid of their country.
[/format]

All of these work.

You do have to reference the text object in a couple of places though.

(This seems a bit inconsistent to me, so I think I’ll report it to Bare Bones.)

Look for the lines with " → Note ."


-------------------------------------------------------------------------------------------

tell application "TextWrangler"
	tell text of front text document --> Note the use of text here.
		replace "Now" using "¢¢¢" options {search mode:grep, case sensitive:false, starting at top:true}
	end tell
end tell

tell application "TextWrangler"
	tell front text window
		replace "Now" using "¢¢¢" options {search mode:grep, case sensitive:false, starting at top:true}
	end tell
end tell

-------------------------------------------------------------------------------------------

tell application "TextWrangler"
	tell front text document
		process lines containing matching string "2" output options {deleting matched lines:true} ¬
			with matching with grep
	end tell
end tell

tell application "TextWrangler"
	tell text of front text window --> Note the use of text here.
		process lines containing matching string "5" output options {deleting matched lines:true} ¬
			with matching with grep
	end tell
end tell

-------------------------------------------------------------------------------------------

–
Take Care,
Chris

Nigel_Garvey · May 18, 2016, 5:12pm

Thanks, Chris! I’ve modified that post accordingly.