Working backwards from the end of a filename

stonemtn · April 13, 2006, 3:33pm

Hi

I’m writing a script to look for a space and a number at the end of a filename and convert the space to “_”. I want to only change the space at the end and only when there is a number.

Here’s the applicable code now. It works, but I know there must be a way to move backwards through a filename and find if it ends with a space and a number, and just hit those files. Obviously from the script, all the files are wmf files. This is a droplet and is functioning well, but it is conceivable I will have to continue to add numbers…you’ll see what I mean.

repeat with tFile in tFiles
	tell application "Finder"
		set tName to name of tFile
	end tell
	set tName to replaceText(tName, " 1.wmf", "_1.wmf")
	set tName to replaceText(tName, " 2.wmf", "_2.wmf")
	set tName to replaceText(tName, " 3.wmf", "_3.wmf")
	set tName to replaceText(tName, " 4.wmf", "_4.wmf")
	set tName to replaceText(tName, " 5.wmf", "_5.wmf")
	set tName to replaceText(tName, " 6.wmf", "_6.wmf")
	tell application "Finder"
		set name of tFile to tName
	end tell
end repeat

Thanks for your help.

Stonemtn

Mikey-San · April 13, 2006, 4:48pm

This will work on any file extension. You should test this thoroughly before using it somewhere important, as always.

set sourceFolder to (POSIX path of (choose folder))

do shell script "#!/bin/sh

grep=/usr/bin/grep
mv=/bin/mv
sed=/usr/bin/sed

IFS=$'\\n'

for each in `/bin/ls" & space & quoted form of sourceFolder & space & "| $grep ' [0-9]\\.[^.]*$'`
do
	fileSuffix=`echo \"$each\" | $grep -o ' [0-9]\\.[^.]*$' | $sed -e 's/^ /_/'`
	fileNameWithNewSuffix=`echo \"$each\" | $sed -e 's/ [0-9]\\.[^.]*$/'$fileSuffix'/'`
	$mv" & space & quoted form of sourceFolder & "\"$each\"" & space & quoted form of sourceFolder & "\"$fileNameWithNewSuffix\"
done"

Edit: Improved script in my explanation post below.

stonemtn · April 13, 2006, 5:01pm

Jacques:


repeat with tFile in tFiles
	tell application "Finder" to set tName to name of tFile
	set nbr to the offset of space in (reverse of characters of tName as string) --find last space
	if nbr > 0 then
		tell tName to set tName to text 1 thru -(nbr + 1) & "_" & text -nbr thru -1
		tell application "Finder" to set name of tFile to tName
	end if
end repeat

Thanks Jacques

I want to see if I understand the script. Is nbr set to the number of characters the space is from the end of the filename ? Then, we’re saying, if the amount is more than zero then set tName to all the text up to the space + 1 character and add _ and the last character. Then we’re telling Finder to set the name.

Couple of questions:

Not all of my files have " 1" or " 2" at the end. They are logo names, so we start with “IBM.wmf” and then go to “IBM 2.wmf” etc. So I only want to rename files that have space and a number at the end. Would this script change “Verizon Online” to “Verizon_O”? Is there a simple way to see if to the right of the last space is a single numeral and change the name of just that file?
do I have to “end tell” those tells?

Stonemtn

Adam_Bell · April 13, 2006, 7:17pm

Test some of your files with this:

--repeat with tFile in tFiles
--	tell application "Finder" to set tName to name of tFile
set tName to "my File.wnf" -- or "myFile 123.wnf", or "my file 123.wnf"
set nbr to the offset of space in (reverse of characters of tName as string) --find last space
if nbr > 0 then
	try
		character -(nbr - 1) of tName as number -- errors if character after space is not a number.
		tell tName to set tName to text 1 thru -(nbr + 1) & "_" & text -(nbr - 1) thru -1
	end try
	--tell application "Finder" to set name of tFile to tName
end if
tName
--end repeat

stonemtn · April 13, 2006, 7:49pm

Hallelujah that’s it. Fantastic! I’m running the script (with some changes to make it more efficient from another post of mine) right now on about 26,000 files. We’ll see how it goes!

Stonemtn

Mikey-San · April 13, 2006, 9:15pm

26,000 files and you’re using the Finder?

Adam_Bell · April 13, 2006, 9:26pm

Mikey has a good point, stoneM. When the going gets really tough with any sort of file manipulation a shell script is often much faster. This applies to moving them, changing their names, sorting large lists, etc.

Having said that - I’m familiar with the problem. The shell script provided isn’t alterable by a mere mortal unless that mere mortal understands RegEx (regular expressions), whereas the much slower Finder script is.

Someday Mike, you’ve got to explain one of these in some detail.

Adam

Mikey-San · April 13, 2006, 11:24pm

I’ll whip up a good explanation for this one and post it. (Heading out for a bit now. I normally don’t do “placeholder” posts, but figured this one was warranted.)

I’ll do something similar to what I did in this thread:

http://bbs.applescript.net/viewtopic.php?id=15942

Adam_Bell · April 13, 2006, 11:57pm

Something like that would be much appreciated, Mike.

Adam

Mikey-San · April 14, 2006, 9:04am

I just realized that I made it too complex than it needed to be. I’ll leave that one up there, as it’ll be understandable after I’m done here, and may be somehow useful for study, but here is the improved script, which will be explained by this post.

set sourceFolder to (POSIX path of (choose folder))

do shell script "#!/bin/sh

grep=/usr/bin/grep
mv=/bin/mv
sed=/usr/bin/sed

IFS=$'\\n'

for each in `/bin/ls" & space & quoted form of sourceFolder & space & "| $grep ' [0-9]\\.[^.]*$'`
do
	fileNameWithNewSuffix=`echo \"$each\" | $sed -e 's/ \\([0-9]\\.[^.]*\\)$/_\\1/'`
	$mv" & space & quoted form of sourceFolder & "\"$each\"" & space & quoted form of sourceFolder & "\"$fileNameWithNewSuffix\"
done"

Here we go.

set sourceFolder to (POSIX path of (choose folder))

This does pretty much what it looks like: it generates a POSIX path to a folder of your choosing. Ideally, it contains files whose names we want to play with. Otherwise, there’s a slight chance we’re just burning electrons.

The do shell script portion itself, broken down into lines and methodology:

Shell interpreter line.

Define the specific locations of the programs we’re going to use and save us some typing. They will be called like this later:

Moving on:

Set bash’s internal field separator to the newline character (done in a special way with $ that is out-of-scope for this post). Think “set AppleScript’s text item delimiters to return”, but without all of the noise.

In the AppleScript itself–and I’ll be saying this a couple more times–we have to escape the escape character:

Next line.

Whoa. “For” is a loop control. Think “repeat with x in y”, but in C-like language. “Each” is actually any variable name we want to use. It could be “pomegranate” for all that matters.

We get a listing of the files in the folder we chose earlier, and then ask grep to show me only a certain set of them–we just want to see the files we want to change, and we know what the pattern is that will match them:

But to a regular expression engine, it’s semantically different:

Piece by piece, that’s:

[one whitespace character] - (You’re on your own to figure this one out.)

[one instance of [any number in the rage of 0 to 9]] - [0-9]

[a dot] - .

Note that we escape this with a backslash. Otherwise, it means “match one instance of any single character”, which is not what we want at all.

[a character that isn’t a dot] - [^.]

Note that this doesn’t need to be escaped here. It is taken literally in the character class when it follows the negator operator (the carat).

Wait, why are we doing this? Well, we don’t want to match .foo.txt, we just want the last extension. But this won’t make sense until we build the whole pattern, so we’ll come back to it.

[everything] - Yup. All characters that follow.

[newline character] - Until we hit this.

All of this ends up being:

Put the regex together, and you have:

“Match just that pattern of characters exactly, nothing else.”

About that weird [^.] thing:

The asterisk is a shotgun operator: it’ll match anything. We need to corral it. We know that we don’t want to match multiple extensions, so we want to avoid finding dots in between [space][dot][newline]. Telling the regex engine NOT to match the dot there will prevent that from happening.

If anyone has a question about that, since it’s a bit wacky when you’re learning regexes, Google for “character class” and “negator” together.

So we’ve got our regex ready to fly. But since we’re using this in AppleScript, we need to escape the escape character, so it compiles (escape characters get read backwards, essentially):

Let’s keep the ball rolling. We need to change that pattern now.

That says, “Store into the variable fileNameWithNewSuffix the result of the following command: I want you to echo the text in $each and run the following sed command on it.” We’ll get to that sed command in a moment.

We quote $each in double-quotes and not single-quotes because double-quotes will expand $variables, but single-quotes will not. For our purpose here, we need double-quotes and really, there’s rarely a reason not to apply one of the two forms of quoting to these sorts of things. It’s just good discipline in the shell.

We see a similar, but not identical, regex in the sed command:

(Note that it doesn’t appear encapsulated in single quotes exactly in this manner in the sed command; the marks are here for clarity in this instance.)

The sed command structure itself is best explained like so:

So we’re telling sed to find the pattern and replace it with . . . what the hell is that? First, we need to explain the extra stuff in the expression this time.

When you encapsulate portion of a regex in parenthesis, you get what’s called a backreference. The regex engine stores the result of the encapsulated portion of the pattern match into a reference in memory so you can access it later. We can recall it by referencing it by its index number:

\1 returns the first backreference.

\2 returns the second backreference.

Et cetera.

In English:

We’re going to match a pattern, store a way to recall the text we found, and then recall it later so we don’t have to do any sort of complex string variable juggling.

Sed does backreferencing a little differently, but only in a superficial manner. Instead of ( and ) being the characters, you use escaped versions: ( and ) instead. Six of one, half-dozen of the other.

So, big deep breath:

Which means:

“Match this pattern, store everything but the space into the first backreference, and then replace the matched pattern with an underscore and the value of the first backreference.”

If we echo “file 1.txt” to this sed command, we will end up with “file_1.txt”. The backreference value is “1.txt”, for those keeping track.

As usual, since we’re doing this through AppleScript, we have to add extra escape characters. We need to escape all backslashes and double-quote characters for it to work. The entire line, including the echo, becomes:

WE’RE ALMOST DONE, I SWEAR.

That should be pretty easy to read, if you’ve kept up so far:

Then, repeat with every other item that was matched in the control statement.

Soup’s done.

stonemtn · April 17, 2006, 4:25pm

Yeah, you caught me. I don’t know much about shell scripts, so the Finder was an easier way to go.

That said, Finder did choke when faced with all those files in one directory. I found processing about 3000 files at a time was about the most I could pull off. I do want to note that it was chewing through 3000 in about 90 seconds. That’s pretty good, I’d say.

However, I’m going to dig into this shell script tutorial to improve the next time I have to do something like this. Thanks a lot for the detailed post!!!

David