Special way to change filenames

Hey Joel,

Building on what Nigel did but removing the shell scripting from the rename loop makes this a bit faster.

I would have used the shell instead of the Finder to do the renaming, but I don’t have the time or the oomph at the moment.

There is rudimentary collision-management.

The script renames 58 test files on my system in ~ 0.5 seconds.

Note: I don’t recommend using AppleScript for elaborate mass file-renames, because it’s an operation fraught with possible problems. I own ‘A Better Finder Rename’ and ‘Name Mangler’ (amongst other utilities that can batch-rename), and I usually use one or the other of them for genuinely complex jobs. Preview lets me see what I’m doing and catch collisions or other problems before they happen.

That said - long ago I wrote my own AppleScript rename routine using the Satimage.osax for regex and Keyboard Maestro for a 2-field find/replace dialog (and to run the script). I use it all the time for relatively simple renaming.


set targetFolderPath to ((path to home folder as text) & "test_directory:Nigel_Garvey_Test:")
set posixPathtoTarget to quoted form of (POSIX path of targetFolderPath)

set shCMD to "fileList=$(ls -1 " & posixPathtoTarget & ");
outputNames=$(sed -E '/^[[:alnum:]]+/{
	# Replace all non-alphanumeric character groups with single spaces.
	s/[^[:alnum:]]+/ /g
	# Replace the last space with a full stop.
	s/ ([[:alnum:]]+)$/.\\1/
	# Convert the entire name to lower case.
	y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
	# Store a copy in the hold space.
	h
	# Remove the first space and everything after it from the original.
	s/ .+//
	# Convert what's left (first three characters) to upper case.
	y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
	# Append (a linefeed and) the stored copy to the result.
	G
	# Replace the linefeed and everything up to and including the copy's first space with a dash.
	s/\\n[[:alnum:]]+ /-/
}' <<< \"$fileList\";);
echo \"$fileList%%%%%$outputNames\";
"

set {oldTIDS, AppleScript's text item delimiters} to {AppleScript's text item delimiters, "%%%%%"}
set {oldNameList, newNameList} to text items of (do shell script shCMD)
set AppleScript's text item delimiters to return
set oldNameList to paragraphs of oldNameList
set newNameList to paragraphs of newNameList
set AppleScript's text item delimiters to oldTIDS

tell application "Finder"
	repeat with i from 1 to length of oldNameList
		set _file to (targetFolderPath & item i of oldNameList) as alias
		try
			(targetFolderPath & item i of newNameList) as alias
			set name of _file to "temp.temp"
			set name of ((targetFolderPath & "temp.temp") as alias) to ("COPY " & item i of newNameList)
		on error
			try
				set name of _file to (item i of newNameList)
			on error
				set name of _file to "temp.temp"
				set name of ((targetFolderPath & "temp.temp") as alias) to ("COPY " & item i of newNameList)
			end try
		end try
	end repeat
end tell

For Nigel & Yvan,

As you probably know I’d do this job on my machine with the Satimage.osax, but Perl works.


do shell script "echo \"été...gaRÇON-ÉLÈVE,./,CitroËn....txt\" | perl \"-Mopen qw/:std :utf8/\" -pe '$_=lc'"

I don’t know how to write that in a standard Perl script just yet, so the command-line version will have to do.

If I feel like it later I’ll come back and rewrite the whole script in Perl just for fun.

For Nigel & Yvan,

Nigel. One thing you may not know is that the Satimage.osax can find/replace within lists, so you can do your text-massage in one-fell-swoop and make things quite a lot more efficient.

The commented-out line is a demo of a simple error-check, although I have once again implemented rudimentary collision-control later.


set srcFldrPath to ((path to home folder as text) & "test_directory:Nigel_Garvey_Test:")
tell application "Finder" to set oldNameList to name of every file of folder srcFldrPath
set newNameList to (change {"[^[:alnum:]]+", "^([[:alnum:]]+) (.+) ([[:alnum:]]+)$"} into {" ", "\\u\\1-\\l\\2.\\l\\3"} in oldNameList with regexp)
set oldNameList to change "^" into srcFldrPath in oldNameList with regexp
set AppleScript's text item delimiters to "."

# if length of (sortlist newNameList with remove duplicates) ≠ length of newNameList then error "Duplicate filenames found!"

tell application "Finder"
	repeat with i from 1 to length of oldNameList
		set _file to (item i of oldNameList) as alias
		set _name to item i of newNameList
		try
			set name of _file to _name
		on error e
			set name of _file to text item 1 of _name & ".COPY." & text item 2 of _name
		end try
	end repeat
end tell

The lowercase command also works with lists and is faster in general for mass case conversion. (I haven’t tested against really large texts though.)


set oldNameList to lowercase oldNameList

Thanks, Chris! That’s handy to know. I knew it worked with multiple lines in a text (that’s how I developed the regex, in fact), but not that it could also handle multiple texts in an AppleScript list ” and multiple lines in multiple texts, it appears. A versatile beast! :slight_smile:

Hey Nigel and guys,

First of all, thanks Nigel for your correct script , it works like a charm

I got A new Buzzz…

I need it to ignore files that starts with an “_”

I know the formula of :

“”“”“”"
set x to “abcde"
if first character of x is not "
” then
return true
else
return false
end if

“”“”“”"

Witch works well…

But It won’t work when I imply it in this.


set x to (choose folder)

tell application "Finder" to set oldNames to name of every file of x

if first character of oldNames is not "_" then
	
	repeat with thename in oldNames
		set z to (do shell script "echo " & quoted form of thename & " | sed -E '
s/[^[:alnum:]]+/ /g
s/ ([[:alnum:]]+)$/.\\1/
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
h
s/ .+//
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/\\n[[:alnum:]]+ /-/'")
		tell application "Finder" to set name of file thename of x to z
	end repeat
	
end if

What am I doing wrong ???
can any one help me please

Hi.

‘oldNames’ is a list containing the original names of the files. You need to apply the “_” test to the individual names in the list, not to the list itself:


set x to (choose folder)

tell application "Finder" to set oldNames to name of every file of x

repeat with thename in oldNames
	if (thename does not start with "_") then
		set z to (do shell script "echo " & quoted form of thename & " | sed -E '
s/[^[:alnum:]]+/ /g
s/ ([[:alnum:]]+)$/.\\1/
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
h
s/ .+//
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/\\n[[:alnum:]]+ /-/'")
		tell application "Finder" to set name of file thename of x to z
	end if
end repeat

You could instead tell the Finder to ‘set oldNames to name of every file of x whose name does not start with “_”’, but having the script do the filtering as above is faster.

Hey Guys :slight_smile:

Thumbs Up, with a big Thanks for the help…

New development,

I need to replace the French Accent [é è ê …] with e

But the script won’t work… It replace with e and a space. for example : Joël => Joe l
But then I try to replace it with X but i saw that is does not even change it too X ?
I don’t get it, it’s like if it skip my line of code…

The tree first SED line are mine that i’m trying to make it work.

Sample of files input
aCT“élèctîons scôlaïres.JPG
aCT“élèctîons scôlaïres.txt

gives me “output”:
ACT-e le cti ons sco lai res.jpg
ACT-e le cti ons sco lai res.txt

Should give
ACT-elections scolaires.jpg
ACT-elections scolaires.txt


set x to (choose folder)

tell application "Finder" to set oldNames to name of every file of x

repeat with thename in oldNames
	if (thename does not start with "_") then
		set z to (do shell script "echo " & quoted form of thename & " | sed -E '
s/[éÉèÈêÊëË]/e/g
s/[áÁà ÀâÂäÄ]/a/g
s/[íÍìÌîÎïÏ]/i/g
# Replace all non-alphanumeric character groups with single spaces.
s/[^[:alnum:]]+/ /g
# Replace the last space with a full stop.
s/ ([[:alnum:]]+)$/.\\1/
# Convert the entire name to lower case.
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
# Store a copy in the hold space.
h
# Remove the first space and everything after it from the original.
s/ .+//
# Convert what's left (first three characters) to upper case.
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
# Append (a linefeed and) the stored copy to the result.
G
# Replace the linefeed and everything up to and including the copy's first space with a dash.
s/\\n[[:alnum:]]+ /-/'")
		
		tell application "Finder" to set name of file thename of x to z
	end if
end repeat

Hi Joe,

I think accented characters were actually two keystrokes. You might want to use the unicode numbers.

Edited: but the gurus will be here soon. :slight_smile:

gl,
kel

Here’s a version that requires Yosemite (or Mavericks with some changes), and uses AppleScriptObjC instead of sed. Cocoa uses ICU regex, which lacks case-conversion operators in replacement templates, so it ends up having to do some extra work. Still, in all but typing time it’s going to be faster than sed.

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

set x to (choose folder)

tell application "Finder" to set oldNames to name of every file of x

repeat with thename in oldNames
	if (thename does not start with "_") then
		set z to cleanFileName(thename)
		tell application "Finder" to set name of file thename of x to z
	end if
end repeat

on cleanFileName(thename)
	-- make an NSString
	set anNSString to current application's NSString's stringWithString:thename
	-- fold to lowercase, ASCII-only string; easiest way is to make ASCII data and convert back to NSString
	set theData to anNSString's dataUsingEncoding:(current application's NSASCIIStringEncoding) allowLossyConversion:true
	set anNSString to (current application's NSString's alloc()'s initWithData:theData encoding:(current application's NSASCIIStringEncoding))'s lowercaseString()
	-- replace non alphanumeric characters with single space
	set anNSString to anNSString's stringByReplacingOccurrencesOfString:"[^[a-z][A-Z][0-9]\\n\\r]+" withString:" " options:(current application's NSRegularExpressionSearch) range:(current application's NSMakeRange(0, anNSString's |length|()))
	-- put in first hyphen and the extension dot, and make it into a mutable string
	set anNSString to (anNSString's stringByReplacingOccurrencesOfString:"(?m)^(^[[a-z][A-Z][0-9]]+) (.+) ([[a-z][A-Z][0-9]]+)$" withString:"$1-$2.$3" options:(current application's NSRegularExpressionSearch) range:(current application's NSMakeRange(0, anNSString's |length|())))'s mutableCopy()
	-- search for first block of letters and uppercase them
	set theNSRegularExpression to current application's NSRegularExpression's regularExpressionWithPattern:"(?m)^(^[[a-z][A-Z][0-9]]+)" options:0 |error|:(missing value)
	set findsNSArray to (theNSRegularExpression's matchesInString:anNSString options:0 range:{location:0, |length|:anNSString's |length|()}) as list
	repeat with anNSTextCheckingResult in findsNSArray
		set theRange to (anNSTextCheckingResult's rangeAtIndex:1)
		(anNSString's replaceCharactersInRange:theRange withString:((anNSString's substringWithRange:theRange)'s uppercaseString()))
	end repeat
	-- coerce back to text
	return anNSString as text
end cleanFileName

And, well, just because, here’s a version of the clean-up handler that eschews regex in favor of a text scanner. It probably looks a little more complicated, but in this case it’s faster than regex, at least if you do each name individually.

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

cleanFileName("aCt_trân.flow“fréight.jpd")

on cleanFileName(theName)
	-- make an NSString
	set anNSString to current application's NSString's stringWithString:theName
	-- fold to lowercase, ASCII-only string; easiest way is to make ASCII data and convert back to NSString
	set theData to anNSString's dataUsingEncoding:(current application's NSASCIIStringEncoding) allowLossyConversion:true
	set anNSString to (current application's NSString's alloc()'s initWithData:theData encoding:(current application's NSASCIIStringEncoding))'s lowercaseString()
	-- make a set of chars we're interested in keeping
	set theNSCharacterSet to current application's NSCharacterSet's characterSetWithCharactersInString:"abcdefghijklmnopqrstuvwxyz0123456789"
	-- make a scanner from string
	set theNSScanner to current application's NSScanner's scannerWithString:anNSString
	-- scan the first block of alphanumeric characters, uppercase them, and add a hyphen
	set {theResult, thePart} to theNSScanner's scanCharactersFromSet:theNSCharacterSet intoString:(reference)
	set theFinalString to ((thePart's uppercaseString()) as text) & "-"
	set lastPart to "" -- will store each block as it's scanned
	-- start looping
	repeat
		-- scan past anything not in our set
		set theResult to theNSScanner's scanUpToCharactersFromSet:theNSCharacterSet intoString:(missing value)
		-- if it returns false, the last block of characters we scanned must be the extension, so add dot plus block and exit repeat
		if theResult as boolean = false then
			set theFinalString to theFinalString & "." & lastPart
			exit repeat
		end if
		-- must have found some unwanted text, so add the last block, inserting a space if we're past the initial block
		if theFinalString ends with "-" then
			set theFinalString to theFinalString & lastPart
		else
			set theFinalString to theFinalString & " " & lastPart
		end if
		-- scan the next block of characters and store it
		set {theResult, lastPart} to theNSScanner's scanCharactersFromSet:theNSCharacterSet intoString:(reference)
		if theResult as boolean = false then exit repeat -- shouldn't happen, but just in case
	end repeat
	return theFinalString
end cleanFileName

Sorry Shane, but your script does not function with my mac.
It tells me << Syntax Error Expected end of line, etc. but found “:” >>

Are you running Yosemite or Mavericks?

It’s at work, can’t update :frowning:
OS X Mountain Lion 10.8.5

Sorry – that’s why I said “Here’s a version that requires Yosemite (or Mavericks with some changes)”. Perhaps Nigel or Chris will have non-ASObjC solution.

Maybe what you can do is use AppleScript’s text item delimiters and replace the old mac extended characters first. Then send to sed.

gl,
kel

The difficulty with sed here is that the Unicode convention used to represent accented characters in the filing system isn’t the same as that compiled into the sed code which looks for them!

For Latin characters with things added above or below them, it seems enough (on the testing I’ve done so far) simply to delete the codes for the added accents ” that is, to make the top line of the sed program:

However, for other characters, or for Latin characters modified in other ways, it does seem necessary to seek them out specifically. This should not be done using square brackets but with bars. For instance, in the current situation, sed would see both “ø” and “Ø” as groups of characters rather than as single characters. So s/[øØ]/o/g would be interpreted as: replace any of the component characters of “ø” or of “Ø” with “o”. But s/ø|Ø/o/g would mean: replace any of the character sequences which make up “ø” or “Ø” with “o”. This sort of check should take place before the deletion of unprintable characters I’ve described above.

Although I haven’t the compared speeds, Shane’s claim that his ASObjC code is faster than sed is probably right. One contributory factor to this is that it takes a comparatively large amount of time to call a shell script ” and in the scripts above, the shell script is called for each name processed. But almost the same sed code can process all the file names in one call if they’re presented to it as a linefeed-delimited text:


set x to (choose folder)

tell application "Finder" to set currentNames to name of every file of x

-- Also get the exisiting names as a single, linefeed-delimited text.
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set currentNamesText to currentNames as text
set AppleScript's text item delimiters to astid

set newNames to paragraphs of (do shell script "echo " & quoted form of currentNamesText & " | sed -E '
/^_/ !{ # In each name which doesn't begin with "_" (probably little advantage to this check) .
	# Replace any likely non-accented exotic characters. (Make your own list.)
	s/ø|Ø/o/g
	s/ł|Ł/l/g
	s/ß/ss/g
	# Replace any en- or em- dashes with spaces.
	s/“|”/ /g
	# Delete what are hopefully are just accent codes!
	s/[^[:print:]]//g
	# Replace all non-alphanumeric character groups with single spaces.
	s/[^[:alnum:]]+/ /g
	# Replace the last space with a full stop.
	s/ ([[:alnum:]]+)$/.\\1/
	# Convert the entire name to lower case.
	y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
	# Store a copy in the hold space.
	h
	# Remove the first space and everything after it from the original.
	s/ .+//
	# Convert what's left (first three characters) to upper case.
	y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
	# Append (a linefeed and) the stored copy to the result.
	G
	# Replace the linefeed and everything up to and including the copy's first space with a dash.
	s/\\n[[:alnum:]]+ /-/
}'")

-- Rename files as appropriate:
repeat with i from 1 to (count currentNames)
	set thisCurrentName to item i of currentNames
	if (thisCurrentName does not start with "_") then
		set thisNewName to item i of newNames
		if (thisNewName is not thisCurrentName) then
			tell application "Finder" to set name of file thisCurrentName of x to thisNewName
		end if
	end if
end repeat

If there are hundreds or thousands of files in the folder, it’ll be necessary to use some other way to get their names into the shell script, which can’t be more than a certain (but fairly generous) length.

Edit: Incomplete edits made when posting corrected. A couple of variable names changed.

That’s right – both the methods I posted take much less time than the simplest of shell script calls. The scanner method, although the fastest, offers no chance for bulk processing, but the regex method could be reworked for bulk changes. But the extra work in case-conversion would probably slow it down on a very long list of names.

Hello

It seems that for you characters like Å’, Æ are « strange characters » which must be dropped.

I tested the two Shane’s scripts using ASObjC and saw that when they are asked to execute
cleanFileName(“aCt_trân.flow“fréight KÅ’NIG.jpd”)
both return
“ACT-tran flow freight k nig.jpd”

I am a bit puzzled because, as far as I know, Œ or Æ are no more strange than â and é.
None are ASCII characters.

More, if I ask them to :
cleanFileName(“aCt_trân.flow“fréight KÔNIG.jpd”)
both scripts return :
“ACT-tran flow freight konig.jpd”

I’m not unhappy, just puzzled :rolleyes:

Yvan KOENIG (VALLAURIS, France) samedi 25 octobre 2014 15:43:19

Well that’s a fly in the ointment. Æ is working fine here, but not Å’. I wonder why?

The only alternative I can think of is to use my ASObjCExtras.framework and its ICU transform – unfortunately ASObjC can’t get at ICU transforms directly. So it would be like this:

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"
use framework "ASObjCExtras" -- for ICU transform

cleanFileName("aCt_trân.Æflow-cœfréightŒ.jpd")

on cleanFileName(theName)
	-- make an NSString
	set anNSString to current application's NSString's stringWithString:theName
	-- fold to lowercase, ASCII-only string; uses method from ASObjCExtras
	set anNSString to current application's SMSFord's stringFrom:anNSString ICUTransform:"Latin-ASCII; Lower" inverse:false
	-- replace non alphanumeric characters with single space
	set anNSString to anNSString's stringByReplacingOccurrencesOfString:"[^[a-z][A-Z][0-9]\\n\\r]+" withString:" " options:(current application's NSRegularExpressionSearch) range:(current application's NSMakeRange(0, anNSString's |length|()))
	-- put in first hyphen and the extension dot, and make it into a mutable string
	set anNSString to (anNSString's stringByReplacingOccurrencesOfString:"(?m)^(^[[a-z][A-Z][0-9]]+) (.+) ([[a-z][A-Z][0-9]]+)$" withString:"$1-$2.$3" options:(current application's NSRegularExpressionSearch) range:(current application's NSMakeRange(0, anNSString's |length|())))'s mutableCopy()
	-- search for first three letters and uppercase them
	set theNSRegularExpression to current application's NSRegularExpression's regularExpressionWithPattern:"(?m)^(^[[a-z][A-Z][0-9]]+)" options:0 |error|:(missing value)
	set findsNSArray to (theNSRegularExpression's matchesInString:anNSString options:0 range:{location:0, |length|:anNSString's |length|()}) as list
	repeat with anNSTextCheckingResult in findsNSArray
		set theRange to (anNSTextCheckingResult's rangeAtIndex:1)
		(anNSString's replaceCharactersInRange:theRange withString:((anNSString's substringWithRange:theRange)'s uppercaseString()))
	end repeat
	-- coerce back to text
	return anNSString as text
end cleanFileName

--> "ACT-tran aeflow coefreightoe.jpd"

The only consolation is that it’s also a whisker faster.

Thanks Shane.

I understood that in your original scripts you deliberately used an instruction dropping several characters.
I was puzzled by the fact that â and é weren’t dropped.
And more, as I didn’t really understand the other part of the scripts, I was wondering if the dropping of several chars was required by the original question.
Given your late message, it seems that it was not really required and that your “old” answers were code interpreting the original question in a very severe way with a very selective filter.

Yvan KOENIG (VALLAURIS, France) dimanche 26 octobre 2014 09:57:26