Special way to change filenames

kel1 · October 25, 2014, 4:42am

Maybe what you can do is use AppleScript’s text item delimiters and replace the old mac extended characters first. Then send to sed.

gl,
kel

Nigel_Garvey · October 25, 2014, 11:29am

The difficulty with sed here is that the Unicode convention used to represent accented characters in the filing system isn’t the same as that compiled into the sed code which looks for them!

For Latin characters with things added above or below them, it seems enough (on the testing I’ve done so far) simply to delete the codes for the added accents ” that is, to make the top line of the sed program:

However, for other characters, or for Latin characters modified in other ways, it does seem necessary to seek them out specifically. This should not be done using square brackets but with bars. For instance, in the current situation, sed would see both “Ã¸” and “Ã˜” as groups of characters rather than as single characters. So s/[Ã¸Ã˜]/o/g would be interpreted as: replace any of the component characters of “Ã¸” or of “Ã˜” with “o”. But s/Ã¸|Ã˜/o/g would mean: replace any of the character sequences which make up “Ã¸” or “Ã˜” with “o”. This sort of check should take place before the deletion of unprintable characters I’ve described above.

Although I haven’t the compared speeds, Shane’s claim that his ASObjC code is faster than sed is probably right. One contributory factor to this is that it takes a comparatively large amount of time to call a shell script ” and in the scripts above, the shell script is called for each name processed. But almost the same sed code can process all the file names in one call if they’re presented to it as a linefeed-delimited text:


set x to (choose folder)

tell application "Finder" to set currentNames to name of every file of x

-- Also get the exisiting names as a single, linefeed-delimited text.
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set currentNamesText to currentNames as text
set AppleScript's text item delimiters to astid

set newNames to paragraphs of (do shell script "echo " & quoted form of currentNamesText & " | sed -E '
/^_/ !{ # In each name which doesn't begin with "_" (probably little advantage to this check) .
	# Replace any likely non-accented exotic characters. (Make your own list.)
	s/Ã¸|Ã˜/o/g
	s/Å‚|Å/l/g
	s/ÃŸ/ss/g
	# Replace any en- or em- dashes with spaces.
	s/“|”/ /g
	# Delete what are hopefully are just accent codes!
	s/[^[:print:]]//g
	# Replace all non-alphanumeric character groups with single spaces.
	s/[^[:alnum:]]+/ /g
	# Replace the last space with a full stop.
	s/ ([[:alnum:]]+)$/.\\1/
	# Convert the entire name to lower case.
	y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
	# Store a copy in the hold space.
	h
	# Remove the first space and everything after it from the original.
	s/ .+//
	# Convert what's left (first three characters) to upper case.
	y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
	# Append (a linefeed and) the stored copy to the result.
	G
	# Replace the linefeed and everything up to and including the copy's first space with a dash.
	s/\\n[[:alnum:]]+ /-/
}'")

-- Rename files as appropriate:
repeat with i from 1 to (count currentNames)
	set thisCurrentName to item i of currentNames
	if (thisCurrentName does not start with "_") then
		set thisNewName to item i of newNames
		if (thisNewName is not thisCurrentName) then
			tell application "Finder" to set name of file thisCurrentName of x to thisNewName
		end if
	end if
end repeat

If there are hundreds or thousands of files in the folder, it’ll be necessary to use some other way to get their names into the shell script, which can’t be more than a certain (but fairly generous) length.

Edit: Incomplete edits made when posting corrected. A couple of variable names changed.

Shane_Stanley · October 25, 2014, 11:46am

That’s right – both the methods I posted take much less time than the simplest of shell script calls. The scanner method, although the fastest, offers no chance for bulk processing, but the regex method could be reworked for bulk changes. But the extra work in case-conversion would probably slow it down on a very long list of names.

Yvan_Koenig · October 25, 2014, 1:44pm

Hello

It seems that for you characters like Å’, Ã† are « strange characters » which must be dropped.

I tested the two Shane’s scripts using ASObjC and saw that when they are asked to execute
cleanFileName(“aCt_trÃ¢n.flow“frÃ©ight KÅ’NIG.jpd”)
both return
“ACT-tran flow freight k nig.jpd”

I am a bit puzzled because, as far as I know, Å’ or Ã† are no more strange than Ã¢ and Ã©.
None are ASCII characters.

More, if I ask them to :
cleanFileName(“aCt_trÃ¢n.flow“frÃ©ight KÃ”NIG.jpd”)
both scripts return :
“ACT-tran flow freight konig.jpd”

I’m not unhappy, just puzzled :rolleyes:

Yvan KOENIG (VALLAURIS, France) samedi 25 octobre 2014 15:43:19

Shane_Stanley · October 26, 2014, 1:11am

Well that’s a fly in the ointment. Ã† is working fine here, but not Å’. I wonder why?

The only alternative I can think of is to use my ASObjCExtras.framework and its ICU transform – unfortunately ASObjC can’t get at ICU transforms directly. So it would be like this:

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"
use framework "ASObjCExtras" -- for ICU transform

cleanFileName("aCt_trÃ¢n.Ã†flow-cÅ“frÃ©ightÅ’.jpd")

on cleanFileName(theName)
	-- make an NSString
	set anNSString to current application's NSString's stringWithString:theName
	-- fold to lowercase, ASCII-only string; uses method from ASObjCExtras
	set anNSString to current application's SMSFord's stringFrom:anNSString ICUTransform:"Latin-ASCII; Lower" inverse:false
	-- replace non alphanumeric characters with single space
	set anNSString to anNSString's stringByReplacingOccurrencesOfString:"[^[a-z][A-Z][0-9]\\n\\r]+" withString:" " options:(current application's NSRegularExpressionSearch) range:(current application's NSMakeRange(0, anNSString's |length|()))
	-- put in first hyphen and the extension dot, and make it into a mutable string
	set anNSString to (anNSString's stringByReplacingOccurrencesOfString:"(?m)^(^[[a-z][A-Z][0-9]]+) (.+) ([[a-z][A-Z][0-9]]+)$" withString:"$1-$2.$3" options:(current application's NSRegularExpressionSearch) range:(current application's NSMakeRange(0, anNSString's |length|())))'s mutableCopy()
	-- search for first three letters and uppercase them
	set theNSRegularExpression to current application's NSRegularExpression's regularExpressionWithPattern:"(?m)^(^[[a-z][A-Z][0-9]]+)" options:0 |error|:(missing value)
	set findsNSArray to (theNSRegularExpression's matchesInString:anNSString options:0 range:{location:0, |length|:anNSString's |length|()}) as list
	repeat with anNSTextCheckingResult in findsNSArray
		set theRange to (anNSTextCheckingResult's rangeAtIndex:1)
		(anNSString's replaceCharactersInRange:theRange withString:((anNSString's substringWithRange:theRange)'s uppercaseString()))
	end repeat
	-- coerce back to text
	return anNSString as text
end cleanFileName

--> "ACT-tran aeflow coefreightoe.jpd"

The only consolation is that it’s also a whisker faster.

Yvan_Koenig · October 26, 2014, 8:57am

Thanks Shane.

I understood that in your original scripts you deliberately used an instruction dropping several characters.
I was puzzled by the fact that Ã¢ and Ã© weren’t dropped.
And more, as I didn’t really understand the other part of the scripts, I was wondering if the dropping of several chars was required by the original question.
Given your late message, it seems that it was not really required and that your “old” answers were code interpreting the original question in a very severe way with a very selective filter.

Yvan KOENIG (VALLAURIS, France) dimanche 26 octobre 2014 09:57:26