Change Text Case

kai · July 24, 2005, 5:08am

Apologies for the delayed response, NovaScotian. I believe I wrote the comments that prompted your reply - but I take your point about the possible usefulness of such discussions. (Indeed, I’ve found the discussion of differences in script behaviour between OS versions quite illuminating.)

I’ve therefore added a further version (below) that sacrifices some speed & brevity to address several of the issues discussed previously. In exploring these, it’s clear that a comprehensive solution might be possible only with access to a substantial dictionary (proper nouns, acronyms, etc.) - which is perhaps beyond the scope of a relatively simple script such as this. (The script introduces various properties, partly to accelerate runtime execution, but also to accommodate any further adjustments that might be considered necessary.)

I might be able to clear up the confusion there, Nigel. At the time, I was working in Jaguar (not Panther) - which would explain the differences in behaviour. (However, for the record, the script below was written and tested in Tiger).

The problem to which I was referring is a stack overflow error (errOSAStackOverflow: -2706), rather than a crash. In older versions of the Mac OS (including some versions of Mac OS X), the error can occur when the resulting number of string elements (text items, characters [items], words or paragraphs) exceeds about 4,060 (the precise figure can vary).

So, apart from getting a list of characters, the problem has very little to do with actual string length - since it depends primarily on the number of resulting string elements.

While such considerations may be of little concern to those using later versions of the Mac OS, they may still be worth noting for anyone interested in portability (for example, if a script is to be distributed generally - such as in a forum like this).

In later versions, the limit appears to have been removed. However, the algorithm to achieve this seems somewhat buggy - and, where there may be several thousand string elements involved, the dreaded, ever-spinning beach ball can appear. (The point at which this occurs appears to vary quite considerably, so it’s difficult to pin it down with any precision. OMM, it’s very likely to occur above 300,000 or 400,000 items - but has sometimes struck at around 60,000 items.)

Silent hanging aside, there also appear to be efficiency issues when getting particularly long lists of string elements - which may therefore take a disproportionate time to evaluate.

I’ve now modified my original ˜textItems’ handler (see script below) in an effort to side-step all three issues. So far, it seems to have worked quite effectively - as demonstrated by the following results (all the usual caveats for interpreting execution times apply):

			[u]execution time (secs)[/u]

number of with without
text items handler handler

1000	    0.01		  0.01

10000 0.2 0.5
20000 0.4 3.8
30000 0.7 7.8
40000 1.1 11.7
50000 1.4 25.1
60000 1.9 40.8
70000 2.0 42.7
80000 2.7 68.5
90000 3.6 127.8
100000 4.7 138.2

The following version introduces an additional option for case type: “mixed” - a variation of title-case that renders definite and indefinite articles, conjunctions and prepositions as lowercase (except where they start a sentence):

-- syntax : changeCase of someText to caseType
-- someText (string) : plain or encoded text
-- caseType (string) : the type of case required ("upper", "lower", "sentence", "title" or "mixed")

-- "upper" : all uppercase text (no exceptions)
-- "lower" : all lowercase text (no exceptions)
-- "sentence" : uppercase character at start of each sentence, other characters lowercase (apart from words in sentenceModList)
-- "title" : uppercase character at start of each word, other characters lowercase (no exceptions)
-- "mixed" : similar to title, except for definite and indefinite articles, conjunctions and prepositions (see mixedModList) that don't start a sentence

property lowerStr : "abcdefghijklmnopqrstuvwxyzÃ¡Ã Ã¢Ã¤Ã£Ã¥Ã¦Ã§Ã©Ã¨ÃªÃ«ÃÃ¬Ã®Ã¯Ã±Ã³Ã²Ã´Ã¶ÃµÃ¸Å“ÃºÃ¹Ã»Ã¼Ã¿"
property upperStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZÃÃ€Ã‚Ã„ÃƒÃ…Ã†Ã‡Ã‰ÃˆÃŠÃ‹ÃÃŒÃŽÃÃ‘Ã“Ã’Ã”Ã–Ã•Ã˜Å’ÃšÃ™Ã›ÃœÅ¸"
property alphaList : lowerStr's characters & reverse of upperStr's characters
property sentenceBreak : {".", "!", "?"}
property wordBreak : {space, ASCII character 202, tab}
property everyBreak : wordBreak & sentenceBreak
property whiteSpace : wordBreak & {return, ASCII character 10}
property currList : missing value
property sentenceModList : {"i", "i'm", "i'm", "i've", "i've", "I've", "I've", "I'm", "I'm", "I"} (* could be extended to include certain proper nouns, acronyms, etc. *)
property mixedModList : {"By Means Of", "In Front Of", "In Order That", "On Account Of", "Whether Or Not", "According To", "As To", "Aside From", "Because Of", "Even If", "Even Though", "In Case", "Inside Of", "Now That", "Only If", "Out Of", "Owing To", "Prior To", "Subsequent To", "A", "About", "Above", "Across", "After", "Against", "Along", "Although", "Among", "An", "And", "Around", "As", "At", "Because", "Before", "Behind", "Below", "Beneath", "Beside", "Between", "Beyond", "But", "By", "De", "Down", "During", "Except", "For", "From", "If", "In", "Inside", "Into", "Like", "Near", "Of", "Off", "On", "Onto", "Or", "Out", "Outside", "Over", "Past", "Since", "So", "The", "Though", "Through", "Throughout", "To", "Under", "Unless", "Until", "Up", "Upon", "When", "Whereas", "While", "With", "Within", "Without", "Ye", "ye", "without", "within", "with", "while", "whereas", "when", "upon", "up", "until", "unless", "under", "to", "throughout", "through", "though", "the", "so", "since", "past", "over", "outside", "out", "or", "onto", "on", "off", "of", "near", "like", "into", "inside", "in", "if", "from", "for", "except", "during", "down", "de", "by", "but", "beyond", "between", "beside", "beneath", "below", "behind", "before", "because", "at", "as", "around", "and", "an", "among", "although", "along", "against", "after", "across", "above", "about", "a", "subsequent to", "prior to", "owing to", "out of", "only if", "now that", "inside of", "in case", "even though", "even if", "because of", "aside from", "as to", "according to", "whether or not", "on account of", "in order that", "in front of", "by means of"}

on textItems from currTxt
	tell (count currTxt's text items) to if it > 4000 then tell it div 2 to return ¬
		my (textItems from (currTxt's text 1 thru text item it)) & ¬
		my (textItems from (currTxt's text from text item (it + 1) to -1))
	currTxt's text items
end textItems

on initialCap(currTxt)
	tell currTxt to if (count words) > 0 then tell word 1's character 1 to if it is in lowerStr then
		set AppleScript's text item delimiters to it
		tell my (textItems from currTxt) to return beginning & upperStr's character ((count lowerStr's text item 1) + 1) & rest
	end if
	currTxt
end initialCap

to capItems from currTxt against breakList
	repeat with currBreak in breakList
		set text item delimiters to currBreak
		if (count currTxt's text items) > 1 then
			set currList to my (textItems from currTxt)
			repeat with n from 2 to count currList
				set my currList's item n to initialCap(my currList's item n)
			end repeat
			set text item delimiters to currBreak's contents
			tell my currList to set currTxt to beginning & ({""} & rest)
		end if
	end repeat
	currTxt
end capItems

on modItems from currTxt against modList
	set currList to modList
	set currCount to (count modList) div 2
	repeat with currBreak in everyBreak
		set text item delimiters to currBreak
		if (count currTxt's text items) > 1 then repeat with n from 1 to currCount
			set text item delimiters to my currList's item n & currBreak
			if (count currTxt's text items) > 1 then
				set currTxt to textItems from currTxt
				set text item delimiters to my currList's item -n & currBreak
				tell currTxt to set currTxt to beginning & ({""} & rest)
			end if
		end repeat
	end repeat
	currTxt
end modItems

to changeCase of currTxt to caseType
	if (count currTxt's words) is 0 then return currTxt
	
	ignoring case
		tell caseType to set {upper_Case, lower_Case, sentence_Case, title_Case, mixed_Case} to {it is "upper", it is "lower", it is "sentence", it is "title", it is "mixed"}
	end ignoring
	
	if not (upper_Case or lower_Case or title_Case or sentence_Case or mixed_Case) then
		error "The term \"" & caseType & "\" is not a valid case type option. Please use \"upper\", \"lower\", \"sentence\", \"title\" or \"mixed\"."
	else if upper_Case then
		set n to 1
	else
		set n to -1
	end if
	
	considering case
		set tid to text item delimiters
		
		repeat with n from n to n * (count lowerStr) by n
			set text item delimiters to my alphaList's item n
			set currTxt to textItems from currTxt
			set text item delimiters to my alphaList's item -n
			tell currTxt to set currTxt to beginning & ({""} & rest)
		end repeat
		
		if sentence_Case then
			set currTxt to initialCap(modItems from (capItems from currTxt against sentenceBreak) against sentenceModList)
		else if title_Case or mixed_Case then
			set currTxt to initialCap(capItems from currTxt against whiteSpace)
			if mixed_Case then set currTxt to initialCap(capItems from (modItems from currTxt against mixedModList) against sentenceBreak)
		end if
		
		set text item delimiters to tid
	end considering
	currTxt
end changeCase

set someText to "How far you go in life depends on your being TENDER with the YOUNG, COMPASSIONATE with the AGED, SYMPATHETIC with the STRIVING and TOLERANT of the WEAK and STRONG. Because SOMEDAY in your life you will have been ALL of these." (* George Washington Carver. *)

changeCase of someText to "upper" (* "upper", "lower", "sentence", "title" or "mixed" *)

Script subsequently edited to insert underscore characters in certain variable labels (see discussion below)

Results:

upper:
HOW FAR YOU GO IN LIFE DEPENDS ON YOUR BEING TENDER WITH THE YOUNG, COMPASSIONATE WITH THE AGED, SYMPATHETIC WITH THE STRIVING AND TOLERANT OF THE WEAK AND STRONG. BECAUSE SOMEDAY IN YOUR LIFE YOU WILL HAVE BEEN ALL OF THESE.

lower:
how far you go in life depends on your being tender with the young, compassionate with the aged, sympathetic with the striving and tolerant of the weak and strong. because someday in your life you will have been all of these.

sentence:
How far you go in life depends on your being tender with the young, compassionate with the aged, sympathetic with the striving and tolerant of the weak and strong. Because someday in your life you will have been all of these.

title:
How Far You Go In Life Depends On Your Being Tender With The Young, Compassionate With The Aged, Sympathetic With The Striving And Tolerant Of The Weak And Strong. Because Someday In Your Life You Will Have Been All Of These.

mixed:
How Far You Go in Life Depends on Your Being Tender with the Young, Compassionate with the Aged, Sympathetic with the Striving and Tolerant of the Weak and Strong. Because Someday in Your Life You Will Have Been All of These.

Apologies for the length of all this…

Nigel_Garvey · July 24, 2005, 1:34pm

Hi, Kai.

Thanks for your latest contribution to this thread. The script’s pretty remarkable as its “sentence” mode even capitalises correctly after brackets and quotes, which don’t (at first sight) seem to have been explicitly catered for! Maybe I’ll see how it works when I’ve had more time to study it in detail.

After you signed off last time, I worked out a script ” less thorough than yours ” that explicitly treated quotes and brackets as “whitish” space, but I didn’t post it because I began to feel that “sentence” mode itself was a mistake ” at least in the context of discussing techniques on this forum.

“Lower”, “upper”, and “title” modes are easy to implement and have already been adequately covered. Philosophically, they do explicit and grammatically irrelevant things to the text.

The only real use for “sentence” mode, though, is as a tidier-upper of bad typing. As you’ve already noted, it’s far more complex to implement and involves context. The script needs to be versed in the proper nouns and acronyms of the language of the text. It also needs a thorough knowledge of that language’s other grammatical features, many of which can be very difficult to handle. For instance, quoted speech in English:

Or mixed contexts:

Unless “sentence” mode is given some specific, narrowly-defined purpose, it might be best to leave it to a fully-fledged application or to a ‘has’-sized library.

By the way, there’s a potential problem with your ‘uppercase’ and ‘lowercase’ variables. These words are commands in the Satimage OSAX. I wouldn’t care to tell you what they do…

kai · July 24, 2005, 5:06pm

As you’ve no doubt seen by now, Nigel, it simply capitalises the first character of the first word following a ‘sentenceBreak’ - and so effectively ignores any intervening characters. AppleScript’s magic - not mine, I’m afraid.

Agreed. I’d been veering towards this conclusion for a while, but my last attempt really clinched it for me. (I’m sure it’s no coincidence that AppleScript itself readily defines words and paragraphs - but steers well clear of the muddy water of sentences!)

I’d be happier to leave things as a short, quick fix - rather than attempting to go down the tortuous path of refining any further. Nevertheless, the discussion’s been a very interesting one.

Good point. Thanks for the reminder - I had a feeling they looked familiar! I’ve since edited the script in situ to insert underscores in the variable labels, just for safety’s sake.

Nothing quite like reinventing the wheel to while away a few spare moments, right? :lol:

RJamesW · September 14, 2006, 4:23pm

The above conversations are very interesting to me as a very novice scripter–I have learned a lot. I am looking for a way to carry out text modifications of the kind described above (uppercase, lowercase, title, etc) on a list rather than on a string. I think there must be a simple modification or addition to some of the scripts you’ve shared, but the solution has eluded me so far. Any help would be appreciated. Thanks!

Bruce_Phillips · September 14, 2006, 5:08pm

I need this myself. I would try something like this:

on changeCase of subject to |case|
	set returnList to true
	
	if |case| is not in {"lower", "upper", "title", "capitalize"} then ¬
		error "Invalid case for changeCase."
	
	-- Make one-item list if needed
	if class of subject is not list then ¬
		set {subject, returnList} to {{subject}, false}
	
	repeat with i from 1 to (count subject)
		do shell script "/usr/bin/python -c \"import sys; " & ¬
			"print unicode(sys.argv[1], 'utf8')." & ¬
			|case| & "().encode('utf8')\" " & ¬
			quoted form of (item i of subject)
		set item i of subject to result
	end repeat
	
	if not returnList then return first item of subject -- Unicode text
	return subject -- list of Unicode text
end changeCase

changeCase of "hELLO, wORlD!" to "capitalize"

-- Python should handle accented characters as well. Try out this line:
-- changeCase of "hELLo, wORlD! Ã©Ã¨Ãª Ã¤Ã¶Ã¼ Ã±" to "upper"

Thanks to Has for mentioning python before: Capitalize String

RJamesW · September 15, 2006, 6:00pm

Thanks Bruce–works like a charm.

RJamesW · October 6, 2006, 12:10am

My next task should be to check against a list of exceptions for title case, so that conjunctions, most prepositions, articles etc. are not capitalized in my list. Like Kai’s version of changeCase that uses the mixedModList. Any hints on how to modify Kai’s script to accept lists? I get a bit lost in the code, especially with all the switching of text delimiters.

Vincent · October 6, 2006, 10:22am

I’d suggest to build a case changer using perl and regular expressions. Perl supports those special characters.

Just to give you an example:
Make lowercase:

echo 'TÃªst' | perl -pe 'use encoding utf8;s/(\w)/\L$1/gi'

Make uppercase:

echo 'TÃªst' | perl -pe 'use encoding utf8;s/(\w)/\U$1/gi'

Bruce_Phillips · December 17, 2006, 12:51am

If you’d rather use perl instead, then you could just modify the python one I posted above:

on changeCase of subject to someCase
	set returnList to true
	
	if someCase is not in {"L", "U"} then error "Invalid case for changeCase()" number 1
	
	-- Make one-item list if needed
	if subject's class is not list then set {subject, returnList} to {{subject}, false}
	
	count subject
	
	repeat with i from 1 to result
		do shell script "echo " & quoted form of (subject's item i) & " | /usr/bin/perl -pe 'use encoding utf8; s/(\\w)/\\" & someCase & "$1/gi'"
		set subject's item i to result
	end repeat
	
	if not returnList then set subject to subject's first item
	return subject
end changeCase

changeCase of "hELLo, wORlD! Ã©Ã¨Ãª Ã¤Ã¶Ã¼ Ã±" to "U"

nerox · January 10, 2008, 1:00pm

Qwerty Denzel,
Great script, works very well for me.

jrj9 · May 21, 2008, 7:12pm

I know this thread is old but it solved my problem perfectly and I wanted to thank kai for his “mixed” case solution. [Edit: I reposted a new version of the code I think is more useful; just copy the text you want to convert, use this script, and then paste. The newly pasted text will have been converted while on the clipboard.] I post it here in the hopes that it may save someone else some effort:


-- syntax : changeCase of someText to caseType
-- someText (string) : plain or encoded text
-- caseType (string) : the type of case required ("upper", "lower", "sentence", "title" or "mixed")

-- "upper" : all uppercase text (no exceptions)
-- "lower" : all lowercase text (no exceptions)
-- "sentence" : uppercase character at start of each sentence, other characters lowercase (apart from words in sentenceModList)
-- "title" : uppercase character at start of each word, other characters lowercase (no exceptions)
-- "mixed" : similar to title, except for definite and indefinite articles, conjunctions and prepositions (see mixedModList) that don't start a sentence

property lowerStr : "abcdefghijklmnopqrstuvwxyzÃ¡Ã Ã¢Ã¤Ã£Ã¥Ã¦Ã§Ã©Ã¨ÃªÃ«ÃÃ¬Ã®Ã¯Ã±Ã³Ã²Ã´Ã¶ÃµÃ¸Å“ÃºÃ¹Ã»Ã¼Ã¿"
property upperStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZÃÃ€Ã‚Ã„ÃƒÃ…Ã†Ã‡Ã‰ÃˆÃŠÃ‹ÃÃŒÃŽÃÃ‘Ã“Ã’Ã”Ã–Ã•Ã˜Å’ÃšÃ™Ã›ÃœÅ¸"
property alphaList : lowerStr's characters & reverse of upperStr's characters
property sentenceBreak : {".", "!", "?", ":"}
property wordBreak : {space, ASCII character 202, tab}
property everyBreak : wordBreak & sentenceBreak
property whiteSpace : wordBreak & {return, ASCII character 10}
property currList : missing value
property sentenceModList : {"i", "i'm", "i'm", "i've", "i've", "I've", "I've", "I'm", "I'm", "I"} (* could be extended to include certain proper nouns, acronyms, etc. *)
property mixedModList : {"Be", "By Means Of", "In Front Of", "In Order That", "On Account Of", "Whether Or Not", "According To", "As To", "Aside From", "Because Of", "Even If", "Even Though", "In Case", "Inside Of", "Now That", "Only If", "Out Of", "Owing To", "Prior To", "Subsequent To", "A", "About", "Above", "Across", "After", "Against", "Along", "Although", "Among", "An", "And", "Around", "As", "At", "Because", "Before", "Behind", "Below", "Beneath", "Beside", "Between", "Beyond", "But", "By", "De", "Down", "During", "Except", "For", "From", "If", "In", "Inside", "Into", "Like", "Near", "Of", "Off", "On", "Onto", "Or", "Out", "Outside", "Over", "Past", "Since", "So", "The", "Though", "Through", "Throughout", "To", "Under", "Unless", "Until", "Upon", "When", "Whereas", "While", "With", "Within", "Without", "Ye", "ye", "without", "within", "with", "while", "whereas", "when", "upon", "until", "unless", "under", "to", "throughout", "through", "though", "the", "so", "since", "past", "over", "outside", "out", "or", "onto", "on", "off", "of", "near", "like", "into", "inside", "in", "if", "from", "for", "except", "during", "down", "de", "by", "but", "beyond", "between", "beside", "beneath", "below", "behind", "before", "because", "at", "as", "around", "and", "an", "among", "although", "along", "against", "after", "across", "above", "about", "a", "subsequent to", "prior to", "owing to", "out of", "only if", "now that", "inside of", "in case", "even though", "even if", "because of", "aside from", "as to", "according to", "whether or not", "on account of", "in order that", "in front of", "by means of", "be"}

on textItems from currTxt
	tell (count currTxt's text items) to if it > 4000 then tell it div 2 to return my (textItems from (currTxt's text 1 thru text item it)) & my (textItems from (currTxt's text from text item (it + 1) to -1))
	currTxt's text items
end textItems

on initialCap(currTxt)
	tell currTxt to if (count words) > 0 then tell word 1's character 1 to if it is in lowerStr then
		set AppleScript's text item delimiters to it
		tell my (textItems from currTxt) to return beginning & upperStr's character ((count lowerStr's text item 1) + 1) & rest
	end if
	currTxt
end initialCap

to capItems from currTxt against breakList
	repeat with currBreak in breakList
		set text item delimiters to currBreak
		if (count currTxt's text items) > 1 then
			set currList to my (textItems from currTxt)
			repeat with n from 2 to count currList
				set my currList's item n to initialCap(my currList's item n)
			end repeat
			set text item delimiters to currBreak's contents
			tell my currList to set currTxt to beginning & ({""} & rest)
		end if
	end repeat
	currTxt
end capItems

on modItems from currTxt against modList
	set currList to modList
	set currCount to (count modList) div 2
	repeat with currBreak in everyBreak
		set text item delimiters to currBreak
		if (count currTxt's text items) > 1 then repeat with n from 1 to currCount
			set text item delimiters to my currList's item n & currBreak
			if (count currTxt's text items) > 1 then
				set currTxt to textItems from currTxt
				set text item delimiters to my currList's item -n & currBreak
				tell currTxt to set currTxt to beginning & ({""} & rest)
			end if
		end repeat
	end repeat
	currTxt
end modItems

to changeCase of currTxt to caseType
	if (count currTxt's words) is 0 then return currTxt
	
	ignoring case
		tell caseType to set {upper_Case, lower_Case, sentence_Case, title_Case, mixed_Case} to {it is "upper", it is "lower", it is "sentence", it is "title", it is "mixed"}
	end ignoring
	
	if not (upper_Case or lower_Case or title_Case or sentence_Case or mixed_Case) then
		error "The term \"" & caseType & "\" is not a valid case type option. Please use \"upper\", \"lower\", \"sentence\", \"title\" or \"mixed\"."
	else if upper_Case then
		set n to 1
	else
		set n to -1
	end if
	
	considering case
		set tid to text item delimiters
		
		repeat with n from n to n * (count lowerStr) by n
			set text item delimiters to my alphaList's item n
			set currTxt to textItems from currTxt
			set text item delimiters to my alphaList's item -n
			tell currTxt to set currTxt to beginning & ({""} & rest)
		end repeat
		
		if sentence_Case then
			set currTxt to initialCap(modItems from (capItems from currTxt against sentenceBreak) against sentenceModList)
		else if title_Case or mixed_Case then
			set currTxt to initialCap(capItems from currTxt against whiteSpace)
			if mixed_Case then set currTxt to initialCap(capItems from (modItems from currTxt against mixedModList) against sentenceBreak)
		end if
		
		set text item delimiters to tid
	end considering
	currTxt
end changeCase

tell application "Finder"
	copy (the clipboard as list) to {text_returned}
end tell
set someText to text_returned
set cnvrtdText to (changeCase of someText to "mixed") (* "upper", "lower", "sentence", "title" or "mixed" *)
set the clipboard to cnvrtdText

yiam-jin-qui · January 11, 2011, 11:39pm

The basic conversions can be done a little more simply than a reader of this (occasionally rather baroque) old thread might have guessed

on ucase(str)
	do shell script "echo " & quoted form of str & " | tr \"[:lower:]\" \"[:upper:]\""
end ucase

on lcase(str)
	do shell script "echo " & quoted form of str & " | tr \"[:upper:]\" \"[:lower:]\""
end lcase

yiam-jin-qui · January 12, 2011, 12:28pm

And there will be subtle cases, but I find that even the initial letters of words and sentences can often be shifted quite simply:

-- First letter of each word
on TitleCase(str)
	do shell script "echo " & quoted form of str & " | perl -ple 's/(\\w+)/\\u$1/g'"
end TitleCase

-- First letter of each sentence
on SentenceCase(str)
	set {strDelim, my text item delimiters} to {my text item delimiters, ". "}
	set lst to text items of str
	repeat with i from 1 to length of lst
		set item i of lst to do shell script ("echo " & quoted form of (item i of lst) & " | perl -nle 'print ucfirst lc'")
	end repeat
	set {strSentences, my text item delimiters} to {lst as text, strDelim}
	strSentences
end SentenceCase

Nigel_Garvey · January 12, 2011, 4:00pm

Hi, yiam-jin-quin. Welcome to MacScripter and thanks for adding to this thread.

The link in kai’s original post is broken now the site’s been revamped. it should be http://macscripter.net/viewtopic.php?id=12758. There’s another “tr” example there ” and ruby and python improvements.

Most of the discussion in this thread has been about vanilla AppleScript methods ” mainly, I think, because it’s given AppleScripters a challenge into which they can get their teeth. It’s also inspired discussion and exploration of the necessary logic and of the possibilities and implications of AppleScript’s various quirks as they existed at the time.

While shell scripts are undoubtedly better and faster for this and many other purposes, they tend to be totally opaque to people not using them already and are usually posted without any comments to explain how they work. Their educational value on these fora has thus historically been slightly greater than nil, although they may occasionally have inspired people to look into the possibiities of certain commands or to learn the other scripting language(s) involved.

I’m not knocking your shell scripts, of course: just giving my take on the “occasionally rather baroque” spirit of this Code Exchange thread.

yiam-jin-qui · January 12, 2011, 5:36pm

The world would certainly be poorer without the baroque

At the same time, Google will often bring readers in search of a quick and practical fix - so useful, I think, to add a bauhaus postscript.

DJ_Bazzie_Wazzie · January 13, 2011, 12:10am

Applescript is so high level that the next example is completely useless but the concept is useful for standard ascii (127 characters). So if we’re are talking about exploring Applescript like Nigel says I think this example will fit right into it. Like I said, it is useless and only to show you that there is another Applescript-only way.

character_array_to_upper("hello world!") --result: "HELLO WORLD!"
character_array_to_lower("HELLO WORLD!") --result: "hello world!"

on character_array_to_upper(aString)
	set characterArray to every item of aString
	set newCharacterArray to {}
	repeat with aCharacter in characterArray
		set end of newCharacterArray to ASCII character to_upper(ASCII number aCharacter)
	end repeat
	return newCharacterArray as string
end character_array_to_upper

on character_array_to_lower(aString)
	set characterArray to every item of aString
	set newCharacterArray to {}
	repeat with aCharacter in characterArray
		set end of newCharacterArray to ASCII character to_lower(ASCII number aCharacter)
	end repeat
	return newCharacterArray as string
end character_array_to_lower

on to_upper(anInt)
	if anInt < 97 or anInt > 122 then
		return anInt
	end if
	return anInt + ((ASCII number "A") - (ASCII number "a"))
end to_upper

on to_lower(anInt)
	if anInt < 65 or anInt > 90 then
		return anInt
	end if
	return anInt + ((ASCII number "a") - (ASCII number "A"))
end to_lower

Nigel_Garvey · January 13, 2011, 10:23am

Hi, DJ Bazzie Wazzie.

Well now …

‘ASCII number’ and ‘ASCII character’ are deprecated in AppleScript 2.x (Leopard and later).
Your to_upper() and to_lower() handlers are written with the foreknowledge of the ASCII codes for “a”, “z”, “A”, and “Z”, but then use two ‘ASCII number’ calls each to work out the difference between the upper and lower case numbers.
When coercing a list to string, AppleScript’s text item delimiters should be set to “” or the default {“”} beforehand as a precaution in case they’ve been changed elsewhere in the script or in the application running it.

An AppleScript 2.x version of your idea, again only considering standard ASCII characters, would be:

on to_upper(ASCIItext)
	set ASCIIcodes to ASCIItext's id as list -- List coercion in case there's only one character.	
	repeat with i from 1 to (count ASCIItext)
		set thisCode to item i of ASCIIcodes
		if ((thisCode > 96) and (thisCode < 123)) then set item i of ASCIIcodes to thisCode - 32
	end repeat
	
	return character id ASCIIcodes
end to_upper

on to_lower(ASCIItext)
	set ASCIIcodes to ASCIItext's id as list -- List coercion in case there's only one character.	
	repeat with i from 1 to (count ASCIItext)
		set thisCode to item i of ASCIIcodes
		if ((thisCode > 64) and (thisCode < 91)) then set item i of ASCIIcodes to thisCode + 32
	end repeat
	
	return character id ASCIIcodes
end to_lower

to_upper("Hello world!")
to_lower("HELLO WORLD!")

DJ_Bazzie_Wazzie · January 13, 2011, 12:48pm

Thanks Nigel,

Your example 2.0 is much better for today’s Applescript. Maybe I should have commented that my example was only for Tiger and earlier.

Well my example is almost directly translated from C. That’s why I pointed out that it is useless because Applescript is high level. Well as you can see when running a script like this


--in Tiger and older
set characterArray to {}
repeat with x from 1 to 127
set end of characterArray to ascii character x
end
return characterArray as string


--in Leopard and newer
set characterArray to {}
repeat with x from 1 to 127
	set end of characterArray to string id x
end repeat
return characterArray as string

that the sequence of the lower case characters and upper case characters are the same. So I think foreknowledge is not the case here. I agree with you when saying that ascii number “a” - ascii number “A” is tons of overhead I’ve created in my script. But like I said it is a translation from C and we don’t have that problem there.

pneshati · March 5, 2011, 12:55am

Hi. You all have really complex methods of changing the cases, but I used a much simpler way of doing things.

--Capitalize 1.0.1
--Copyright © Panah Neshati, 2011.

on replaceText(find, replace, subject)
	set prevTIDs to text item delimiters of AppleScript
	set text item delimiters of AppleScript to find
	set subject to text items of subject
	
	set text item delimiters of AppleScript to replace
	set subject to "" & subject
	set text item delimiters of AppleScript to prevTIDs
	
	return subject
end replaceText

on capitalize(a)
	set a to (get replaceText("a", "A", a))
	set a to (get replaceText("b", "B", a))
	set a to (get replaceText("c", "C", a))
	set a to (get replaceText("d", "D", a))
	set a to (get replaceText("e", "E", a))
	set a to (get replaceText("f", "F", a))
	set a to (get replaceText("g", "G", a))
	set a to (get replaceText("h", "H", a))
	set a to (get replaceText("i", "I", a))
	set a to (get replaceText("j", "J", a))
	set a to (get replaceText("k", "K", a))
	set a to (get replaceText("l", "L", a))
	set a to (get replaceText("m", "M", a))
	set a to (get replaceText("n", "N", a))
	set a to (get replaceText("o", "O", a))
	set a to (get replaceText("p", "P", a))
	set a to (get replaceText("q", "Q", a))
	set a to (get replaceText("r", "R", a))
	set a to (get replaceText("s", "S", a))
	set a to (get replaceText("t", "T", a))
	set a to (get replaceText("u", "U", a))
	set a to (get replaceText("v", "V", a))
	set a to (get replaceText("w", "W", a))
	set a to (get replaceText("x", "X", a))
	set a to (get replaceText("y", "Y", a))
	set a to (get replaceText("z", "Z", a))
	return a
end capitalize

And, of course, the opposite,

--Lowercase 1.0.1
--Copyright © Panah Neshati, 2011.

on replaceText(find, replace, subject)
	set prevTIDs to text item delimiters of AppleScript
	set text item delimiters of AppleScript to find
	set subject to text items of subject
	
	set text item delimiters of AppleScript to replace
	set subject to "" & subject
	set text item delimiters of AppleScript to prevTIDs
	
	return subject
end replaceText

on lowercase(a)
	set a to (get replaceText("A", "a", a))
	set a to (get replaceText("B", "b", a))
	set a to (get replaceText("C", "c", a))
	set a to (get replaceText("D", "d", a))
	set a to (get replaceText("E", "e", a))
	set a to (get replaceText("F", "f", a))
	set a to (get replaceText("G", "g", a))
	set a to (get replaceText("H", "h", a))
	set a to (get replaceText("I", "i", a))
	set a to (get replaceText("J", "j", a))
	set a to (get replaceText("K", "k", a))
	set a to (get replaceText("L", "l", a))
	set a to (get replaceText("M", "m", a))
	set a to (get replaceText("N", "n", a))
	set a to (get replaceText("O", "o", a))
	set a to (get replaceText("P", "p", a))
	set a to (get replaceText("Q", "q", a))
	set a to (get replaceText("R", "r", a))
	set a to (get replaceText("S", "s", a))
	set a to (get replaceText("T", "t", a))
	set a to (get replaceText("U", "u", a))
	set a to (get replaceText("V", "v", a))
	set a to (get replaceText("W", "w", a))
	set a to (get replaceText("X", "x", a))
	set a to (get replaceText("Y", "y", a))
	set a to (get replaceText("Z", "z", a))
	return a
end lowercase

While it’s not exactly efficient coding, it certainly gets the job done. I’m working on making an improved version that gets the ASCII number of the input, adds 32, and gets the new ASCII character.

I’ll reply when it’s finished.

Nigel_Garvey · March 5, 2011, 11:33am

Hi, pneshati.

That’s exactly what my script in post #40 above does, except that mine uses the character ‘id’ functions introduced with AppleScript 2.0 (in Mac OS 10.5) rather than the now deprecated ‘ASCII character’ and ‘ASCII number’.

You’ll probably find that your “improved version” will be much slower than what you have already, since it will have to call both the ‘ASCII number’ and ‘ASCII character’ functions for every individual character in the text and will have to test each character individually to see if it’s a letter, whether it’s already in the required case, etc.