Change Text Case

Hi, Kai.

Thanks for your latest contribution to this thread. The script’s pretty remarkable as its “sentence” mode even capitalises correctly after brackets and quotes, which don’t (at first sight) seem to have been explicitly catered for! Maybe I’ll see how it works when I’ve had more time to study it in detail. :slight_smile:

After you signed off last time, I worked out a script ” less thorough than yours ” that explicitly treated quotes and brackets as “whitish” space, but I didn’t post it because I began to feel that “sentence” mode itself was a mistake ” at least in the context of discussing techniques on this forum.

“Lower”, “upper”, and “title” modes are easy to implement and have already been adequately covered. Philosophically, they do explicit and grammatically irrelevant things to the text.

The only real use for “sentence” mode, though, is as a tidier-upper of bad typing. As you’ve already noted, it’s far more complex to implement and involves context. The script needs to be versed in the proper nouns and acronyms of the language of the text. It also needs a thorough knowledge of that language’s other grammatical features, many of which can be very difficult to handle. For instance, quoted speech in English:

Or mixed contexts:

Unless “sentence” mode is given some specific, narrowly-defined purpose, it might be best to leave it to a fully-fledged application or to a ‘has’-sized library.

By the way, there’s a potential problem with your ‘uppercase’ and ‘lowercase’ variables. These words are commands in the Satimage OSAX. I wouldn’t care to tell you what they do… :wink:

As you’ve no doubt seen by now, Nigel, it simply capitalises the first character of the first word following a ‘sentenceBreak’ - and so effectively ignores any intervening characters. AppleScript’s magic - not mine, I’m afraid. :wink:

Agreed. I’d been veering towards this conclusion for a while, but my last attempt really clinched it for me. (I’m sure it’s no coincidence that AppleScript itself readily defines words and paragraphs - but steers well clear of the muddy water of sentences!)

I’d be happier to leave things as a short, quick fix - rather than attempting to go down the tortuous path of refining any further. Nevertheless, the discussion’s been a very interesting one. :slight_smile:

Good point. Thanks for the reminder - I had a feeling they looked familiar! I’ve since edited the script in situ to insert underscores in the variable labels, just for safety’s sake.

Nothing quite like reinventing the wheel to while away a few spare moments, right? :lol:

The above conversations are very interesting to me as a very novice scripter–I have learned a lot. I am looking for a way to carry out text modifications of the kind described above (uppercase, lowercase, title, etc) on a list rather than on a string. I think there must be a simple modification or addition to some of the scripts you’ve shared, but the solution has eluded me so far. Any help would be appreciated. Thanks!

I need this myself. I would try something like this:

on changeCase of subject to |case|
	set returnList to true
	
	if |case| is not in {"lower", "upper", "title", "capitalize"} then ¬
		error "Invalid case for changeCase."
	
	-- Make one-item list if needed
	if class of subject is not list then ¬
		set {subject, returnList} to {{subject}, false}
	
	repeat with i from 1 to (count subject)
		do shell script "/usr/bin/python -c \"import sys; " & ¬
			"print unicode(sys.argv[1], 'utf8')." & ¬
			|case| & "().encode('utf8')\" " & ¬
			quoted form of (item i of subject)
		set item i of subject to result
	end repeat
	
	if not returnList then return first item of subject -- Unicode text
	return subject -- list of Unicode text
end changeCase

changeCase of "hELLO, wORlD!" to "capitalize"

-- Python should handle accented characters as well. Try out this line:
-- changeCase of "hELLo, wORlD! éèê äöü ñ" to "upper"

Thanks to Has for mentioning python before: Capitalize String

Thanks Bruce–works like a charm.

My next task should be to check against a list of exceptions for title case, so that conjunctions, most prepositions, articles etc. are not capitalized in my list. Like Kai’s version of changeCase that uses the mixedModList. Any hints on how to modify Kai’s script to accept lists? I get a bit lost in the code, especially with all the switching of text delimiters.

I’d suggest to build a case changer using perl and regular expressions. Perl supports those special characters.

Just to give you an example:
Make lowercase:

echo 'Têst' | perl -pe 'use encoding utf8;s/(\w)/\L$1/gi'

Make uppercase:

echo 'Têst' | perl -pe 'use encoding utf8;s/(\w)/\U$1/gi'

If you’d rather use perl instead, then you could just modify the python one I posted above:

on changeCase of subject to someCase
	set returnList to true
	
	if someCase is not in {"L", "U"} then error "Invalid case for changeCase()" number 1
	
	-- Make one-item list if needed
	if subject's class is not list then set {subject, returnList} to {{subject}, false}
	
	count subject
	
	repeat with i from 1 to result
		do shell script "echo " & quoted form of (subject's item i) & " | /usr/bin/perl -pe 'use encoding utf8; s/(\\w)/\\" & someCase & "$1/gi'"
		set subject's item i to result
	end repeat
	
	if not returnList then set subject to subject's first item
	return subject
end changeCase

changeCase of "hELLo, wORlD! éèê äöü ñ" to "U"

Qwerty Denzel,
Great script, works very well for me.

I know this thread is old but it solved my problem perfectly and I wanted to thank kai for his “mixed” case solution. [Edit: I reposted a new version of the code I think is more useful; just copy the text you want to convert, use this script, and then paste. The newly pasted text will have been converted while on the clipboard.] I post it here in the hopes that it may save someone else some effort:


-- syntax : changeCase of someText to caseType
-- someText (string) : plain or encoded text
-- caseType (string) : the type of case required ("upper", "lower", "sentence", "title" or "mixed")

-- "upper" : all uppercase text (no exceptions)
-- "lower" : all lowercase text (no exceptions)
-- "sentence" : uppercase character at start of each sentence, other characters lowercase (apart from words in sentenceModList)
-- "title" : uppercase character at start of each word, other characters lowercase (no exceptions)
-- "mixed" : similar to title, except for definite and indefinite articles, conjunctions and prepositions (see mixedModList) that don't start a sentence

property lowerStr : "abcdefghijklmnopqrstuvwxyzáà âäãåæçéèêëíìîïñóòôöõøœúùûüÿ"
property upperStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZÁÀÂÄÃÅÆÇÉÈÊËÍÌÎÏÑÓÒÔÖÕØŒÚÙÛÜŸ"
property alphaList : lowerStr's characters & reverse of upperStr's characters
property sentenceBreak : {".", "!", "?", ":"}
property wordBreak : {space, ASCII character 202, tab}
property everyBreak : wordBreak & sentenceBreak
property whiteSpace : wordBreak & {return, ASCII character 10}
property currList : missing value
property sentenceModList : {"i", "i'm", "i'm", "i've", "i've", "I've", "I've", "I'm", "I'm", "I"} (* could be extended to include certain proper nouns, acronyms, etc. *)
property mixedModList : {"Be", "By Means Of", "In Front Of", "In Order That", "On Account Of", "Whether Or Not", "According To", "As To", "Aside From", "Because Of", "Even If", "Even Though", "In Case", "Inside Of", "Now That", "Only If", "Out Of", "Owing To", "Prior To", "Subsequent To", "A", "About", "Above", "Across", "After", "Against", "Along", "Although", "Among", "An", "And", "Around", "As", "At", "Because", "Before", "Behind", "Below", "Beneath", "Beside", "Between", "Beyond", "But", "By", "De", "Down", "During", "Except", "For", "From", "If", "In", "Inside", "Into", "Like", "Near", "Of", "Off", "On", "Onto", "Or", "Out", "Outside", "Over", "Past", "Since", "So", "The", "Though", "Through", "Throughout", "To", "Under", "Unless", "Until", "Upon", "When", "Whereas", "While", "With", "Within", "Without", "Ye", "ye", "without", "within", "with", "while", "whereas", "when", "upon", "until", "unless", "under", "to", "throughout", "through", "though", "the", "so", "since", "past", "over", "outside", "out", "or", "onto", "on", "off", "of", "near", "like", "into", "inside", "in", "if", "from", "for", "except", "during", "down", "de", "by", "but", "beyond", "between", "beside", "beneath", "below", "behind", "before", "because", "at", "as", "around", "and", "an", "among", "although", "along", "against", "after", "across", "above", "about", "a", "subsequent to", "prior to", "owing to", "out of", "only if", "now that", "inside of", "in case", "even though", "even if", "because of", "aside from", "as to", "according to", "whether or not", "on account of", "in order that", "in front of", "by means of", "be"}

on textItems from currTxt
	tell (count currTxt's text items) to if it > 4000 then tell it div 2 to return my (textItems from (currTxt's text 1 thru text item it)) & my (textItems from (currTxt's text from text item (it + 1) to -1))
	currTxt's text items
end textItems

on initialCap(currTxt)
	tell currTxt to if (count words) > 0 then tell word 1's character 1 to if it is in lowerStr then
		set AppleScript's text item delimiters to it
		tell my (textItems from currTxt) to return beginning & upperStr's character ((count lowerStr's text item 1) + 1) & rest
	end if
	currTxt
end initialCap

to capItems from currTxt against breakList
	repeat with currBreak in breakList
		set text item delimiters to currBreak
		if (count currTxt's text items) > 1 then
			set currList to my (textItems from currTxt)
			repeat with n from 2 to count currList
				set my currList's item n to initialCap(my currList's item n)
			end repeat
			set text item delimiters to currBreak's contents
			tell my currList to set currTxt to beginning & ({""} & rest)
		end if
	end repeat
	currTxt
end capItems

on modItems from currTxt against modList
	set currList to modList
	set currCount to (count modList) div 2
	repeat with currBreak in everyBreak
		set text item delimiters to currBreak
		if (count currTxt's text items) > 1 then repeat with n from 1 to currCount
			set text item delimiters to my currList's item n & currBreak
			if (count currTxt's text items) > 1 then
				set currTxt to textItems from currTxt
				set text item delimiters to my currList's item -n & currBreak
				tell currTxt to set currTxt to beginning & ({""} & rest)
			end if
		end repeat
	end repeat
	currTxt
end modItems

to changeCase of currTxt to caseType
	if (count currTxt's words) is 0 then return currTxt
	
	ignoring case
		tell caseType to set {upper_Case, lower_Case, sentence_Case, title_Case, mixed_Case} to {it is "upper", it is "lower", it is "sentence", it is "title", it is "mixed"}
	end ignoring
	
	if not (upper_Case or lower_Case or title_Case or sentence_Case or mixed_Case) then
		error "The term \"" & caseType & "\" is not a valid case type option. Please use \"upper\", \"lower\", \"sentence\", \"title\" or \"mixed\"."
	else if upper_Case then
		set n to 1
	else
		set n to -1
	end if
	
	considering case
		set tid to text item delimiters
		
		repeat with n from n to n * (count lowerStr) by n
			set text item delimiters to my alphaList's item n
			set currTxt to textItems from currTxt
			set text item delimiters to my alphaList's item -n
			tell currTxt to set currTxt to beginning & ({""} & rest)
		end repeat
		
		if sentence_Case then
			set currTxt to initialCap(modItems from (capItems from currTxt against sentenceBreak) against sentenceModList)
		else if title_Case or mixed_Case then
			set currTxt to initialCap(capItems from currTxt against whiteSpace)
			if mixed_Case then set currTxt to initialCap(capItems from (modItems from currTxt against mixedModList) against sentenceBreak)
		end if
		
		set text item delimiters to tid
	end considering
	currTxt
end changeCase

tell application "Finder"
	copy (the clipboard as list) to {text_returned}
end tell
set someText to text_returned
set cnvrtdText to (changeCase of someText to "mixed") (* "upper", "lower", "sentence", "title" or "mixed" *)
set the clipboard to cnvrtdText

The basic conversions can be done a little more simply than a reader of this (occasionally rather baroque) old thread might have guessed :wink:

on ucase(str)
	do shell script "echo " & quoted form of str & " | tr \"[:lower:]\" \"[:upper:]\""
end ucase

on lcase(str)
	do shell script "echo " & quoted form of str & " | tr \"[:upper:]\" \"[:lower:]\""
end lcase

And there will be subtle cases, but I find that even the initial letters of words and sentences can often be shifted quite simply:

-- First letter of each word
on TitleCase(str)
	do shell script "echo " & quoted form of str & " | perl -ple 's/(\\w+)/\\u$1/g'"
end TitleCase

-- First letter of each sentence
on SentenceCase(str)
	set {strDelim, my text item delimiters} to {my text item delimiters, ". "}
	set lst to text items of str
	repeat with i from 1 to length of lst
		set item i of lst to do shell script ("echo " & quoted form of (item i of lst) & " | perl -nle 'print ucfirst lc'")
	end repeat
	set {strSentences, my text item delimiters} to {lst as text, strDelim}
	strSentences
end SentenceCase

Hi, yiam-jin-quin. Welcome to MacScripter and thanks for adding to this thread.

The link in kai’s original post is broken now the site’s been revamped. it should be http://macscripter.net/viewtopic.php?id=12758. There’s another “tr” example there ” and ruby and python improvements.

Most of the discussion in this thread has been about vanilla AppleScript methods ” mainly, I think, because it’s given AppleScripters a challenge into which they can get their teeth. It’s also inspired discussion and exploration of the necessary logic and of the possibilities and implications of AppleScript’s various quirks as they existed at the time.

While shell scripts are undoubtedly better and faster for this and many other purposes, they tend to be totally opaque to people not using them already and are usually posted without any comments to explain how they work. Their educational value on these fora has thus historically been slightly greater than nil, although they may occasionally have inspired people to look into the possibiities of certain commands or to learn the other scripting language(s) involved.

I’m not knocking your shell scripts, of course: just giving my take on the “occasionally rather baroque” spirit of this Code Exchange thread. :slight_smile:

The world would certainly be poorer without the baroque :cool:

At the same time, Google will often bring readers in search of a quick and practical fix - so useful, I think, to add a bauhaus postscript.

Applescript is so high level that the next example is completely useless but the concept is useful for standard ascii (127 characters). So if we’re are talking about exploring Applescript like Nigel says I think this example will fit right into it. Like I said, it is useless and only to show you that there is another Applescript-only way.

character_array_to_upper("hello world!") --result: "HELLO WORLD!"
character_array_to_lower("HELLO WORLD!") --result: "hello world!"

on character_array_to_upper(aString)
	set characterArray to every item of aString
	set newCharacterArray to {}
	repeat with aCharacter in characterArray
		set end of newCharacterArray to ASCII character to_upper(ASCII number aCharacter)
	end repeat
	return newCharacterArray as string
end character_array_to_upper

on character_array_to_lower(aString)
	set characterArray to every item of aString
	set newCharacterArray to {}
	repeat with aCharacter in characterArray
		set end of newCharacterArray to ASCII character to_lower(ASCII number aCharacter)
	end repeat
	return newCharacterArray as string
end character_array_to_lower

on to_upper(anInt)
	if anInt < 97 or anInt > 122 then
		return anInt
	end if
	return anInt + ((ASCII number "A") - (ASCII number "a"))
end to_upper

on to_lower(anInt)
	if anInt < 65 or anInt > 90 then
		return anInt
	end if
	return anInt + ((ASCII number "a") - (ASCII number "A"))
end to_lower

Hi, DJ Bazzie Wazzie.

Well now … :wink:

  1. ‘ASCII number’ and ‘ASCII character’ are deprecated in AppleScript 2.x (Leopard and later).

  2. Your to_upper() and to_lower() handlers are written with the foreknowledge of the ASCII codes for “a”, “z”, “A”, and “Z”, but then use two ‘ASCII number’ calls each to work out the difference between the upper and lower case numbers.

  3. When coercing a list to string, AppleScript’s text item delimiters should be set to “” or the default {“”} beforehand as a precaution in case they’ve been changed elsewhere in the script or in the application running it.

An AppleScript 2.x version of your idea, again only considering standard ASCII characters, would be:

on to_upper(ASCIItext)
	set ASCIIcodes to ASCIItext's id as list -- List coercion in case there's only one character.	
	repeat with i from 1 to (count ASCIItext)
		set thisCode to item i of ASCIIcodes
		if ((thisCode > 96) and (thisCode < 123)) then set item i of ASCIIcodes to thisCode - 32
	end repeat
	
	return character id ASCIIcodes
end to_upper

on to_lower(ASCIItext)
	set ASCIIcodes to ASCIItext's id as list -- List coercion in case there's only one character.	
	repeat with i from 1 to (count ASCIItext)
		set thisCode to item i of ASCIIcodes
		if ((thisCode > 64) and (thisCode < 91)) then set item i of ASCIIcodes to thisCode + 32
	end repeat
	
	return character id ASCIIcodes
end to_lower

to_upper("Hello world!")
to_lower("HELLO WORLD!")

Thanks Nigel,

Your example 2.0 is much better for today’s Applescript. Maybe I should have commented that my example was only for Tiger and earlier.

Well my example is almost directly translated from C. That’s why I pointed out that it is useless because Applescript is high level. Well as you can see when running a script like this


--in Tiger and older
set characterArray to {}
repeat with x from 1 to 127
set end of characterArray to ascii character x
end
return characterArray as string

--in Leopard and newer
set characterArray to {}
repeat with x from 1 to 127
	set end of characterArray to string id x
end repeat
return characterArray as string

that the sequence of the lower case characters and upper case characters are the same. So I think foreknowledge is not the case here. I agree with you when saying that ascii number “a” - ascii number “A” is tons of overhead I’ve created in my script. But like I said it is a translation from C and we don’t have that problem there.

Hi. You all have really complex methods of changing the cases, but I used a much simpler way of doing things.

--Capitalize 1.0.1
--Copyright © Panah Neshati, 2011.

on replaceText(find, replace, subject)
	set prevTIDs to text item delimiters of AppleScript
	set text item delimiters of AppleScript to find
	set subject to text items of subject
	
	set text item delimiters of AppleScript to replace
	set subject to "" & subject
	set text item delimiters of AppleScript to prevTIDs
	
	return subject
end replaceText

on capitalize(a)
	set a to (get replaceText("a", "A", a))
	set a to (get replaceText("b", "B", a))
	set a to (get replaceText("c", "C", a))
	set a to (get replaceText("d", "D", a))
	set a to (get replaceText("e", "E", a))
	set a to (get replaceText("f", "F", a))
	set a to (get replaceText("g", "G", a))
	set a to (get replaceText("h", "H", a))
	set a to (get replaceText("i", "I", a))
	set a to (get replaceText("j", "J", a))
	set a to (get replaceText("k", "K", a))
	set a to (get replaceText("l", "L", a))
	set a to (get replaceText("m", "M", a))
	set a to (get replaceText("n", "N", a))
	set a to (get replaceText("o", "O", a))
	set a to (get replaceText("p", "P", a))
	set a to (get replaceText("q", "Q", a))
	set a to (get replaceText("r", "R", a))
	set a to (get replaceText("s", "S", a))
	set a to (get replaceText("t", "T", a))
	set a to (get replaceText("u", "U", a))
	set a to (get replaceText("v", "V", a))
	set a to (get replaceText("w", "W", a))
	set a to (get replaceText("x", "X", a))
	set a to (get replaceText("y", "Y", a))
	set a to (get replaceText("z", "Z", a))
	return a
end capitalize

And, of course, the opposite,

--Lowercase 1.0.1
--Copyright © Panah Neshati, 2011.

on replaceText(find, replace, subject)
	set prevTIDs to text item delimiters of AppleScript
	set text item delimiters of AppleScript to find
	set subject to text items of subject
	
	set text item delimiters of AppleScript to replace
	set subject to "" & subject
	set text item delimiters of AppleScript to prevTIDs
	
	return subject
end replaceText

on lowercase(a)
	set a to (get replaceText("A", "a", a))
	set a to (get replaceText("B", "b", a))
	set a to (get replaceText("C", "c", a))
	set a to (get replaceText("D", "d", a))
	set a to (get replaceText("E", "e", a))
	set a to (get replaceText("F", "f", a))
	set a to (get replaceText("G", "g", a))
	set a to (get replaceText("H", "h", a))
	set a to (get replaceText("I", "i", a))
	set a to (get replaceText("J", "j", a))
	set a to (get replaceText("K", "k", a))
	set a to (get replaceText("L", "l", a))
	set a to (get replaceText("M", "m", a))
	set a to (get replaceText("N", "n", a))
	set a to (get replaceText("O", "o", a))
	set a to (get replaceText("P", "p", a))
	set a to (get replaceText("Q", "q", a))
	set a to (get replaceText("R", "r", a))
	set a to (get replaceText("S", "s", a))
	set a to (get replaceText("T", "t", a))
	set a to (get replaceText("U", "u", a))
	set a to (get replaceText("V", "v", a))
	set a to (get replaceText("W", "w", a))
	set a to (get replaceText("X", "x", a))
	set a to (get replaceText("Y", "y", a))
	set a to (get replaceText("Z", "z", a))
	return a
end lowercase

While it’s not exactly efficient coding, it certainly gets the job done. I’m working on making an improved version that gets the ASCII number of the input, adds 32, and gets the new ASCII character.

I’ll reply when it’s finished.

Hi, pneshati.

That’s exactly what my script in post #40 above does, except that mine uses the character ‘id’ functions introduced with AppleScript 2.0 (in Mac OS 10.5) rather than the now deprecated ‘ASCII character’ and ‘ASCII number’.

You’ll probably find that your “improved version” will be much slower than what you have already, since it will have to call both the ‘ASCII number’ and ‘ASCII character’ functions for every individual character in the text and will have to test each character individually to see if it’s a letter, whether it’s already in the required case, etc.

Thanks for the advice, Nigel! I didn’t know about the character id functions, I’ll certainly check them out.

:slight_smile: