Hi, Qwerty. Well spotted about the decimal point bug in my rewrite!
I haven’t had time this weekend to study your version of kai’s script, but I can confirm that both it and your latest version of your own seem to work well in both Jaguar and Tiger.
I see that in your rewrite of my rewrite, you’ve abandoned the “pre-tested this_case” optimisation. It might not make any noticeable difference here - unless this_string is very long - but as a general rule, it’s best not to test things inside a repeat when this can be done beforehand and the results aren’t going to change during the repeat.
For instance, the test ‘this_case is “Title”’ compares the individual characters of this_case with those of “Title”. We need a case-insensitive comparison here, which takes longer because allowances are made for the fact that individual characters might not be exactly the same. The comparison produces a ‘true’ or ‘false’ result, which becomes the parameter for the ‘if’ statement.
Inside the repeat, this test process (and possibly the one for “Sentence”), or else the one for “UPPER”, is performed with every character of this_string.
My version of the script tests each possible value of this_case just once, before the repeat, and simply feeds in the appropriate ‘trues’ and ‘falses’ during the repeat. This has an additional advantage in that it’s easier to arrange for punctuation and white space to be tested in the faster ‘considering case’ mode.
You could do a similar, once-only thing with ‘count of this_text’.
Apologies if you were already familiar with these concepts.
No, I’m not really familiar with this. It seems so logical! Anyway, my rewrite of kai’s cannot handle Unicode text, and so is inferior. We also need to watch out for things like ‘I’, in sentence case it is not capitalized.
True the thread is growing like crazy, but think of the really valuable education embedded in this thread for a new scripter. It really does flesh out all the issues involved in what, at first anyway, seems like a straight-forward process. Perhaps when the experts agree that they’ve reached a “golden master” version, then that one would be flagged as “THE” version and the rest would be preserved as discussion.
Apologies for the delayed response, NovaScotian. I believe I wrote the comments that prompted your reply - but I take your point about the possible usefulness of such discussions. (Indeed, I’ve found the discussion of differences in script behaviour between OS versions quite illuminating.)
I’ve therefore added a further version (below) that sacrifices some speed & brevity to address several of the issues discussed previously. In exploring these, it’s clear that a comprehensive solution might be possible only with access to a substantial dictionary (proper nouns, acronyms, etc.) - which is perhaps beyond the scope of a relatively simple script such as this. (The script introduces various properties, partly to accelerate runtime execution, but also to accommodate any further adjustments that might be considered necessary.)
I might be able to clear up the confusion there, Nigel. At the time, I was working in Jaguar (not Panther) - which would explain the differences in behaviour. (However, for the record, the script below was written and tested in Tiger).
The problem to which I was referring is a stack overflow error (errOSAStackOverflow: -2706), rather than a crash. In older versions of the Mac OS (including some versions of Mac OS X), the error can occur when the resulting number of string elements (text items, characters [items], words or paragraphs) exceeds about 4,060 (the precise figure can vary).
So, apart from getting a list of characters, the problem has very little to do with actual string length - since it depends primarily on the number of resulting string elements.
While such considerations may be of little concern to those using later versions of the Mac OS, they may still be worth noting for anyone interested in portability (for example, if a script is to be distributed generally - such as in a forum like this).
In later versions, the limit appears to have been removed. However, the algorithm to achieve this seems somewhat buggy - and, where there may be several thousand string elements involved, the dreaded, ever-spinning beach ball can appear. (The point at which this occurs appears to vary quite considerably, so it’s difficult to pin it down with any precision. OMM, it’s very likely to occur above 300,000 or 400,000 items - but has sometimes struck at around 60,000 items.)
Silent hanging aside, there also appear to be efficiency issues when getting particularly long lists of string elements - which may therefore take a disproportionate time to evaluate.
I’ve now modified my original ˜textItems’ handler (see script below) in an effort to side-step all three issues. So far, it seems to have worked quite effectively - as demonstrated by the following results (all the usual caveats for interpreting execution times apply):
The following version introduces an additional option for case type: “mixed” - a variation of title-case that renders definite and indefinite articles, conjunctions and prepositions as lowercase (except where they start a sentence):
Thanks for your latest contribution to this thread. The script’s pretty remarkable as its “sentence” mode even capitalises correctly after brackets and quotes, which don’t (at first sight) seem to have been explicitly catered for! Maybe I’ll see how it works when I’ve had more time to study it in detail.
After you signed off last time, I worked out a script ” less thorough than yours ” that explicitly treated quotes and brackets as “whitish” space, but I didn’t post it because I began to feel that “sentence” mode itself was a mistake ” at least in the context of discussing techniques on this forum.
“Lower”, “upper”, and “title” modes are easy to implement and have already been adequately covered. Philosophically, they do explicit and grammatically irrelevant things to the text.
The only real use for “sentence” mode, though, is as a tidier-upper of bad typing. As you’ve already noted, it’s far more complex to implement and involves context. The script needs to be versed in the proper nouns and acronyms of the language of the text. It also needs a thorough knowledge of that language’s other grammatical features, many of which can be very difficult to handle. For instance, quoted speech in English:
Or mixed contexts:
Unless “sentence” mode is given some specific, narrowly-defined purpose, it might be best to leave it to a fully-fledged application or to a ‘has’-sized library.
By the way, there’s a potential problem with your ‘uppercase’ and ‘lowercase’ variables. These words are commands in the Satimage OSAX. I wouldn’t care to tell you what they do…
As you’ve no doubt seen by now, Nigel, it simply capitalises the first character of the first word following a ‘sentenceBreak’ - and so effectively ignores any intervening characters. AppleScript’s magic - not mine, I’m afraid.
Agreed. I’d been veering towards this conclusion for a while, but my last attempt really clinched it for me. (I’m sure it’s no coincidence that AppleScript itself readily defines words and paragraphs - but steers well clear of the muddy water of sentences!)
I’d be happier to leave things as a short, quick fix - rather than attempting to go down the tortuous path of refining any further. Nevertheless, the discussion’s been a very interesting one.
Good point. Thanks for the reminder - I had a feeling they looked familiar! I’ve since edited the script in situ to insert underscores in the variable labels, just for safety’s sake.
Nothing quite like reinventing the wheel to while away a few spare moments, right? :lol:
The above conversations are very interesting to me as a very novice scripter–I have learned a lot. I am looking for a way to carry out text modifications of the kind described above (uppercase, lowercase, title, etc) on a list rather than on a string. I think there must be a simple modification or addition to some of the scripts you’ve shared, but the solution has eluded me so far. Any help would be appreciated. Thanks!
My next task should be to check against a list of exceptions for title case, so that conjunctions, most prepositions, articles etc. are not capitalized in my list. Like Kai’s version of changeCase that uses the mixedModList. Any hints on how to modify Kai’s script to accept lists? I get a bit lost in the code, especially with all the switching of text delimiters.
I know this thread is old but it solved my problem perfectly and I wanted to thank kai for his “mixed” case solution. [Edit: I reposted a new version of the code I think is more useful; just copy the text you want to convert, use this script, and then paste. The newly pasted text will have been converted while on the clipboard.] I post it here in the hopes that it may save someone else some effort:
The basic conversions can be done a little more simply than a reader of this (occasionally rather baroque) old thread might have guessed
on ucase(str)
do shell script "echo " & quoted form of str & " | tr \"[:lower:]\" \"[:upper:]\""
end ucase
on lcase(str)
do shell script "echo " & quoted form of str & " | tr \"[:upper:]\" \"[:lower:]\""
end lcase
And there will be subtle cases, but I find that even the initial letters of words and sentences can often be shifted quite simply:
-- First letter of each word
on TitleCase(str)
do shell script "echo " & quoted form of str & " | perl -ple 's/(\\w+)/\\u$1/g'"
end TitleCase
-- First letter of each sentence
on SentenceCase(str)
set {strDelim, my text item delimiters} to {my text item delimiters, ". "}
set lst to text items of str
repeat with i from 1 to length of lst
set item i of lst to do shell script ("echo " & quoted form of (item i of lst) & " | perl -nle 'print ucfirst lc'")
end repeat
set {strSentences, my text item delimiters} to {lst as text, strDelim}
strSentences
end SentenceCase
Hi, yiam-jin-quin. Welcome to MacScripter and thanks for adding to this thread.
The link in kai’s original post is broken now the site’s been revamped. it should be http://macscripter.net/viewtopic.php?id=12758. There’s another “tr” example there ” and ruby and python improvements.
Most of the discussion in this thread has been about vanilla AppleScript methods ” mainly, I think, because it’s given AppleScripters a challenge into which they can get their teeth. It’s also inspired discussion and exploration of the necessary logic and of the possibilities and implications of AppleScript’s various quirks as they existed at the time.
While shell scripts are undoubtedly better and faster for this and many other purposes, they tend to be totally opaque to people not using them already and are usually posted without any comments to explain how they work. Their educational value on these fora has thus historically been slightly greater than nil, although they may occasionally have inspired people to look into the possibiities of certain commands or to learn the other scripting language(s) involved.
I’m not knocking your shell scripts, of course: just giving my take on the “occasionally rather baroque” spirit of this Code Exchange thread.
Applescript is so high level that the next example is completely useless but the concept is useful for standard ascii (127 characters). So if we’re are talking about exploring Applescript like Nigel says I think this example will fit right into it. Like I said, it is useless and only to show you that there is another Applescript-only way.
character_array_to_upper("hello world!") --result: "HELLO WORLD!"
character_array_to_lower("HELLO WORLD!") --result: "hello world!"
on character_array_to_upper(aString)
set characterArray to every item of aString
set newCharacterArray to {}
repeat with aCharacter in characterArray
set end of newCharacterArray to ASCII character to_upper(ASCII number aCharacter)
end repeat
return newCharacterArray as string
end character_array_to_upper
on character_array_to_lower(aString)
set characterArray to every item of aString
set newCharacterArray to {}
repeat with aCharacter in characterArray
set end of newCharacterArray to ASCII character to_lower(ASCII number aCharacter)
end repeat
return newCharacterArray as string
end character_array_to_lower
on to_upper(anInt)
if anInt < 97 or anInt > 122 then
return anInt
end if
return anInt + ((ASCII number "A") - (ASCII number "a"))
end to_upper
on to_lower(anInt)
if anInt < 65 or anInt > 90 then
return anInt
end if
return anInt + ((ASCII number "a") - (ASCII number "A"))
end to_lower
‘ASCII number’ and ‘ASCII character’ are deprecated in AppleScript 2.x (Leopard and later).
Your to_upper() and to_lower() handlers are written with the foreknowledge of the ASCII codes for “a”, “z”, “A”, and “Z”, but then use two ‘ASCII number’ calls each to work out the difference between the upper and lower case numbers.
When coercing a list to string, AppleScript’s text item delimiters should be set to “” or the default {“”} beforehand as a precaution in case they’ve been changed elsewhere in the script or in the application running it.
An AppleScript 2.x version of your idea, again only considering standard ASCII characters, would be:
on to_upper(ASCIItext)
set ASCIIcodes to ASCIItext's id as list -- List coercion in case there's only one character.
repeat with i from 1 to (count ASCIItext)
set thisCode to item i of ASCIIcodes
if ((thisCode > 96) and (thisCode < 123)) then set item i of ASCIIcodes to thisCode - 32
end repeat
return character id ASCIIcodes
end to_upper
on to_lower(ASCIItext)
set ASCIIcodes to ASCIItext's id as list -- List coercion in case there's only one character.
repeat with i from 1 to (count ASCIItext)
set thisCode to item i of ASCIIcodes
if ((thisCode > 64) and (thisCode < 91)) then set item i of ASCIIcodes to thisCode + 32
end repeat
return character id ASCIIcodes
end to_lower
to_upper("Hello world!")
to_lower("HELLO WORLD!")