Apologies for the delayed response, NovaScotian. I believe I wrote the comments that prompted your reply - but I take your point about the possible usefulness of such discussions. (Indeed, I’ve found the discussion of differences in script behaviour between OS versions quite illuminating.)
I’ve therefore added a further version (below) that sacrifices some speed & brevity to address several of the issues discussed previously. In exploring these, it’s clear that a comprehensive solution might be possible only with access to a substantial dictionary (proper nouns, acronyms, etc.) - which is perhaps beyond the scope of a relatively simple script such as this. (The script introduces various properties, partly to accelerate runtime execution, but also to accommodate any further adjustments that might be considered necessary.)
I might be able to clear up the confusion there, Nigel. At the time, I was working in Jaguar (not Panther) - which would explain the differences in behaviour. (However, for the record, the script below was written and tested in Tiger).
The problem to which I was referring is a stack overflow error (errOSAStackOverflow: -2706), rather than a crash. In older versions of the Mac OS (including some versions of Mac OS X), the error can occur when the resulting number of string elements (text items, characters [items], words or paragraphs) exceeds about 4,060 (the precise figure can vary).
So, apart from getting a list of characters, the problem has very little to do with actual string length - since it depends primarily on the number of resulting string elements.
While such considerations may be of little concern to those using later versions of the Mac OS, they may still be worth noting for anyone interested in portability (for example, if a script is to be distributed generally - such as in a forum like this).
In later versions, the limit appears to have been removed. However, the algorithm to achieve this seems somewhat buggy - and, where there may be several thousand string elements involved, the dreaded, ever-spinning beach ball can appear. (The point at which this occurs appears to vary quite considerably, so it’s difficult to pin it down with any precision. OMM, it’s very likely to occur above 300,000 or 400,000 items - but has sometimes struck at around 60,000 items.)
Silent hanging aside, there also appear to be efficiency issues when getting particularly long lists of string elements - which may therefore take a disproportionate time to evaluate.
I’ve now modified my original ˜textItems’ handler (see script below) in an effort to side-step all three issues. So far, it seems to have worked quite effectively - as demonstrated by the following results (all the usual caveats for interpreting execution times apply):
[u]execution time (secs)[/u]
number of with without
text items handler handler
1000 0.01 0.01
10000 0.2 0.5
20000 0.4 3.8
30000 0.7 7.8
40000 1.1 11.7
50000 1.4 25.1
60000 1.9 40.8
70000 2.0 42.7
80000 2.7 68.5
90000 3.6 127.8
100000 4.7 138.2
The following version introduces an additional option for case type: “mixed” - a variation of title-case that renders definite and indefinite articles, conjunctions and prepositions as lowercase (except where they start a sentence):
-- syntax : changeCase of someText to caseType
-- someText (string) : plain or encoded text
-- caseType (string) : the type of case required ("upper", "lower", "sentence", "title" or "mixed")
-- "upper" : all uppercase text (no exceptions)
-- "lower" : all lowercase text (no exceptions)
-- "sentence" : uppercase character at start of each sentence, other characters lowercase (apart from words in sentenceModList)
-- "title" : uppercase character at start of each word, other characters lowercase (no exceptions)
-- "mixed" : similar to title, except for definite and indefinite articles, conjunctions and prepositions (see mixedModList) that don't start a sentence
property lowerStr : "abcdefghijklmnopqrstuvwxyzáà âäãåæçéèêëÃìîïñóòôöõøœúùûüÿ"
property upperStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZÃÀÂÄÃÅÆÇÉÈÊËÃÌÎÃÑÓÒÔÖÕØŒÚÙÛÜŸ"
property alphaList : lowerStr's characters & reverse of upperStr's characters
property sentenceBreak : {".", "!", "?"}
property wordBreak : {space, ASCII character 202, tab}
property everyBreak : wordBreak & sentenceBreak
property whiteSpace : wordBreak & {return, ASCII character 10}
property currList : missing value
property sentenceModList : {"i", "i'm", "i'm", "i've", "i've", "I've", "I've", "I'm", "I'm", "I"} (* could be extended to include certain proper nouns, acronyms, etc. *)
property mixedModList : {"By Means Of", "In Front Of", "In Order That", "On Account Of", "Whether Or Not", "According To", "As To", "Aside From", "Because Of", "Even If", "Even Though", "In Case", "Inside Of", "Now That", "Only If", "Out Of", "Owing To", "Prior To", "Subsequent To", "A", "About", "Above", "Across", "After", "Against", "Along", "Although", "Among", "An", "And", "Around", "As", "At", "Because", "Before", "Behind", "Below", "Beneath", "Beside", "Between", "Beyond", "But", "By", "De", "Down", "During", "Except", "For", "From", "If", "In", "Inside", "Into", "Like", "Near", "Of", "Off", "On", "Onto", "Or", "Out", "Outside", "Over", "Past", "Since", "So", "The", "Though", "Through", "Throughout", "To", "Under", "Unless", "Until", "Up", "Upon", "When", "Whereas", "While", "With", "Within", "Without", "Ye", "ye", "without", "within", "with", "while", "whereas", "when", "upon", "up", "until", "unless", "under", "to", "throughout", "through", "though", "the", "so", "since", "past", "over", "outside", "out", "or", "onto", "on", "off", "of", "near", "like", "into", "inside", "in", "if", "from", "for", "except", "during", "down", "de", "by", "but", "beyond", "between", "beside", "beneath", "below", "behind", "before", "because", "at", "as", "around", "and", "an", "among", "although", "along", "against", "after", "across", "above", "about", "a", "subsequent to", "prior to", "owing to", "out of", "only if", "now that", "inside of", "in case", "even though", "even if", "because of", "aside from", "as to", "according to", "whether or not", "on account of", "in order that", "in front of", "by means of"}
on textItems from currTxt
tell (count currTxt's text items) to if it > 4000 then tell it div 2 to return ¬
my (textItems from (currTxt's text 1 thru text item it)) & ¬
my (textItems from (currTxt's text from text item (it + 1) to -1))
currTxt's text items
end textItems
on initialCap(currTxt)
tell currTxt to if (count words) > 0 then tell word 1's character 1 to if it is in lowerStr then
set AppleScript's text item delimiters to it
tell my (textItems from currTxt) to return beginning & upperStr's character ((count lowerStr's text item 1) + 1) & rest
end if
currTxt
end initialCap
to capItems from currTxt against breakList
repeat with currBreak in breakList
set text item delimiters to currBreak
if (count currTxt's text items) > 1 then
set currList to my (textItems from currTxt)
repeat with n from 2 to count currList
set my currList's item n to initialCap(my currList's item n)
end repeat
set text item delimiters to currBreak's contents
tell my currList to set currTxt to beginning & ({""} & rest)
end if
end repeat
currTxt
end capItems
on modItems from currTxt against modList
set currList to modList
set currCount to (count modList) div 2
repeat with currBreak in everyBreak
set text item delimiters to currBreak
if (count currTxt's text items) > 1 then repeat with n from 1 to currCount
set text item delimiters to my currList's item n & currBreak
if (count currTxt's text items) > 1 then
set currTxt to textItems from currTxt
set text item delimiters to my currList's item -n & currBreak
tell currTxt to set currTxt to beginning & ({""} & rest)
end if
end repeat
end repeat
currTxt
end modItems
to changeCase of currTxt to caseType
if (count currTxt's words) is 0 then return currTxt
ignoring case
tell caseType to set {upper_Case, lower_Case, sentence_Case, title_Case, mixed_Case} to {it is "upper", it is "lower", it is "sentence", it is "title", it is "mixed"}
end ignoring
if not (upper_Case or lower_Case or title_Case or sentence_Case or mixed_Case) then
error "The term \"" & caseType & "\" is not a valid case type option. Please use \"upper\", \"lower\", \"sentence\", \"title\" or \"mixed\"."
else if upper_Case then
set n to 1
else
set n to -1
end if
considering case
set tid to text item delimiters
repeat with n from n to n * (count lowerStr) by n
set text item delimiters to my alphaList's item n
set currTxt to textItems from currTxt
set text item delimiters to my alphaList's item -n
tell currTxt to set currTxt to beginning & ({""} & rest)
end repeat
if sentence_Case then
set currTxt to initialCap(modItems from (capItems from currTxt against sentenceBreak) against sentenceModList)
else if title_Case or mixed_Case then
set currTxt to initialCap(capItems from currTxt against whiteSpace)
if mixed_Case then set currTxt to initialCap(capItems from (modItems from currTxt against mixedModList) against sentenceBreak)
end if
set text item delimiters to tid
end considering
currTxt
end changeCase
set someText to "How far you go in life depends on your being TENDER with the YOUNG, COMPASSIONATE with the AGED, SYMPATHETIC with the STRIVING and TOLERANT of the WEAK and STRONG. Because SOMEDAY in your life you will have been ALL of these." (* George Washington Carver. *)
changeCase of someText to "upper" (* "upper", "lower", "sentence", "title" or "mixed" *)
Script subsequently edited to insert underscore characters in certain variable labels (see discussion below)
Results:
Apologies for the length of all this…