It works in Tiger, but not in Jaguar or (apparently) Panther.
In Tiger, âoffsetâ now responds to âconsideringâ and ignoringâ conditions. Itâs case-insensitive unless âconsidering caseâ is used, which means that this_char is always found in lower_alphabet if itâs a letter and never if itâs punctuation or white space. In Qwertyâs script, the âelseâ section deals specifically with non-letters. It works, but only in Tiger. Kaiâs effort at the top of this thread works on both of my X systems, but I havenât checked out the tab thing yet. (Time for bed! :))
By the way, TIDs are also now subject to âconsideringâ and âignoringâ, but only when the main text is Unicode. With âstringsâ, they still behave in the old way.
Thanks very much, Nigel. Iâd noted your point previously about the behaviour of tids with Unicode text in Tiger - but hadnât yet assimilated the impact of changes affecting offset. Taking that script apart was driving me nuts! Just something else to watch out for - but thatâs progress I suppose.
And now that youâve put my mind at rest, I can get some shuteye too. Thanks again, Mr. G. - and gânight, folks!
Iâve just, realised (I think) to what you were referring, Qwerty - so let me try that again. I donât think thereâs a tabs issue with title case. However, having taken another look at the whole thing, Iâd say there are definitely one or two general white space issues with sentence case. I donât really have time to address them right now, but Iâll certainly take a look a little later (assuming that Nigel hasnât completely rewritten the script by then).
Gosh, what have I been missing out on! Sorry kai, I did mean âSentenceâ case, not âTitleâ, how stupid of me! Nigel, Iâm using 10.3.9, so itâs not only Tiger.
Does this work for you? (still slow):
(*
Evolved from Apple's 'Change Case of Item Names.scpt' - part of the Finder scripts.
You have the option of UPPER, lower, Title or Sentence cases.
Accepts non-alphabetic characters.
*)
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."
get change_case(this_string, "Sentence")
on change_case(this_text, this_case)
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
return "Error: Case must be UPPER, lower, Title or Sentence"
end if
set new_text to ""
if this_case is "lower" then
set use_capital to false
else
set use_capital to true
end if
repeat with i from 1 to count of this_text
set this_char to character i of this_text
ignoring case
set x to offset of this_char in lower_alphabet
end ignoring
if x is not 0 then
if use_capital then
set new_text to new_text & character x of upper_alphabet as string
if this_case is not "UPPER" then
set use_capital to false
end if
else
set new_text to new_text & character x of lower_alphabet as string
end if
else
if this_case is "Title" and this_char is in white_space then
set use_capital to true
else if this_case is "Sentence" and this_char is "." and ÂŹ
i is not (count of this_text) and ÂŹ
character (i + 1) of this_text is in white_space then
set use_capital to true
end if
set new_text to new_text & this_char as string
end if
end repeat
return new_text
end change_case
Here is kaiâs script rewritten. It should work the same, well as much as I can see, except the tab bug has been fixed.
I have tried to make it more readable, including logical variables names.
I donât really know where to put considering case, because it works fine on my machine without it. (But you should be aware that putting it around your âif c is not in {âupperâ, âlowerâ, âtitleâ, âsentenceâ} thenâ line will only allow input of case as lowercase (an error will come up if you use âchangeCase of someText to âLowerââ, for instance)).
There is a bug, though (in both this and the original). This (part here), when in title case, will not have a capital letter on âpartâ.
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
set this_string to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE. Text containing punctuation: x-ray, don't."
on change_case(this_text, this_case)
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
error "Error: Case must be UPPER, lower, Title or Sentence"
else if this_case is "UPPER" then
set case_alphabet to {lower_alphabet, upper_alphabet}
else
set case_alphabet to {upper_alphabet, lower_alphabet}
end if
set old_delimiters to text item delimiters
repeat with i from 1 to 26
set text item delimiters to item i of (item 1 of case_alphabet as string)
set this_text to text items of this_text
set text item delimiters to item i of (item 2 of case_alphabet as string)
set this_text to text items of this_text as string
end repeat
if this_case is in {"Title", "Sentence"} then
if this_case is "Title" then
set this_space to space
else
set this_space to ". "
end if
set this_text to this_space & this_text
repeat with this_white in white_space
if this_case is "Sentence" then
set this_white to "." & this_white
end if
set text item delimiters to this_white
if (count of text items of this_text) > 1 then repeat with i from 1 to 26
set text item delimiters to this_white & item i of lower_alphabet
if (count of text items of this_text) > 1 then
set this_text to text items of this_text
set text item delimiters to this_white & item i of upper_alphabet
set this_text to text items of this_text as string
end if
end repeat
end repeat
set text item delimiters to ""
set this_text to text ((count this_space) + 1) thru -1 of this_text
end if
set text item delimiters to old_delimiters
return this_text
end change_case
change_case(this_string, "UPPER")
--> "THIS TEXT WILL BE RETURNED WITH CHARACTERS CAPITALISED AS SPECIFIED. ANY OTHER CHARACTERS WILL BE LOWER CASE. TEXT CONTAINING PUNCTUATION: X-RAY, DON'T."
change_case(this_string, "lower")
--> "this text will be returned with characters capitalised as specified. any other characters will be lower case. text containing punctuation: x-ray, don't."
change_case(this_string, "Title")
--> "This Text Will Be Returned With Characters Capitalised As Specified. Any Other Characters Will Be Lower Case. Text Containing Punctuation: X-ray, Don't."
change_case(this_string, "Sentence")
--> "This text will be returned with characters capitalised as specified. Any other characters will be lower case. Text containing punctuation: x-ray, don't."
Hi, Qwerty. Gosh! Nothing like a bit of confusion to start the day. If you and kai are both using Panther, Iâve no idea why your script works for you and not for him. However, what I said last night about the script working in Tiger but not in Jaguar still seems to be true.
I hadnât checked âSentenceâ mode then. My results with that this morning are:
JAGUAR:
Both versions of your script capitalise the first letter of a sentence if it was lower case, but capitalise the second letter too if the first was already upper case. (And the third if the first two were already upper case, etc.)
TIGER:
Your first version capitalises the first letter of the first sentence, but actively lower-cases the first letter of subsequent sentences. The second version appears to work OK, but not, of course, if the previous sentence ends with a question mark, exclamation mark, or quote.
Putting the âoffsetâ line in an âignoring caseâ block has no effect. âIgnoringâ is the default setting for case where âignoringâ and âconsideringâ apply. In Jaguar (and Panther?), they donât apply: âoffsetâ is exclusively case sensitive.
Hereâs a version of your approach that works properly in both Jaguar and Tiger. Obviously itâll need a longer alphabet if itâs likely to encounter diacritical characters.
property alphabet : "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
property terminators : ".!?"
set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't." as Unicode text
get change_case(this_string, "title")
on change_case(this_text, this_case)
set new_text to {}
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
return "Error: Case must be UPPER, lower, Title or Sentence"
end if
set is_upper_mode to (this_case is "UPPER")
set is_title_mode to (this_case is "Title")
set is_sentence_mode to (this_case is "Sentence")
set use_capital to (this_case is not "lower")
if (this_text's class is Unicode text) then
set alpha to alphabet as Unicode text
else if (this_text's class is string) then
set alpha to alphabet as string
else
display dialog "OH NO! WE'RE ALL GOING TO DIE!" buttons {"AAAGGHHH!"} default button 1 with icon caution
error number -128
end if
considering case -- for speed and to customise 'offset' in Tiger
repeat with this_char in this_text
set x to offset of this_char in alpha
if (x > 0) then
if (use_capital) then
set end of new_text to character ((x - 1) mod 26 + 27) of alpha
set use_capital to (is_upper_mode)
else
set end of new_text to character ((x - 1) mod 26 + 1) of alpha
end if
else
if (is_sentence_mode and this_char is in terminators) or (is_title_mode and this_char is in white_space) then
set use_capital to true
end if
set end of new_text to this_char's contents
end if
end repeat
end considering
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
tell new_text to set new_text to its beginning & its rest
set AppleScript's text item delimiters to astid
return new_text
end change_case
PS. This is nothing to do with your latest post, which Iâve only just seen. What do you mean by tab âbugâ?
PPS. kai doesnât seem to take much sleep, does he?
The tab bug is where âgibberish gibberishâ (gibberish«tab»gibberish), when converted to sentence case, is âGibberish Gibberishâ in kaiâs script. The second word should not be capitalized.
P.S. Yes, kai seems to be full time on this forum.
P.P.S. Good point about the different sentence terminators!?.!
It may be worth noting that, on long strings, Qwertyâs rewrite will break due to a stack overflow error. OMM, it tripped over a string (based largely on those tested earlier) of 57,586 characters. (As some of you will know, the overflow threshold can vary considerably, depending on string content - and possibly on OS, too. In the most extreme cases, it might even be as low as 4000 - 5000 characters.) In addition, a few sentence case issues still remain (to which I may now have a solution, of sorts). These include white space at the beginning of a string and multiple white spaces following a full point.
However, Iâm slightly concerned that this thread is kinda âgrowing like topsyâ for a Code Exchange item. Since I feel somewhat responsible for much of the noise, now might be an appropriate time for me to bow out (to attend, anyway, to a rather pressing local issue) - and to leave the conclusion of this lively discussion to you good guys. I look forward to seeing the results with great interest.
:lol: Thatâs just one of the burdens of a perpetual insomniac, Nigel. Can play havoc with oneâs social life, too!
Nigel, thanks for your rewrite of mine!
Unfortunately, it didnât work with periods that didnât have a space after them, like a decimal number. Hopefully, this one should. I have kept the alphabets split to make it easy to compare if you did want to add diacritical characters.
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
property sentence_terminators : ".!?"
set this_string to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE. Text containing punctuation: x-ray, don't. This sentence contains a real number 5.3 with text following."
get change_case(this_string, "Sentence")
on change_case(this_text, this_case)
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
error "Error: Case must be UPPER, lower, Title or Sentence"
end if
set new_text to {}
set use_capital to this_case is not "lower"
if class of this_text is Unicode text then
set case_alphabet to lower_alphabet & upper_alphabet as Unicode text
else if class of this_text is string then
set case_alphabet to lower_alphabet & upper_alphabet as string
else
display dialog "OH NO! WE'RE ALL GOING TO DIE!" buttons {"AAAGGHHH!"} default button 1 with icon caution
error number -128
end if
repeat with i from 1 to count of this_text
set this_char to character i of this_text
considering case -- for speed and to customise 'offset' in Tiger
set this_offset to offset of this_char in case_alphabet
end considering
if this_offset is not 0 then
if use_capital then
set end of new_text to character ((this_offset - 1) mod 26 + 27) of case_alphabet
set use_capital to this_case is "UPPER"
else
set end of new_text to character ((this_offset - 1) mod 26 + 1) of case_alphabet
end if
else
if (this_case is "Title" and this_char is in white_space) or (this_case is "Sentence" and this_char is in sentence_terminators and ÂŹ
i is not (count of this_text) and character (i + 1) of this_text is in white_space) then
set use_capital to true
end if
set end of new_text to this_char
end if
end repeat
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
tell new_text to set new_text to its beginning & its rest
set AppleScript's text item delimiters to astid
return new_text
end change_case
Unfortunately kai, this script does handle spaces at the beginning and multiples after periods. If this script was written in say, C or Obejctive-C, wouldnât it be as fast as yours?
Using my rewrite of kaiâs, I can use this code to generate the input string and it still does not crash for me (using his reply to me as the source text ;)). I donât know what the problem is?
set this_string to ""
repeat until (count of this_string) > 500000
set this_string to this_string & "It may be worth noting that, on long strings, Qwerty's rewrite will break due to a stack overflow error. OMM, it tripped over a string (based largely on those tested earlier) of 57,586 characters. (As some of you will know, the overflow threshold can vary considerably, depending on string content - and possibly on OS, too. In the most extreme cases, it might even be as low as 4000 - 5000 characters.) In addition, a few sentence case issues still remain (to which I may now have a solution, of sorts). These include white space at the beginning of a string and multiple white spaces following a full point.
However, I'm slightly concerned that this thread is kinda 'growing like topsy' for a Code Exchange item. Since I feel somewhat responsible for much of the noise, now might be an appropriate time for me to bow out (to attend, anyway, to a rather pressing local issue) - and to leave the conclusion of this lively discussion to you good guys. I look forward to seeing the results with great interest. "
end repeat
Hi, Qwerty. Well spotted about the decimal point bug in my rewrite!
I havenât had time this weekend to study your version of kaiâs script, but I can confirm that both it and your latest version of your own seem to work well in both Jaguar and Tiger.
I see that in your rewrite of my rewrite, youâve abandoned the âpre-tested this_caseâ optimisation. It might not make any noticeable difference here - unless this_string is very long - but as a general rule, itâs best not to test things inside a repeat when this can be done beforehand and the results arenât going to change during the repeat.
For instance, the test âthis_case is âTitleââ compares the individual characters of this_case with those of âTitleâ. We need a case-insensitive comparison here, which takes longer because allowances are made for the fact that individual characters might not be exactly the same. The comparison produces a âtrueâ or âfalseâ result, which becomes the parameter for the âifâ statement.
Inside the repeat, this test process (and possibly the one for âSentenceâ), or else the one for âUPPERâ, is performed with every character of this_string.
My version of the script tests each possible value of this_case just once, before the repeat, and simply feeds in the appropriate âtruesâ and âfalsesâ during the repeat. This has an additional advantage in that itâs easier to arrange for punctuation and white space to be tested in the faster âconsidering caseâ mode.
You could do a similar, once-only thing with âcount of this_textâ.
Apologies if you were already familiar with these concepts.
No, Iâm not really familiar with this. It seems so logical! Anyway, my rewrite of kaiâs cannot handle Unicode text, and so is inferior. We also need to watch out for things like âIâ, in sentence case it is not capitalized.
True the thread is growing like crazy, but think of the really valuable education embedded in this thread for a new scripter. It really does flesh out all the issues involved in what, at first anyway, seems like a straight-forward process. Perhaps when the experts agree that theyâve reached a âgolden masterâ version, then that one would be flagged as âTHEâ version and the rest would be preserved as discussion.
Apologies for the delayed response, NovaScotian. I believe I wrote the comments that prompted your reply - but I take your point about the possible usefulness of such discussions. (Indeed, Iâve found the discussion of differences in script behaviour between OS versions quite illuminating.)
Iâve therefore added a further version (below) that sacrifices some speed & brevity to address several of the issues discussed previously. In exploring these, itâs clear that a comprehensive solution might be possible only with access to a substantial dictionary (proper nouns, acronyms, etc.) - which is perhaps beyond the scope of a relatively simple script such as this. (The script introduces various properties, partly to accelerate runtime execution, but also to accommodate any further adjustments that might be considered necessary.)
I might be able to clear up the confusion there, Nigel. At the time, I was working in Jaguar (not Panther) - which would explain the differences in behaviour. (However, for the record, the script below was written and tested in Tiger).
The problem to which I was referring is a stack overflow error (errOSAStackOverflow: -2706), rather than a crash. In older versions of the Mac OS (including some versions of Mac OS X), the error can occur when the resulting number of string elements (text items, characters [items], words or paragraphs) exceeds about 4,060 (the precise figure can vary).
So, apart from getting a list of characters, the problem has very little to do with actual string length - since it depends primarily on the number of resulting string elements.
While such considerations may be of little concern to those using later versions of the Mac OS, they may still be worth noting for anyone interested in portability (for example, if a script is to be distributed generally - such as in a forum like this).
In later versions, the limit appears to have been removed. However, the algorithm to achieve this seems somewhat buggy - and, where there may be several thousand string elements involved, the dreaded, ever-spinning beach ball can appear. (The point at which this occurs appears to vary quite considerably, so itâs difficult to pin it down with any precision. OMM, itâs very likely to occur above 300,000 or 400,000 items - but has sometimes struck at around 60,000 items.)
Silent hanging aside, there also appear to be efficiency issues when getting particularly long lists of string elements - which may therefore take a disproportionate time to evaluate.
Iâve now modified my original ËtextItemsâ handler (see script below) in an effort to side-step all three issues. So far, it seems to have worked quite effectively - as demonstrated by the following results (all the usual caveats for interpreting execution times apply):
The following version introduces an additional option for case type: âmixedâ - a variation of title-case that renders definite and indefinite articles, conjunctions and prepositions as lowercase (except where they start a sentence):
Thanks for your latest contribution to this thread. The scriptâs pretty remarkable as its âsentenceâ mode even capitalises correctly after brackets and quotes, which donât (at first sight) seem to have been explicitly catered for! Maybe Iâll see how it works when Iâve had more time to study it in detail.
After you signed off last time, I worked out a script â less thorough than yours â that explicitly treated quotes and brackets as âwhitishâ space, but I didnât post it because I began to feel that âsentenceâ mode itself was a mistake â at least in the context of discussing techniques on this forum.
âLowerâ, âupperâ, and âtitleâ modes are easy to implement and have already been adequately covered. Philosophically, they do explicit and grammatically irrelevant things to the text.
The only real use for âsentenceâ mode, though, is as a tidier-upper of bad typing. As youâve already noted, itâs far more complex to implement and involves context. The script needs to be versed in the proper nouns and acronyms of the language of the text. It also needs a thorough knowledge of that languageâs other grammatical features, many of which can be very difficult to handle. For instance, quoted speech in English:
Or mixed contexts:
Unless âsentenceâ mode is given some specific, narrowly-defined purpose, it might be best to leave it to a fully-fledged application or to a âhasâ-sized library.
By the way, thereâs a potential problem with your âuppercaseâ and âlowercaseâ variables. These words are commands in the Satimage OSAX. I wouldnât care to tell you what they doâŠ
As youâve no doubt seen by now, Nigel, it simply capitalises the first character of the first word following a âsentenceBreakâ - and so effectively ignores any intervening characters. AppleScriptâs magic - not mine, Iâm afraid.
Agreed. Iâd been veering towards this conclusion for a while, but my last attempt really clinched it for me. (Iâm sure itâs no coincidence that AppleScript itself readily defines words and paragraphs - but steers well clear of the muddy water of sentences!)
Iâd be happier to leave things as a short, quick fix - rather than attempting to go down the tortuous path of refining any further. Nevertheless, the discussionâs been a very interesting one.
Good point. Thanks for the reminder - I had a feeling they looked familiar! Iâve since edited the script in situ to insert underscores in the variable labels, just for safetyâs sake.
Nothing quite like reinventing the wheel to while away a few spare moments, right? :lol:
The above conversations are very interesting to me as a very novice scripterâI have learned a lot. I am looking for a way to carry out text modifications of the kind described above (uppercase, lowercase, title, etc) on a list rather than on a string. I think there must be a simple modification or addition to some of the scripts youâve shared, but the solution has eluded me so far. Any help would be appreciated. Thanks!