The discussion at http://bbs.applescript.net/viewtopic.php?id=12758, about how to capitalize a string, demonstrated a few different approaches - to which I would have preferred to add this contribution. However, since that’s apparently no longer possible, here it is under a new subject…
While the following vanilla method isn’t exactly short, it’s pretty fast, should preserve non-alphabetic characters (together with any text encoding), is optimised to handle longer strings - and offers several case options: “upper”, “lower”, “title” & “sentence”.
property alphaList : "abcdefghijklmnopqrstuvwxyz"'s items & reverse of "ABCDEFGHIJKLMNOPQRSTUVWXYZ"'s items
on textItems from t
try
t's text items
on error number -2706
tell (count t's text items) div 2 to ¬
my (textItems from (t's text 1 thru text item it)) & ¬
my (textItems from (t's text from text item (it + 1) to -1))
end try
end textItems
to changeCase of t to c
if (count t) is 0 then return t
considering case
if c is not in {"upper", "lower", "title", "sentence"} then
error "The word \"" & c & "\" is not a valid option. Please use \"upper\", \"lower\", \"title\" or \"sentence\"."
else if c is "upper" then
set n to 1
else
set n to -1
end if
set d to text item delimiters
repeat with n from n to n * 26 by n
set text item delimiters to my alphaList's item n
set t to textItems from t
set text item delimiters to my alphaList's item -n
tell t to set t to beginning & ({""} & rest)
end repeat
if c is in {"title", "sentence"} then
if c is "title" then
set s to space
else
set s to ". "
end if
set t to (t's item 1 & s & t)'s text 2 thru -1
repeat with i in {s, tab, return, ASCII character 10}
set text item delimiters to i
if (count t's text items) > 1 then repeat with n from 1 to 26
set text item delimiters to i & my alphaList's item n
if (count t's text items) > 1 then
set t to textItems from t
set text item delimiters to i & my alphaList's item -n
tell t to set t to beginning & ({""} & rest)
end if
end repeat
end repeat
set t to t's text ((count s) + 1) thru -1
end if
set text item delimiters to d
end considering
t
end changeCase
set someText to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE."
changeCase of someText to "upper"
--> "THIS TEXT WILL BE RETURNED WITH CHARACTERS CAPITALISED AS SPECIFIED. ANY OTHER CHARACTERS WILL BE LOWER CASE."
changeCase of someText to "lower"
--> "this text will be returned with characters capitalised as specified. any other characters will be lower case."
changeCase of someText to "title"
--> "This Text Will Be Returned With Characters Capitalised As Specified. Any Other Characters Will Be Lower Case."
changeCase of someText to "sentence"
--> "This text will be returned with characters capitalised as specified. Any other characters will be lower case."
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."
get change_case(this_string, "Title")
on change_case(this_text, this_case)
set new_text to ""
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
return "Error: Case must be UPPER, lower, Title or Sentence"
end if
if this_case is "lower" then
set use_capital to false
else
set use_capital to true
end if
repeat with this_char in this_text
set x to offset of this_char in lower_alphabet
if x is not 0 then
if use_capital then
set new_text to new_text & character x of upper_alphabet as string
if this_case is not "UPPER" then
set use_capital to false
end if
else
set new_text to new_text & character x of lower_alphabet as string
end if
else
if this_case is "Title" and this_char is in white_space then
set use_capital to true
end if
set new_text to new_text & this_char as string
end if
end repeat
return new_text
end change_case
BTW, thank you Ray for re-enabling replies in Code Exchange!
Actually, it was partly your original script that prompted me to write mine, Qwerty.
When I tried your version, only uppercase conversions seemed to work effectively - so I thought I might have a crack at the problem myself. I see that you’ve since modified your script, although I’m afraid it still appears to behave in a similar way to your original (on my machine, at any rate).
It’s also worth bearing in mind that, while the old ‘loop through and check each character’ technique is fine for shorter strings, it can become a bit… ponderous when trying to handle longer gobs of text. In such situations, it might be worth considering an alternative, such as text item delimiters.
To compare the performance of both ‘loop’ and ‘tid’ methods, I timed them on various string lengths, from 50 to 5,000 characters - on a machine that is neither particularly fast nor slow. (To put string length into perspective, your earlier test string contained 154 characters.) In each case, the mix between upper and lowercase was 50-50, and only conversions to uppercase were compared. Obviously, performance will vary from one machine to another, but the figures should offer some general indications:
string times (ms)
length loop tid
50 25 9
100 53 10
500 283 11
1000 553 15
5000 3250 40
To look at it another way, the loop-based handler could convert about 1,700 characters a second, while the tid-based handler could manage up to 100,000 characters in the same time. Of course, none of this is really that critical with the string examples we’ve been using here. But with longer strings, the differences might be worth considering…
UPPER, lower, Title works for me (in my script). Sentence really doesn’t work, how obvious!
Yes, your’s is much faster. I find your script quite difficult to understand in some spots, especially towards the end. Is their any chance you could make a commented version (with english variable names)?
BTW kai, should “gibberish gibberish” (gibberish«tab»gibberish), when converted to title case, be “Gibberish Gibberish”, with the second word having a capital letter? Is that correct? (The tab could be any whitespace item). I am sure you probably did this on purpose, but why?
Hi, kai. On a point of pendantry, since t is definitely a string or Unicode text at this point, not a list, its ‘item 1’ is properly its ‘character 1’. Both work, but since there’s no speed advantage either way, my vote’s for clarity.
Ditto. And we forgive the off-topic stickies in the OS X forum.
I was aware of t’s class, thanks, Nigel - but take your point. Had it been a list, I’d have probably gone for ‘beginning’, rather than ‘item 1’ (which I suppose might be considered yet another form of distinction). I’m afraid that my vote sometimes goes with what’s quicker to type - but apologies if it causes any confusion.
To be honest, I’m having difficulty understanding how “lower” and “Title” can possibly work with uppercase input text - because I just can’t see a mechanism for switching to lowercase when required. Here’s my take on what happens…
At the start of your repeat loop, you have the following statements:
Since anything immediately below this relates to converting a lowercase character (and bearing in mind that we’re tracking an uppercase character), we can skip straight to the corresponding ‘else’ statement further below:
Within the else section, we start with another if/then block, which may (or may not) change the value of the variable ‘use_capital’. However, this can affect only what happens in the repeat loop’s subsequent iteration (influencing how the next character is treated - and not the current one).
The next statement, ‘set new_text to’, etc., which does not refer or react to the value of ‘use_capital’ (or indeed anything else), will evidently be executed regardless. This surely means that the value of ‘this_char’ (an uppercase character) is added to the end of ‘new_text’ - whether ‘this_case’ is “lower”, “Title”, “Sentence” - or whatever.
That’s the theory - but what of the practice? When I run your script, the returned values I get are, specifically:
Results:
These results are exactly what I’d have expected, given my understanding of your code - which is why I’m so puzzled as to why it should apparently work for you…
Can anyone please tell me what I’m missing in my analysis of Qwerty’s script - and why his script should work for him (at least in “lower” and “Title” modes) - and not for me?
In fact, the only way I can get this particular approach to work - is to do something like this:
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
on change_case(this_text, this_case)
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then return "Error: Case must be UPPER, lower, Title or Sentence" -- can't continue
set new_text to "" -- initialise the new string
set use_capital to this_case is not "lower" -- if this_case is "lower", set use_capital to false (otherwise true)
repeat with this_char in this_text
if use_capital then
set currentOffset to offset of this_char in lower_alphabet -- get alphabetical index of lowercase character
if currentOffset is 0 then -- this_char is not lowercase anyway, so:
set new_text to new_text & this_char -- don't change it
else -- this_char is lowercase, so:
set new_text to new_text & upper_alphabet's item currentOffset -- change it to uppercase
end if
if this_case is in {"Title", "Sentence"} then set use_capital to false -- next character should not use_capital
else -- don't use_capital
set currentOffset to offset of this_char in upper_alphabet -- get alphabetical index of uppercase character
if currentOffset is 0 then -- this_char is not uppercase anyway, so:
set new_text to new_text & this_char -- don't change it
else -- this_char is uppercase, so:
set new_text to new_text & lower_alphabet's item currentOffset -- change it to lowercase
end if
if this_char is in white_space and this_case is not "lower" then -- should next character use_capital?
ignoring white space -- in case new_text already ends with white space
if this_case is "Title" or new_text ends with "." and this_case is "Sentence" then set use_capital to true -- next character should use_capital
end ignoring
end if
end if
end repeat
return new_text
end change_case
Results:
Didn’t your modified script aim to treat characters following white space in a similar way, Qwerty?
Title case is generally assumed to mean that each word’s first character is uppercase, and that any remaining letters are lowercase. An alternative interpretation treats words in a broadly similar manner, apart from definite and indefinite articles (e.g. ‘the’ and ‘a’), conjunctions (e.g. ‘and’) and prepositions (e.g. ‘in’, ‘of’). These exceptions can consist of entirely lowercase characters.
For simplicity, I obviously based my script on the former definition. However, I don’t really understand why, when considering title case, you question capitalising in this way. If you don’t capitalise after white space, then surely most words would be lower case?
(I’m aware of some potential issues with the algorithm that I used, but I don’t think they include the more common forms of white space.)
It works in Tiger, but not in Jaguar or (apparently) Panther.
In Tiger, ‘offset’ now responds to ‘considering’ and ignoring’ conditions. It’s case-insensitive unless ‘considering case’ is used, which means that this_char is always found in lower_alphabet if it’s a letter and never if it’s punctuation or white space. In Qwerty’s script, the ‘else’ section deals specifically with non-letters. It works, but only in Tiger. Kai’s effort at the top of this thread works on both of my X systems, but I haven’t checked out the tab thing yet. (Time for bed! :))
By the way, TIDs are also now subject to ‘considering’ and ‘ignoring’, but only when the main text is Unicode. With ‘strings’, they still behave in the old way.
Thanks very much, Nigel. I’d noted your point previously about the behaviour of tids with Unicode text in Tiger - but hadn’t yet assimilated the impact of changes affecting offset. Taking that script apart was driving me nuts! Just something else to watch out for - but that’s progress I suppose.
And now that you’ve put my mind at rest, I can get some shuteye too. Thanks again, Mr. G. - and g’night, folks!
I’ve just, realised (I think) to what you were referring, Qwerty - so let me try that again. I don’t think there’s a tabs issue with title case. However, having taken another look at the whole thing, I’d say there are definitely one or two general white space issues with sentence case. I don’t really have time to address them right now, but I’ll certainly take a look a little later (assuming that Nigel hasn’t completely rewritten the script by then).
Gosh, what have I been missing out on! Sorry kai, I did mean “Sentence” case, not “Title”, how stupid of me! Nigel, I’m using 10.3.9, so it’s not only Tiger.
Does this work for you? (still slow):
(*
Evolved from Apple's 'Change Case of Item Names.scpt' - part of the Finder scripts.
You have the option of UPPER, lower, Title or Sentence cases.
Accepts non-alphabetic characters.
*)
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."
get change_case(this_string, "Sentence")
on change_case(this_text, this_case)
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
return "Error: Case must be UPPER, lower, Title or Sentence"
end if
set new_text to ""
if this_case is "lower" then
set use_capital to false
else
set use_capital to true
end if
repeat with i from 1 to count of this_text
set this_char to character i of this_text
ignoring case
set x to offset of this_char in lower_alphabet
end ignoring
if x is not 0 then
if use_capital then
set new_text to new_text & character x of upper_alphabet as string
if this_case is not "UPPER" then
set use_capital to false
end if
else
set new_text to new_text & character x of lower_alphabet as string
end if
else
if this_case is "Title" and this_char is in white_space then
set use_capital to true
else if this_case is "Sentence" and this_char is "." and ¬
i is not (count of this_text) and ¬
character (i + 1) of this_text is in white_space then
set use_capital to true
end if
set new_text to new_text & this_char as string
end if
end repeat
return new_text
end change_case
Here is kai’s script rewritten. It should work the same, well as much as I can see, except the tab bug has been fixed.
I have tried to make it more readable, including logical variables names.
I don’t really know where to put considering case, because it works fine on my machine without it. (But you should be aware that putting it around your ‘if c is not in {“upper”, “lower”, “title”, “sentence”} then’ line will only allow input of case as lowercase (an error will come up if you use ‘changeCase of someText to “Lower”’, for instance)).
There is a bug, though (in both this and the original). This (part here), when in title case, will not have a capital letter on ‘part’.
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
set this_string to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE. Text containing punctuation: x-ray, don't."
on change_case(this_text, this_case)
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
error "Error: Case must be UPPER, lower, Title or Sentence"
else if this_case is "UPPER" then
set case_alphabet to {lower_alphabet, upper_alphabet}
else
set case_alphabet to {upper_alphabet, lower_alphabet}
end if
set old_delimiters to text item delimiters
repeat with i from 1 to 26
set text item delimiters to item i of (item 1 of case_alphabet as string)
set this_text to text items of this_text
set text item delimiters to item i of (item 2 of case_alphabet as string)
set this_text to text items of this_text as string
end repeat
if this_case is in {"Title", "Sentence"} then
if this_case is "Title" then
set this_space to space
else
set this_space to ". "
end if
set this_text to this_space & this_text
repeat with this_white in white_space
if this_case is "Sentence" then
set this_white to "." & this_white
end if
set text item delimiters to this_white
if (count of text items of this_text) > 1 then repeat with i from 1 to 26
set text item delimiters to this_white & item i of lower_alphabet
if (count of text items of this_text) > 1 then
set this_text to text items of this_text
set text item delimiters to this_white & item i of upper_alphabet
set this_text to text items of this_text as string
end if
end repeat
end repeat
set text item delimiters to ""
set this_text to text ((count this_space) + 1) thru -1 of this_text
end if
set text item delimiters to old_delimiters
return this_text
end change_case
change_case(this_string, "UPPER")
--> "THIS TEXT WILL BE RETURNED WITH CHARACTERS CAPITALISED AS SPECIFIED. ANY OTHER CHARACTERS WILL BE LOWER CASE. TEXT CONTAINING PUNCTUATION: X-RAY, DON'T."
change_case(this_string, "lower")
--> "this text will be returned with characters capitalised as specified. any other characters will be lower case. text containing punctuation: x-ray, don't."
change_case(this_string, "Title")
--> "This Text Will Be Returned With Characters Capitalised As Specified. Any Other Characters Will Be Lower Case. Text Containing Punctuation: X-ray, Don't."
change_case(this_string, "Sentence")
--> "This text will be returned with characters capitalised as specified. Any other characters will be lower case. Text containing punctuation: x-ray, don't."
Hi, Qwerty. Gosh! Nothing like a bit of confusion to start the day. If you and kai are both using Panther, I’ve no idea why your script works for you and not for him. However, what I said last night about the script working in Tiger but not in Jaguar still seems to be true.
I hadn’t checked “Sentence” mode then. My results with that this morning are:
JAGUAR:
Both versions of your script capitalise the first letter of a sentence if it was lower case, but capitalise the second letter too if the first was already upper case. (And the third if the first two were already upper case, etc.)
TIGER:
Your first version capitalises the first letter of the first sentence, but actively lower-cases the first letter of subsequent sentences. The second version appears to work OK, but not, of course, if the previous sentence ends with a question mark, exclamation mark, or quote.
Putting the ‘offset’ line in an ‘ignoring case’ block has no effect. ‘Ignoring’ is the default setting for case where ‘ignoring’ and ‘considering’ apply. In Jaguar (and Panther?), they don’t apply: ‘offset’ is exclusively case sensitive.
Here’s a version of your approach that works properly in both Jaguar and Tiger. Obviously it’ll need a longer alphabet if it’s likely to encounter diacritical characters.
property alphabet : "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
property terminators : ".!?"
set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't." as Unicode text
get change_case(this_string, "title")
on change_case(this_text, this_case)
set new_text to {}
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
return "Error: Case must be UPPER, lower, Title or Sentence"
end if
set is_upper_mode to (this_case is "UPPER")
set is_title_mode to (this_case is "Title")
set is_sentence_mode to (this_case is "Sentence")
set use_capital to (this_case is not "lower")
if (this_text's class is Unicode text) then
set alpha to alphabet as Unicode text
else if (this_text's class is string) then
set alpha to alphabet as string
else
display dialog "OH NO! WE'RE ALL GOING TO DIE!" buttons {"AAAGGHHH!"} default button 1 with icon caution
error number -128
end if
considering case -- for speed and to customise 'offset' in Tiger
repeat with this_char in this_text
set x to offset of this_char in alpha
if (x > 0) then
if (use_capital) then
set end of new_text to character ((x - 1) mod 26 + 27) of alpha
set use_capital to (is_upper_mode)
else
set end of new_text to character ((x - 1) mod 26 + 1) of alpha
end if
else
if (is_sentence_mode and this_char is in terminators) or (is_title_mode and this_char is in white_space) then
set use_capital to true
end if
set end of new_text to this_char's contents
end if
end repeat
end considering
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
tell new_text to set new_text to its beginning & its rest
set AppleScript's text item delimiters to astid
return new_text
end change_case
PS. This is nothing to do with your latest post, which I’ve only just seen. What do you mean by tab “bug”?
PPS. kai doesn’t seem to take much sleep, does he?
The tab bug is where “gibberish gibberish” (gibberish«tab»gibberish), when converted to sentence case, is “Gibberish Gibberish” in kai’s script. The second word should not be capitalized.
P.S. Yes, kai seems to be full time on this forum.
P.P.S. Good point about the different sentence terminators!?.!
It may be worth noting that, on long strings, Qwerty’s rewrite will break due to a stack overflow error. OMM, it tripped over a string (based largely on those tested earlier) of 57,586 characters. (As some of you will know, the overflow threshold can vary considerably, depending on string content - and possibly on OS, too. In the most extreme cases, it might even be as low as 4000 - 5000 characters.) In addition, a few sentence case issues still remain (to which I may now have a solution, of sorts). These include white space at the beginning of a string and multiple white spaces following a full point.
However, I’m slightly concerned that this thread is kinda ‘growing like topsy’ for a Code Exchange item. Since I feel somewhat responsible for much of the noise, now might be an appropriate time for me to bow out (to attend, anyway, to a rather pressing local issue) - and to leave the conclusion of this lively discussion to you good guys. I look forward to seeing the results with great interest.
:lol: That’s just one of the burdens of a perpetual insomniac, Nigel. Can play havoc with one’s social life, too!
Nigel, thanks for your rewrite of mine!
Unfortunately, it didn’t work with periods that didn’t have a space after them, like a decimal number. Hopefully, this one should. I have kept the alphabets split to make it easy to compare if you did want to add diacritical characters.
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
property sentence_terminators : ".!?"
set this_string to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE. Text containing punctuation: x-ray, don't. This sentence contains a real number 5.3 with text following."
get change_case(this_string, "Sentence")
on change_case(this_text, this_case)
if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
error "Error: Case must be UPPER, lower, Title or Sentence"
end if
set new_text to {}
set use_capital to this_case is not "lower"
if class of this_text is Unicode text then
set case_alphabet to lower_alphabet & upper_alphabet as Unicode text
else if class of this_text is string then
set case_alphabet to lower_alphabet & upper_alphabet as string
else
display dialog "OH NO! WE'RE ALL GOING TO DIE!" buttons {"AAAGGHHH!"} default button 1 with icon caution
error number -128
end if
repeat with i from 1 to count of this_text
set this_char to character i of this_text
considering case -- for speed and to customise 'offset' in Tiger
set this_offset to offset of this_char in case_alphabet
end considering
if this_offset is not 0 then
if use_capital then
set end of new_text to character ((this_offset - 1) mod 26 + 27) of case_alphabet
set use_capital to this_case is "UPPER"
else
set end of new_text to character ((this_offset - 1) mod 26 + 1) of case_alphabet
end if
else
if (this_case is "Title" and this_char is in white_space) or (this_case is "Sentence" and this_char is in sentence_terminators and ¬
i is not (count of this_text) and character (i + 1) of this_text is in white_space) then
set use_capital to true
end if
set end of new_text to this_char
end if
end repeat
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
tell new_text to set new_text to its beginning & its rest
set AppleScript's text item delimiters to astid
return new_text
end change_case
Unfortunately kai, this script does handle spaces at the beginning and multiples after periods. If this script was written in say, C or Obejctive-C, wouldn’t it be as fast as yours?
Using my rewrite of kai’s, I can use this code to generate the input string and it still does not crash for me (using his reply to me as the source text ;)). I don’t know what the problem is?
set this_string to ""
repeat until (count of this_string) > 500000
set this_string to this_string & "It may be worth noting that, on long strings, Qwerty's rewrite will break due to a stack overflow error. OMM, it tripped over a string (based largely on those tested earlier) of 57,586 characters. (As some of you will know, the overflow threshold can vary considerably, depending on string content - and possibly on OS, too. In the most extreme cases, it might even be as low as 4000 - 5000 characters.) In addition, a few sentence case issues still remain (to which I may now have a solution, of sorts). These include white space at the beginning of a string and multiple white spaces following a full point.
However, I'm slightly concerned that this thread is kinda 'growing like topsy' for a Code Exchange item. Since I feel somewhat responsible for much of the noise, now might be an appropriate time for me to bow out (to attend, anyway, to a rather pressing local issue) - and to leave the conclusion of this lively discussion to you good guys. I look forward to seeing the results with great interest. "
end repeat