Wednesday, May 18, 2022

#1 2005-07-03 07:36:30 pm

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Change Text Case

The discussion at http://bbs.applescript.net/viewtopic.php?id=12758, about how to capitalize a string, demonstrated a few different approaches - to which I would have preferred to add this contribution. However, since that's apparently no longer possible, here it is under a new subject...

While the following vanilla method isn't exactly short, it's pretty fast, should preserve non-alphabetic characters (together with any text encoding), is optimised to handle longer strings - and offers several case options: "upper", "lower", "title" & "sentence".

Applescript:

property alphaList : "abcdefghijklmnopqrstuvwxyz"'s items & reverse of "ABCDEFGHIJKLMNOPQRSTUVWXYZ"'s items

on textItems from t
   try
       t's text items
   on error number -2706
       tell (count t's text items) div 2 to ¬
           my (textItems from (t's text 1 thru text item it)) & ¬
           my (textItems from (t's text from text item (it + 1) to -1))
   end try
end textItems

to changeCase of t to c
   if (count t) is 0 then return t
   considering case
       if c is not in {"upper", "lower", "title", "sentence"} then
           error "The word \"" & c & "\" is not a valid option. Please use \"upper\", \"lower\", \"title\" or \"sentence\"."
       else if c is "upper" then
           set n to 1
       else
           set n to -1
       end if
       set d to text item delimiters
       repeat with n from n to n * 26 by n
           set text item delimiters to my alphaList's item n
           set t to textItems from t
           set text item delimiters to my alphaList's item -n
           tell t to set t to beginning & ({""} & rest)
       end repeat
       if c is in {"title", "sentence"} then
           if c is "title" then
               set s to space
           else
               set s to ". "
           end if
           set t to (t's item 1 & s & t)'s text 2 thru -1
           repeat with i in {s, tab, return, ASCII character 10}
               set text item delimiters to i
               if (count t's text items) > 1 then repeat with n from 1 to 26
                   set text item delimiters to i & my alphaList's item n
                   if (count t's text items) > 1 then
                       set t to textItems from t
                       set text item delimiters to i & my alphaList's item -n
                       tell t to set t to beginning & ({""} & rest)
                   end if
               end repeat
           end repeat
           set t to t's text ((count s) + 1) thru -1
       end if
       set text item delimiters to d
   end considering
   t
end changeCase

set someText to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE."

changeCase of someText to "upper"
--> "THIS TEXT WILL BE RETURNED WITH CHARACTERS CAPITALISED AS SPECIFIED. ANY OTHER CHARACTERS WILL BE LOWER CASE."

changeCase of someText to "lower"
--> "this text will be returned with characters capitalised as specified. any other characters will be lower case."

changeCase of someText to "title"
--> "This Text Will Be Returned With Characters Capitalised As Specified. Any Other Characters Will Be Lower Case."

changeCase of someText to "sentence"
--> "This text will be returned with characters capitalised as specified. Any other characters will be lower case."

Last edited by kai (2005-07-03 08:20:44 pm)


kai


Filed under: text, case

Offline

 

#2 2005-07-05 09:41:08 pm

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

I posted this script at http://bbs.applescript.net/viewtopic.php?id=12758, which also should handle lower, UPPER, Title and Sentence.

Applescript:

property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}

set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."

get change_case(this_string, "Title")

on change_case(this_text, this_case)
   set new_text to ""
   if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
       return "Error: Case must be UPPER, lower, Title or Sentence"
   end if
   if this_case is "lower" then
       set use_capital to false
   else
       set use_capital to true
   end if
   repeat with this_char in this_text
       set x to offset of this_char in lower_alphabet
       if x is not 0 then
           if use_capital then
               set new_text to new_text & character x of upper_alphabet as string
               if this_case is not "UPPER" then
                   set use_capital to false
               end if
           else
               set new_text to new_text & character x of lower_alphabet as string
           end if
       else
           if this_case is "Title" and this_char is in white_space then
               set use_capital to true
           end if
           set new_text to new_text & this_char as string
       end if
   end repeat
   return new_text
end change_case

BTW, thank you Ray for re-enabling replies in Code Exchange! smile

Offline

 

#3 2005-07-07 09:14:45 pm

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Re: Change Text Case

Qwerty Denzel wrote:

I posted this script at http://bbs.applescript.net/viewtopic.php?id=12758, which also should handle lower, UPPER, Title and Sentence.


Actually, it was partly your original script that prompted me to write mine, Qwerty.

When I tried your version, only uppercase conversions seemed to work effectively - so I thought I might have a crack at the problem myself. I see that you've since modified your script, although I'm afraid it still appears to behave in a similar way to your original (on my machine, at any rate).

It's also worth bearing in mind that, while the old 'loop through and check each character' technique is fine for shorter strings, it can become a bit... ponderous when trying to handle longer gobs of text. In such situations, it might be worth considering an alternative, such as text item delimiters.

To compare the performance of both 'loop' and 'tid' methods, I timed them on various string lengths, from 50 to 5,000 characters - on a machine that is neither particularly fast nor slow. (To put string length into perspective, your earlier test string contained 154 characters.) In each case, the mix between upper and lowercase was 50-50, and only conversions to uppercase were compared. Obviously, performance will vary from one machine to another, but the figures should offer some general indications:

string    times (ms)
length    loop    tid

    50       25      9
  100       53     10
  500      283     11
1000      553     15
5000    3250     40

To look at it another way, the loop-based handler could convert about 1,700 characters a second, while the tid-based handler could manage up to 100,000 characters in the same time. Of course, none of this is really that critical with the string examples we've been using here. But with longer strings, the differences might be worth considering...
smile

Last edited by kai (2005-07-07 10:19:14 pm)


kai

Offline

 

#4 2005-07-08 01:05:50 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

UPPER, lower, Title works for me (in my script). Sentence really doesn't work, how obvious!
Yes, your's is much faster. I find your script quite difficult to understand in some spots, especially towards the end. Is their any chance you could make a commented version (with english variable names)? smile

Offline

 

#5 2005-07-08 03:35:27 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

BTW kai, should "gibberish    gibberish" (gibberish«tab»gibberish), when converted to title case, be "Gibberish    Gibberish", with the second word having a capital letter? Is that correct? (The tab could be any whitespace item). I am sure you probably did this on purpose, but why?

Offline

 

#6 2005-07-08 01:12:09 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5512

Re: Change Text Case

kai wrote:

set t to (t's item 1 & s & t)'s text 2 thru -1


Hi, kai. On a point of pendantry, since t is definitely a string or Unicode text at this point, not a list, its 'item 1' is properly its 'character 1'. Both work, but since there's no speed advantage either way, my vote's for clarity.  smile

Qwerty Denzel wrote:

BTW, thank you Ray for re-enabling replies in Code Exchange! smile


Ditto. And we forgive the off-topic stickies in the OS X forum.  wink


NG

Online

 

#7 2005-07-08 04:46:58 pm

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Re: Change Text Case

Nigel Garvey wrote:
kai wrote:

set t to (t's item 1 & s & t)'s text 2 thru -1


Hi, kai. On a point of pendantry, since t is definitely a string or Unicode text at this point, not a list, its 'item 1' is properly its 'character 1'. Both work, but since there's no speed advantage either way, my vote's for clarity.  smile


I was aware of t's class, thanks, Nigel - but take your point. Had it been a list, I'd have probably gone for 'beginning', rather than 'item 1' (which I suppose might be considered yet another form of distinction). I'm afraid that my vote sometimes goes with what's quicker to type - but apologies if it causes any confusion. smile


kai

Offline

 

#8 2005-07-08 05:25:40 pm

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Re: Change Text Case

Qwerty Denzel wrote:

UPPER, lower, Title works for me (in my script). Sentence really doesn't work, how obvious!


To be honest, I'm having difficulty understanding how "lower" and "Title" can possibly work with uppercase input text - because I just can't see a mechanism for switching to lowercase when required. Here's my take on what happens...

At the start of your repeat loop, you have the following statements:

    repeat with this_char in this_text
        set x to offset of this_char in lower_alphabet
        if x is not 0 then -- character is lowercase


Since anything immediately below this relates to converting a lowercase character (and bearing in mind that we're tracking an uppercase character), we can skip straight to the corresponding 'else' statement further below:

        else -- character is either uppercase or non-alpha
            if this_case is "Title" and this_char is in white_space then
                set use_capital to true
            end if
            set new_text to new_text & this_char as string
        end if
    end repeat


Within the else section, we start with another if/then block, which may (or may not) change the value of the variable 'use_capital'. However, this can affect only what happens in the repeat loop's subsequent iteration (influencing how the next character is treated - and not the current one).

The next statement, 'set new_text to', etc., which does not refer or react to the value of 'use_capital' (or indeed anything else), will evidently be executed regardless. This surely means that the value of 'this_char' (an uppercase character) is added to the end of 'new_text' - whether 'this_case' is "lower", "Title", "Sentence" - or whatever.

That's the theory - but what of the practice? When I run your script, the returned values I get are, specifically:

Results:

set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."

change_case(this_string, "UPPER")
--> "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. TEXT CONTAINING PUNCTUATION: X-RAY, DON'T."

change_case(this_string, "lower")
--> "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."

change_case(this_string, "Title")
--> "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. TExt Containing Punctuation: X-ray, Don't."

change_case(this_string, "Sentence")
--> "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. TExt containing punctuation: x-ray, don't."


These results are exactly what I'd have expected, given my understanding of your code - which is why I'm so puzzled as to why it should apparently work for you...

Can anyone please tell me what I'm missing in my analysis of Qwerty's script - and why his script should work for him (at least in "lower" and "Title" modes) - and not for me? hmm

In fact, the only way I can get this particular approach to work - is to do something like this:

Applescript:

property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}

on change_case(this_text, this_case)
   
   if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then return "Error: Case must be UPPER, lower, Title or Sentence" -- can't continue
   
   set new_text to "" -- initialise the new string
   
   set use_capital to this_case is not "lower" -- if this_case is "lower", set use_capital to false (otherwise true)
   
   repeat with this_char in this_text
       
       if use_capital then
           
           set currentOffset to offset of this_char in lower_alphabet -- get alphabetical index of lowercase character
           if currentOffset is 0 then -- this_char is not lowercase anyway, so:
               set new_text to new_text & this_char -- don't change it
           else -- this_char is lowercase, so:
               set new_text to new_text & upper_alphabet's item currentOffset -- change it to uppercase
           end if
           
           
           if this_case is in {"Title", "Sentence"} then set use_capital to false -- next character should not use_capital
           
       else -- don't use_capital
           
           set currentOffset to offset of this_char in upper_alphabet -- get alphabetical index of uppercase character
           if currentOffset is 0 then -- this_char is not uppercase anyway, so:
               set new_text to new_text & this_char -- don't change it
           else -- this_char is uppercase, so:
               set new_text to new_text & lower_alphabet's item currentOffset -- change it to lowercase
           end if
           
           if this_char is in white_space and this_case is not "lower" then -- should next character use_capital?
               ignoring white space -- in case new_text already ends with white space
                   if this_case is "Title" or new_text ends with "." and this_case is "Sentence" then set use_capital to true -- next character should use_capital
               end ignoring
           end if
       end if
       
   end repeat
   return new_text
end change_case

Results:

set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."

change_case(this_string, "UPPER")
--> "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. TEXT CONTAINING PUNCTUATION: X-RAY, DON'T."

change_case(this_string, "lower")
--> "this string will come back with the first letter of every word being a capital and the rest will be lower case. text containing punctuation: x-ray, don't."

change_case(this_string, "Title")
--> "This String Will Come Back With The First Letter Of Every Word Being A Capital And The Rest Will Be Lower Case. Text Containing Punctuation: X-ray, Don't."

change_case(this_string, "Sentence")
--> "This string will come back with the first letter of every word being a capital and the rest will be lower case. Text containing punctuation: x-ray, don't."


Qwerty Denzel wrote:

BTW kai, should "gibberish    gibberish" (gibberish«tab»gibberish), when converted to title case, be "Gibberish    Gibberish", with the second word having a capital letter? Is that correct? (The tab could be any whitespace item). I am sure you probably did this on purpose, but why?


Didn't your modified script aim to treat characters following white space in a similar way, Qwerty? wink

Title case is generally assumed to mean that each word's first character is uppercase, and that any remaining letters are lowercase. An alternative interpretation treats words in a broadly similar manner, apart from definite and indefinite articles (e.g. 'the' and 'a'), conjunctions (e.g. 'and') and prepositions (e.g. 'in', 'of'). These exceptions can consist of entirely lowercase characters.

For simplicity, I obviously based my script on the former definition. However, I don't really understand why, when considering title case, you question capitalising in this way. If you don't capitalise after white space, then surely most words would be lower case?

(I'm aware of some potential issues with the algorithm that I used, but I don't think they include the more common forms of white space.) smile


kai

Offline

 

#9 2005-07-08 07:35:46 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5512

Re: Change Text Case

kai wrote:

Can anyone please tell me what I'm missing in my analysis of Qwerty's script - and why his script should work for him (at least in "lower" and "Title" modes) - and not for me? hmm


It works in Tiger, but not in Jaguar or (apparently) Panther.

In Tiger, 'offset' now responds to 'considering' and ignoring' conditions. It's case-insensitive unless 'considering case' is used, which means that this_char is always found in lower_alphabet if it's a letter and never if it's punctuation or white space. In Qwerty's script, the 'else' section deals specifically with non-letters. It works, but only in Tiger. Kai's effort at the top of this thread works on both of my X systems, but I haven't checked out the tab thing yet. (Time for bed!  smile)

By the way, TIDs are also now subject to 'considering' and 'ignoring', but only when the main text is Unicode. With 'strings', they still behave in the old way.


NG

Online

 

#10 2005-07-08 09:49:52 pm

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Re: Change Text Case

Nigel Garvey wrote:

In Tiger, 'offset' now responds to 'considering' and ignoring' conditions. It's case-insensitive unless 'considering case' is used, which means that this_char is always found in lower_alphabet if it's a letter and never if it's punctuation or white space. In Qwerty's script, the 'else' section deals specifically with non-letters. It works, but only in Tiger.


Thanks very much, Nigel. I'd noted your point previously about the behaviour of tids with Unicode text in Tiger - but hadn't yet assimilated the impact of changes affecting offset. Taking that script apart was driving me nuts! Just something else to watch out for - but that's progress I suppose. wink

Kai's effort at the top of this thread works on both of my X systems, but I haven't checked out the tab thing yet. (Time for bed!  smile)


And now that you've put my mind at rest, I can get some shuteye too. Thanks again, Mr. G. - and g'night, folks! smile


kai

Offline

 

#11 2005-07-08 11:54:07 pm

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Re: Change Text Case

Qwerty Denzel wrote:

BTW kai, should "gibberish    gibberish" (gibberish«tab»gibberish), when converted to title case, be "Gibberish    Gibberish", with the second word having a capital letter? Is that correct? (The tab could be any whitespace item). I am sure you probably did this on purpose, but why?


I've just, realised (I think) to what you were referring, Qwerty - so let me try that again. I don't think there's a tabs issue with title case. However, having taken another look at the whole thing, I'd say there are definitely one or two general white space issues with sentence case. I don't really have time to address them right now, but I'll certainly take a look a little later (assuming that Nigel hasn't completely rewritten the script by then). wink

Last edited by kai (2005-07-09 12:25:03 am)


kai

Offline

 

#12 2005-07-09 03:16:11 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

Gosh, what have I been missing out on! Sorry kai, I did mean "Sentence" case, not "Title", how stupid of me! Nigel, I'm using 10.3.9, so it's not only Tiger.
Does this work for you? (still slow):

Applescript:

(*
Evolved from Apple's 'Change Case of Item Names.scpt' - part of the Finder scripts.
You have the option of UPPER, lower, Title or Sentence cases.
Accepts non-alphabetic characters.
*)


property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}

set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't."

get change_case(this_string, "Sentence")

on change_case(this_text, this_case)
   if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
       return "Error: Case must be UPPER, lower, Title or Sentence"
   end if
   set new_text to ""
   if this_case is "lower" then
       set use_capital to false
   else
       set use_capital to true
   end if
   repeat with i from 1 to count of this_text
       set this_char to character i of this_text
       ignoring case
           set x to offset of this_char in lower_alphabet
       end ignoring
       if x is not 0 then
           if use_capital then
               set new_text to new_text & character x of upper_alphabet as string
               if this_case is not "UPPER" then
                   set use_capital to false
               end if
           else
               set new_text to new_text & character x of lower_alphabet as string
           end if
       else
           if this_case is "Title" and this_char is in white_space then
               set use_capital to true
           else if this_case is "Sentence" and this_char is "." and ¬
               i is not (count of this_text) and ¬
               character (i + 1) of this_text is in white_space then
               set use_capital to true
           end if
           
           set new_text to new_text & this_char as string
       end if
   end repeat
   return new_text
end change_case

Offline

 

#13 2005-07-09 05:13:16 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

Here is kai's script rewritten. It should work the same, well as much as I can see, except the tab bug has been fixed.
I have tried to make it more readable, including logical variables names.
I don't really know where to put considering case, because it works fine on my machine without it. (But you should be aware that putting it around your 'if c is not in {"upper", "lower", "title", "sentence"} then' line will only allow input of case as lowercase (an error will come up if you use 'changeCase of someText to "Lower"', for instance)).

There is a bug, though (in both this and the original). This (part here), when in title case, will not have a capital letter on 'part'.

Applescript:

property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}

set this_string to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE. Text containing punctuation: x-ray, don't."

on change_case(this_text, this_case)
   if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
       error "Error: Case must be UPPER, lower, Title or Sentence"
   else if this_case is "UPPER" then
       set case_alphabet to {lower_alphabet, upper_alphabet}
   else
       set case_alphabet to {upper_alphabet, lower_alphabet}
   end if
   set old_delimiters to text item delimiters
   repeat with i from 1 to 26
       set text item delimiters to item i of (item 1 of case_alphabet as string)
       set this_text to text items of this_text
       set text item delimiters to item i of (item 2 of case_alphabet as string)
       set this_text to text items of this_text as string
   end repeat
   if this_case is in {"Title", "Sentence"} then
       if this_case is "Title" then
           set this_space to space
       else
           set this_space to ". "
       end if
       set this_text to this_space & this_text
       repeat with this_white in white_space
           if this_case is "Sentence" then
               set this_white to "." & this_white
           end if
           set text item delimiters to this_white
           if (count of text items of this_text) > 1 then repeat with i from 1 to 26
               set text item delimiters to this_white & item i of lower_alphabet
               if (count of text items of this_text) > 1 then
                   set this_text to text items of this_text
                   set text item delimiters to this_white & item i of upper_alphabet
                   set this_text to text items of this_text as string
               end if
           end repeat
       end repeat
       set text item delimiters to ""
       
       set this_text to text ((count this_space) + 1) thru -1 of this_text
   end if
   set text item delimiters to old_delimiters
   return this_text
end change_case

change_case(this_string, "UPPER")
--> "THIS TEXT WILL BE RETURNED WITH CHARACTERS CAPITALISED AS SPECIFIED. ANY OTHER CHARACTERS WILL BE LOWER CASE. TEXT CONTAINING PUNCTUATION: X-RAY, DON'T."

change_case(this_string, "lower")
--> "this text will be returned with characters capitalised as specified. any other characters will be lower case. text containing punctuation: x-ray, don't."

change_case(this_string, "Title")
--> "This Text Will Be Returned With Characters Capitalised As Specified. Any Other Characters Will Be Lower Case. Text Containing Punctuation: X-ray, Don't."

change_case(this_string, "Sentence")
--> "This text will be returned with characters capitalised as specified. Any other characters will be lower case. Text containing punctuation: x-ray, don't."

Last edited by Qwerty Denzel (2005-07-09 05:54:10 am)

Offline

 

#14 2005-07-09 07:52:31 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5512

Re: Change Text Case

Hi, Qwerty. Gosh! Nothing like a bit of confusion to start the day.  wink  If you and kai are both using Panther, I've no idea why your script works for you and not for him. However, what I said last night about the script working in Tiger but not in Jaguar still seems to be true.

I hadn't checked "Sentence" mode then. My results with that this morning are:

JAGUAR:
Both versions of your script capitalise the first letter of a sentence if it was lower case, but capitalise the second letter too if the first was already upper case. (And the third if the first two were already upper case, etc.)

TIGER:
Your first version capitalises the first letter of the first sentence, but actively lower-cases the first letter of subsequent sentences. The second version appears to work OK, but not, of course, if the previous sentence ends with a question mark, exclamation mark, or quote.

Putting the 'offset' line in an 'ignoring case' block has no effect. 'Ignoring' is the default setting for case where 'ignoring' and 'considering' apply. In Jaguar (and Panther?), they don't apply: 'offset' is exclusively case sensitive.

Here's a version of your approach that works properly in both Jaguar and Tiger. Obviously it'll need a longer alphabet if it's likely to encounter diacritical characters.

Applescript:

property alphabet : "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
property terminators : ".!?"

set this_string to "THIS STRING WILL COME BACK WITH THE FIRST LETTER OF EVERY WORD BEING A CAPITAL AND THE REST WILL BE LOWER CASE. Text containing punctuation: x-ray, don't." as Unicode text

get change_case(this_string, "title")

on change_case(this_text, this_case)
   set new_text to {}
   if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
       return "Error: Case must be UPPER, lower, Title or Sentence"
   end if
   
   set is_upper_mode to (this_case is "UPPER")
   set is_title_mode to (this_case is "Title")
   set is_sentence_mode to (this_case is "Sentence")
   set use_capital to (this_case is not "lower")
   if (this_text's class is Unicode text) then
       set alpha to alphabet as Unicode text
   else if (this_text's class is string) then
       set alpha to alphabet as string
   else
       display dialog "OH NO! WE'RE ALL GOING TO DIE!" buttons {"AAAGGHHH!"} default button 1 with icon caution
       error number -128
   end if
   
   considering case -- for speed and to customise 'offset' in Tiger
       repeat with this_char in this_text
           set x to offset of this_char in alpha
           if (x > 0) then
               if (use_capital) then
                   set end of new_text to character ((x - 1) mod 26 + 27) of alpha
                   set use_capital to (is_upper_mode)
               else
                   set end of new_text to character ((x - 1) mod 26 + 1) of alpha
               end if
           else
               if (is_sentence_mode and this_char is in terminators) or (is_title_mode and this_char is in white_space) then
                   set use_capital to true
               end if
               set end of new_text to this_char's contents
           end if
       end repeat
   end considering
   
   set astid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to ""
   tell new_text to set new_text to its beginning & its rest
   set AppleScript's text item delimiters to astid
   return new_text
end change_case

PS. This is nothing to do with your latest post, which I've only just seen. What do you mean by tab "bug"?

PPS. kai doesn't seem to take much sleep, does he?  smile


NG

Online

 

#15 2005-07-09 08:21:23 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5512

Re: Change Text Case

I wrote:

What do you mean by tab "bug"?


OK. I've seen it now. Sorry.  roll


NG

Online

 

#16 2005-07-09 08:27:00 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

The tab bug is where  "gibberish    gibberish" (gibberish«tab»gibberish), when converted to sentence case, is "Gibberish    Gibberish" in kai's script. The second word should not be capitalized.
P.S. smile Yes, kai seems to be full time on this forum.

P.P.S. Good point about the different sentence terminators!?.! tongue

Offline

 

#17 2005-07-09 08:28:45 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

Nigel Garvey wrote:
I wrote:

What do you mean by tab "bug"?


OK. I've seen it now. Sorry.  roll


Sorry, I'm being a bit slow too!

Offline

 

#18 2005-07-09 10:10:30 am

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Re: Change Text Case

Qwerty Denzel wrote:

Here is kai's script rewritten.


It may be worth noting that, on long strings, Qwerty's rewrite will break due to a stack overflow error. OMM, it tripped over a string (based largely on those tested earlier) of 57,586 characters. (As some of you will know, the overflow threshold can vary considerably, depending on string content - and possibly on OS, too. In the most extreme cases, it might even be as low as 4000 - 5000 characters.) In addition, a few sentence case issues still remain (to which I may now have a solution, of sorts). These include white space at the beginning of a string and multiple white spaces following a full point.

However, I'm slightly concerned that this thread is kinda 'growing like topsy' for a Code Exchange item. Since I feel somewhat responsible for much of the noise, now might be an appropriate time for me to bow out (to attend, anyway, to a rather pressing local issue) - and to leave the conclusion of this lively discussion to you good guys. I look forward to seeing the results with great interest. smile

Nigel Garvey wrote:

PPS. kai doesn't seem to take much sleep, does he?  smile


lol That's just one of the burdens of a perpetual insomniac, Nigel. Can play havoc with one's social life, too! wink

Last edited by kai (2005-07-09 10:11:17 am)


kai

Offline

 

#19 2005-07-10 03:00:51 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

Nigel, thanks for your rewrite of mine!
Unfortunately, it didn't work with periods that didn't have a space after them, like a decimal number. Hopefully, this one should. I have kept the alphabets split to make it easy to compare if you did want to add diacritical characters.

Applescript:

property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}
property sentence_terminators : ".!?"

set this_string to "this TEXT will be RETURNED with CHARACTERS CAPITALISED as SPECIFIED. any OTHER CHARACTERS will be LOWER CASE. Text containing punctuation: x-ray, don't. This sentence contains a real number 5.3 with text following."

get change_case(this_string, "Sentence")

on change_case(this_text, this_case)
   if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
       error "Error: Case must be UPPER, lower, Title or Sentence"
   end if
   set new_text to {}
   set use_capital to this_case is not "lower"
   if class of this_text is Unicode text then
       set case_alphabet to lower_alphabet & upper_alphabet as Unicode text
   else if class of this_text is string then
       set case_alphabet to lower_alphabet & upper_alphabet as string
   else
       display dialog "OH NO! WE'RE ALL GOING TO DIE!" buttons {"AAAGGHHH!"} default button 1 with icon caution
       error number -128
   end if
   repeat with i from 1 to count of this_text
       set this_char to character i of this_text
       considering case -- for speed and to customise 'offset' in Tiger
           set this_offset to offset of this_char in case_alphabet
       end considering
       if this_offset is not 0 then
           if use_capital then
               set end of new_text to character ((this_offset - 1) mod 26 + 27) of case_alphabet
               set use_capital to this_case is "UPPER"
           else
               set end of new_text to character ((this_offset - 1) mod 26 + 1) of case_alphabet
           end if
       else
           if (this_case is "Title" and this_char is in white_space) or (this_case is "Sentence" and this_char is in sentence_terminators and ¬
               i is not (count of this_text) and character (i + 1) of this_text is in white_space) then
               set use_capital to true
           end if
           set end of new_text to this_char
       end if
   end repeat
   
   set astid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to ""
   tell new_text to set new_text to its beginning & its rest
   set AppleScript's text item delimiters to astid
   return new_text
end change_case

Unfortunately kai, this script does handle spaces at the beginning and multiples after periods. If this script was written in say, C or Obejctive-C, wouldn't it be as fast as yours?

Offline

 

#20 2005-07-11 12:55:13 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

Using my rewrite of kai's, I can use this code to generate the input string and it still does not crash for me (using his reply to me as the source text wink). I don't know what the problem is?

Applescript:

set this_string to ""
repeat until (count of this_string) > 500000
   set this_string to this_string & "It may be worth noting that, on long strings, Qwerty's rewrite will break due to a stack overflow error. OMM, it tripped over a string (based largely on those tested earlier) of 57,586 characters. (As some of you will know, the overflow threshold can vary considerably, depending on string content - and possibly on OS, too. In the most extreme cases, it might even be as low as 4000 - 5000 characters.) In addition, a few sentence case issues still remain (to which I may now have a solution, of sorts). These include white space at the beginning of a string and multiple white spaces following a full point.

However, I'm slightly concerned that this thread is kinda 'growing like topsy' for a Code Exchange item. Since I feel somewhat responsible for much of the noise, now might be an appropriate time for me to bow out (to attend, anyway, to a rather pressing local issue) - and to leave the conclusion of this lively discussion to you good guys. I look forward to seeing the results with great interest. "

end repeat

Offline

 

#21 2005-07-11 04:47:15 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5512

Re: Change Text Case

Hi, Qwerty. Well spotted about the decimal point bug in my rewrite!

I haven't had time this weekend to study your version of kai's script, but I can confirm that both it and your latest version of your own seem to work well in both Jaguar and Tiger.

I see that in your rewrite of my rewrite, you've abandoned the "pre-tested this_case" optimisation. It might not make any noticeable difference here - unless this_string is very long - but as a general rule, it's best not to test things inside a repeat when this can be done beforehand and the results aren't going to change during the repeat.

For instance, the test 'this_case is "Title"' compares the individual characters of this_case with those of "Title". We need a case-insensitive comparison here, which takes longer because allowances are made for the fact that individual characters might not be exactly the same. The comparison produces a 'true' or 'false' result, which becomes the parameter for the 'if' statement.

Inside the repeat, this test process (and possibly the one for "Sentence"), or else the one for "UPPER", is performed with every character of this_string.

My version of the script tests each possible value of this_case just once, before the repeat, and simply feeds in the appropriate 'trues' and 'falses' during the repeat. This has an additional advantage in that it's easier to arrange for punctuation and white space to be tested in the faster 'considering case' mode.

You could do a similar, once-only thing with 'count of this_text'.

Apologies if you were already familiar with these concepts.  smile


NG

Online

 

#22 2005-07-11 05:05:41 am

Qwerty Denzel
Member
Registered: 2005-06-11
Posts: 337

Re: Change Text Case

No, I'm not really familiar with this. It seems so logical! Anyway, my rewrite of kai's cannot handle Unicode text, and so is inferior. We also need to watch out for things like 'I', in sentence case it is not capitalized.

Offline

 

#23 2005-07-11 07:25:56 am

Adam Bell
Administrator
From:: Nova Scotia, Canada
Registered: 2005-10-04
Posts: 4666

Re: Change Text Case

Qwerty Denzel wrote:

However, I'm slightly concerned that this thread is kinda 'growing like topsy' for a Code Exchange item. Since I feel somewhat responsible for much of the noise, now might be an appropriate time for me to bow out (to attend, anyway, to a rather pressing local issue) - and to leave the conclusion of this lively discussion to you good guys. I look forward to seeing the results with great interest.


True the thread is growing like crazy, but think of the really valuable education embedded in this thread for a new scripter. It really does flesh out all the issues involved in what, at first anyway, seems like a straight-forward process. Perhaps when the experts agree that they've reached a "golden master" version, then that one would be flagged as "THE" version and the rest would be preserved as discussion.


Mac mini running 10.14.6, 2011 27" iMac as display.

Offline

 

#24 2005-07-23 11:08:38 pm

kai
Member
From:: Brighton, UK
Registered: 2005-05-28
Posts: 912

Re: Change Text Case

NovaScotian wrote:

True the thread is growing like crazy, but think of the really valuable education embedded in this thread for a new scripter. It really does flesh out all the issues involved in what, at first anyway, seems like a straight-forward process.


Apologies for the delayed response, NovaScotian. I believe I wrote the comments that prompted your reply - but I take your point about the possible usefulness of such discussions. (Indeed, I've found the discussion of differences in script behaviour between OS versions quite illuminating.) smile

I’ve therefore added a further version (below) that sacrifices some speed & brevity to address several of the issues discussed previously. In exploring these, it's clear that a comprehensive solution might be possible only with access to a substantial dictionary (proper nouns, acronyms, etc.) - which is perhaps beyond the scope of a relatively simple script such as this. (The script introduces various properties, partly to accelerate runtime execution, but also to accommodate any further adjustments that might be considered necessary.)

Nigel Garvey wrote:

Gosh! Nothing like a bit of confusion to start the day. wink If you and kai are both using Panther, I've no idea why your script works for you and not for him.


I might be able to clear up the confusion there, Nigel. At the time, I was working in Jaguar (not Panther) - which would explain the differences in behaviour. (However, for the record, the script below was written and tested in Tiger).

Qwerty Denzil wrote:

Using my rewrite of kai's, I can use this code to generate the input string and it still does not crash for me (using his reply to me as the source text smile). I don't know what the problem is?


The problem to which I was referring is a stack overflow error (errOSAStackOverflow: -2706), rather than a crash. In older versions of the Mac OS (including some versions of Mac OS X), the error can occur when the resulting number of string elements (text items, characters [items], words or paragraphs) exceeds about 4,060 (the precise figure can vary).

So, apart from getting a list of characters, the problem has very little to do with actual string length - since it depends primarily on the number of resulting string elements.

While such considerations may be of little concern to those using later versions of the Mac OS, they may still be worth noting for anyone interested in portability (for example, if a script is to be distributed generally - such as in a forum like this).

In later versions, the limit appears to have been removed. However, the algorithm to achieve this seems somewhat buggy - and, where there may be several thousand string elements involved, the dreaded, ever-spinning beach ball can appear. (The point at which this occurs appears to vary quite considerably, so it’s difficult to pin it down with any precision. OMM, it’s very likely to occur above 300,000 or 400,000 items - but has sometimes struck at around 60,000 items.)

Silent hanging aside, there also appear to be efficiency issues when getting particularly long lists of string elements - which may therefore take a disproportionate time to evaluate.

I've now modified my original ‘textItems’ handler (see script below) in an effort to side-step all three issues. So far, it seems to have worked quite effectively - as demonstrated by the following results (all the usual caveats for interpreting execution times apply):

                execution time (secs)
number of        with    without
text items    handler    handler

    1000        0.01          0.01

  10000        0.2            0.5
  20000        0.4            3.8
  30000        0.7            7.8
  40000        1.1          11.7
  50000        1.4          25.1
  60000        1.9          40.8
  70000        2.0          42.7
  80000        2.7          68.5
  90000        3.6        127.8
100000        4.7        138.2

The following version introduces an additional option for case type: "mixed" - a variation of title-case that renders definite and indefinite articles, conjunctions and prepositions as lowercase (except where they start a sentence):

Applescript:

-- syntax : changeCase of someText to caseType
-- someText (string) : plain or encoded text
-- caseType (string) : the type of case required ("upper", "lower", "sentence", "title" or "mixed")

-- "upper" : all uppercase text (no exceptions)
-- "lower" : all lowercase text (no exceptions)
-- "sentence" : uppercase character at start of each sentence, other characters lowercase (apart from words in sentenceModList)
-- "title" : uppercase character at start of each word, other characters lowercase (no exceptions)
-- "mixed" : similar to title, except for definite and indefinite articles, conjunctions and prepositions (see mixedModList) that don't start a sentence

property lowerStr : "abcdefghijklmnopqrstuvwxyzáàâäãåæçéèêëíìîïñóòôöõøœúùûüÿ"
property upperStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZÁÀÂÄÃÅÆÇÉÈÊËÍÌÎÏÑÓÒÔÖÕØŒÚÙÛÜŸ"
property alphaList : lowerStr's characters & reverse of upperStr's characters
property sentenceBreak : {".", "!", "?"}
property wordBreak : {space, ASCII character 202, tab}
property everyBreak : wordBreak & sentenceBreak
property whiteSpace : wordBreak & {return, ASCII character 10}
property currList : missing value
property sentenceModList : {"i", "i'm", "i’m", "i've", "i’ve", "I’ve", "I've", "I’m", "I'm", "I"} (* could be extended to include certain proper nouns, acronyms, etc. *)
property mixedModList : {"By Means Of", "In Front Of", "In Order That", "On Account Of", "Whether Or Not", "According To", "As To", "Aside From", "Because Of", "Even If", "Even Though", "In Case", "Inside Of", "Now That", "Only If", "Out Of", "Owing To", "Prior To", "Subsequent To", "A", "About", "Above", "Across", "After", "Against", "Along", "Although", "Among", "An", "And", "Around", "As", "At", "Because", "Before", "Behind", "Below", "Beneath", "Beside", "Between", "Beyond", "But", "By", "De", "Down", "During", "Except", "For", "From", "If", "In", "Inside", "Into", "Like", "Near", "Of", "Off", "On", "Onto", "Or", "Out", "Outside", "Over", "Past", "Since", "So", "The", "Though", "Through", "Throughout", "To", "Under", "Unless", "Until", "Up", "Upon", "When", "Whereas", "While", "With", "Within", "Without", "Ye", "ye", "without", "within", "with", "while", "whereas", "when", "upon", "up", "until", "unless", "under", "to", "throughout", "through", "though", "the", "so", "since", "past", "over", "outside", "out", "or", "onto", "on", "off", "of", "near", "like", "into", "inside", "in", "if", "from", "for", "except", "during", "down", "de", "by", "but", "beyond", "between", "beside", "beneath", "below", "behind", "before", "because", "at", "as", "around", "and", "an", "among", "although", "along", "against", "after", "across", "above", "about", "a", "subsequent to", "prior to", "owing to", "out of", "only if", "now that", "inside of", "in case", "even though", "even if", "because of", "aside from", "as to", "according to", "whether or not", "on account of", "in order that", "in front of", "by means of"}

on textItems from currTxt
   tell (count currTxt's text items) to if it > 4000 then tell it div 2 to return ¬
       my (textItems from (currTxt's text 1 thru text item it)) & ¬
       my (textItems from (currTxt's text from text item (it + 1) to -1))
   currTxt's text items
end textItems

on initialCap(currTxt)
   tell currTxt to if (count words) > 0 then tell word 1's character 1 to if it is in lowerStr then
       set AppleScript's text item delimiters to it
       tell my (textItems from currTxt) to return beginning & upperStr's character ((count lowerStr's text item 1) + 1) & rest
   end if
   currTxt
end initialCap

to capItems from currTxt against breakList
   repeat with currBreak in breakList
       set text item delimiters to currBreak
       if (count currTxt's text items) > 1 then
           set currList to my (textItems from currTxt)
           repeat with n from 2 to count currList
               set my currList's item n to initialCap(my currList's item n)
           end repeat
           set text item delimiters to currBreak's contents
           tell my currList to set currTxt to beginning & ({""} & rest)
       end if
   end repeat
   currTxt
end capItems

on modItems from currTxt against modList
   set currList to modList
   set currCount to (count modList) div 2
   repeat with currBreak in everyBreak
       set text item delimiters to currBreak
       if (count currTxt's text items) > 1 then repeat with n from 1 to currCount
           set text item delimiters to my currList's item n & currBreak
           if (count currTxt's text items) > 1 then
               set currTxt to textItems from currTxt
               set text item delimiters to my currList's item -n & currBreak
               tell currTxt to set currTxt to beginning & ({""} & rest)
           end if
       end repeat
   end repeat
   currTxt
end modItems

to changeCase of currTxt to caseType
   if (count currTxt's words) is 0 then return currTxt
   
   ignoring case
       tell caseType to set {upper_Case, lower_Case, sentence_Case, title_Case, mixed_Case} to {it is "upper", it is "lower", it is "sentence", it is "title", it is "mixed"}
   end ignoring
   
   if not (upper_Case or lower_Case or title_Case or sentence_Case or mixed_Case) then
       error "The term \"" & caseType & "\" is not a valid case type option. Please use \"upper\", \"lower\", \"sentence\", \"title\" or \"mixed\"."
   else if upper_Case then
       set n to 1
   else
       set n to -1
   end if
   
   considering case
       set tid to text item delimiters
       
       repeat with n from n to n * (count lowerStr) by n
           set text item delimiters to my alphaList's item n
           set currTxt to textItems from currTxt
           set text item delimiters to my alphaList's item -n
           tell currTxt to set currTxt to beginning & ({""} & rest)
       end repeat
       
       if sentence_Case then
           set currTxt to initialCap(modItems from (capItems from currTxt against sentenceBreak) against sentenceModList)
       else if title_Case or mixed_Case then
           set currTxt to initialCap(capItems from currTxt against whiteSpace)
           if mixed_Case then set currTxt to initialCap(capItems from (modItems from currTxt against mixedModList) against sentenceBreak)
       end if
       
       set text item delimiters to tid
   end considering
   currTxt
end changeCase

set someText to "How far you go in life depends on your being TENDER with the YOUNG, COMPASSIONATE with the AGED, SYMPATHETIC with the STRIVING and TOLERANT of the WEAK and STRONG. Because SOMEDAY in your life you will have been ALL of these." (* George Washington Carver. *)

changeCase of someText to "upper" (* "upper", "lower", "sentence", "title" or "mixed" *)

Script subsequently edited to insert underscore characters in certain variable labels (see discussion below)

Results:

upper:
HOW FAR YOU GO IN LIFE DEPENDS ON YOUR BEING TENDER WITH THE YOUNG, COMPASSIONATE WITH THE AGED, SYMPATHETIC WITH THE STRIVING AND TOLERANT OF THE WEAK AND STRONG. BECAUSE SOMEDAY IN YOUR LIFE YOU WILL HAVE BEEN ALL OF THESE.

lower:
how far you go in life depends on your being tender with the young, compassionate with the aged, sympathetic with the striving and tolerant of the weak and strong. because someday in your life you will have been all of these.

sentence:
How far you go in life depends on your being tender with the young, compassionate with the aged, sympathetic with the striving and tolerant of the weak and strong. Because someday in your life you will have been all of these.

title:
How Far You Go In Life Depends On Your Being Tender With The Young, Compassionate With The Aged, Sympathetic With The Striving And Tolerant Of The Weak And Strong. Because Someday In Your Life You Will Have Been All Of These.

mixed:
How Far You Go in Life Depends on Your Being Tender with the Young, Compassionate with the Aged, Sympathetic with the Striving and Tolerant of the Weak and Strong. Because Someday in Your Life You Will Have Been All of These.


Apologies for the length of all this... smile

Last edited by kai (2005-07-24 11:02:16 am)


kai

Offline

 

#25 2005-07-24 07:34:35 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 5512

Re: Change Text Case

Hi, Kai.

Thanks for your latest contribution to this thread. The script's pretty remarkable as its "sentence" mode even capitalises correctly after brackets and quotes, which don't (at first sight) seem to have been explicitly catered for! Maybe I'll see how it works when I've had more time to study it in detail.  smile

After you signed off last time, I worked out a script — less thorough than yours — that explicitly treated quotes and brackets as "whitish" space, but I didn't post it because I began to feel that "sentence" mode itself was a mistake — at least in the context of discussing techniques on this forum.

"Lower", "upper", and "title" modes are easy to implement and have already been adequately covered. Philosophically, they do explicit and grammatically irrelevant things to the text.

The only real use for "sentence" mode, though, is as a tidier-upper of bad typing. As you've already noted, it's far more complex to implement and involves context. The script needs to be versed in the proper nouns and acronyms of the language of the text. It also needs a thorough knowledge of that language's other grammatical features, many of which can be very difficult to handle. For instance, quoted speech in English:

"Look out!" he yelled.

She turned to Charles and asked, "How many sugars do you take?"


Or mixed contexts:

... and I realise that i is a reference in this repeat. (It's also a Welsh word for "to" or "for" and is the inclusive "and" in Polish.)


Unless "sentence" mode is given some specific, narrowly-defined purpose, it might be best to leave it to a fully-fledged application or to a 'has'-sized library.

By the way, there's a potential problem with your 'uppercase' and 'lowercase' variables. These words are commands in the Satimage OSAX. I wouldn't care to tell you what they do....  wink


NG

Online

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)