(InDesign) Find bad hyphens - Working but could use some optimization

Hi everyone,

I now have a properly working code to find “bad” hyphens in InDesign CS3, should work as is in CS2 by changing the app (double hyphens and hyphens in words containing upper case letter).

The script works, but of course it’s pretty slow, as I’m sure many things I’ve done can be done better (10 minutes to go through a 360 pages document).

So, I’d be greatful if anyone could give me hints on what to optimize.

Among the things I guess could be done better :

  • checking the presence of a letter in a string (checkcaps handler)
  • detection of double hyphens directly one after the other : I convert the word to ascii to check the presenc of two ascii 45 in a row. The unicode ref for the soft hyphen seems to be 2d00, but i have no idea how to check for it in a string…

Fabrice

Here’s the code :


tell application "Adobe InDesign CS3"
	set storynumber to count document 1 each story --number of stories in current document
	set theList to {}
	set lineNumber to {}
	repeat with m from 1 to storynumber
		set end of lineNumber to count story m of document 1 each line --for every story, add "number of lines in story" to the list linenumber
	end repeat
	tell document 1
		repeat with n from 1 to storynumber -- for every story in the document
			tell story n -- talk to selected story
				repeat with i from 1 to item n of lineNumber --for every line in the current story
					if length of line i is greater than 1 then --check that line is not empty
						tell line i
							set lastword to last word
							set asciicodes to my converttoascii(last word) --converts the last word to a list of its characters' ascii numbers 
							set verifBaseline to ((baseline of first character of last word) is not (baseline of last character of last word)) --if the baseline of the first character of the last word of the current line is different from the baseline of its last character then verifBaseline is true
							set verifHyphen to (characters of lastword contains "-") and (asciicodes does not contain {45, 45}) --if the last word of the current line contains a hyphen and the word is not rightfully hyphenated between words then verifHyphen becomes true
							set verifCaps to my checkcaps(lastword) -- verifCaps is true if the last word contains an uppercase letter
							if ((verifBaseline and verifHyphen) or (verifBaseline and verifCaps)) then --if either (the word contains an Hyphen and is on two lines) OR (it has an upper case letter and is on two lines)
								set selection of application "Adobe InDesign CS3" to last word
								select (last word)
								tell application "Adobe InDesign CS3"
									set thePage to (parent of parent text frames of selection) --move the view to the page containg the word to be corrected
									set active page of layout window 1 to thePage
									tell layout window 1
										set zoom percentage to 100
									end tell
								end tell
								set choix to ""
								repeat until choix is "Suivant" --user can unhyphen the word or move to the next or stop the script
									if hyphenation of last word is true then --if word is hyphenated, allow user to unhyphen it
										display dialog "Que faire avec " & lastword buttons {"Dé-césurer", "Suivant", "Stop"}
										set choix to button returned of result
										if choix is "Dé-césurer" then
											set hyphenation of last word to false
										end if
										if choix is "Stop" then
											return
										end if
									end if
									if hyphenation of last word is false then --if word is not hyphenated, allow user to rehyphen it
										display dialog "Que faire avec " & lastword buttons {"Re-césurer", "Suivant", "Stop"}
										set choix to button returned of result
										if choix is "Re-césurer" then
											set hyphenation of last word to true
										end if
										if choix is "Stop" then
											return
										end if
									end if
								end repeat --move to next word
								set end of theList to ({lastword, name of thePage}) --record the word and its initial page number in theList
							end if
						end tell
					end if
				end repeat
			end tell
		end repeat
	end tell
	set AppleScript's text item delimiters to (",")
	set theListText to theList as string
	display alert theListText
end tell
to checkcaps(passedword) --checks if each character of the passed word contains a letter from the capitalletters list 
	local capitalized, capitalletters
	set capitalized to false as boolean
	set capitalletters to {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", ASCII character 131, ASCII character 174, ASCII character 128, ASCII character 129, ASCII character 130, ASCII character 132, ASCII character 133, ASCII character 134, ASCII character 203, ASCII character 204, ASCII character 205, ASCII character 206, ASCII character 229, ASCII character 230, ASCII character 231, ASCII character 232, ASCII character 233, ASCII character 234, ASCII character 235, ASCII character 236, ASCII character 237, ASCII character 238, ASCII character 239, ASCII character 241, ASCII character 242, ASCII character 243, ASCII character 244} as list
	repeat with zz from 1 to length of passedword
		considering case --required to force script to differentiate lower case from higher case
			if capitalized = false then
				set capitalized to (item zz of characters of passedword is in capitalletters) --becomes true when a capital letter is found in the passedword
			end if
		end considering
	end repeat
	return capitalized
end checkcaps
to converttoascii(passedword) --converts each character of the passed word to its ascii number and collects them in asciilist, which is returned
	set asciilist to {} as list
	repeat with zz from 1 to length of passedword
		set end of asciilist to ASCII number (item zz of characters of passedword)
	end repeat
	return asciilist
end converttoascii

Hi Fabrice,

a few suggestions:

¢ omit all tell application "Adobe InDesign CS3 (also last word of application …) statments except the main one at the beginning
¢ it saves time, if you use one if - else - end if form instead of multiple if - end if statements, when you check the same condition status
¢ Your Recésurer / Décésurer part can be simplified with this, it just toggles the boolean status until the “Suivant” button is pressed
¢ (Re)set AppleScript text item delimiters before the display alert line to default {“”} to avoid unexpected behavior


...
if ((verifBaseline and verifHyphen) or (verifBaseline and verifCaps)) then --if either (the word contains an Hyphen and is on two lines) OR (it has an upper case letter and is on two lines)
	set selection to last word
	set thePage to (parent of parent text frames of selection) --move the view to the page containg the word to be corrected
	set active page of layout window 1 to thePage
	tell layout window 1
		set zoom percentage to 100
	end tell
	set choix to ""
	repeat until choix is "Suivant"
		set H to hyphenation of last word --if word is hyphenated, allow user to unhyphen it
		display dialog "Que faire avec " & lastword buttons {item ((H as integer) + 1) of {"Re-césurer", "Dé-césurer"}, "Suivant", "Stop"}
		set choix to button returned of result
		if choix contains "césurer" then
			set hyphenation of last word to not H
		else if choix is "Stop" then
			return
		end if
	end repeat
	set end of theList to ({lastword, name of thePage}) --record the word and its initial page number in theList
end if
...

I may be misunderstanding the point of it, but I’m not sure your hyphen check is a necessity. Are you aware that the built-in option to limit a word’s hyphens exists in ID’s hyphenation controls?

:slight_smile:

Yes I am, though :

  1. It will soft hyphenate a word containing a hyphen (auto-completion for example maybe hyphenated to auto-comple-tion)
  2. It will hyphenate a word containing an upper letter that is not its initial letter (l’Automobile will be hyphenated even though the option to not hyphenate capitalized words is on)

In French, we prefer to avoid both.

Fabrice

Hey StefanK,

Thanks a lot for the tips.

As I understand it, the fact that I am within a Tell line block requires me to include the tell application (for example, in your new code sample below, I get an error if I don’t reinstate a tell application…

I’ll rebuild the if blocks to if then else

Indeed your code for the dialog is much cleaner

And I will reset the delimiters, I didn’t think of that…

Thanks a lot

Fabrice

Just FYI:
1.) The default behaves like that, but setting the Hyphen Limit preference to 1 will disallow further (auto) hyphenation of compound words.
2.) Depending on how many instances of the case differences you are dealing with, it would likely be faster to add a discretionary hyphen (“^-” as string) as the first character; this will prevent these words from breaking.

Hi Marc Anthony,

  1. Hyphen Limit : Hyphen Limit limits the number of consecutive LINES that can end in a soft hyphen, it has no effect on the number of hyphens within a word.
  2. Only a couple in 300-400 pages books usually, but the challenge is to find them :slight_smile: Once they are found, is there an advantage to including a discretionary hyphen to a word instead of setting its hyphenate property to false ?

Fabrice

Hello again Fabrice,
1.) This is my mistake and I apologize for passing bogus info. I had tested this out on a word and it worked for me, but it appears that it was a combination of that word’s particular length, my hyphenation zone and accompanying rules - not the hyphen limit itself.
2.) Yes. Discretionary hyphens could be helpful for the French words with latent caps. Unless there has been a change in CS3 (I don’t have that version to check), hyphenation is a property of a paragraph, not a word; I suspect that when you’re instructing the last word to not hyphenate, that you are affecting the whole paragraph. The advantage to using discretionary hyphens is that they allow you to set custom break points, or deny them altogether on a per-word basis (with the caveat that, if you use them in conjunction with a standard hyphen (compound word), you can still set the desired break, but you lose the ability to deny further autohyphenation).

Ouch, that’s very valuable info Marc Anthony, I must say I didn’t bother to check if hyphenation was a word or paragraph property… So I guess I’ll resort to your method, thanks a lot !

I can’t seem to be able to figure out how to add a discretionary hypen.

Neither “^-” as string, nor ascii character 173 work… Any idea ?

Fabrice

It’s might be easier to just add it from the menu to initially get this character; go to Type>Insert Special Character>Discretionary Hyphen. You can then do a find/replace.

I’m afraid it doesn’t work either… Strangely, adding “<00AD>” to last word works where adding it to a variable set to last word doesn’t either.

I’m not in love with the way InDesign/AS handle those special characters :frowning:

Fabrice

Let’s say the specific text, l’Automobile, is a problem for you. Firstly, tell the application to:

set find preferences to nothing
set change preferences to nothing

then tell the active doc to:

search for “l’Automobile” replacing with “^-l’Automobile”

That word is now an exception to autohyphenation.

Ok so let me just chime in here for a minute, what you are all working on here is really cool :cool:

what I am looking for is that I have one word that contains a hyphen and will often get hyphenated

so what would be the best way to automation the correction of this problem

example:

“t–
strom”

mm