Repeat loop running slow in 10.5

GordonF · September 19, 2008, 4:10pm

Why would a Repeat loop built in 10.4.11 on a 4 year old PowerPC mac tower complete almost instantly, while the same loop, run in 10.5 on brand new intel machines take minutes to complete?

repeat while x is 1
		repeat with n from 1 to TotalParagraphs
			if second text item of paragraph n of NameList is FindName then
				
				(*do some things*)

				set x to 0
				
				exit repeat
				
				(*If the name is not found after searching all the paragraphs, Alert the user, and give them the option to quit the program, or retry typing in the name*)
				
			else if n is TotalParagraphs then
				display dialog ((FindName as string) & " Not Found!") buttons {"Re-Enter", "Cancel"} default button "Re-Enter"
				set SearchRetry to button returned of result
				if SearchRetry is "Re-Enter" then
					(*Prompt for the desired student / staff and store the answer*)
					display dialog "Enter full name:" default answer "Name Here"
					set FindName to text returned of result
					set x to 1
				end if
			end if
		end repeat
	end repeat

The script in it’s whole works, but for some reason it gets super bogged down at repeat loop. Once it finds the name, it burns through the rest of the script in seconds. But the repeat loop crawls. Did something change with repeat loops between 10.4.11 and 10.5?

Nigel_Garvey · September 19, 2008, 9:15pm

Unlikely. But the default text type changed: from string to Unicode text. Depending on how your NameList is derived, it might be that it’s a string in 10.4.11, in which case AppleScript will parse it more quickly than the Unicode text it definitely is in OS 10.5.

Without knowing the full details of what’s going on in the script, it’s difficult to make recommendations, but a couple of suggestions would be: 1) break NameList into a list of paragraphs and parse the ‘items’ of this list rather than the ‘paragraphs’ of the original; and 2) if the parsing doesn’t need to be case-insensitive, enclose the entire repeat in a ‘considering case’ block.

set theParagraphs to NameList's paragraphs

considering case
	repeat while x is 1
		repeat with n from 1 to TotalParagraphs
			if second text item of item n of theParagraphs is FindName then
				
				(*do some things*)
				
				-- Blah blah blah.
				
			end if
		end repeat
	end repeat
end considering

It’ll still be slower than in 10.4.11, but should be faster than what you’re getting in 10.5 at the moment. Otherwise, the devil may be in that (do some things)…

GordonF · September 21, 2008, 2:25pm

NameList is derived from a CSV file that I’m dumping to memory with the following:

	set fp to open for access NameListFile
	set NameList to read fp
	close access fp

NameListFile is the CSV file that has been created with Excel. Is it possible that something has changed with the way that 10.5 is reading the file?

Adam_Bell · September 21, 2008, 3:32pm

You don’t have to open for access to read a file – just read it.

porkozone · September 22, 2008, 3:42pm

I’ve noticed this behavior as well. Several areas of my script that utilize repeat loops take an agonizingly long time under 10.5 that are not an issue prior to that version of OS X.

Nigel_Garvey · September 24, 2008, 2:13pm

Hi, Gordon. Sorry for this late reply. My Internet machine’s non-functional at the moment, so I’ve had to join my local library and book a session on one of its Dells.

GordonF:

NameList is derived from a CSV file that I’m dumping to memory with the following:
	set fp to open for access NameListFile
	set NameList to read fp
	close access fp
NameListFile is the CSV file that has been created with Excel. Is it possible that something has changed with the way that 10.5 is reading the file?

Not with the way the file’s being read, but with what’s being returned to the script.

The AppleScript in OS 10.5 uses UTF16 Unicode text internally, whereas with OS 9.0 to 10.4, it used one-byte-per-character text (‘string’ or ‘text’) by default, or UTF16 Unicode (‘Unicode text’) if a coercion was done or if that’s what was returned by an application or OSAX. In AppleScript 2.0 (OS 10.5), ‘string’, ‘text’, and ‘Unicode text’ all mean UTF16 Unicode text.

The ‘read’ command still assumes, in the absence of an ‘as’ parameter, that what it’s reading is in the old ‘string’ format, but now returns it to the script as Unicode text instead of as string.

By default, the ‘write’ command still writes data to a file in the format in which it’s presented, so it now writes text to file as Unicode unless the ‘as string’ parameter’s used. Here, ‘string’ retains its old meaning and doesn’t mean UTF16 as it does in the core language.

Hope that makes sense.

GordonF · September 25, 2008, 6:08pm

So, since it’s being returned as Unicode, and not a string, it’s going to take longer?

And it doesn’t sound like there is a way around this, right? Is there another way to access the file that won’t bog down the script?

porkozone · October 7, 2008, 7:23pm

That’s what I don’t get either - how is a difference in text encoding causing such a slowdown, and is there a way to fix/avoid it?

Nigel_Garvey · October 8, 2008, 12:36am

Well I don’t know the intimate details of how text is compared in AppleScript. I can only surmise from general principles.

The main thing that would make comparing Unicode texts take longer than comparing strings is that, whereas each character in a string is a single byte, a UTF16 character might be two bytes or more. It probably doesn’t take any longer to compare two-byte values than it does one-byte values, but it must take longer to analyse each two-byte value first to see if it needs to be combined with more two-byte values to make a four- or six-byte character. The text handling routines don’t know what’s in the text until it’s passed to them, so they have to do this analysis even if you yourself know that the text only contains two-byte characters.

I’d guess that, internally, AppleScript doesn’t need to worry whether the UTF16’s big-endian or little-endian.

In post #2 above, I suggested a couple of possibilities for regaining some of the lost text-handling speed in GordonF’s script.

If you’re going to be examining every paragraph in the text, break the text into a list of paragraphs first. Referencing the paragraphs directly from the text requires AppleScript to work through from the beginning each time, identifying the paragraph endings, until it’s located the text between paragraph ending (n -1) and paragraph ending n. When building a list of paragraphs, it can simply go straight from the paragraph it’s at to the next. Locating each item in the list is (basically) just a matter of AppleScript doing some maths with the index and the list base address and doing a couple of pointer hops.
If you don’t need case-insensitivity, use ‘considering case’. When ignoring case (the default behaviour), AppleScript has to decide whether or not two characters which may be different are actually different cases of the same letter. When ignoring case, it can simply compare them directly.

With large texts, it may be possible to get faster results using shell script commands, but someone else would have to tell you how to use those. :rolleyes:

don.grinker · December 20, 2008, 7:36pm

I have noticed that a repeat gets slower the longer the repeat is.

For instance I have a script that copies one file at a time to a server from
a list of many files. After it gets to about 10 files in during the repeat, it
gets progressively slower. It has increased the total time of my process
by quadruple or more. In Tiger 10.4.11 it flew…

I hope there is an answer or solution soon.