text item delimiters out of memory error

Hi dear scripting friends

Today I experienced a real strange ouf memory error while using text item delimiters. The script contains a very well known and common search and replace handler:

on SaR(_atext, _search, _replace)
	set oldastid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to _search
	set _atext to every text item of _atext
	set AppleScript's text item delimiters to _replace
	set _atext to _atext as string
	set AppleScript's text item delimiters to oldastid
	return _atext
end SaR

It reads a text file exported as tab delimited text from a database, converts its text items to a list and crawls through the list items in order to perform some search and replaces. It works well with some export files but on one an out of memory error occurs.

All text files have the same format but differ in the count of lines. The script runs without producing errors on export files containing more than 60 lines but on some files containing less than 60 (or more) lines, the out of memory error occurs after the 36th line for example.

ScriptDebugger 4 shows me the error occurs within the SaR() handler. I am not able to find out by what this is caused. What I detected was a var-value showed up as “MEV…8230” in the SD result window but in the original export file there has only “MEV8230” been written.

Might this be related to diacritical marks that will not show up in BBEdit? The export files came from a Windows system.

The SaR handler converts spaces into “;”, “,” or the other way round. There are a lot of SaR() calls in the script but all work except in a special situation, unknown by now.

I checked not to use the same varnames in the handler than in the script’s main body. But this made no difference.

Example code excerpt:


	
		set _stichw_o to stichwoerter of item _liste of _rec
		set _stichw to _SaR(_SaR(_ums(_stichw_o), ", ", " "), ",, ", ", ")
		set _temp to _SaR(_stichw_o & " " & _stichw, " ", ", ")
		set _temp to _SaR(_temp, ", ,", ",")
		set _temp to _SaR(_temp, ",", ";")
		
		set AppleScript's text item delimiters to "; "
		set _temp to text items of _temp
		set AppleScript's text item delimiters to ""
		
		set _temp to _remove_duplicates(_temp)
		set _temp to words of _temp
	
		set _tt to ""
		repeat with _word in _temp
			set _tt to _tt & (_word & "; ") as string
		end repeat
		--		set _tt to _tt as string
		set _temp to (characters 1 through -3 of _tt) as string
		set stichwoerter of item _liste of _rec to _temp
		set _new_dat to _new_dat & {item _liste of _rec}
		

There is no difference running the script on Tiger nor Leopard. The out of memory error occurs from both in the same context.

The text processed follows this format:

I cannot understand why it sometimes breaks but with other export files of the same format it does not?! Chars like “°” have been converted into properly chars before processing the list, btw. (see below)

The var _atext is used in the handler only and processes text items in a list stored in a var called _text in the main part of the script.

Really strange. Any ideas?

Best regards,
Thomas

(Chars replaced:)

property _suchumlaute : {"°", "¸", "ˆ", "·", "Æ’", "÷", "¹"}
property _umlaute : {"ä", "ü", "ö", "ß", "Ä", "Ö", "Ü"}

To search the wrong doer, I would insert some log instructions


set _stichw_o to stichwoerter of item _liste of _rec
log "point 1"
set _stichw to _SaR(_SaR(_ums(_stichw_o), ", ", " "), ",, ", ", ")
(* Where is the handler _ums() *)
log "point 2"
set _temp to _SaR(_stichw_o & " " & _stichw, " ", ", ")
log "point 3"
set _temp to _SaR(_temp, ", ,", ",")
log "point 4"
set _temp to _SaR(_temp, ",", ";")
log "point 5"

set AppleScript's text item delimiters to "; "
set _temp to text items of _temp
set AppleScript's text item delimiters to ""
log "point 6"

set _temp to _remove_duplicates(_temp)
(* Where is the handler _remove_duplicates() *)
log "point 6"
set _temp to words of _temp
log "point 7"

set _tt to ""
repeat with _word in _temp
	set _tt to _tt & (_word & "; ") as string
	log "boucle >> " & _word
end repeat
--		set _tt to _tt as string
set _temp to (characters 1 through -3 of _tt) as string
log "point 8"
set stichwoerter of item _liste of _rec to _temp
log "point 9"
set _new_dat to _new_dat & {item _liste of _rec}
log "point 10"

on _SaR(_atext, _search, _replace)
	local _
	set AppleScript's text item delimiters to _search
	set _ to every text item of _atext
	set AppleScript's text item delimiters to _replace
	set _ to _ as string
	set AppleScript's text item delimiters to ""
	return _
end _SaR

Looking in the event log report would give the ability to discover which instruction is the source of the error message.
It may be different to the true culprit but often it helps to identify it.

Yvan KOENIG (from FRANCE samedi 7 juin 2008 17:08:02)

Thanks Yvan

I checked the log but could not find any entry that gave me a hint what causes the error. The handlers not posted work fine in the former release 0.4.8. I added some SaR-calls in 4.8.9 that caused the out of memory error. I do not see that one of the handlers used in 0.4.8 worked wrong or was faulty.

I am thinking about parsing the input files first for checking what ASCII numbers are within. Maybe that way I will find out, why the error occurs.

I am thinking that the issue is related by some kind of a control character that is in the input file. It’s not quite sure, that the export file comes from the same PC or the same user. There might be a difference in that.

I am thinking about something like that: If you habe an Excel-table, e.g., containing columns like A,B,C and then in the layout, suppress B by moving C nearby A, and leave the column B unformatted, what does Excel export for column B when the table is exported as a tab text?

IMHO it’s not a problem related by the script itself but by the format of the input file. What I wanted to know is if someone experienced the same behavior on processing tab-delimited text files from a PC’s database export with AppleScript because there are export files that are processed properly but some others are causing the error.

I know, that it is very difficult to find out the cause. That’s why I ask you and all in the forum. All tests and tries I made didn’t gave me a hint 'til now.

Thomas

Hi, Thomas.

I can’t tell from your post what’s causing the problem. It’s odd that you should get “out of memory” errors with shorter files rather than with longer ones! Maybe there’s something odd about your input file (as I see you’ve already conjectured) or a reference that isn’t always being properly resolved.

A part of your sample code that definitely uses much more memory than it needs to (though I don’t think it’s enough to cause your problem) is this:

The repeat performs successive string concatenations, whose progressively longer results sit alongside each other in memory until such time as the system’s memory manager releases the memory holding the ones that no longer have variables pointing to them. So, if the original value of _temp is, say, “This is some text”, all of the following values will exist somewhere in memory by the end of the repeat and will stay there until the memory holding the “orphaned” ones is reassigned:

“This is some text”
{“This”, “is”, “some”, “text”}
“”
"; "
"This; "
"This; "
"; "
"is; "
"This; is; "
"; "
"some; "
"This; is; some; "
"; "
"text; "
"This; is; some; text; "

After the repeat, you’ve used the ‘(characters . ) as string’ construction to drop the last "; ". You’ve previously set AppleScript’s text item delimiters to “”, so the following two values are also created in memory:

{“T”, “h”, “i”, “s”, “;”, " ", “i”, “s”, “;”, " ", “s”, “o”, “m”, “e”, “;”, " ", “t”, “e”, “x”, “t”}
“This; is; some; text”

Obviously, the more words there are in the original string and the longer it is, the more and longer will be the intermediate values generated before the final result’s reached.

The ‘characters’ list can be avoided by using ‘text’ instead of ‘(characters . ) as string’:

set _temp to text 1 through -3 of _tt

But even that isn’t necessary, since the whole process so far can be replaced with a simple coercion of the list of words directly to string, using "; " as the delimiter:

set _temp to words of _temp
  
set AppleScript's text item delimiters to "; "
set _temp to _temp as string

The equivalent items left in memory are then just:

“This is some text”
{“This”, “is”, “some”, “text”}
"; "
“This; is; some; text”

I presume that your variable _new_dat holds a list of records. Concatenating {item _liste of _rec} to it results in a new list that has one more item than the original. If instead you use:

set end of _new_dat to item _liste of _rec

. the original list is simply made one item longer. (Technically, a new list may be created if the current one outgrows its memory allocation, but officially it’s the same list.)

Hi Nigel

You are a hero! I am very sure that your explanation exactly triggers the issue! I will rewrite the script today by your advices and keep things in mind for the future! I had never been happy with (characters xy through z) as string as it broke values often or sometimes. It’s really great to know how AppleScript behaves different in dealing with memory internally by using different phrases.

If all works well, what I assume, I will buy you a mug full of kudos! :cool:

Best regards,
Thomas

Hi Nigel

Your hints helped to go around the “out of memory error”. Thanks so far. But the script fails on processing a special german umlaut in a string as I can see in the log, it’s the “ü” in the word “Sehenswürdigkeiten”. It seems to be something concerning “Ãœ” and “ü” and might be related to upper- and lowercase letters.

I will follow this issue but if you have any hint (or someone else has) this would be very helpful seriously.

TIA!
Thomas