How to speed up Applescript?

Thanks Adam

I’ll give it a bash tomorrow and let you know what happens.

Just out of curiousity, is the list not loaded into memory already by assigning it to a variable?

kind regards

EM

Without getting into gory details, its because a list or string in a script object exists as a separate entity and a much faster algorithm is used for finding its parts than is used when you ask for “item j of Blah” or compare as you are doing. It can be five or six times faster with long lists.

Well it certainly is faster. Thanks for the tip on Script objects Adam. I’ll be using it a lot more.

However, my original problem still remains. It still slows down halfway through. It gets to the point were it does only 2 loops per second. With almost 15 000 left to do you can see were this is going.

Any ideas?

Kind regards

EM

I would suggest running a sed or awk shell script from within AppleScript. (sed and awk are both pattern matching “programs”.) I know those are more root programs and have the impression that they’re faster. Now, that being said, I don’t know too much about them myself. However, there is a lot available with a little googleing

A sed script to copy every line that contained “aa” from file1 to file 2 would be:
(run in terminal)
sed ‘/aa/p’ file1.txt > file2.txt

(better sed scripters may correct me on this)

I expect that it’s slowing down because you’re using so much memory that it’s beginning to page it to disk. Disk swapping is extremely slow, so you’ll have to segment the task.

I agree with even_dana, I might right the data to a text file and use a grep shell script to check for the existence of the items in question.

Adam

What would be the best approach in doing this. I suspect you are right about the disk paging.

I have tried creating temp files with 1000 paths in each saved file as list. (A sort of cache if you wish) Then using another loop I am reading each files contents and in theory looping through only 1000 paths at a time, yet the main “meat” loop still runs slow.

Is there a way to free the memory after it’s used?

regards

EM

This is odd. The machine is not nearly out of memory. My Mac has 3.5 GB of RAM. When my app runs it uses about 20 MB of RAM and between 420MB and 440MB of virtual memory.

I also run a few other apps at the same time.

The machine always has 1.5 GB RAM free.

What gives?

regards
EM

Hi EM

… are you running Tiger or Leopard ?

AppleScript Release Notes for Mac OS X version 10.5
[u]http://developer.apple.com/releasenotes/AppleScript/RN-AppleScript/index.html[/u]

Bug Fixes
AppleScript
*The delay command uses less CPU. [3178086]
* Impossible object specifiers in math expressions, such as 1 + character 2 of “9”, produce an error instead of a random result. [4029175]
* The numerical limits for repeat loops accept real numbers; they will be rounded to integers. [4215670]
* Object specifiers other than application and date specifiers, in particular alias “…” and POSIX file “…”, are not evaluated at compile time, and will be left exactly as originally typed. [4444698]
→ * AppleScript no longer limits script memory usage to 32MB. [4511477]
* Counting the paragraphs of an empty string gives a result of zero. [4588706]
* Raw data literals («data …») are no longer limited to 127 bytes. [4986420]

Thanks for your reply clemhoff

I am on Tiger. If this is my problem, how can I clear the 32 MB to make space for the next round in the loop?

I tried to delete the variables (which I set at the start of the repeat loop) , right at the end of the repeat block just before it goes back for the next loop. Does it actually do anything when you delete the variable?

The slow down still happens though. I have abandoned the cache file idea because the effects are the same. As the loop continues it still slows down. I am starting to think that it definately has something to do with memory and as you discovered , it might be an applescript limitation.

There MUST be a way around this. I will try to optimize the “main” handler and then post it here later for others to view.

I am sure there are faster ways of doing what I am trying to achieve.

cheers

EM

OK here is what I have so far. Thanks to whoever sits through all this !

on Begin_CopySecure() --current handler
	set START_TIME to current date
	set END_TIME to "Not Set"
	ControlProgressIndeterminate(true)
	DisableCopySecureButton()
	set theGraphicOn to my Working_In_Graphic("Capture")
	set theState to my Check_CaptureLocation_BackupLocation()
	if theState is true then
		my Info_Display("Gathering files to process ...")
		delay 0.5
		set theCaptureLocationFileList to my Get_Location_Files(CaptureLocation)
		--set Files_Buildt to my BuildCacheFiles(theCaptureLocationFileList)
		my Info_Display("Gathering Verified files from Log ...")
		set theVerifiedList_in_Capture_LOG to Get_Verified_CaptureList_From_SQL()
		--set UnVerifiedList to {}
		set MaxCount to count items in theCaptureLocationFileList
		my SETControlProgress(0, MaxCount)
		--my Info_Display("Finding Un-Verified files...")
		set x to 0
		
		my Info_Display("Allocating memory...") ---remove after testing
		
		script Verified_Items
			property VerCapInLog : missing value
			property CapLocList : missing value
		end script
		
		my Info_Display("Setup Verified list...") ---remove after testing
		set Verified_Items's VerCapInLog to theVerifiedList_in_Capture_LOG
		my Info_Display("Setup Capture list...") ---remove after testing
		set Verified_Items's CapLocList to theCaptureLocationFileList
		
		my Info_Display("Start process...")
		
		set x to 0
		set MaxCount to count items in theCaptureLocationFileList
		my SETControlProgress(0, MaxCount)
		repeat with theFile in Verified_Items's CapLocList
			if ABort_Copy is true then
				--	set ABort_Copy to false
				exit repeat
			end if
			if theFile is in Verified_Items's VerCapInLog then
				my Info_Display("Verified: " & (theFile as string))
				
			else --not verified
				
				set theFile to theFile as alias
				set FileinCaptureLog to GetFileinCaptureLog(theFile)
				
				if FileinCaptureLog = "" then
					set theGraphicOn to my Working_In_Graphic("Capture")
					set CaptureFile_MD5_Value to my Get_MD5_CHECKSUM_VALUE(theFile)
					my SaveFilein_Capture_LOG(theFile, CaptureFile_MD5_Value)
					set FileinCaptureLog to GetFileinCaptureLog(theFile)
					--set FileinCaptureLog to Set_File_to_ONLINE(FileinCaptureLog)
				else
					--set FileinCaptureLog to Set_File_to_ONLINE(FileinCaptureLog)
				end if
				--set FileinCaptureLog to GetFileinCaptureLog(theFile)
				try
					set HasVerified to item 6 of FileinCaptureLog
				on error
					set HasVerified to "NO"
				end try
				if HasVerified = "YES" then
					my Info_Display("Verified: " & (theFile as string))
					--Scan next file
				else
					set LoopCount to 0
					repeat
						if ABort_Copy is true then
							--	set ABort_Copy to false
							exit repeat
						end if
						set LoopCount to LoopCount + 1
						set theGraphicOn to my Working_In_Graphic("Backup")
						set FileinBackupLog to GetFileinBackupLog(theFile)
						--display dialog "File in Backup LOG: " & FileinBackupLog
						if FileinBackupLog = "" then -- file is not in log
							--display dialog "file not in log look for in file system"
							set theFileinBackup_FileSystem to my GetFileinBackup_FileSystem(theFile)
							
							--display dialog "found " & (item 2 of theFileinBackup_FileSystem)
							if item 2 of theFileinBackup_FileSystem = "File Path does not exist-error 1" then
								try
									set ParentDirinBackup to my GetParentDirinBackup_FileSystem(theFile)
								on error
									set ParentDirinBackup to my CreateParentDirinBackup_FileSystem(theFile)
								end try
								set theGraphicOn to my Working_In_Graphic("All_OFF")
								my progressWheel_Run(true)
								set File_Has_Copied_with_MD5Value_For_Capture_File to my Copy_File_from_Capture_to_Backup(theFile, ParentDirinBackup)
								my progressWheel_Run(false)
								--my SaveFilein_Backup_LOG(theFile, File_Has_Copied_with_MD5Value_For_Capture_File)
							else
								--try
								if item 1 of theFileinBackup_FileSystem is true then
									set theGraphicOn to my Working_In_Graphic("BackupGraphic")
									set Backup_File_MD5_Value to Get_MD5_CHECKSUM_VALUE(item 2 of theFileinBackup_FileSystem)
									--display dialog Backup_File_MD5_Value
									
									my SaveFilein_Backup_LOG(item 2 of theFileinBackup_FileSystem, Backup_File_MD5_Value)
									--display dialog "Saved in Backup Log"
								end if
								--end try
							end if
							
						else -- file is in log 
							set theGraphicOn to my Working_In_Graphic("BackupGraphic")
							set theFileinBackup_FileSystem to my GetFileinBackup_FileSystem(theFile)
							try
								if item 1 of theFileinBackup_FileSystem is true then
									set CaptureFile_MD5 to my GetFileinCaptureLog(theFile)
									set BackupFile_MD5 to my GetFileinBackupLog(item 2 of theFileinBackup_FileSystem)
									set AppleScript's text item delimiters to "|"
									set CaptureFile_MD5 to every text item in CaptureFile_MD5
									try
										set CaptureFile_MD5 to item 8 of CaptureFile_MD5
									on error
										set CaptureFile_MD5 to 1
									end try
									set BackupFile_MD5 to every text item in BackupFile_MD5
									try
										set BackupFile_MD5 to item 8 of BackupFile_MD5
									on error
										set BackupFile_MD5 to 0
									end try
									set AppleScript's text item delimiters to ""
									set theGraphicOn to my Working_In_Graphic("Compare")
									set MD5_Compares to my Compare_MD5_Checksum(CaptureFile_MD5, BackupFile_MD5)
									if MD5_Compares = true then
										my Save_File_as_VERIFIED(theFile, CaptureFile_MD5)
									else
										my DeleteFile_from_BackupLOG_AND_FileSYSTEM(item 2 of theFileinBackup_FileSystem)
									end if
								else -- file not in file system
									if item 2 of theFileinBackup_FileSystem = "File Path does not exist-error 1" then
										try
											set ParentDirinBackup to my GetParentDirinBackup_FileSystem(theFile)
										on error
											set ParentDirinBackup to my CreateParentDirinBackup_FileSystem(theFile)
										end try
										my progressWheel_Run(true)
										set theGraphicOn to my Working_In_Graphic("All_OFF")
										set File_Has_Copied to my Copy_File_from_Capture_to_Backup(theFile, ParentDirinBackup)
										my progressWheel_Run(false)
									end if
									
								end if
							end try
						end if
						
						if LoopCount is 3 then
							exit repeat
						end if
						set theGraphicOn to my Working_In_Graphic("All_OFF")
					end repeat
				end if
				
			end if --file verified
			---scan next file
			my IncrementControlProgress(1)
			set x to x + 1
			RemainingInfo("Remaining: " & (MaxCount - x))
		end repeat
		ControlProgressIndeterminate(true)
		delay 0.5
		if ABort_Copy is true then
			my Info_Display("Finishing last command... Waiting to Abort ")
		else
			my Info_Display("Get Files in Backup Location...")
			set theGraphicOn to my Working_In_Graphic("Backup")
			
			set theBackupLocationFileList to my Get_Location_Files(BackupLocation)
			---Is file in dest log? if it is generate md5
			my Info_Display("Get Files in Backup Log...")
			
			set Files_in_Backup_Log to my Get_Files_In_Backup_Log()
			
			script Backup_Items
				property FileinLog : missing value
				property BackLocList : missing value
			end script
			
			my Info_Display("Loading into memory...") ---remove after testing
			set Backup_Items's FileinLog to Files_in_Backup_Log
			set Backup_Items's BackLocList to theBackupLocationFileList
			
			set x to 0
			set MaxCount to count items in theBackupLocationFileList
			my SETControlProgress(0, MaxCount)
			
			repeat with someFile in Backup_Items's BackLocList
				if ABort_Copy is true then
					--set ABort_Copy to false
					exit repeat
				end if
				
				if someFile is in Backup_Items's FileinLog then
					--Set_File_in_Backup_to_ONLINE(someFile) 
					--Set_File_in_Backup_to_Verified(someFile)
					try
						my Info_Display("Files in Backup ONLINE: " & someFile)
					end try
				else
					try
						set BackupMD5 to my Get_MD5_CHECKSUM_VALUE(someFile)
					on error
						set BackupMD5 to "FAILED"
					end try
					my SaveFilein_Backup_LOG(someFile, BackupMD5)
					(*if BackupMD5 is "FAILED" then
					--
				else
					
					--Set_File_in_Backup_to_Verified(someFile)
					--Set_File_in_Backup_to_ONLINE(someFile)
				end if*)
				end if
				try
					my IncrementControlProgress(1)
					set x to x + 1
					my RemainingInfo("Remaining: " & (MaxCount - x))
				end try
			end repeat
		end if
		---SCAN SOURCE LOG FOR OFFLINE FILES
		
		--set DELETE_OFFLINE_Capture_Files_In_Log to my Delete_Offline_Files_in_Log("Capture")
		--my Set_ALL_BackupLogfiles_to_OFFLINE()
		---Read next file in dest dir
		(*if ABort_Copy is true then
			--
		else
			set DELETE_OFFLINE_Capture_Files_In_Log to my Delete_Offline_Files_in_Log("Backup")
		end if*)
	else
		my DisplayAlerttoUSER("Capture & destination error")
		
	end if
	
	my Info_Display("Run Complete !")
	EnableCopySecureButton()
	RemainingInfo("")
	set AppleScript's text item delimiters to ""
	set END_TIME to current date ---remove time stuff after testing start , remember current date at top of handler
	set FINAL_TIME_TAKEN to (END_TIME - START_TIME) / 60
	set FINAL_TIME_TAKEN to FINAL_TIME_TAKEN as string
	if (count of characters in FINAL_TIME_TAKEN) is greater than 4 then
		set FINAL_TIME_TAKEN to characters 1 thru 4 of FINAL_TIME_TAKEN as string
	end if
	my Info_Display("Run Started on " & (START_TIME as string))
	my Info_Display("Run Ended on " & (END_TIME as string))
	my Info_Display("Run Took " & (FINAL_TIME_TAKEN as string) & " minutes to complete.") ---remove time stuff after testing end
end Begin_CopySecure

So hopefully I got it right and the verified items should cause an early next repeat. This is the part that I hope to speed up and where applescript also gets slower and slower as I iterate through the list of items.

Thanks again to anyone who made it to this point

Kind regards

EM

Any ideas anyone?

Hello EM,

This might be worth a try. The entire operation of creating a list of 25,000 strings and then searching it for one specific string took less than one second on my machine.

icta



-- time the following operation:
set start_time to (time of (current date)) -- start timing

set my_list to {}
-- use the "a reference to" operator: 
set my_list_ref to a reference to my_list

-- create a list of 25,000 strings:
set number_of_items to 25000
repeat with x from 1 to number_of_items
	set this_string to "String_" & x
	copy this_string to the end of my_list_ref
end repeat

-- look for a specific string:
if my_list_ref contains "String_24000" then
	set string_does_exit to true
end if

set end_time to (time of (current date)) -- stop timing.
set elapsed_time to end_time - start_time

log "1. my_list is shown below:"
log my_list
log return
log "2. elapsed_time is is " & elapsed_time & " seconds."
log return
log "3. string_does_exit is: " & string_does_exit & "."


Thanks for the reply, I’ll give it a go and report back.

What does the “reference to” operator do?

kind regards

EM

So I tried the suggested “a reference to” operator.

There is no significant speed increase and the process still slows down as the loop runs.

Thanks anyway.

What does this “a reference to” operator do ?

Adam suggested that I segment the task. How do I do that ?

regards

EM

Hello EM,

Sorry for the long delay.

The Applescript Language Guide PDF defines the “a reference to” operator on pages 203, 204 and 205. Here’s a link: http://developer.apple.com/documentation/applescript/conceptual/applescriptlangguide/AppleScriptLanguageGuide.pdf It says it has several uses, one of which is the one I mentioned about accessing items in a list efficiently. It also gives example scripts that you can run to see for yourself. I can assure you it does work, although loading the whole script as a script object, as Adam Bell suggested, may provide the same speed gain. (I will try it when I have time, to see how it compares).

However, since you did NOT see any difference in speed, I wonder if your delay might be coming from somewhere else in your script. As with most complex problems, I think your best approach would be to “divide and conquer.” Since you suspect the list-checking part of the script, why not make a copy of your script and then delete or comment out all but the list checking. Also, I couldn’t tell where the file name strings you were checking were coming from. If they are coming from another list, bear in mind that you might want to use the “a reference to” operator for both lists. Can you load the entire list of 24000 file name strings into a list and then check your other list of name strings without doing any logging, data checking or other work? If so, and it turns out to run without bogging, work outward from there.

Here are some wild guesses:

How are you building your lists? I’m pretty sure that 24,000 cycles of:

set my_list to my_list and this_string

will take a lot longer than:

copy this_string to the end of my_list

and it should be even faster with:

copy this_string to the end of my_list_ref (where my_list_ref is a reference to my_list)

How about the data checking? Could your script be comparing the data to every item in the list? Can you remove the data checking then then run the script to see how it compares?

How about timing every piece of work so you can really see where the slowdown occurs?

Well, I’m sure you know all this. My guess is that you were hoping someone could spot the problem without you having to slog through a major rebuild. Sometimes the hard way is the only way. Be encouraged, I’m sure you can get it.

Wishing you complete success,
icta

Hi.

I haven’t studied the big script in the middle of this thread yet, so excuse me for that.

The a reference to technique works on the same principle as the script object idea mentioned near the top of the thread ” except that the script object solution is wrongly implemented. The idea is to use a reference to the list variable while accessing the list’s items.

The a reference to operator sets up a reference in another variable:

set my_list to {lots of items} 
set my_list_ref to a reference to my_list
--> my_list of «script»

my_list_ref refers to my_list as something belonging to the script. Through a quirk of AppleScript, accessing the list’s items using the reference is very much faster than accessing them using my_list directly. The process can be made slightly faster still by simply putting my in front of the original variable:

item i of my my_list
-- as opposed to:
item i of my_list_ref

The reference is the same in both cases, but it has to be retrieved and interpreted from my_list_ref, whereas it’s written directly into the script with my my_list.

The script object technique is another way of implementing a “directly into the script” reference. It comes into its own within handlers, where variables are local and temporary and can’t be referenced. Assigning a list to a property of a script object assigns it to a variable that belongs to something, so a reference can be set up in that context:

local my_list
set my_list to {lots of items}

script V
	property l : missing value
end script

set V's l to my_list

my_list is a local variable. V’s l is a reference. The list is the same physical object in both cases.

To get the best out of these techniques, you have to used a numbered repeat index, not the repeat with theFile in . construction. And, ideally, commands like is in, which operate on the list itself rather than on specified items, should not use a reference.

Since you’re comparing string paths, and the cases of matching paths are likely to be the same, you could get a further speed increase by enclosing the process in a considering case block. This allows AppleScript simply to compare the strings to see if they’re the same, instead of checking each character to see if it has an equivalent in another case that has to be taken into account.

Since ABort_Copy has a boolean value, you could use it directly after if instead of comparing it with true first and using the boolean result of that. The difference is minimal, but real.

-- way up at the beginning of your script:
script V
	property CapLocList : missing value
end script
-- read the data in, then
set V's CapLocList to theCaptureLocationFileList
-- set things up as necessary, then

considering case
	repeat with i from 1 to (count theCaptureLocationFileList)
		set theFile to item i of V's CapLocList
		---at this point theFile is a string
		---at this point theCaptureLocationFileList is a list of strings
		---at this point theVerifiedList_in_Capture_LOG is a list of strings (about 25 000 items to process)
		if (ABort_Copy) then exit repeat
		if (theFile is in theVerifiedList_in_Capture_LOG) then
			my Info_Display("Verified: " & (theFile as string)) --- very slow.
		else --not verified
			set theFile to theFile as alias
			set FileinCaptureLog to GetFileinCaptureLog(theFile)
		end if
	end repeat
end considering

It also helps if the texts being compared are of the same class ” ie. both string or both Unicode text. And if theVerifiedList_in_Capture_LOG were a single, linefeed-delimited text rather than a list, that would be an advantage too. (This might not be desirable if you’re constantly updating it, though.)

That’s about all I can say, AppleScriptwise, on the given information. As if_confused_then_ask says, what goes on in your other handlers could be relevant. If one of them adds items to theVerifiedList_in_Capture_LOG, that could slow things down when the list gets very long. AppleScript can only append a certain number of items to a list before it has to replace it with a new list in a larger block of reserved memory. This process is invisible to the script, but takes longer the more items there are in the list already.

Hello EM,

I thought of one more thing that could slow your script down. Please do not be insulted if you have known this for eons. It seems like nearly everyone would know this but I’m new to the list and don’t know if it has been reported or even if it is widely known:

Are you watching Script Editor’s Event Log History window for feedback as your script runs? If you are, it will certainly slow your script down. I have included a short script to illustrate the difference. If you didn’t know about this behavior (or for anyone who might not), run the script once with the Event Log History window open and then run it again with the window closed and compare the times. I ran it on my machine and chose a folder containing 3251 items. Here are the results:

Event Log History window open: 317 seconds
Event Log History window closed: 64 seconds

Since your script is doing a lot more work, it could amount to a much greater difference.

With a list of 24,000 items you will surely want some feedback. You could insert a counter into your repeat loop and count up to, say, 100 and then report to a TextEdit file (report once every 100 files). Or, for the most minimal feedback, launch Activity Monitor and watch that just to make sure the script is still running.

Best wishes,
icta


set my_files_list to {}
set item_count to 0

set start_time to (time of (current date))

tell application "Finder"
	
	set source_folder to choose folder with prompt "Select a folder containing LOTS of files"
	
	set entire_contents to entire contents of source_folder
	set item_count to count of entire_contents
	
	repeat with this_item in entire_contents
		if kind of this_item is file then -- ignore folders
			set this_file_path_string to this_item as string
			copy this_string to end of my_files_list
		end if
	end repeat
	
	set end_time to (time of (current date))
	set elapsed_time to end_time - start_time
	display dialog "elapsed_time is: " & elapsed_time & " seconds." & return & return & "item_count is: " & item_count
	
end tell

Thanks for the replies folks.

if_confused_then_ask and Nigel

Thanks for the ideas. I am aware of the slow down with Script Editor ;). Also , I implemented the a reference to operator after the script object idea. I never tried timing them exactly and as a result didn’t notice any significant increase in speed.

However there was an increase when I implemented the script object(it was VERY obvious) and if the results are the same for both the a reference to operator and script object then I’m sure it does work !

The slow down occurs not when creating the list since this all happens way at the top . The list is created by reading an sqlite db file. This process is pretty quick. I will however look at that section again even if only to improve with a second or two :stuck_out_tongue:

The script actually slows down as it loops through each item, after about 3000-4000 items to be more precise. I suppose it could have something to do with the memory limitation in applescript.

I will try the tips Nigel gave and report back.

Thanks again folks
EM

As I understand it, you are reading an SQLite table of 25000 records into an AppleScript variable, then looping through that list to test if something exists. Is that right? No wonder it’s slow :wink:

Why don’t you use the SQLite database for the searching, since it’s much better at it than AppleScript lists. Your pseudo code would be something like:

SQLSelect(“select exists (select 1 from MyHugeTableOfFiles where FileName = '” & myFileNameToMatch & “';”)

which will return 1 if it exists, 0 if it doesn’t.

If your search is more complex, then you are still much better off doing the actual search in SQL rather than transferring a pile of data into AppleScript and searching there.

Tom
BareFeet