Localised weekday?

Slow? LOL

AWK is written to outperform the shell big time and succeeded. Weird is that I would suspect that cut is indeed faster but seems not to be true on every machine. On my machine there is no difference, some machines cut is faster and others awk is faster. AWK is pretty awesome and insanely fast for an interpreter.

These are my results:

It is now an exact match.

for instance when you lookup ‘Tue’ you would normally have an contain match. But when you wrap and ^…$ around it (begins with and ends with) it is an exact match.

Good! Dank je.

Hello

Here is a version using ASObjC Runner.


--weekday of (current date) as text
"Wed"
timeLocalStrings(result, "en_US", "fr_FR")

on timeLocalStrings(str, _from, _to)
	set vName to path to startup disk as text
	set fPath to vName & "usr:share:locale:" & _from & ".UTF-8:LC_TIME"
	set tPath to vName & ":usr:share:locale:" & _to & ".UTF-8:LC_TIME"
	tell application "System Events"
		{exists disk item fPath, exists disk item tPath}
	end tell
	if result contains false then return str
	paragraphs of (read file fPath)
	tell application "ASObjC Runner"
		set maybe to look in list result matching str
	end tell
	if maybe = {} then return str
	item 1 of maybe
	paragraph result of (read file tPath)
	return result
end timeLocalStrings

Yvan KOENIG (VALLAURIS, France) mercredi 15 août 2012 18:10:15

I have not timed it towards perl, and maybe it does as good as cut, when it is just one line of input.

I have no doubt, that cut will outperform awk, on any OSX from Tiger onwards if say it is over 100 lines of text that is to be cut.

Most of the time, I use sed when I can, to overcome the slowness of awk. That is, when I don’t need a such a big script language to process input. Sed is also an interpreter, though much smaller, and faster.

How perl perfoms with regards to awk, would be interesting to see. My initial guess would be that it is faster, but I have no knowledge on the matter.

Hello

For the fun, I compared three handlers executing them 1000 times.

with handler #1 :


set beg to current date
repeat 1000 times
	--weekday of (current date) as text
	"Wed"
	timeLocalStrings(result, "en_US", "fr_FR")
end repeat
(current date) - beg
--> 460

on timeLocalStrings(str, _from, _to)
	tell application "System Events"
		set validLocales to name of every folder of folder "Macintosh HD:usr:share:locale:" whose name contains ".UTF-8"
	end tell
	{validLocales contains _from & ".UTF-8", validLocales contains _to & ".UTF-8"}
	
	if result contains false then return str
	set vName to path to startup disk as text
	set fPath to vName & "usr:share:locale:" & _from & ".UTF-8:LC_TIME"
	set tPath to vName & ":usr:share:locale:" & _to & ".UTF-8:LC_TIME"
	paragraphs of (read file fPath)
	tell application "ASObjC Runner"
		set maybe to look in list result matching str
	end tell
	if maybe = {} then return str
	item 1 of maybe
	paragraph result of (read file tPath)
	return result
end timeLocalStrings

with handler #2 (DJ Bazzie Wazzie one)



set beg to current date
repeat 1000 times
	--weekday of (current date) as text
	"Wed"
	timeLocalStrings(result, "en_US", "fr_FR")
end repeat
(current date) - beg
--> 37

on timeLocalStrings(str, _from, _to)
	set validLocales to every paragraph of (do shell script "ls /usr/share/locale | grep -i '.utf-8$' | awk -F. '{print $1}'")
	if _from is not in validLocales or _to is not in validLocales then return str
	set fPath to "/usr/share/locale/" & _from & ".UTF-8/LC_TIME"
	set tPath to "/usr/share/locale/" & _to & ".UTF-8/LC_TIME"
	set lineNumber to (do shell script "cat " & quoted form of fPath & " | sed -n '/^" & str & "$/ {=;q;}'") as integer
	if lineNumber = 0 then return str
	return do shell script "cat " & quoted form of tPath & " | sed -n -e '" & lineNumber & "," & lineNumber & "p'"
end timeLocalStrings

with handler #3



set beg to current date
repeat 1000 times
	--weekday of (current date) as text
	"Wed"
	timeLocalStrings(result, "en_US", "fr_FR")
end repeat
(current date) - beg
--> 9

on timeLocalStrings(str, _from, _to)
	set vName to path to startup disk as text
	set fPath to vName & "usr:share:locale:" & _from & ".UTF-8:LC_TIME"
	set tPath to vName & ":usr:share:locale:" & _to & ".UTF-8:LC_TIME"
	tell application "System Events"
		{exists disk item fPath, exists disk item tPath}
	end tell
	if result contains false then return str
	paragraphs of (read file fPath)
	tell application "ASObjC Runner"
		set maybe to look in list result matching str
	end tell
	if maybe = {} then return str
	item 1 of maybe
	paragraph result of (read file tPath)
	return result
end timeLocalStrings


You read well

handler #1 → 460 seconds
handler #2 → 37 seconds
handler #1 → 9 seconds

Thank you Shane Stanley.

Yvan KOENIG (VALLAURIS, France) mercredi 15 août 2012 21:35:36

First of all there are different awks, teh byte code awk is fastest but not implemented on OS X. Unbelievable but byte code version of AWK is faster than compiled code, I’m still amazed about that. But unfortunately not distributed with Mac OS X. No we have to work with the ‘one and only true’ AWK (designed by Aho, Weinberger and Kernighan), because Kernighan was also the designer of C we don’t have to worry if the C code is properly written :P.

No sed is a good tool but remember that Sed was there first, AWK was designed to extend, or at least, to do things sed isn’t able to. For instance AWK supports extended regular expressions, also you have C-style conditions and controls which sed also doesn’t have. Also AWK has bult-in field separator which ignores surrounding white spaces which cut nor sed have.

Later when the limits of AWK came up Perl was designed to do things which can’t be done with AWK like system call. Perl is also extensible which AWK and sed both aren’t.

So performance-wise AWK should be the middle, perl the slowest and Sed the fastest between these three.

When to use which?

  • sed for simple text processing
  • awk for more complex processing
  • perl for more complex processing and system calls are needed.

And then, of course, there’s handler #4:slight_smile:


set beg to current date
repeat 1000 times
	--weekday of (current date) as text
	"Wed"
	timeLocalStrings(result, "en_US", "fr_FR")
end repeat
(current date) - beg
--> 1

on timeLocalStrings(str, _from, _to)
	set fPath to "/usr/share/locale/" & _from & ".UTF-8/LC_TIME"
	set tPath to "/usr/share/locale/" & _to & ".UTF-8/LC_TIME"
	try
		set lookup1 to linefeed & (read fPath as «class utf8») & linefeed
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to linefeed & str & linefeed
		if ((count lookup1's text items) is 1) then
			set AppleScript's text item delimiters to astid
			error
		end if
		set lineNumber to (count paragraphs of text item 1 of lookup1)
		if (lineNumber is 0) then set lineNumber to 1 -- Special-case the first line in the file.
		set AppleScript's text item delimiters to astid
		set outStr to paragraph lineNumber of (read tPath as «class utf8»)
	on error
		set outStr to str
	end try
	
	return outStr
end timeLocalStrings

Edit: Incorporated a fix by alastor933 for a problem he discovered some months later which occurs when the ‘str’ term is the first line in the ‘_from’ file.

Hello Nigel

with handler #5


set beg to current date
repeat 10000 times
	--weekday of (current date) as text
	"Wed"
	timeLocalStrings(result, "en_US", "fr_FR")
end repeat
(current date) - beg
--> 20 -- Edited

on timeLocalStrings(str, _from, _to)
	set fPath to "/usr/share/locale/" & _from & ".UTF-8/LC_TIME"
	set tPath to "/usr/share/locale/" & _to & ".UTF-8/LC_TIME"
	try
		paragraphs of (read fPath as «class utf8») -- Edited
		tell application "ASObjC Runner"
			set maybe to look in list result matching str
		end tell
		--if maybe = {} then return str
		item 1 of maybe
		paragraph result of (read tPath) -- Edited
		return result
	on error
		return str
	end try
end timeLocalStrings

I ran with handler #4 with 10 000 pass too and got 2 seconds

I think that yours is the best answer to the original question as the OP wished a plain Applescript one.

Yvan KOENIG (VALLAURIS, France) mercredi 15 août 2012 23:45:49

As Shane Stanley pointed, there was two extraneous words « file » in the original code.
After removing them, the 10 000 pass require 20 seconds ;-(

I agree with usage of the tools Bazzie Wazzie, and I even didn’t know there was a byte code version available of awk. Wondering if it is made of Java, or the Microsoft byte code, or something else?

As for speed, I am not sure if perl is generally slower really. Having said that, perl code is hard to write, it looks sexy when done, but I can’t understand it after a month away from it, so I prefer awk over perl for those reasons, though the seemingly similarity of awk with c, confuses me at times.

But if I wanted optimum speed, I’d actually test both of those tools, to find the one that performs faststest in that case.

@ post 24 just amazing! Then suddenly grep -n was implemented in Applescript. :smiley:

Yvan,

I think handler #5 is cheating :wink:

The line:

       paragraphs of (read file fPath as «class utf8»)

actually errors. If you remove the word “file”, it works – but the time then drops it right down the rankings.

Hi Shane

Of course you are right.
It seems that my eyes were wide closed.
After removing the two words « file » the script requires 20 seconds for the 10 000 pass ;-(

I will edit my original message.

Yvan KOENIG (VALLAURIS, France) jeudi 16 août 2012 11:44:28

Don’t push it, handlers as these will never been called 10000 times. If so the whole approach is wrong, you should load the data once (to spare the file system) and store it as an property like this:

property LC_TIMES : missing value

--initialze
set LC_TIMES to loadLocales()

--start the script
set _weekday to word 1 of ((current date) as text)
set _from to getLocale("nl_NL")
set _to to getLocale("en_US")

repeat 12500 times
	timeLocalString(_weekday, _from, _to)
end repeat

on timeLocalString(str, _from, _to)
	repeat with x from 1 to count _from's localeItems
		if item x of _from's localeItems = str then return item x of _to's localeItems
	end repeat
	return str
end timeLocalString

on getLocale(localeName)
	repeat with timeLocale in my LC_TIMES
		if name of timeLocale = localeName then return timeLocale --returns reference
	end repeat
end getLocale

on loadLocales()
	set localeNames to every paragraph of (do shell script "ls /usr/share/locale | grep -i '.utf-8$' | awk -F. '{print $1}'")
	set localeTimes to {}
	repeat with localeName in localeNames
		set end of localeTimes to {name:contents of localeName, localeItems:my loadTimeLocale(localeName)}
	end repeat
	return localeTimes
end loadLocales

on loadTimeLocale(localeName)
	return every paragraph of (do shell script "cat /usr/share/locale/" & localeName & ".UTF-8/LC_TIME")
end loadTimeLocale

OK:

tell application "ASObjC Runner" to set theWeekday to format date (current date) format "eeee"

All:
Simple question.
Big, and amazing, response!
Quite a bit of (to me) new stuff came up, which is much appreciated.

Then I remembered there’s an ‘official’ way of doing this: use a script bundle, and add a localisation file.
Script reads localised strings from there with ‘localized string’.

Nigel’s variant is pretty fast, but that LC_TIME file was ‘just a bit of luck’, and its existence allowed a quite specific solution for a single localisation problem.
At the same time it exemplifies another method for localisation.

So, as a general method for localisation, how would it compare with the official method?
What factors would you consider to judge the usability and “quality” of both methods?
Relative speed is an obvious candidate, when the difference is big enough.
Then again, it could just be a matter of preference.

:lol: :facepalm:

I’d consider copying the files I need from /usr/share/locale, and from iPhoto in to that script bundle, and take it from there, with maybe a slightly reworked version of NIgel Garvey’s handler (so it copes with different localization files, and different naming schemes.) :wink:

Maybe :


weekday of (current date) as text

my localiseur("CalInfo" & result, result)

on localiseur(str, str2)
	log {str, str2}
	tell application "iPhoto" to set z to localized string str
	if z is str then str2
end localiseur

Yvan KOENIG (VALLAURIS, France) jeudi 16 août 2012 15:56:08

What I meant with byte code version of AWK (MAWK) is not that AWK itself is written in byte code but the MAWK scripts/instructions are compiled to byte code and interpreted by the virtual machine (interpreter) inside MAWK. The difference is that code can be executed noticeable faster. remember when Java updated their interpreter to a byte code interpreter; suddenly Java wasn’t so slow anymore and became one of world’s most popular languages. Also LLVM is IMO on of world’s best byte code interpreters.

I see @ Bazzie Wazzie

@ Yvan: I see nothing wrong by my approach, neither with yours, your user has paid for iPhoto.app, and so have you, and I. But at least up to Mountain Lion, a user may strip out locales, so theoretically, the user may lack the localization file you seek. As for the files in /usr/share/locale : if the user deletes files there, he is really left in his own peril :slight_smile: But say it should come to it, very unlikely though, that the format or anything changes along the way (say you hard code the line numbers in order to save time, then I’d also copy them into the bundle. Just to be sure, if there are values there, that can’t be found in the localization files of iPhoto.