You can scrap that one. A few years back I changed my command line utilities to support Unicode. Some of them do excessive text processing and adding Unicode support to it slowed down the command line utilities unnoticeable. On the newer (read: faster) machine it should still be faster than on the old one and not two times slower as it is now.
You mean BRE and ERE (used by sed on Mac OS X) are 7 times faster than PCRE? First of all that is incorrect performance-wise and secondly PCRE and perl’s regex library are not the same. BRE and ERE can be 7 times faster than perl’s regex library but not 7 times faster than PCRE. Also on many other distributions sed has the option to use the PCRE library instead of using the BRE and ERE.
This is all hypothetical, but say if Unicode support was added by means of a library that uses PCRE …
For the Second part, I was absolutely sure that perl used PCRE, sorry about that. Then the timings should be valid for perl, and not PCRE. (Didn’t know it was somewhat faster, but it can’t be much faster.)
I don’t know which of the sed’s either one close to the original, or the GNU one, it is that has it’s source at gitHub, but I’ll certainly have a look into it, at my leisure.
By the way:
BRE stands for Basic Regular Expression.
ERE stands for Extended Regular Expression.
PCRE stands for Perl Compatible Regular Expression.
The two first ones, uses a much simpler algorithm, than the latter, since they doesn’t provide look-back abilities, which you need recursion to implement.
Edit
There is really nothing wrong with using PCRE, if you need that feature set, then that probably saves time, compared to coding the parsing of the “problem-text” in a regular programming language, like C and higher, while using a BRE/ERE library for the regular expressions.
But if you only need the feature set of BRE/ERE, then that is of course much faster.
First of all BRE/ERE is very basic compared to PCRE but both their speed depends completely on the expression. We can take RE2 and Oniguruma in this discussion as well. All these regex libraries can be considered equally fast. Why? Because BRE is in some situations extremely slow (more than 100 times) compared to RE2 but in other situations PCRE is the fastest of all. So we can’t really say which once is the fastest. RE2 (Google’s regex library) is considered the fastest because it is the fastest in most of the tests, then PCRE is considered second when it comes to fastest libraries. Because BRE/ERE will fail (drop in performance) most tests it is definitely not defined as the fastest library today.
But that doesn’t make it bad, the beauty is that bash, grep, sed, (probably) awk, etc… are all using the same regex library. That means that when I have optimized my regular expression I can use it for a lot of commands as well.
I haven’t seen those figures you have, but I guess that when the reg-exps starts to get really hairy, then it is probable that the BRE/ERE package will drop in performance. But I dare say, that the performance for the kinds of regexps you see in this forum, “bread and butter” regexps, then the basic regexp libraries are unsurpassed in performance. I have actually read a paper that I have referenced somewhere here, that for some regexps, BRE can actually be a million times faster than perl/PCRE regexps.
I agree with you, in the beauty of write once, use everywhere regexps.
That may be so, but I’m sure you agree with me in, that such an operation should go as fast and efficient as possible.
By the way, one way to implement idle time in cocoa, would be to have each triggered event, update a time stamp, and reset a timer on a different thread, that way you won’t use unnecessary resources, and only fire the idle event, when it is something to fire it for, rather than having a an idle timer run periodically, whether something has happened or not.
In Cocoa addLocalMonitorForEventsMatchingMask:handler: respectively addGlobalMonitorForEventsMatchingMask:handler: of NSEvent and/or CGEventSourceSecondsSinceLastEventType() of Quartz Event Services in conjunction with a timer might be the most efficient solution.
You can use the idle handler to get user input from an application. For instance if the user adds an event to Calendar, the idle handler can periodically check for this.