Hello!
Your sed skills are very impressive Nigel!
It is of course much easier to make up a solution now, that the example input is more well defined, and on the table.
An AS solution, may not be that fast, but not that slow either, and it scores on readability. And it is fast to script when you donât know, or have no intention to know sed that well.
The pass by reference construct, text item delimiters, and makes it fairly fast to sift out numbers with AppleScript
My solution works this way:
-
First the text is split into paragraphs. if the input consists of more than 16384 paragraphs the solution is not going to work, but this is unlikely.
-
Then I split up the text by text item delimiters into a list, then I collect the numbers, by summarizing them.
The handler summarizes those numbers, and returns the partial sum, to be aggregated in the main handler.
- The main handler then gets the next chunk of text by paragraphs and continues to add up the partial sum, until the text is processed.
Math behind the solution:
An As list can contain 16384 = 2^14 items.
A conservative approach would say that no paragraph would contain more than 400 words.
That gives us 16384/400 ââ°Ë 40 paragraphs to process at a time.
What filesize concerns that would be just a guess, since the number of words are so variable. I think the solution will hold for files up to 1.5 Mb in size under all circumstances given that they contain ânormalâ text. This is guess is based upon the maximum estimates above that predicts a maximum filesize of 12.5 Megabytes. I have then reduced the number of words in a paragraph to an average of 10, with an average word size of 5 characters, and done the math.
16383 paragraphs times 10 words times 5 characters times 2 bytes div (1024*1024) ââ°Ë 1.5 Mb
Speed:
Programmers speed is of the greatest concern!
As for speed, it may take Nigel like 20 minutes or whatever now, to come up with that sed script, I could easily use like 5 hours to get it right
On the other hand, this solution took me about 20 minutes, maybe an hour, I was distracted. So much for efficency.
property chunkSize : 40
to max(a, b)
if a > b then return a
return b
end max
to min(a, b)
if a < b then return a
return b
end min
to sumPars(L)
script o
property pars : L
end script
set o's pars to o's pars as text
local tids
set {tids, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {tab, space, return}}
set o's pars to text items 1 thru -1 of o's pars
set AppleScript's text item delimiters to tids
local i, tSum
set {i, tSum} to {1, 0}
repeat (length of o's pars) times
try
set tSum to tSum + (item i of o's pars as number)
end try
set i to i + 1
end repeat
return tSum
end sumPars
to summarizeText(theText)
set pList to paragraphs of theText
# Turns the text into a list paragraphs
set pCount to length of pList
# Number of pars to process
set pRemains to max((pCount - chunkSize), 0)
# remaining paragrpahs to process
set {pStart, pEnd, iSum, chunkCount} to {1, min(chunkSize, pCount), 0, ((pCount div chunkSize) + 1)}
# initial range of paragraphs to process, intermediary sum and count of chunks with paragrapsh
repeat chunkCount times
set iSum to iSum + sumPars(items pStart thru pEnd of pList)
# gets the count and adds it to the partial sum
set {pStart, pEnd, pRemains} to {(pStart + chunkSize), (pEnd + min(chunkSize, pRemains)), max((pRemains - chunkSize), 0)}
end repeat
return iSum
end summarizeText
tell application "TextWrangler" to set theTxt to text of front text window
set theSum to summarizeText(theTxt)
log theSum