I have a list of real estate sales that includes, for each property, the sale date, the city, the addresses, the buyer, the seller, and the selling price. I wish to sort this by ascending date, then ascending city, then descending price. The data is in tab-delimited files. A simple sample:
In this example, the second line should be sorted to the top, since Monterey comes before San Francisco in the sort.
Is there any way besides creating a dummy document with the columns reorganized, first for the ascending sorts and then for the descending one? I.e., any scripting additions that’ll handle this? I’m currently doing it manually in Excel . . .
set x to {{3, 6}, {1, 5}, {2, 4}}
sortListOfLists(x, 2) --> {{2, 4}, {1, 5}, {3, 6}}
to sortListOfLists(lst, fld)
script a
property l1 : lst
property l2 : {l1's item 1}
end script
repeat with i from 1 to count lst
set f to a's l1's item i's item fld
considering case
if f < a's l2's item 1's item fld then
set beginning of a's l2 to a's l1's item i
else if f > a's l2's item -1's item fld then
set end of a's l2 to a's l1's item i
else --> loop
repeat with x from 1 to count lst
try
if f > a's l2's item x's item fld and f < a's l2's item (x + 1)'s item fld then
set a's l2 to (a's l2's items 1 thru x) & {(a's l1's item i)} & (a's l2's items (x + 1) thru -1)
exit repeat
end if
end try
end repeat
end if
end considering
end repeat
return a's l2
end sortListOfLists
The first step would be transforming your tab-delimited data into a list of lists, so it can be sorted (including coercion of dates, and $ to numbers). To apply a descending sorting, just use the current ascending one, then read items from the tail (item -1, item -2…), or just use the statement “reverse of sortListOfLists(thelist, thefieldtosortby)”.
How do I generalize your example to my case of having, say, 5 columns in which I’m sorting on 3. In other words, let’s say I had the following data (with ‘|’ representing a tab)
A | 5/2/2006 | San Francisco | $300,000
B | 5/1/2006 | San Francisco | $100,000
C | 5/1/2006 | Monterey | $120,000
D | 5/1/2006 | Monterey | $150,000
E | 5/2/2006 | Monterey | $100,000
I want to sort ascending by date, then (for same date) ascending by city, then (for same date and city) descending by price, so I end up with:
D | 5/1/2006 | Monterey | $150,000
C | 5/1/2006 | Monterey | $120,000
B | 5/1/2006 | San Francisco | $100,000
E | 5/2/2006 | Monterey | $100,000
A | 5/2/2006 | San Francisco | $300,000
I don’t see how a simple binary sort can readily yield these results. Maybe I just don’t understand the “list of lists” concept. My thought was that I need to somehow record the initial record order, then generate a new set of records in which my three fields are somehow concatenated . . . but it seems I’d need to pad the field lenths to not have a numeral from the price field being matched against a letter from the city field.
The trickey part about your text is that the dates don’t have leading zeros. For instance, the date 5/2/2006 can be written 05/02/2006. With this form you can sort alphabetically for months. Note that this won’t work for different years. After you have the leading zeros, you need to place the fileds in order for sorting with date, city, amount, and label adding the padding. Then you could just sort the whole thing alphabetically.
The other method would be to sort each field by grouping where each group with greater then 1 item needs to be sorted. There might be an easy way to do this with recursion.
What you want isn’t exactly a trivial task! You need a sort routine, a subsort routine to control it, and a way to control the direction of the individual subsorts. You also need to convert the short dates to AppleScript dates and the currency values to numbers first, so that they sort by value rather than lexically. If your text is from a spreadsheet, it would be easier if that could be scripted to do the job.
Failing that, here’s a script that’s adapted from something I wrote for my own use. It’s long, but it works and is quite fast. I’ve used a convention of negative column numbers in the column list for ‘sortOnColumns()’ to specify reverse sorts. (Please don’t ask me to explain how it works!)
set originalText to "A 5/2/2006 San Francisco $300,000
B 5/1/2006 San Francisco $100,000
C 5/1/2006 Monterey $120,000
D 5/1/2006 Monterey $150,000
E 5/2/2006 Monterey $100,000"
-- Break up the text into a list of "rows".
set theRows to originalText's paragraphs
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to tab
repeat with thisRow in theRows
-- Break up each row into list of "column" values.
set rowList to thisRow's text items
-- Replace each short date with an AppleScript date.
-- (Assumes the machine's short-date format matches the format supplied.)
set item 2 of rowList to date (item 2 of rowList)
-- Replace each currency value with an AppleScript number.
set item 4 of rowList to currencyToNumber(item 4 of rowList)
-- Append the "row" text to the list for a free ride during the sort.
set end of rowList to thisRow's contents
set thisRow's contents to rowList
end repeat
-- Sort successively on columns 2, 3, and 4, reverse-sorting on column 4.
sortOnColumns(theRows, {2, 3, -4})
-- Retrieve the orignal "row" strings from the sorted lists.
repeat with thisRow in theRows
set thisRow's contents to end of thisRow
end repeat
-- Rejoin them into a single text.
set AppleScript's text item delimiters to return -- or ASCII character 10
set newText to theRows as text
set AppleScript's text item delimiters to astid
return newText
on currencyToNumber(d)
set i to 1
considering case
repeat until character i of d is in "1234567890."
set i to i + 1
end repeat
end considering
set d to text i thru -1 of d
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ","
set d to d's text items
set AppleScript's text item delimiters to ""
set d to d as text
set AppleScript's text item delimiters to astid
return d as number
end currencyToNumber
on sortOnColumns(theRows, sortColumns)
-- theRows is a list of lists. The lists represent rows in a spreadsheet and their
-- items are the values from the intersections with the corresponding columns.
-- sortColumns is a list of the column numbers on which successive sorts and
-- subsorts are to be based. Negative column numbers indicate reverse sorting.
script o
property rows : theRows
property sortCols : sortColumns
property maxRecursionLevel : count sortColumns
script forwardSort
-- x, isLess() and isGreater() are for the CustomQsort comparisons
property x : missing value
on isLess(a, b)
script p
property c : a
property d : b
end script
(p's c's item x < p's d's item x)
end isLess
on isGreater(a, b)
script p
property c : a
property d : b
end script
(p's c's item x > p's d's item x)
end isGreater
end script
script reverseSort
property x : missing value
property isLess : forwardSort's isGreater
property isGreater : forwardSort's isLess
end script
on soc(top, bottom, recursionLevel)
-- Sort this batch of rows on this column
set sortColumn to my sortCols's item recursionLevel
-- If the supplied column number's positive, sort forwards; otherwise reverse sort.
if (sortColumn > -1) then
set forwardSort's x to sortColumn
CustomQsort(my rows, top, bottom, forwardSort)
else
set reverseSort's x to -sortColumn
CustomQsort(my rows, top, bottom, reverseSort)
end if
-- For every run of equal values in this section of this column
-- recurse to subsort on the next specified column
if (recursionLevel < maxRecursionLevel) then
set nextRecursionLevel to recursionLevel + 1
set i to top
repeat with j from top + 1 to bottom
if (my rows's item j's item sortColumn is my rows's item i's item sortColumn) then
else
if (j - i > 1) then soc(i, j - 1, nextRecursionLevel)
set i to j
end if
end repeat
if (j > i) then soc(i, j, nextRecursionLevel)
end if
end soc
end script
o's soc(1, count theRows, 1)
end sortOnColumns
-- A variation on the Garvey/Knapp qsort,
on CustomQsort(a, l, r, compObj) -- compObj must have handlers isLess(a,b) and isGreater(a,b)
script o
property cutoff : 10
property p : a
on qsrt(l, r)
set i to l
set j to r
set v to my p's item ((l + r) div 2)
repeat while (j > i)
set u to my p's item i
repeat while (compObj's isLess(u, v))
set i to i + 1
set u to my p's item i
end repeat
set w to my p's item j
repeat while (compObj's isGreater(w, v))
set j to j - 1
set w to my p's item j
end repeat
if (i > j) then
else
set my p's item i to w
set my p's item j to u
set i to i + 1
set j to j - 1
end if
end repeat
if (j - l < cutoff) then
else
qsrt(l, j)
end if
if (r - i < cutoff) then
else
qsrt(i, r)
end if
end qsrt
on isrt(l, r)
set x to l
set z to l + cutoff - 1
if (z > r) then set z to r
set v to my p's item x
repeat with y from (x + 1) to z
if (compObj's isLess(my p's item y, v)) then
set x to y
set v to my p's item y
end if
end repeat
tell my p's item l
set my p's item l to v
set my p's item x to it
end tell
set u to my p's item (l + 1)
repeat with i from (l + 2) to r
set v to my p's item i
if (compObj's isLess(v, u)) then
set my p's item i to u
repeat with j from (i - 2) to l by -1
if (compObj's isLess(v, my p's item j)) then
set my p's item (j + 1) to my p's item j
else
set my p's item (j + 1) to v
exit repeat
end if
end repeat
else
set u to v
end if
end repeat
end isrt
end script
if r - l < o's cutoff then
else
o's qsrt(l, r)
end if
o's isrt(l, r)
end CustomQsort
I use your sortOnColumns(theRows, sortColumns) scipt in post#5 for sorting a 800 x 10 list of lists and it is much faster than the quick sort handler (also get it from this forum) that I normally would use. Thank you for your 15 years-old work!
Thanks, ngan. It’s good to know the old stuff’s still useful.
The sortOnColumns(theRows, sortColumns) handler would be much shorter if written today because:
My custom sort handlers now only expect an isGreater(a, b) handler in the customising script objects. The effect of the old isLess(a, b) handler can be got simply by calling isGreater(a, b) with swapped parameters!
I generally find a single sort with compound comparisons to be more efficient than sorting and subsorting on individual criteria. It’s certainly less code.
The parameters for my custom sorts are specified slightly differently nowadays too, but I haven’t changed that aspect here:
set originalText to "A 5/2/2006 San Francisco $300,000
B 5/1/2006 San Francisco $100,000
C 5/1/2006 Monterey $120,000
D 5/1/2006 Monterey $150,000
E 5/2/2006 Monterey $100,000
F 5/2/2006 Monterey $150,000"
-- Break up the text into a list of "rows".
set theRows to originalText's paragraphs
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to tab
repeat with thisRow in theRows
-- Break up each row into list of "column" values.
set rowList to thisRow's text items
-- Replace each short date with an AppleScript date.
-- (Assumes the machine's short-date format matches the format supplied.)
set item 2 of rowList to date (item 2 of rowList)
-- Replace each currency value with an AppleScript number.
set item 4 of rowList to currencyToNumber(item 4 of rowList)
-- Append the "row" text to the list for a free ride during the sort.
set end of rowList to thisRow's contents
set thisRow's contents to rowList
end repeat
-- Sort successively on columns 2, 3, and 4, reverse-sorting on column 4.
sortOnColumns(theRows, {2, 3, -4})
-- Retrieve the orignal "row" strings from the sorted lists.
repeat with thisRow in theRows
set thisRow's contents to end of thisRow
end repeat
-- Rejoin them into a single text.
set AppleScript's text item delimiters to return -- or ASCII character 10
set newText to theRows as text
set AppleScript's text item delimiters to astid
return newText
on currencyToNumber(d)
set i to 1
considering case
repeat until character i of d is in "1234567890."
set i to i + 1
end repeat
end considering
set d to text i thru -1 of d
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ","
set d to d's text items
set AppleScript's text item delimiters to ""
set d to d as text
set AppleScript's text item delimiters to astid
return d as number
end currencyToNumber
on sortOnColumns(theRows, sortColumns)
-- theRows is a list of lists. The lists represent rows in a spreadsheet and their
-- items are the values from the intersections with the corresponding columns.
-- sortColumns is a list of the column numbers on which successive sorts and
-- subsorts are to be based. Negative column numbers indicate reverse sorting.
script o
property sortCols : sortColumns
property sortColCount : (count sortCols)
on isGreater(a, b) -- Should list 'a' be moved to after list 'b'?.
script p
property c : a
property d : b
end script
-- With each pair of lists, compare the items from each sort column in turn until a decision's made.
repeat with i from 1 to sortColCount
-- Get the next sort column index.
set x to my sortCols's item i
-- If it's negative, descending order's required and the index needs to be made positive.
set descending to (x < 0)
if (descending) then set x to -x
-- Get the values from the two lists.
set va to p's c's item x
set vb to p's d's item x
-- If they're equal, no decision can be made based on this column.
-- Otherwise the first list has to go after the second if the first value's
-- relationship to the second doesn't match the required sort direction.
if (va = vb) then
else
return ((descending) = (vb > va))
end if
end repeat
-- If we get here, the values from all the sort columns are the same in both lists.
-- List 'a' doesn't have to go after list 'b'.
return false
end isGreater
end script
CustomQSort(theRows, 1, (count theRows), o)
end sortOnColumns
-- A variation on the Garvey/Knapp qsort,
on CustomQSort(a, l, r, compObj) -- compObj must have handler isGreater(a,b)
script o
property cutoff : 10
property p : a
on qsrt(l, r)
set i to l
set j to r
set v to my p's item ((l + r) div 2)
repeat while (j > i)
set u to my p's item i
repeat while (compObj's isGreater(v, u))
set i to i + 1
set u to my p's item i
end repeat
set w to my p's item j
repeat while (compObj's isGreater(w, v))
set j to j - 1
set w to my p's item j
end repeat
if (i > j) then
else
set my p's item i to w
set my p's item j to u
set i to i + 1
set j to j - 1
end if
end repeat
if (j - l < cutoff) then
else
qsrt(l, j)
end if
if (r - i < cutoff) then
else
qsrt(i, r)
end if
end qsrt
on isrt(l, r)
set x to l
set z to l + cutoff - 1
if (z > r) then set z to r
set v to my p's item x
repeat with y from (x + 1) to z
if (compObj's isGreater(v, my p's item y)) then
set x to y
set v to my p's item y
end if
end repeat
tell my p's item l
set my p's item l to v
set my p's item x to it
end tell
set u to my p's item (l + 1)
repeat with i from (l + 2) to r
set v to my p's item i
if (compObj's isGreater(u, v)) then
set my p's item i to u
repeat with j from (i - 2) to l by -1
if (compObj's isGreater(my p's item j, v)) then
set my p's item (j + 1) to my p's item j
else
set my p's item (j + 1) to v
exit repeat
end if
end repeat
else
set u to v
end if
end repeat
end isrt
end script
if r - l < o's cutoff then
else
o's qsrt(l, r)
end if
o's isrt(l, r)
end CustomQSort
I will invest time to understand the technique behind.
A brief test, without using Script Geek, on sorting a list of list (800 x 10) by the two versions of sortOnColumns(theRows, sortColumns) on the same process on SD shows that the newer version is even slightly faster!