I have a list containing a few hundred email addresses. There are a lot of “doubles” in their. I’d like to go through the list and remove any duplicates so there is only one instance of each item. Is there some kind of “replacing yes” function that relates to lists? Thanks for any info.
You can use this:
set x to {"a@a.e", "b@b.c", "d@a.h", "a@a.e"}
addr(x) --> {"a@a.e", "b@b.c", "d@a.h"}
to addr(l)
script foo
property foo2 : l
property okAddresses : {}
end script
considering case
repeat with i from 1 to count (foo's foo2)
set x to ((foo's foo2)'s item i)
if x is in foo's okAddresses then
else
set end of foo's okAddresses to x
end if
end repeat
end considering
foo's okAddresses
end addr
- Acceleration techniques learnt from Nigel Garvey, drops here 1second for a 1000 items list
WHOA. that’s pretty insane! works perfect too.
much thanks to you and Nigel. i learned a lot just from reading this script.
Very cool !
Now why do you use a script object:
script foo
property foo2 : l
property okAddresses : {}
end script
in stead of ordinary lists ?
how does this speed things up ?
I just believe :lol:
If you are very interested, you can enable/disable the following speed-techniques:
-Accessing list items in script objects.
-Using a “considering case” statement.
-Use “if x then[nothing]else[something]” instead of “if not x then[something]” (not very relevant here, I think, but just in case.
The “regular” version is 4 times slower in my tests:
to addr2(l)
set okAddresses to {}
repeat with i from 1 to count l
set x to (l's item i)
if x is not in okAddresses then set end of okAddresses to x
end repeat
okAddresses
end addr2
For some reason known only to the people at Apple, access to the items in an AppleScript list is dramatically faster if the list variable is “referenced” rather than simply being — er — used. A “reference” is a descriptive phrase like ‘my longList’ or ‘foo2 of foo’.
The only kinds of variable that can be referenced are globals and properties. Local variables can’t be referenced, so if you want the speed advantage inside a handler, you either have to use a global variable (generally considered to be bad practice) or set up a script object with its own list properties. It’s not the script object itself that speeds things up. It’s the fact that it enables the use of references in the handler.
By default, AppleScript ignores case when comparing strings, but considers everything else. ‘Ignoring’ involves more work than ‘considering’, because AppleScript has to recognise some characters as being the same when in fact they’re not. If you know beforehand that the strings will have the same case — or that one or both of them will have no case at all, as with numerals and punctuation — it makes sense to use ‘considering case’ to reduce the amount of background work. I personally wouldn’t use it when comparing e-mail addresses, but I don’t know the circumstances of the dupes in Muad’Dib’s list.
This is one of my sillier ideas. I don’t know if I like it or not. The time it saves is only significant in intensely repetitive situations.
Very helpful.
In case anyone comes across the same problem I was having…
I have lists, where each entry of the list is a record:
set tabGoods to {{|style|:"g185", colorCode:"51", colorName:"black"}, {|style|:"g185", colorCode:"51", colorName:"black"}}
And these functions to remove dulicates did NOT work on it. I don’t know why, but they spit the list back out again with the duplicates still in place.
I’m not sure why these don’t work and what I ended up writing does work, but just in case this is useful to anyone, this removes duplicates even when the list items are records:
on removeDuplicateRecords(inputList)
set itemCount to count of items in inputList
set outputList to {}
repeat with anItem from 1 to itemCount
set firstListItem to item anItem of inputList
set occurrenceCount to 0
repeat with anotherItem from 1 to count of items in outputList
set secondListItem to item anotherItem of outputList
if firstListItem is secondListItem then set occurrenceCount to occurrenceCount + 1
end repeat
if occurrenceCount = 0 then copy firstListItem to end of outputList
end repeat
return outputList
end removeDuplicateRecords
This could be painfully slow for large sets of records, I really don’t know. My lists have at most maybe 10-20 records, so it’s not significant. The longest lists I ran it on, it took 127 milliseconds, so it’s not stressing me out, but I’m guessing from that time that it would not scale well to thousands… but at least it works for records.
- t.spoon
Hi t.spoon.
The problem with the original script (apart from the fact that it no longer opens correctly in Script Editor!) is this line:
if x is in foo's okAddresses then
It tends to get written this way because it works with simple objects like strings and numbers. But the correct formulation when using ‘is in’ or ‘contains’ with a list of items is:
if {x} is in foo's okAddresses then
Notice the braces round ‘x’. The reason for them is that the code’s notionally looking for a section of the list, not, as we think of it, for an item in the list.
set tabGoods to {{|style|:"g185", colorCode:"51", colorName:"black"}, {|style|:"g185", colorCode:"51", colorName:"black"}}
tabGoods contains {|style|:"g185", colorCode:"51", colorName:"black"}
--> false
tabGoods contains {{|style|:"g185", colorCode:"51", colorName:"black"}}
--> true
tabGoods contains {{|style|:"g185", colorCode:"51", colorName:"black"}, {|style|:"g185", colorCode:"51", colorName:"black"}}
--> true
The same’s notionally true with text:
"Hell" is in "Hello"
--> true ” not because "Hello"'s a container containing "Hell", but because "Hell"'s a subsection of it.
I think the reason you can get away without the braces when checking for a text or number in the list is that in AppleScript, a single item is automatically coercible to a list containing that item, so we don’t have to think about it. But when the item’s already a list or a record, the coercion to list takes on a different meaning. In these cases, we have to be explicit with the braces. But using braces is actually correct in any case.
Hope this makes sense.
Edit: Yes. I was right about items being coerced to lists. Here’s a short demo:
set tabGoods to {"g185", "51", "black", "g185", "51", "black"} -- Now a list of texts.
"g185" is in tabGoods
--> true, because "g185" is automatically coerced to {"g185"} (a text to list coercion) for the check.
{|style|:"g185", colorCode:"51", colorName:"black"} is in tabGoods
--> true, because the record is coerced to {"g185", "51", "black"} (a record to list coercion) for the check.
--> This is just to demonstrate that the coercion takes place. A record to list coercion should never be relied upon to produce a list with items in a particular order.
Hello Nigel.
That was a brilliant explanation.
It made a lot of sense to me.
Thanks
Some extra to Nigel’s explanation:
{2, 3} is in {1, 2, 3, 4, 5} --> true
{2, 4} is in {1, 2, 3, 4, 5} --> false
The reason why the first line will return true and the second false is that the first line is a subset of the list and the second line not, even when all values matches.
I didn’t know if records will be coerced into lists before comparing but I did know that records only compare values. A presumable reason behind this is that a scripting addition for instance can mess up the comparison (read: user defined key turn into a enumerated key). Technically there is a difference between a record containing user defined keys and enumerated keys. A record with user defined keys is actually a record containing one key (usrf) and a list as it’s value containing all the keys and values. The odd indexes are key values as normal AppleScript strings followed by their values. A record with enumerated keys are not. So when compared it’s better to only compare their values with their associated indexes. Which results in the same behavior as coercing into list first before comparing.
To make it it better understandable. A list as in the example of Nigel is actually stored as:
Then when a scripting addition is installed or other script library loaded into global scope and have colorCode and colorName enumerated respectively into ccod and cnam code, the list would look like:
Both lists would be presented the same way, except for some syntax highlighting, in script editor. If the records would be compared including their keys they would not match. But when only values are compared they will.
[offtopic]This is also why it’s important to use pipes around keys in records when using AppleScriptObjC, so you don’t send an enumerated key by accident[/offtopic]
I didn’t know that either, I did know hower that I could coerce a record to a list, on a one by one basis, what I didn’t know, or didn’t think of, was that I could coerce it with {} so I could use an “is in” expresson.
Great indepth on lists!
The same principle applies if you’re obliged to concatenate something to a list:
set aList to {"a", "b", "c"}
aList & {|style|:"g185", colorCode:"51", colorName:"black"}
--> {"a", "b", "c", "g185", "51", "black"}
aList & {{|style|:"g185", colorCode:"51", colorName:"black"}}
--> {"a", "b", "c", {|style|:"g185", colorCode:"51", colorName:"black"}}
aList & {1, 2, 3}
--> {"a", "b", "c", 1, 2, 3}
aList & {{1, 2, 3}}
--> {"a", "b", "c", {1, 2, 3}}
Hello.
The concatenation examples were interesting, there we go again, with the record. The list example, is somewhere I have been.
It is the "list compatible thing in order to search for elements, and records, (especially records), that has been an “aha” experience for me, but then again, looking at the difference, between a list of characters and strings, it is quite natural, that one object must be of the same form, as the object you want to check for containement of it.
set m to {{1, 2, 3, 4}, {5, 6, 7, 8}}
log ({1, 2, 3, 4} is in m) as text
-- false
log ({{1, 2, 3, 4}} is in m) as text
-- true
-- and this one, so this is a little bit smarter than text item delimiters after all, you can't overstep "item boundaries"
log ({{3, 4, 5, 6}} is in m) as text
-- false
I see a lot of uses for this. Thanks a lot.
I
Indeed. So consider this script:
set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}
x as list
--> {"zz", "a", "OK", "answer", "z", "aa"}
It seems that, somehow, the order that the items have been added is preserved in the order of the resulting list. Do you have any idea how? If I add:
set the clipboard to x
I see:
‘Jons’'pClp’{ ‘----’:{ ‘bhit’:‘utxt’(“OK”), ‘ttxt’:‘utxt’(“answer”), ‘usrf’:[ ‘utxt’(“zz”), ‘utxt’(“zz”), ‘utxt’(“a”), ‘utxt’(“a”), ‘utxt’(“z”), ‘utxt’(“z”), ‘utxt’(“aa”), ‘utxt’(“aa”) ] }, &‘subj’:null(), &‘csig’:65536 }
which is what I’d expect, but doesn’t explain the placement of the non-user items in the final list.
Although more relevant to understanding of the use of ‘is in’ and ‘contains’ with records and lists:
set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}
x contains {button returned:"OK", aa:"aa", z:"z", zz:"zz"}
--> true
Yes, and that makes sense if you assume there’s no order to record items. But the previous scripts suggest there is, at some level.
AppleScript is nothing if not entertaining…
There is some magic going on there. But the reason behind that is that a record can only contain enumerated keys and not user defined keys. Also another aspect is that those keys are only allowed once in the list, you can’t have the same enumerated key twice in a record. The user defined keys are therefore collected into one list and filled under a single keyword named ‘usrf’.
That makes that a record is different in presentation (AppleScript) and actual data (AppleEvent aka AERecord). It must be the AppleScript layer of the record keeping track of the order of the items while there is no such thing in the AppleEvent tier. I can confirm while I wrote scripting additions, as you have experienced yourself probably, the order of the AppleScript record isn’t always the same as the order of an AppleEvent record that comes in. That there is a difference was the only logical explanation I had back then, and still applies to this weird behavior.
To confirm my point I thought of running a mixed record through the AppleEvent manager and return the data back and see what happens. I have found that the items were rearranged inside the record, so once a mixed record leaves the AppleScript world it’s order is lost. What if we do the same with your script:
set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}
script scriptX
on run argv
return item 1 of argv
end run
end script
run script scriptX with parameters {x} --> an re-arranged record
As you see the order of which you have set the items are lost and rearranged.
This is the closest answer I can get to to your question “Do you have any idea?”. I have no idea what really happens in code, but based on AppleEvent’s transparency and testing AppleScript code it is clear than an AppleScript record is not (entirely) the same as an AppleEvent record. An extra irreversible coercion/transition is made when an AppleScript records will enter the world of AppleEvents.
Is it a bug or poor implementation? No, AppleScript and AppleEvents are both clear that order of values in a record is not a guarantee. As with normal hash tables and associative arrays in other programming languages, the index is not important including the order in which the items are stored.
edit— shane showed i’m wrong about a conclusion… maybe it’s time to go to bed
That was my conclusion, too – something at a lower level.
But Nigel’s later example suggests that order is ignored when you use “is in” with records.
You were fast :)… I found out that is in is still safe. So the order of the items is not important when comparing records, but it is when comparing list containing values. The comparison itself is more than just a record to list coercion:
set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}
script scriptX
on run argv
return item 1 of argv
end run
end script
set results to run script scriptX with parameters {x} --> an re-arranged record
({text returned:"answer", z:"z", zz:"zz"} as list) is in (results as list) --false
{text returned:"answer", z:"z", zz:"zz"} is in results --true
Hello.
So a record in AppleScript, is really at least, a simulated set of values, if not a set of values internally.
A set is a an unordered collection of elements, where there are no count of any similiar elements, a constraint of uniqueness of elements, is also possible, and then it is called a set of unique values.
Attributes and properties, of objects and records, often work like this: the last attribute/property of a kind, is the one that are used.
It is a good thing that AppleScript treats the record as a record, and not a list, and arranges the attributes of the record in some order to make the comparision easier of them when there is a test for likeness/containment.