Compare 2 lists and create a 3rd list with the unique values.

Gokussl · November 8, 2013, 6:02pm

I am importing two text files into two lists that I’m comparing. One text file has a list of updates to install and the other has the history of updates already installed, they are all separated by returns. There will be some duplicates between the 2 lists since some of the updates may have been already installed. I want to be able to compare the 2 lists and create a third list with just the unique non duplicate values (updates yet to be installed). This is what I have so far but its not working.



set listofMtnLion to {{"osx1"},{"osx2"},{"osx3"}}
set listofMtnLionHistory to {{"osx1"},{"osx2"}}
set Uniquelist to {}

repeat with a from 1 to count of listOfMtnLion
	set Updatelist to (item a of listOfMtnLion)
	set Historylist to (item a of listofMtnLionHistory)
	if {Updatelist} is not equal to {Historylist} then
		set Uniquelist's end to {Updatelist}
	end if
end repeat

Nigel_Garvey · November 8, 2013, 9:15pm

Hi. Welcome to MacScripter.

I don’t think you need all those braces.


set listofMtnLion to {"osx1", "osx2", "osx3"}
set listofMtnLionHistory to {"osx1", "osx2"}
set Uniquelist to {}

repeat with a from 1 to (count listofMtnLion)
	set thisUpdate to item a of listofMtnLion
	if (thisUpdate is not in listofMtnLionHistory) then set end of Uniquelist to thisUpdate
end repeat

Uniquelist

Gokussl · November 8, 2013, 9:36pm

When I use your updated code with preset lists, this works. When I just use the automatically created lists from my imported text documents, it lists all the entires instead of just the unique one.

I have a feeling I’m importing the data into the lists incorrectly.

This is what I have for importing from text document.



--// Import OSX Update file for 10.8

set listOfMtnLion to {}
set OSUpdates2 to paragraphs of (read POSIX file "/Scentsy/Updates/OSX10.8.txt")
repeat with nextLine2 in OSUpdates2
	if length of nextLine2 is greater than 0 then
		copy nextLine2 to the end of listOfMtnLion
	end if
end repeat

--// Create OSX History file 

tell application "Finder" to if exists "/Scentsy/Updates/UpdateHistory10.8.txt" as POSIX file then
else
	do shell script ("sudo touch /Scentsy/Updates/UpdateHistory10.8.txt") user name (account of theCredentials) password (password of theCredentials) with administrator privileges
	do shell script ("sudo chmod 777 /Scentsy/Updates/UpdateHistory10.8.txt") user name (account of theCredentials) password (password of theCredentials) with administrator privileges
end if

--// Tests OSX History file to make sure its not empty with a try catch

set listofMtnLionHistory to {}
try
	set OSUpdatesHist2 to paragraphs of (read POSIX file "/Scentsy/Updates/UpdateHistory10.8.txt")
	set MtnLionFileNotEmpty to "yes"
on error Error1
	set MtnLionFileNotEmpty to "no"
end try

--// If History file not empty this imports it into a list

if MtnLionFileNotEmpty is equal to "yes" then
	set OSUpdatesHist2 to paragraphs of (read POSIX file "/Scentsy/Updates/UpdateHistory10.8.txt")
	repeat with nextLine2 in OSUpdatesHist2
		if length of nextLine2 is greater than 0 then
			copy nextLine2 to the end of listofMtnLionHistory
		end if
	end repeat
end if

Nigel_Garvey · November 8, 2013, 10:09pm

Hi.

Because of the kind of repeat you’re using to build the two lists, the items you’re putting in them aren’t the actual paragraph texts but references to them in the two values of OSUpdates2 ” eg. not the text “osx1” but the reference ‘item 1 of {“osx1”, “osx2”, “osx3”}’. It’s these references which are being compared rather than the values at the end of them. There are a couple of ways round this, the slightly easier being to change the two instances of .

copy nextLine2 to the end of listOfMtnLion  -- or of listofMtnLionHistory

. to .

copy contents of nextLine2 to the end of listOfMtnLion  -- or of listofMtnLionHistory

Then the lists will contain actual text and the comparisons should work properly.

Yvan_Koenig · November 9, 2013, 9:32am

Nigel Garvey:

Hi. Welcome to MacScripter.

I don’t think you need all those braces.


set listofMtnLion to {"osx1", "osx2", "osx3"}
set listofMtnLionHistory to {"osx1", "osx2"}
set Uniquelist to {}

repeat with a from 1 to (count listofMtnLion)
	set thisUpdate to item a of listofMtnLion
	if (thisUpdate is not in listofMtnLionHistory) then set end of Uniquelist to thisUpdate
end repeat

Uniquelist

Hello
If there is in listofMtnLionHistory an item which is not available in listofMtnLion my understanding is that it must be added to Uniquelist which the script doesn’t.

Yvan KOENIG (VALLAURIS, France) samedi 9 novembre 2013 10:32:35

Nigel_Garvey · November 9, 2013, 10:51am

Hi Yvan.

The way I read it, Gokussi only wanted “updates yet to be installed” in the third list. I’m presuming that listOfMtnLionHistory is the list with the “history of updates already installed.”

McUsrII · November 9, 2013, 10:23pm

Hello.

I was thinking a little about this, and I wondered if this couldn’t be solved by using text item delimiters as a set theoretic tool. I am disappointed, as I figured the problem to involve three lists, and not just two.

Now, to get a unique subset out of two sets A and B, you would form the difference A-B + B-A. (A set can per definition only contain unique elements).

Although this uses text item delimiters, I don’t think this to be any faster, but it would be fun to time it.


set listofMtnLion to {"osx1", "osx2", "osx3"}
set listofMtnLionHistory to {"osx1", "osx2"}

set tmp1 to difference(listofMtnLion, listofMtnLionHistory)
set tmp2 to difference(listofMtnLionHistory, listofMtnLion)
set uniqueValues to tmp1 & tmp2
-- > {"osx3"}

on difference(setA, setB)
	tell (a reference to AppleScript's text item delimiters)
		local tids
		set {tids, contents of it} to {contents of it, return}
		set setB to setB as text
		set contents of it to setA
		set setB to text items of setB
		set contents of it to return
		set setB to text items of (setB as text)
		set contents of it to missing value
		set setB to text items of (setB as text)
		set contents of it to tids
	end tell
	return setB
end difference

Nigel_Garvey · November 9, 2013, 11:34pm

Nice idea. But the handler needs to be a little more robust.


set listofMtnLion to {"osx1", "osx2", "osx3", "osx4"}
set listofMtnLionHistory to {"osx13", "osx4", "osx2"}

set Uniquelist to difference(listofMtnLion, listofMtnLionHistory)

-- Return a list of the items in setA which aren't in setB.
on difference(setA, setB)
	set astid to text item delimiters
	set text item delimiters to return & return
	set setA to return & setA & return
	set text item delimiters to return & linefeed & return
	set setB to return & setB & return
	set text item delimiters to linefeed
	set text item delimiters to setB's text items
	set setA to setA's text items
	set text item delimiters to ""
	set setA to setA as text
	if ((count setA) > 0) then
		set text item delimiters to return & return
		set setA to text items of text 2 thru -2 of setA
	else
		set setA to {}
	end if
	set text item delimiters to astid
	
	return setA
end difference

McUsrII · November 10, 2013, 12:09am

I’ll figure out why you do that!

Because, I thought it to be a list as presented, that is: no linefeeds, or returns in the list, and no empty elements in the list, given this, and the constraint below, I can’t see any reason for it not to to work,.

There is one major constraint/deficiency by this approach, no element can be a substring of another element:

Example:
if osx was an element in the list, then everything would be ruined. -We would probably end up with a list of the numbers. -Which could be all right if we took height for that. I think that is a valid constraint in set theory, but it is rather awkward in real life, too much constraints for my taste.

I’m not sure if this is going to be faster than anything, given all the constraints here, the one place it can be faster, is when there are only two lists that are to be sifted for duplicates. 3 lists, leads to 10 difference operations.

So, the traditional way of doing things still rules.

Edit
You removed the constraint by embedding everything into returns! Nice!!!

I am sure it is more usable now, but I am still very unsure if it is faster, the in operator is a fast one too…

DJ_Bazzie_Wazzie · November 10, 2013, 2:43am

The problem with your example does not rely in linefeeds or empty items. It’s when a name of a file overlaps another. When you have an update for iTunes and iTunes helper is already installed, your example fails. Nigel’s example fixes that.

McUsrII · November 10, 2013, 4:19am

Hello.
Yes I figured it out, several hours ago, that he boxes the text item delimiters with returns, so that there is no way a text item delimiter can work on a substring, really nice. And even nicer how he uses the linefeed for safety in the process!

I figured it out a couple of hours ago before your post, (see the stamp).

The handler is also a great idea, but for set manipulation, getting the interscetion, and difference between sets.

Nigel_Garvey · November 10, 2013, 2:23pm

The techniques have been around for years and aren’t mine. I was only posting from experience. But if McUsrII was previously unaware of the idea of coercing a list of text to text and using TIDs, instead of repeating through the original list, his invention of it here for the current purpose shows some original thinking.

The linefeed isn’t for safety, but is a delimiter for separating the return-bounded sections of setB, which are then used as delimiters themselves.

McUsrII · November 10, 2013, 3:11pm

Hello.

Frankly no original thinking from my part -I only took advantage of the fact that the delimiters nowadays can be a list.

I have seen Nigel Garvey do “the robustness” before, with lines I think. What I wasn’t aware of, is that it would also protect words from being broken up into pieces, as you wont find a match for return & osx & return, which the new text item delimiters are, which makes them only match when the word boundaries are the same. Which is quite smart. I didn’t really contribute with anything new here I think, except for using missing value as a text item delimiter maybe, and Apple has provided for the lists of text item delimiters. -Everything else, except for the idea of looking at the uniqueness problem as a set operation, (which can hardly be called original), everything is stolen as usual. Ranting along: whether and idea is original or not, is of lesser importance than if the idea works, not just technically, but also as a solution to some problem. -That it feels good to use.

It is Nigel’s effort that made the handler usable, which I think is a practical one for finding differences between sets.
There is no problem in using the while loop really, but this feels at least more correct, though the one with the loop may actually outperform it, at least when it has gotten a script object to keep the reference to a list with.

I got it with the linefeed, but this is boggling all together. (As usual with text item delimiters, which makes them so fascinating, apart from the speed they do represent.)

Gokussl · November 10, 2013, 11:41pm

Nigel,

That changed ended up working great. I guess I didn’t fully understand what was happening when calling copy.

I appreciate the quick responses. Thanks!

McUsrII · November 16, 2013, 6:41pm

Hello.

I just add this complementary operation in this thread since that is most relevant, and thereby complets the set handlers.

add and remove member, isMemberOf, and UnionSet, should be fairly easy to implement.

It should also be fairly easy to implement sets with a counter for how many times it has been added, for some “histogrammic” purpose, or whatnot.


-- Return a list of the items in aSet which aren't in Universe.
on isSubset of Universe for aSet
” Stolen from Nigel Garvey
	if aSet = {} then return true
		-- the empty set is a member of every set.
	set astid to text item delimiters
	set text item delimiters to return & return
	set aSet to return & aSet & return
	set text item delimiters to return & linefeed & return
	set Universe to return & Universe & return
	set text item delimiters to linefeed
	set text item delimiters to Universe's text items
	set aSet to aSet's text items
	set text item delimiters to ""
	set aSet to aSet as text
	set text item delimiters to astid
	return ((length of aSet) = 0)
end isSubset

Edit

If you are a lazy typist, then there is of course nothing hindering you in using the handler above for figuring out if two lists with simple values in it are equal.