I have a script that merges multiple Excel files into one master file. After the merge, I’d like to delete all duplicate rows in that file so that only unique records remain. I’ve looked at the Excel Applescript Reference but can’t seem to figure this out. I’d appreciate some help.
Thanks.
Have you looked at Advanced Filter’s Unique only option?
When you say “duplicate rows” does that mean that each cell in the row is to be checked?
i.e.
Smith 1
Jones 2
Smith 3
is that OK
or
should one of the Smith’s go, if so which Smith?
OR (he hopes. he hopes) is there a column where a duplicate in that column implies that the other columns are the same. i.e. two Smiths implies that the number is the same and it doesn’t matter which row gets deleted?
Mike,
In your example above, the Smith entries are unique since not all cells in their respective rows are identical. If you had Smith 1 in row 3, then that would be duplicated. So, yes, each row’s data needs to be unique.
I looked at this function:
advanced filter range “Database” action filter in place ¬
criteria range range “Criteria”
but don’t really know how that works.
To use Advanced Filter, each column has to have a unique header and there can be no blank rows.
This will copy myMessyRange to myCleanRange without duplicates. The optional last line deletes myMessyRange, leaving the un-duplicated data.
tell application "Microsoft Excel"
set myMessySheet to get worksheet "Sheet1" of workbook "Workbook2.xls"
set myMessyRange to get current region of range "A1" of myMessySheet
set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
-- optional deletes dirty range
delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end tell
As you clearly indicated, this will only work IF the header is unique. Therein lies the problem. Most of these files share the same top row header (and hence the need to get rid of them). I tried your script and sure enough, it will work if the header is not duplicated elsewhere.
I will try to poke around to see how I may modify your script so that I may de-dupe if the header is NOT unique. Actually, the script works when blank spaces were present.
Maybe a tall order but I have to get this to work. Thanks for you insight and direction so far!
This will insert a header row, fill it with unique headers and delete after Filter has done its work.
tell application "Microsoft Excel"
set myMessySheet to get worksheet "Sheet1" of workbook "Workbook1.xls"
set myMessyRange to get current region of range "A1" of myMessySheet
insert into range (entire row of (row 1 of myMessyRange)) shift shift down
set headerRow to get resize (row 1 of myMessyRange) row size 1 column size (count of columns of myMessyRange)
set formula of headerRow to "=\"header\"&column()"
set value of headerRow to get value of headerRow
set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
delete range (entire row of headerRow) shift shift up
-- optional deletes dirty range
delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end tell
Aw man, I spent the last 2 hours trying do this exact workaround of inserting a dummy row header, filter, then deleting the dummy. Now, I feel like the dummy! Many thanks for the quick response by the way. Much appreciated. I am sure this is good to go.
Marlon
Mike,
I’ve tried to implement this script as part of a subroutine in my application. I have a popup button where if the user selects xls files to merge, a checkbox button “duplicate” becomes visible and the default is unchecked (0 state). During the merge of the xls files, if the state of the “duplicate” is 1, I need to run the delete duplicate rows subroutine deleteRows.
tell application "Microsoft Excel"
open text file filename output_ data type delimited text qualifier text qualifier double quote with comma
set display alerts to false
save active workbook in output_ as default save format
if state of button "duplicate" of window "merger" is 1 then
deleteRows()
end if
quit
on deleteRows()
set myMessyRange to get current region of range "A1"
insert into range (entire row of (row 1 of myMessyRange)) shift shift down
set headerRow to get resize (row 1 of myMessyRange) row size 1 column size (count of columns of myMessyRange)
set formula of headerRow to "=\"header\"&column()"
set value of headerRow to get value of headerRow
set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
delete range (entire row of headerRow) shift shift up
-- optional deletes dirty range
delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end deleteRows
I am not certain the subroutine can be called in the middle of the script where it needs to look at the state of a button on a separate window. I get a syntax error where “region” is highlighted in the deleteRows sub. Your thoughts…
The script needs to be inside a Tell block
on deleteRows()
tell application "Microsoft Excel"
-- script
end tell
end deleteRows
You may also need to call the method with the my keyword
If ....
my deleteRows()
End If
Although the subroutine was being called from within the Tell block of Excel application, I still need to indicate the Tell block for deleteRows()? Yep, that worked…now, the code is being hung up on this section where the state of button is being checked:
...
tell application "Microsoft Excel"
open text file filename output_ data type delimited text qualifier text qualifier double quote with comma
set display alerts to false
save active workbook in output_ as default save format
if state of button "duplicate" of window "merger" is 1 then
my deleteRows() *******hung up here*************
end if
if state of button "duplicate" of window "merger" is 0 then
quit
end if
end tell
Am I setting the button correctly? This is driving me nuts! Thanks for all your help (as a sidebar, Stefan also pointed out the issue with the missing tell block)
I don’t know how you are setting the button.
Is “the button” a checkbox on a sheet (presumably the active sheet of that book) of the Excel workbook “merger”?
Edit: I did some investigation and got this script to probe the value of a checkbox (from the Forms menu) on a sheet.
tell application "Microsoft Excel"
if (value of checkbox "Check Box 2" of active sheet) = checkbox on then
display dialog "on"
else
display dialog "off"
end if
end tell
The checkbox is on the main NSWindow “merger” (the UI).
I’m not a computer guy. I’m an Excel guy. So I’m not sure what you mean by an NSWindow.
If the checkbox is not on an Excel sheet, getting its value should probably be done outside an Excel try block.
SOLVED!
My if-else syntax was incorrect. All’s well and thanks Mike.