Thursday, September 2, 2010

#1 2010-03-13 09:51:23 am

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Delete Duplicate Rows in Excel

I have a script that merges multiple Excel files into one master file.  After the merge, I'd like to delete all duplicate rows in that file so that only unique records remain.  I've looked at the Excel Applescript Reference but can't seem to figure this out.  I'd appreciate some  help. 

Thanks.

Offline

 

#2 2010-03-13 12:37:19 pm

mikerickson
Member
Registered: 2008-10-26
Posts: 357

Re: Delete Duplicate Rows in Excel

Have you looked at Advanced Filter's Unique only option?

When you say "duplicate rows" does that mean that each cell in the row is to be checked?
i.e.

Smith 1
Jones 2
Smith 3

is that OK
or
should one of the Smith's go, if so which Smith?

OR (he hopes. he hopes) is there a column where a duplicate in that column implies that the other columns are the same. i.e. two Smiths implies that the number is the same and it doesn't matter which row gets deleted?

Offline

 

#3 2010-03-13 01:49:01 pm

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Re: Delete Duplicate Rows in Excel

Mike,

In your example above, the Smith entries are unique since not all cells in their respective rows are identical.  If you had Smith 1 in row 3, then that would be duplicated.  So, yes, each row's data needs to be unique.

I looked at this function:

advanced filter range "Database" action filter in place ¬
criteria range range "Criteria"

but don't really know how that works.

Offline

 

#4 2010-03-13 02:14:25 pm

mikerickson
Member
Registered: 2008-10-26
Posts: 357

Re: Delete Duplicate Rows in Excel

To use Advanced Filter, each column has to have a unique header and there can be no blank rows.
This will copy myMessyRange to myCleanRange without duplicates. The optional last line deletes myMessyRange, leaving the un-duplicated data.

Applescript:

tell application "Microsoft Excel"
   set myMessySheet to get worksheet "Sheet1" of workbook "Workbook2.xls"
   set myMessyRange to get current region of range "A1" of myMessySheet
   
   set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
   
   advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
   
   -- optional deletes dirty range
   delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end tell

Offline

 

#5 2010-03-13 04:03:21 pm

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Re: Delete Duplicate Rows in Excel

As you clearly indicated, this will only work IF the header is  unique.  Therein lies the problem.  Most of these files share the same top row header (and hence the need to get rid of them).  I tried your script and sure enough, it will work if the header is not duplicated elsewhere.

I will try to poke around to see how I may modify your script so that I may de-dupe if the  header is NOT unique.  Actually, the script works when blank spaces were present. wink

Maybe a tall order but I have to get this to work.  Thanks for you insight and direction so far!

Last edited by marlon4417 (2010-03-13 04:08:47 pm)

Offline

 

#6 2010-03-13 05:51:56 pm

mikerickson
Member
Registered: 2008-10-26
Posts: 357

Re: Delete Duplicate Rows in Excel

This will insert a header row, fill it with unique headers and delete after Filter has done its work.

Applescript:

tell application "Microsoft Excel"
   set myMessySheet to get worksheet "Sheet1" of workbook "Workbook1.xls"
   set myMessyRange to get current region of range "A1" of myMessySheet
   
   insert into range (entire row of (row 1 of myMessyRange)) shift shift down
   
   set headerRow to get resize (row 1 of myMessyRange) row size 1 column size (count of columns of myMessyRange)
   
   set formula of headerRow to "=\"header\"&column()"
   set value of headerRow to get value of headerRow
   
   set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
   
   advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
   delete range (entire row of headerRow) shift shift up
   -- optional deletes dirty range
   delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end tell

Offline

 

#7 2010-03-13 06:05:38 pm

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Re: Delete Duplicate Rows in Excel

Aw man, I spent the last 2 hours trying do this exact workaround of inserting a dummy row header, filter, then deleting the dummy. Now, I feel like the dummy!  Many thanks for the quick response by the way.  Much appreciated.  I am sure this is good to go.

Marlon

Offline

 

#8 2010-03-15 10:36:31 am

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Re: Delete Duplicate Rows in Excel

Mike,

I've tried to implement this script as part of a subroutine in my application. I have a popup button where if the user selects xls files to merge, a checkbox button "duplicate" becomes visible and the default is unchecked (0 state).  During the merge of the xls files,  if the state of the "duplicate" is 1, I need to run the delete duplicate rows subroutine deleteRows. 

Applescript:

tell application "Microsoft Excel"
       
       open text file filename output_ data type delimited text qualifier text qualifier double quote with comma
       set display alerts to false
       save active workbook in output_ as default save format
       if state of button "duplicate" of window "merger" is 1 then
           deleteRows()
       end if
       
       quit

Applescript:


on deleteRows()

set myMessyRange to get current region of range "A1"
   insert into range (entire row of (row 1 of myMessyRange)) shift shift down
   set headerRow to get resize (row 1 of myMessyRange) row size 1 column size (count of columns of myMessyRange)
   
   set formula of headerRow to "=\"header\"&column()"
   set value of headerRow to get value of headerRow
   
   set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
   
   advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
   delete range (entire row of headerRow) shift shift up
   -- optional deletes dirty range
   delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end deleteRows

I am not certain the subroutine can be called in the middle of the script where it needs to look at the state of a button on a separate window.  I get a syntax error where "region" is highlighted in the deleteRows sub.  Your thoughts...

Offline

 

#9 2010-03-15 05:21:06 pm

mikerickson
Member
Registered: 2008-10-26
Posts: 357

Re: Delete Duplicate Rows in Excel

The script needs to be inside a Tell block

Applescript:

on deleteRows()
tell application "Microsoft Excel"
-- script
end tell
end deleteRows

You may also need to call the method with the my keyword

Code:

If ....
my deleteRows()
End If

Last edited by mikerickson (2010-03-15 05:22:31 pm)

Offline

 

#10 2010-03-15 06:14:51 pm

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Re: Delete Duplicate Rows in Excel

Although the subroutine was being called from within the Tell block of Excel application, I still need to indicate the Tell block for deleteRows()?  Yep, that worked...now, the code is being hung up on this section where the state of button is being checked:

Applescript:

...
tell application "Microsoft Excel"
       
       open text file filename output_ data type delimited text qualifier text qualifier double quote with comma
       set display alerts to false
       save active workbook in output_ as default save format
       if state of button "duplicate" of window "merger" is 1 then
           my deleteRows() *******hung up here*************
       end if
       if state of button "duplicate" of window "merger" is 0 then
           quit
       end if
       
   end tell

Am I setting the button correctly?  This is driving me nuts!  Thanks for all your help (as a sidebar, Stefan also pointed out the issue with the missing tell block)

Offline

 

#11 2010-03-15 08:03:14 pm

mikerickson
Member
Registered: 2008-10-26
Posts: 357

Re: Delete Duplicate Rows in Excel

I don't know how you are setting the button.

Is "the button" a checkbox on a sheet (presumably the active sheet of that book) of the Excel workbook "merger"?

Edit: I did some investigation and got this script to probe the value of a checkbox (from the Forms menu) on a sheet.

Applescript:

tell application "Microsoft Excel"
   if (value of checkbox "Check Box 2" of active sheet) = checkbox on then
       display dialog "on"
   else
       display dialog "off"
   end if
end tell

Last edited by mikerickson (2010-03-15 08:13:53 pm)

Offline

 

#12 2010-03-15 08:39:58 pm

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Re: Delete Duplicate Rows in Excel

The checkbox is on the main NSWindow "merger" (the UI).

Offline

 

#13 2010-03-15 08:43:08 pm

mikerickson
Member
Registered: 2008-10-26
Posts: 357

Re: Delete Duplicate Rows in Excel

I'm not a computer guy. I'm an Excel guy. So I'm not sure what you mean by an NSWindow.

If the checkbox is not on an Excel sheet, getting its value should probably be done outside an Excel try block.

Last edited by mikerickson (2010-03-15 08:43:47 pm)

Offline

 

#14 2010-03-17 12:51:23 pm

marlon4417
Member
From: NJ
Registered: 2007-11-27
Posts: 36

Re: Delete Duplicate Rows in Excel

SOLVED!

My if-else syntax was incorrect. All's well and thanks Mike.

Offline

 

Board footer

Powered by FluxBB

[ Generated in 0.319 seconds, 8 queries executed ]

RSS (new topics) RSS (active topics)