Determine CSV with Automator Clear and reorder columns

madmax0410 · December 23, 2018, 7:45pm

Hello!

I would like to automatically edit a CSV file with the Automator. The CSV file has 8 columns (A-H)

And I would like to delete columns B, D, E, F, G, H, and then column C is first and then column A.

for example

Before.
A, B, C, D, E, F, G, H,

later
C, A,

What’s the best way to go?

CK11 · December 23, 2018, 10:09pm

I think this will require some form of scripting to do this, and since we’re on an AppleScript forum, using a Run AppleScript action would be appropriate for the job.

There are a few ways to do it. I’m going to go with the most obvious and use text item delimiters to split each row of values at the delimiter (which you haven’t explcitly stated, but I will go ahead assume is a comma). The benefit of this method is that it’s straightforward, easy to implement and read, and will be efficient. However, the obvious drawback is that if any of the values themselves contain commas, then the data won’t get parsed correctly.

Let’s assume there’s a preceding Automator action that supplies the Run AppleScript action with the CSV file as input. Then:

on run input
		set [[f]] to input
		
		read f as «class utf8» using delimiter linefeed
		set [CSVHeader, CSVRows] to [item 1, rest] of the result
		
		set my text item delimiters to ","
		
		repeat with row in CSVRows
				set the row's contents to the contents of row's ¬
					{text item 3, text item 1} as text
		end repeat
		
		set CSVHeader to the contents of CSVHeader's ¬
				{text item 3, text item 1} as text
		
		set my text item delimiters to linefeed
		set CSVData to {CSVHeader} & CSVRows as text
		
		set eof of f to 0
		write CSVData to f as «class utf8»
end run

Nigel_Garvey · December 26, 2018, 1:10pm

Here’s one which copes with possible instances of a field or record separator being part of a field value (ie. when the field’s in quotes). It’s in ASObjC in order to make use of the system’s built-in regex functions. It assumes that quoted fields are properly formed with no spaces outside the quotes. Also that the field separator is always the same, as in the original query, and that the script’s to be used in Automator and won’t be passed anything other than CSV files. It saves the edited CSV text to a separate file with a name derived from the original, in the same location. It can handle multiple CSV files if that’s what it’s given and returns a list of the new file(s).

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

-- Set this property as required. Having the script work out the separator itself from the file contents is far from foolproof!
property fieldSeparator : ","

on run {input, parameters}
	set |⌘| to current application
	set regexSearch to |⌘|'s NSRegularExpressionSearch
	set output to |⌘|'s class "NSMutableArray"'s new()
	
	-- The input is assumed to contain only CSV file(s).
	repeat with CSVFile in input
		-- Read this file and note its probable text encoding.
		set CSVPath to (|⌘|'s class "NSString"'s stringWithString:(POSIX path of CSVFile))
		set {CSVString, usedEncoding} to (|⌘|'s class "NSString"'s stringWithContentsOfFile:(CSVPath) usedEncoding:(reference) |error|:(missing value))
		
		-- Ignoring any empty lines at the end, use a regex to match every field separator or line-beginning in the text (capture group 1) and the field which follows it (capture group 2).
		set fieldRegex to (|⌘|'s class "NSRegularExpression"'s regularExpressionWithPattern:("(" & fieldSeparator & "|\\R|\\A)(\"(?:[^\"]|\"\")*+\"|[^[:cntrl:]" & fieldSeparator & "]*+)") options:(0) |error|:(missing value))
		set CSVLength to CSVString's |length|()
		set junkRange to (CSVString's rangeOfString:("\\R++\\Z") options:(regexSearch) range:({0, CSVLength}))
		set fieldMatches to (fieldRegex's matchesInString:(CSVString) options:(0) range:({0, CSVLength - (junkRange's |length|)}))
		
		-- Work through the first few matches to find the first beginning with a line ending. This will tell us the record separator and the number of fields per record.
		set matchCount to (count fieldMatches)
		set recordSeparator to return & linefeed -- Initialise in case there's only one record. The value's academic.
		set fieldsPerRecord to matchCount -- Ditto.
		repeat with i from 4 to (matchCount) -- (In this script, the first three fields can be assumed to be in the first record.)
			set preFieldRange to ((item i of fieldMatches)'s rangeAtIndex:(1))
			if ((CSVString's rangeOfString:("\\R") options:(regexSearch) range:(preFieldRange))'s |length|() > 0) then
				set recordSeparator to (CSVString's substringWithRange:(preFieldRange))
				set fieldsPerRecord to i - 1
				exit repeat
			end if
		end repeat
		
		-- Work through the matches a record at a time, extracting and recombining the first and third field from each record.
		set newRecords to |⌘|'s class "NSMutableArray"'s new()
		set newRecordTemplate to (|⌘|'s class "NSString"'s stringWithString:("%@" & fieldSeparator & "%@"))
		repeat with i from 1 to (matchCount) by fieldsPerRecord
			-- Get the text of the match to the first field in this record.
			set fieldARange to ((item i of fieldMatches)'s rangeAtIndex:(2))
			set fieldAValue to (CSVString's substringWithRange:(fieldARange))
			-- Get the text of the match to the third field in this record.
			set fieldCRange to ((item (i + 2) of fieldMatches)'s rangeAtIndex:(2))
			set fieldCValue to (CSVString's substringWithRange:(fieldCRange))
			-- Recombine them into a new record and add it to the array of new records.
			set newRecord to |⌘|'s class "NSString"'s stringWithFormat_(newRecordTemplate, fieldCValue, fieldAValue)
			tell newRecords to addObject:(newRecord)
		end repeat
		-- Combine the new records into a single text, using the original record separator as a delimiter.
		set newCSVString to (newRecords's componentsJoinedByString:(recordSeparator))
		
		-- Save the doctored text to a new file with a name and location derived from the original.
		set newCSVPath to (CSVPath's stringByReplacingOccurrencesOfString:("(?i)(\\.csv)\\Z") withString:(" (edited)$1") options:(regexSearch) range:({0, CSVPath's |length|()}))
		set newCSVURL to (|⌘|'s class "NSURL"'s fileURLWithPath:(newCSVPath))
		tell newCSVString to writeToURL:(newCSVURL) atomically:(true) encoding:(usedEncoding) |error|:(missing value)
		tell output to addObject:(newCSVURL)
	end repeat
	
	return output as list
end run

Edit: The script now finds out the number of fields per record for itself, ensures that the line ending used is indeed the record separator and not just something from a field in the first record, and copes with empty lines at the end of the text.