Modify (open/save) CSV in Batch

I have 100’s of CSV files to import into a DB. The shell script to import the file requires 28 fields formatted as such:

HEADER,MTMP,ClientAccountID,AccountAlias,AssetClass,Symbol,UnderlyingSymbol,Strike,Expiry,Put/Call,ReportDate,PreviousCloseQuantity,PrevClosePrice,CloseQuantity,ClosePrice,TransactionMtmPnl,PriorOpenMtmPnl,Commissions,Other,Total,

The Original CSV file extracted from a 3rd party is delivered as follows:

“HEADER”,“MTMP”,“ClientAccountID”,“AccountAlias”,“AssetClass”,“Symbol”,“UnderlyingSymbol”,“Strike”,“Expiry”,“Put/Call”,“ReportDate”,“PreviousCloseQuantity”,“PrevClosePrice”,“CloseQuantity”,“ClosePrice”,“TransactionMtmPnl”,“PriorOpenMtmPnl”,“Commissions”,“Other”,“Total”

Manually, I can open the CSV file in Excel and then save as CSV and excel will automatically save with correct format.

I’ve been working various approaches and here is where I am at (which returns various errors)


---choose folder with source files, the another folder for the results

set Quellordner to choose folder with prompt "In which Folder are the source-files you want to process?"
set Zielordner to choose folder with prompt "In which Folder do you want me to store the results? (Do NOT choose the same folder!)"




set Quelldateien to list folder Quellordner without invisibles
set Zieldateien to list folder Zielordner without invisibles


tell application "Finder"
	delete (every item of folder Zielordner)
end tell




repeat with Datei in Quelldateien
	tell application "Finder"
		set Quelldatei to (Quellordner as string)
		set Zieldatei to (Zielordner as string) & "(M).csv"
		open document file Quelldatei
		
		tell application "Microsoft Excel"
			activate
			open Quelldatei
			set Zieldatei to get full name of active workbook
			save workbook as workbook Zieldatei filename Quelldatei file format CSV file format with overwrite
			close workbooks
			--quit
		end tell
		
	end tell
end repeat

If there is a way to cut out excel that’s even better.

Thanks for the help

DL

I running OS X 10.10 Yosemite, although it says 10.8 below

Model: Macbook Pro
Browser: Safari 537.76.4
Operating System: Mac OS X (10.8)

Hi,

try this, the script adds double quotes to each field.
It assumes
¢ the text of the source files is MacRoman or ASCII encoded
¢ the field separator is a comma
¢ none of the fields contains a double quote character


---choose folder with source files, the another folder for the results

set quellOrdner to (choose folder with prompt "In which Folder are the source-files you want to process?") as text
set zielOrdner to (choose folder with prompt "In which Folder do you want me to store the results? (Do NOT choose the same folder!)") as text

tell application "Finder"
	set quellDateien to name of every file whose name extension is "csv"
	--	delete (every item of zielordner)
end tell

repeat with datei in quellDateien
	
	set theSource to paragraphs of (read file (quellOrdner & datei))
	set TID to text item delimiters
	repeat with aParagraph in theSource
		set text item delimiters to ","
		set textItems to text items of aParagraph
		set text item delimiters to "\",\""
		set contents of aParagraph to (quote & (textItems as text) & quote)
	end repeat
	set text item delimiters to return
	set theSource to theSource as text
	set text item delimiters to TID
	
	set destinationFile to (zielOrdner & datei)
	try
		set fileRef to open for access file destinationFile with write permission
		write theSource to fileRef
		close access fileRef
	on error
		try
			close access file destinationFile
		end try
	end try
end repeat


Hi.

I may have misread the question, but I think it means the exisiting quotes need to be either stripped or escaped for the benefit of the shell script and that eight commas need to be added to the end of each line.

The commas are easy, but it would help to know if the quotes in the original are functional or optional (probably optional if Excel exports without them) and why the database script doesn’t like them.

Something like this?

set CSVContent to "\"HEADER\",\"MTMP\",\"ClientAccountID\",\"AccountAlias\",\"AssetClass\",\"Symbol\",\"UnderlyingSymbol\",\"Strike\",\"Expiry\",\"Put/Call\",\"ReportDate\",\"PreviousCloseQuantity\",\"PrevClosePrice\",\"CloseQuantity\",\"ClosePrice\",\"TransactionMtmPnl\",\"PriorOpenMtmPnl\",\"Commissions\",\"Other\",\"Total\"
\"value 1\",\"value 2\",\"\",\"value x\",\"value y\",\"value z\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"value a\",\"value b\",\"value c\",\"\",\"\",\"\""

do shell script "(IFS=','
while read line
do
	#evil but it works to remove quotes
	eval line=($line) 

	while [ ${#line[@]} -lt 28 ]
	do
		line+=('')
	done

	echo \"${line[*]}\"
done)<<<" & quoted form of CSVContent

Hey thanks for the responses.

There are 2 components.

First, to strip the “”'s
Second, to pad each line with comma if it has less than 28 fields. Rows vary with # of columns in each row - In other words, all row must have 28 fields, but not all rows require 8 comma (some 0, some 4, some 8, etc)

This is why I used Excell, because it took the max column and padded all rows

No, I did, my fault

Isn’t that what my code does? Did I misunderstand the question? I’m using Bash’s eval to unquote the fields and using the input field separator (IFS; pretty much like text item delimiters) to read the line into a array. Then I use an loop to add empty field until we reach 28 values, to pad the line. Then at the end I use parameter expansion to print each element separated by a comma, including empty values.

Because you were talking about an shell scripting importing the data into a database I thought that maybe bash would be a better choice because you can implement parts of the example code directly into your shell script without needing an extra script avoiding an process of opening and closing multiple times the same csv file.

Bazzie

I was responding more directly to the other post.

Can you post that in applescript? There is only one call to the shell and it was written by 3rd party, I am trying to glue 2 3rd party pre defined engines

Here’s another solution, with which I’ve been tinkering over the weekend. It differs from DJ’s in that the input text can have either LF, CR, or CR/LF line endings and the output has CR/LF line endings.

set lineEnding to return & linefeed -- linefeed -- return

set CSVContent to "\"HEADER\",\"MTMP\",\"ClientAccountID\",\"AccountAlias\",\"AssetClass\",\"Symbol\",\"UnderlyingSymbol\",\"Strike\",\"Expiry\",\"Put/Call\",\"ReportDate\",\"PreviousCloseQuantity\",\"PrevClosePrice\",\"CloseQuantity\",\"ClosePrice\",\"TransactionMtmPnl\",\"PriorOpenMtmPnl\",\"Commissions\",\"Other\",\"Total\"" & lineEnding & "\"value 1\",\"value 2\",\"\",\"value x\",\"value y\",\"value z\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"value a\",\"value b\",\"value c\",\"\",\"\",\"\"" & lineEnding & "\"value 1\",\"value 2\",\"value x\",\"value y\",\"value z\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"value a\",\"value b\",\"value c\",\"\",\"\",\"\"" & lineEnding & "\"value 1\",\"value 2\",\"\",\"value x\",\"value y\",\"value z\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"value a\",\"value b\",\"value c\",\"\",\"\",\"\"" & lineEnding & "\"value 1\",\"value 2\",\"value x\",\"value y\",\"value z\",\"\",\"\",\"\",\"\",\"\",\"value a\",\"value b\",\"value c\",\"\",\"\",\"\""

-- Strip quotes, pad to 28 fields per line, and output with MS-DOS line endings.
-- Assumptions: fields are comma-separated and quotes are non-functional.
do shell script ("sed -En '1 h			; # Collect the entire CSV text .
1 !H								; # . in the hold space.
$ {
	g							; # Transfer it to the pattern space.
	s/(\\n|'$'\\r'')+/'$'\\r''\\'$'\\n''/g		; # Convert all line endings to CR/LF.
	s/\\\"//g						; # Delete all quotes.
	s/[^'$'\\r'']*/&,,,,,,,,,,,,,,,,,,,,,,,,,,,,/g	; # Insert at least 28 commas at the end of every line.
	s/((,[^,]*){27}),[^'$'\\r'']*/\\1/g		; # Keep 1 field and 27 comma/field pairs from each line.
	p							; # Print.
}' <<<" & quoted form of CSVContent) without altering line endings

Edit: Or to modify Stefan’s script for the same output:


---choose folder with source files, the another folder for the results

set quellOrdner to (choose folder with prompt "In which Folder are the source-files you want to process?") as text
set zielOrdner to (choose folder with prompt "In which Folder do you want me to store the results? (Do NOT choose the same folder!)") as text

tell application "Finder"
	set quellDateien to name of every file of folder quellOrdner whose name extension is "csv"
end tell

repeat with datei in quellDateien
	
	set TID to text item delimiters
	set text item delimiters to quote
	set textItems to text items of (read file (quellOrdner & datei))
	set text item delimiters to ""
	set theSource to paragraphs of (textItems as text)
	set text item delimiters to ","
	repeat with aParagraph in theSource
		set contents of aParagraph to text 1 thru text item 28 of (aParagraph & ",,,,,,,,,,,,,,,,,,,,,,,,,,,")
	end repeat
	set text item delimiters to return & linefeed
	set theSource to theSource as text
	set text item delimiters to TID
	
	set destinationFile to (zielOrdner & datei)
	try
		set fileRef to open for access file destinationFile with write permission
		set eof fileRef to 0
		write theSource to fileRef
		close access fileRef
	on error
		try
			close access file destinationFile
		end try
	end try
end repeat

this is a vanilla AppleScript solution based on post #1, it assumes also non-functional quotes and that any field is wrapped in quotes.
the line ending string can be predefined like in Nigel’s solution


set lineEnding to return & linefeed -- linefeed -- return
set separator to ","

set quellOrdner to (choose folder with prompt "In which Folder are the source-files you want to process?") as text
set zielOrdner to (choose folder with prompt "In which Folder do you want me to store the results? (Do NOT choose the same folder!)") as text

tell application "Finder"
	set quellDateien to name of every file whose name extension is "csv"
	-- delete (every item of zielOrdner)
end tell

repeat with datei in quellDateien
	
	set theSource to paragraphs of (read file (quellOrdner & datei))
	set TID to text item delimiters
	set text item delimiters to separator
	repeat with aParagraph in theSource
		set numberOfFields to count (get text items of aParagraph)
		set trimmedParagraph to text 2 thru -2 of aParagraph -- cut leading and trailing quote
		set text item delimiters to quote & separator & quote
		set fields to text items of trimmedParagraph
		repeat while numberOfFields < 28
			set end of fields to ""
			set numberOfFields to numberOfFields + 1
		end repeat
		set text item delimiters to separator
		set contents of aParagraph to fields as text
	end repeat
	set text item delimiters to lineEnding
	set theSource to theSource as text
	set text item delimiters to TID
	
	set destinationFile to (zielOrdner & datei)
	try
		set fileRef to open for access file destinationFile with write permission
		write theSource to fileRef
		close access fileRef
	on error
		try
			close access file destinationFile
		end try
	end try
end repeat