Parsing a .csv file and AppleScript Text Item Delimiters

Hi All,

I’m attempting to parse a downloaded .csv file. When the file is opened in either Numbers or Excel it displays correctly within the rows and columns.

However, as a text file it displays quite differently. The columns are still separated with commas but if there are commas or line returns in a cell then my script see’s them as commas even though they are encased within quotations.

For example…These 3 rows of data with 10 columns…

Date,Workout,Result,Prescribed,Pukie,Work performed,Work time,Formatted Result,Notes,Description
2015-10-09,Karen,558000,true,false,59203,558000,9 mins 18 secs,“”,“150 Wall Balls, 20 lbs”
2015-10-20,Clean & Jerk 1-1-1-1-1-1-1-1-1-1,670.0,true,false,7757,“670 kg | 50 kg, 50 kg, 60 kg, 60 kg, 70 kg, 70 kg, 75 kg, 75 kg, 80 kg, and 80 kg”,“”,“Sets
1 Clean & Jerk | 50 kg
1 Clean & Jerk | 50 kg
1 Clean & Jerk | 60 kg
1 Clean & Jerk | 60 kg
1 Clean & Jerk | 70 kg
1 Clean & Jerk | 70 kg
1 Clean & Jerk | 75 kg
1 Clean & Jerk | 75 kg
1 Clean & Jerk | 80 kg
1 Clean & Jerk | 80 kg”

The first line parses correctly, the second line see’s the comma in the quoted “Description” column of “150 Wall Balls, 20 lbs” as a text item delimiter.

Also, the line returns in the third row description confuse my script.

Is there an easy way to make my script treat any text within quotes as a complete text item and to disregard the included commas and line returns?


set theSettingsScriptPath to path to desktop folder from user domain as string
set theSettingsScriptPath to theSettingsScriptPath & "Library:ScriptingAdditions:dataTest.csv"

set csvData to read file theSettingsScriptPath

set csvEntries to paragraphs of csvData

repeat with i from 1 to count csvEntries
	set {theDate, theWorkout, theResult, theRxd, thePukie, theWorkPerformed, theWorkTime, theFormattedResult, theNotes, theDescription} to parseCsvEntry(csvEntries's item i)
end repeat


to parseCsvEntry(csvEntry)
	set AppleScript's text item delimiters to ","
	set {theDate, theWorkout, theResult, theRxd, thePukie, theWorkPerformed, theWorkTime, theFormattedResult, theNotes, theDescription} to csvEntry's text items
	set AppleScript's text item delimiters to {""}
	return {theDate, theWorkout, theResult, theRxd, thePukie, theWorkPerformed, theWorkTime, theFormattedResult, theNotes, theDescription}
end parseCsvEntry

Hi,

Actually you can’t simply parse a CSV file with text item delimiters which contains double quotes to escape special characters. You have to parse it character by character remembering the double quote status as flag

Which you can do in Yosemite or later (or in Mavericks by putting this code in a script library) like this:

use scripting additions
use framework "Foundation"

set theString to "cust1,\"prod,1\",season 1,
cust1,prod1,season2,
cust2,prod1,event1,season1
cust2,prod3,event1,season 1"
its makeListsFromCSV:theString commaIs:","

on makeListsFromCSV:theString commaIs:theComma
	set theRows to {}
	set newLineCharSet to current application's NSCharacterSet's newlineCharacterSet()
	set importantCharSet to current application's NSMutableCharacterSet's characterSetWithCharactersInString:("\"" & theComma)
	importantCharSet's formUnionWithCharacterSet:newLineCharSet
	set theNSScanner to current application's NSScanner's scannerWithString:theString
	theNSScanner's setCharactersToBeSkipped:(missing value)
	repeat while (theNSScanner's isAtEnd() as integer = 0)
		set insideQuotes to false
		set finishedRow to false
		set theColumns to {}
		set currentColumn to ""
		repeat while not finishedRow
			set {theResult, tempString} to theNSScanner's scanUpToCharactersFromSet:importantCharSet intoString:(reference)
			if theResult as integer = 1 then set currentColumn to currentColumn & (tempString as text)
			if theNSScanner's isAtEnd() as integer = 1 then
				if currentColumn is not "" then set end of theColumns to currentColumn
				set finishedRow to true
			else
				set {theResult, tempString} to theNSScanner's scanCharactersFromSet:newLineCharSet intoString:(reference)
				if theResult as integer = 1 then
					if insideQuotes then
						set currentColumn to currentColumn & (tempString as text)
					else
						if currentColumn is not "" then set end of theColumns to currentColumn
						set finishedRow to true
					end if
				else
					set theResult to theNSScanner's scanString:"\"" intoString:(missing value)
					if theResult as integer = 1 then
						if insideQuotes then
							set theResult to theNSScanner's scanString:"\"" intoString:(missing value)
							if theResult as integer = 1 then
								set currentColumn to currentColumn & "\""
							else
								set insideQuotes to not insideQuotes
							end if
						else
							set insideQuotes to not insideQuotes
						end if
					else
						set theResult to theNSScanner's scanString:theComma intoString:(missing value)
						if theResult as integer = 1 then
							if insideQuotes then
								set currentColumn to currentColumn & theComma
							else
								set end of theColumns to currentColumn
								set currentColumn to ""
								theNSScanner's scanCharactersFromSet:(current application's NSCharacterSet's whitespaceCharacterSet()) intoString:(missing value)
							end if
						end if
					end if
				end if
			end if
		end repeat
		if (count of theColumns) > 0 then set end of theRows to theColumns
	end repeat
	return theRows
end makeListsFromCSV:commaIs:

Hello Shane

I tried to apply your code to the datas posted by the asker.
I started with the datas in a text file stored on the desktop.

Alas the used code failed :

use scripting additions
use framework "Foundation"

using terms from scripting additions
	set theString to read file ((path to desktop as text) & "fakeCSV.txt")
end using terms from

I got this events log :

tell current application
	path to desktop as text
	read current application
Résultat :
error "Impossible de convertir current application en type file." number -1700 from current application to file

So I decided to define the original datas from the clipboard after copying the datas from the text file.

use scripting additions
use framework "Foundation"

set theString to the clipboard as text
its makeListsFromCSV:theString commaIs:","
set the clipboard to result as text

This time the handler was completely executed and the result was :

DateWorkoutResultPrescribedPukieWork performedWork timeFormatted ResultNotesDescription2015-10-09Karen558000truefalse592035580009 mins 18 secs150 Wall Balls, 20 lbs2015-10-20Clean & Jerk 1-1-1-1-1-1-1-1-1-1670.0truefalse7757670 kg | 50 kg, 50 kg, 60 kg, 60 kg, 70 kg, 70 kg, 75 kg, 75 kg, 80 kg, and 80 kgSets
1 Clean & Jerk | 50 kg
1 Clean & Jerk | 50 kg
1 Clean & Jerk | 60 kg
1 Clean & Jerk | 60 kg
1 Clean & Jerk | 70 kg
1 Clean & Jerk | 70 kg
1 Clean & Jerk | 75 kg
1 Clean & Jerk | 75 kg
1 Clean & Jerk | 80 kg
1 Clean & Jerk | 80 kg

Which, according to my understanding of the original question is not what is wanted.
My understanding is that the values of each row must be separated by TAB chars.

Date⇥Workout⇥Result⇥Prescribed⇥Pukie⇥Work performed⇥Work time⇥Formatted Result⇥Notes⇥Description

2015-10-09⇥Karen⇥558000⇥true⇥false⇥59203⇥558000⇥9 mins 18 secs⇥150 Wall Balls, 20 lbs

2015-10-20⇥Clean & Jerk 1-1-1-1-1-1-1-1-1-1⇥670.0⇥true⇥false⇥7757⇥670 kg | 50 kg, 50 kg, 60 kg, 60 kg, 70 kg, 70 kg, 75 kg, 75 kg, 80 kg, and 80 kg⇥⇥Sets
1 Clean & Jerk | 50 kg¶1 Clean & Jerk | 50 kg¶1 Clean & Jerk | 60 kg¶1 Clean & Jerk | 60 kg¶1 Clean & Jerk | 70 kg¶1 Clean & Jerk | 70 kg¶1 Clean & Jerk | 75 kg¶1 Clean & Jerk | 75 kg¶1 Clean & Jerk | 80 kg¶1 Clean & Jerk | 80 kg

Here I used the symbol ⇥ to display the TAB characters and the symbol ¶ to, display the character which are supposed to force a line break inside a cell.
I inserted also an extraneous linefeed between rows.

Yvan KOENIG running El Capitan 10.11.1 in French (VALLAURIS, France) samedi 14 novembre 2015 14:54:03

Not per se. You can count double quotes because in an CSV it is not allowed to have odd number or double quotes per fields which means you can’t have odd double quotes per line. So you can use text item delimiters to create a CSV file. Here an example on how to work with CSV files. Only problems is that the double quote itself is removed (“""""”).

set CSVData to "Date,Workout,Result,Prescribed,Pukie,Work performed,Work time,Formatted Result,Notes,Description
2015-10-09,Karen,558000,true,false,59203,558000,9 mins 18 secs,\"\",\"150 Wall Balls, 20 lbs\"
2015-10-20,Clean & Jerk 1-1-1-1-1-1-1-1-1-1,670.0,true,false,7757,,\"670 kg | 50 kg, 50 kg, 60 kg, 60 kg, 70 kg, 70 kg, 75 kg, 75 kg, 80 kg, and 80 kg\",\"\",\"Sets
1 Clean & Jerk | 50 kg
1 Clean & Jerk | 50 kg
1 Clean & Jerk | 60 kg
1 Clean & Jerk | 60 kg
1 Clean & Jerk | 70 kg
1 Clean & Jerk | 70 kg
1 Clean & Jerk | 75 kg
1 Clean & Jerk | 75 kg
1 Clean & Jerk | 80 kg
1 Clean & Jerk | 80 kg\""

set theLines to {}
set buff to ""
set parts to split(CSVData, linefeed)

repeat with i from 1 to count parts
	if buff is not "" then set buff to buff & linefeed
	set buff to buff & item i of parts
	if (count of split(buff, "\"")) mod 2 = 1 then
		set end of theLines to buff
		set buff to ""
	end if
end repeat

repeat with i from 1 to count theLines
	set theLine to item i of theLines
	set fields to {}
	set buff to ""
	set parts to split(theLine, ",")
	repeat with j from 1 to count parts
		if buff is not "" then set buff to buff & ","
		set buff to buff & item j of parts
		if (count of split(buff, "\"")) mod 2 = 1 then
			set buff to stringReplace("\"", "", buff)
			set end of fields to buff
			set buff to ""
		end if
	end repeat
	set item i of theLines to fields
end repeat

return theLines

on stringReplace(find, needle, haystack)
	return join(split(haystack, find), needle)
end stringReplace

on split(str, sep)
	tell AppleScript
		set oldTID to text item delimiters
		set text item delimiters to sep
		set lst to text items of str
		set text item delimiters to oldTID
	end tell
	return lst
end split

on join(lst, sep)
	tell AppleScript
		set oldTID to text item delimiters
		set text item delimiters to sep
		set str to lst as string
		set text item delimiters to oldTID
	end tell
	return str
end join

:wink:

My mistake

Here is a sample script to replace commas with tabs in one line.

you can modify to work on whole file


set ctext to "2015-10-09,Karen,558000,true,false,59203,558000,9 mins 18 secs,\"\",\"150 Wall Balls, 20 lbs\""
set text item delimiters to {"\""}
set b to text items of ctext
set t to count of b
repeat with n from 1 to t
	if (n mod 2) = 1 then -- odd
		set text item delimiters to ","
		set bt to text items of item n of b
		set text item delimiters to tab
		set item n of b to bt as text
	end if
end repeat
set text item delimiters to ""
set ctext to b as text

Since The number of Quote characters in each line is even,
the odd items with commas should be converted to tabs.
The commas in the even items should be left alone.

You need either:

using terms from scripting additions
	tell current application to set theString to read file ((path to desktop as text) & "fakeCSV.txt")
end using terms from

or:

using terms from scripting additions
	set theString to read (((path to desktop as text) & "fakeCSV.txt") as «class furl»)
end using terms from

The script returns a list of sublists, with each sublist representing a record. The existing parseCsvEntry() handler returns a list for a record, so I’m not sure where tabs come in.

One other point: doing it like this isn’t particularly efficient – it’s much, much slower than, say, DJ’s code, although it does leave the quotes intact. It was just some code I happened to have, and I thought it might illustrate Stefan’s suggestion.

However, much the same code written in Objective-C is included in BridgePlus. So the script could be written in one line thus:

set listOfLists to (script "BridgePlus"'s arrayFromCSV:CSVData commaIs:",") as list

And doing it that way is actually about twice as fast as DJ’s native method.

Thanks Shane.

It appears that my memory failed.

I will store the two syntax proposals in my library text file so I will be able to retrieve them easily.

The comment about TABs are the result of my understanding of the original question. Maybe I’m right but maybe I’m wrong.
Would be fine to read the asker’s comments.

Yvan KOENIG running El Capitan 10.11.1 in French (VALLAURIS, France) dimanche 15 novembre 2015 10:56:20

Hi Everyone,

Sorry I’ve been off the grid for some time… refreshing. Thank you all so much for the suggestions.

DJ Bazzie Wazzie’s code is exactly what I was after.

I love this forum, the contributions from members is so helpful and ultimately instructional.

Merry Xmas all,

Kev