Ignore Delimiter Inside Quotes

For a project I’m working on, I’m reading lines from a CSV (Comma Separated Values) file, reordering the data, adding and removing some data, and eventually will output each line as a SQL insert statement. I have no control over the format of the incoming data as it is generated elsewhere.

The data looks something like this:

Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,"Last,First,Middle",LAST,FIRST,M,1,LA,,,910,B6,

As you can see, they include the student’s full name inside quotes but also separated by commas. Using comma as a text delimiter gives me:

{"Fall 2006", "BIOL", "195J" ,"6", "1", "INTRO TO BIOLOGY", "XXXXX-XXXXX", ""Last", "First", "Middle"", "LAST", "FIRST", "M", "1", "LA", "", "", "910","B6"}

This would be fine, except that SOME of the lines don’t include the Middle name so the number of items in the array is not constant. (Oy!)

What I need is some clever way to get Applescript to treat “Last,First,Middle” as one delimited item rather than three (or ignore it all together).

My first thought was to somehow use " as a delimiter to split the original data into three pieces, drop the middle, concatenate them back together, change the delimiter to ‘,’ and then proceed from there. My second thought was to count the number of items in the array since it should be 20 with a middle name and 19 without. But I’m loathe to trust that this will be true for all cases.

I’m hoping there’s a more elegant solution.

Here’s what I have for the moment (for reference).


repeat
    try
        set s_temp to read aFile until eol -- (eol = "\n")
        set {myTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {","}}
        set temp to text items of s_temp
        set AppleScript's text item delimiters to myTID
                        
        set SQL_data to {item 7 of temp, item 2 of temp, item 3 of temp, item 4 of temp, "I AGREE", item 10 of temp, item 9 of temp, item 11 of temp, item 13 of temp, item 5 of temp, item 1 of temp, item 2 of temp, item 6 of temp, prof1, prof2} -- (prof1 and prof2 are set earlier in the code)

        on error
            display dialog "Error"
            exit repeat
    end try
end repeat

Thanks in advance!! :slight_smile:

Hi Steven,

my suggestion is your first thought.
Read the file into a list of paragraphs.
Split each paragraph into three parts, temp1, tempName, temp2.

set s_temp to read file aFile using delimiter eol --> list of paragraphs
repeat with i in s_temp
	-- try
	set {myTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, quote}
	set t1 to text item 1 of i
	set tempName to words of text item 2 of i -- {last, first, middle}
	set t3 to text item 3 of i
	set AppleScript's text item delimiters to {","}
	set temp1 to text items 1 thru -2 of t1 -- list of items before the name
	set temp2 to text items 2 thru -1 of t3 -- list of items after the name
	set AppleScript's text item delimiters to myTID
	
	(*
	 	set SQL_data to {item 7 of temp, item 2 of temp, item 3 of temp, item 4 of temp, "I AGREE", item 10 of temp, item 9 of temp, item 11 of temp, item 13 of temp, item 5 of temp, item 1 of temp, item 2 of temp, item 6 of temp, prof1, prof2} -- (prof1 and prof2 are set earlier in the code)
	 	on error
	 	display dialog "Error"
	 	exit repeat
	 	end try
	 *)
end repeat

If First, Last, Middle is the only quoted form inside the string, then this extracts it:


set aLine to "Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,\"Last,First,Middle\",LAST,FIRST,M,1,LA,,,910,B6"

set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "\""
set N to text items of aLine
set AppleScript's text item delimiters to tid
N

→ {“Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,”, “Last,First,Middle”, “,LAST,FIRST,M,1,LA,910,B6”}

I don’t know what LAST, FIRST, M stand for, but if that is a repeat, you could just drop the middle term here

set new to {beginning, end} of N

→ {“Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,”, “,LAST,FIRST,M,1,LA,910,B6”}

“LAST, FIRST, M” is the Last name, First name, and Middle initial (Uppercased and white space removed) . Why they provide this and the full name (in quotes) is beyond me.

Unless someone has a better solution, I’ll probably go with this then…

Thanks!!

Steven, it looks like Adam’s solution should work for you. However, here is a reuseable handler (which I just made): explode_ignoring()

Hi,

That’s why they say it’s better to use tab as a delimiter. Here’s a script that switches comma delimited to tab delimited.


set t to "Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,\"Last,First,Middle\",LAST,FIRST,M,1,LA,,,91"
set user_tid to AppleScript's text item delimiters
-- group quoted items
set AppleScript's text item delimiters to {"\""}
set temp_l to text items of t
-- change non-quoted items to tab delimited strings
set c to count temp_l
repeat with i from 1 to c by 2
	set AppleScript's text item delimiters to {","}
	set l to text items of item i of temp_l
	set AppleScript's text item delimiters to {tab}
	set item i of temp_l to l as string
end repeat
-- requote the quoted items and get tab delimited string
set AppleScript's text item delimiters to {"\""}
set s to temp_l as string -- tab delimited string
-- get list of items from tab delimited string
set AppleScript's text item delimiters to {tab}
set item_list to text items of s
set AppleScript's text item delimiters to user_tid
return item_list

Edited: and if you don’t like the double quote, change this lines:

set AppleScript’s text item delimiters to {“"”}

{“"”} to {“”}, but I think Office uses double quotes if I remember right.

Edited: I just reread Bruce’s post and it looks the same almost. Sorry about that.

gl,

Thanks to everyone for your suggestions. Ultimately I went with Adam Bell’s example.

Final code below (now with extra exciting comments!)


on open theFiles
	
	set eol to "\n" as string
	
	repeat with aFile in theFiles
		
		set item_info to the info for aFile
		set ext to name extension of item_info
		
		ignoring case
			if ext is not "CSV" then
				exit repeat
			else
				open for access aFile
				set outName to (name of item_info & ".sql") 
				tell application "Finder" 
					make new file at desktop with properties {name:outName}
					set outFile to file outName of desktop --I probably don't have to put this in the tell block, but "Meh..."
				end tell
				open for access (outFile as alias) with write permission
				set eof of the (outFile as alias) to 0
				
				set prof1 to text returned of (display dialog "Primary Professor" default answer " " buttons {"OK"} default button "OK")
				set prof2 to text returned of (display dialog "Secondary Professor" default answer " " buttons {"OK"} default button "OK")
				
				set first_pass to true
				
				repeat
					try
						set s_temp to read aFile until eol
						if first_pass is false then --The first line of the CSV file contains the category headers. We need to skip this line.
							
							set {myTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {"\""}}
							set temp to text items of s_temp
							set temp2 to ((item 1 of temp as string) & item 3 of temp as string)
							set AppleScript's text item delimiters to {","}
							set temp to text items of temp2
																				
							set SQL_temp to {item 7 of temp, item 12 of temp, item 3 of temp, item 4 of temp, "I AGREE", item 10 of temp, item 9 of temp, item 11 of temp, item 13 of temp, item 5 of temp, item 1 of temp, item 2 of temp, item 6 of temp, prof1, prof2}
							set SQL_statement to "insert into student_records values("
							set r_count to count of items in SQL_temp
							
							repeat with i from 1 to (r_count - 1)
								set SQL_statement to SQL_statement & quoted form of item i of SQL_temp & ","
							end repeat

							set SQL_statement to SQL_statement & quoted form of item r_count of SQL_temp & ")" & return
							write SQL_statement to (outFile as alias) starting at eof
							
							set AppleScript's text item delimiters to myTID
						else
							set first_pass to false
						end if
						
					on error ErrorText number ErrorNumber
						if ErrorNumber is not equal to -39 then --Error -39 is End Of File error
							display dialog "There was an error. " & return & return & ErrorNumber & " - " & ErrorText with icon 2
						else
							display dialog "Done!" buttons {"OK"} default button "OK" giving up after 2
						end if
						exit repeat
						
					end try
					
				end repeat
				close access aFile
				close access (outFile as alias)
			end if
		end ignoring
		
	end repeat
end open