Friday, September 3, 2010

#1 2006-11-30 08:52:59 am

steven_hunter
Member
Registered: 2005-12-07
Posts: 45

Ignore Delimiter Inside Quotes

For a project I'm working on, I'm reading lines from a CSV (Comma Separated Values) file, reordering the data, adding and removing some data, and eventually will output each line as a SQL insert statement. I have no control over the format of the incoming data as it is generated elsewhere.

The data looks something like this:

Code:

Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,"Last,First,Middle",LAST,FIRST,M,1,LA,,,910,B6,

As you can see, they include the student's full name inside quotes but also separated by commas. Using comma as a text delimiter gives me:

Code:

{"Fall 2006", "BIOL", "195J" ,"6", "1", "INTRO TO BIOLOGY", "XXXXX-XXXXX", ""Last", "First", "Middle"", "LAST", "FIRST", "M", "1", "LA", "", "", "910","B6"}

This would be fine, except that SOME of the lines don't include the Middle name so the number of items in the array is not constant. (Oy!)

What I need is some clever way to get Applescript to treat "Last,First,Middle" as one delimited item rather than three (or ignore it all together).

My first thought was to somehow use " as a delimiter to split the original data into three pieces, drop the middle, concatenate them back together, change the delimiter to ',' and then proceed from there. My second thought was to count the number of items in the array since it *should* be 20 with a middle name and 19 without. But I'm loathe to trust that this will be true for all cases.

I'm hoping there's a more elegant solution.

Here's what I have for the moment (for reference).

Applescript:


repeat
   try
       set s_temp to read aFile until eol -- (eol = "\n")
       set {myTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {","}}
       set temp to text items of s_temp
       set AppleScript's text item delimiters to myTID
                       
       set SQL_data to {item 7 of temp, item 2 of temp, item 3 of temp, item 4 of temp, "I AGREE", item 10 of temp, item 9 of temp, item 11 of temp, item 13 of temp, item 5 of temp, item 1 of temp, item 2 of temp, item 6 of temp, prof1, prof2} -- (prof1 and prof2 are set earlier in the code)

       on error
           display dialog "Error"
           exit repeat
   end try
end repeat

Thanks in advance!! smile

Offline

 

#2 2006-11-30 09:57:03 am

StefanK
Member
From: Sankt Gallen, Switzerland
Registered: 2006-10-21
Posts: 7437
Website

Re: Ignore Delimiter Inside Quotes

Hi Steven,

my suggestion is your first thought.
Read the file into a list of paragraphs.
Split each paragraph into three parts, temp1, tempName, temp2.

Applescript:

set s_temp to read file aFile using delimiter eol --> list of paragraphs
repeat with i in s_temp
   -- try
   set {myTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, quote}
   set t1 to text item 1 of i
   set tempName to words of text item 2 of i -- {last, first, middle}
   set t3 to text item 3 of i
   set AppleScript's text item delimiters to {","}
   set temp1 to text items 1 thru -2 of t1 -- list of items before the name
   set temp2 to text items 2 thru -1 of t3 -- list of items after the name
   set AppleScript's text item delimiters to myTID
   
   (*
       set SQL_data to {item 7 of temp, item 2 of temp, item 3 of temp, item 4 of temp, "I AGREE", item 10 of temp, item 9 of temp, item 11 of temp, item 13 of temp, item 5 of temp, item 1 of temp, item 2 of temp, item 6 of temp, prof1, prof2} -- (prof1 and prof2 are set earlier in the code)
       on error
       display dialog "Error"
       exit repeat
       end try
    *)

end repeat


regards

Stefan

Offline

 

#3 2006-11-30 10:10:55 am

Adam Bell
Administrator
From: Nova Scotia, Canada
Registered: 2005-10-04
Posts: 4255

Re: Ignore Delimiter Inside Quotes

If First, Last, Middle is the only quoted form inside the string, then this extracts it:

Applescript:


set aLine to "Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,\"Last,First,Middle\",LAST,FIRST,M,1,LA,,,910,B6"

set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "\""
set N to text items of aLine
set AppleScript's text item delimiters to tid
N

--> {"Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,", "Last,First,Middle", ",LAST,FIRST,M,1,LA,,,910,B6"}

I don't know what LAST, FIRST, M stand for, but if that is a repeat, you could just drop the middle term here

Applescript:

set new to {beginning, end} of N

--> {"Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,", ",LAST,FIRST,M,1,LA,,,910,B6"}


Scripts are tested on a PowerMac dual-core G5/2.3 running OS X 10.5.8 or MacBook Pro Intel Core 2 Duo running OS X 10.6.4

Offline

 

#4 2006-11-30 10:42:45 am

steven_hunter
Member
Registered: 2005-12-07
Posts: 45

Re: Ignore Delimiter Inside Quotes

Adam Bell wrote:

If First, Last, Middle is the only quoted form inside the string, then this extracts it:

Applescript:


set aLine to "Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,\"Last,First,Middle\",LAST,FIRST,M,1,LA,,,910,B6"

set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "\""
set N to text items of aLine
set AppleScript's text item delimiters to tid
N

--> {"Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,", "Last,First,Middle", ",LAST,FIRST,M,1,LA,,,910,B6"}

I don't know what LAST, FIRST, M stand for, but if that is a repeat, you could just drop the middle term here

Applescript:

set new to {beginning, end} of N

--> {"Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,", ",LAST,FIRST,M,1,LA,,,910,B6"}

"LAST, FIRST, M" is the Last name, First name, and Middle initial (Uppercased and white space removed) . Why they provide this *and* the full name (in quotes) is beyond me.

Unless someone has a better solution, I'll probably go with this then...

Thanks!!

Last edited by steven_hunter (2006-11-30 10:43:50 am)

Offline

 

#5 2006-11-30 02:08:48 pm

Bruce Phillips
Administrator
Registered: 2004-07-15
Posts: 2647

Re: Ignore Delimiter Inside Quotes

Steven, it looks like Adam's solution should work for you. However, here is a reuseable handler (which I just made): explode_ignoring()

Offline

 

#6 2006-11-30 04:14:42 pm

kel
Member
Registered: 2002-11-19
Posts: 2031

Re: Ignore Delimiter Inside Quotes

Hi,

That's why they say it's better to use tab as a delimiter. Here's a script that switches comma delimited to tab delimited.

Applescript:


set t to "Fall 2006,BIOL,195J,6,1,INTRO TO BIOLOGY,XXXXX-XXXXX,\"Last,First,Middle\",LAST,FIRST,M,1,LA,,,91"
set user_tid to AppleScript's text item delimiters
-- group quoted items
set AppleScript's text item delimiters to {"\""}
set temp_l to text items of t
-- change non-quoted items to tab delimited strings
set c to count temp_l
repeat with i from 1 to c by 2
   set AppleScript's text item delimiters to {","}
   set l to text items of item i of temp_l
   set AppleScript's text item delimiters to {tab}
   set item i of temp_l to l as string
end repeat
-- requote the quoted items and get tab delimited string
set AppleScript's text item delimiters to {"\""}
set s to temp_l as string -- tab delimited string
-- get list of items from tab delimited string
set AppleScript's text item delimiters to {tab}
set item_list to text items of s
set AppleScript's text item delimiters to user_tid
return item_list

Edited: and if you don't like the double quote, change this lines:

set AppleScript's text item delimiters to {"\""}

{"\""} to {""}, but I think Office uses double quotes if I remember right.

Edited: I just reread Bruce's post and it looks the same almost. Sorry about that.

gl,

Last edited by kel (2006-11-30 04:31:44 pm)


Kel

Offline

 

#7 2006-12-01 09:15:18 am

steven_hunter
Member
Registered: 2005-12-07
Posts: 45

Re: Ignore Delimiter Inside Quotes

Thanks to everyone for your suggestions. Ultimately I went with Adam Bell's example.

Final code below (now with extra exciting comments!)

Applescript:


on open theFiles
   
   set eol to "\n" as string
   
   repeat with aFile in theFiles
       
       set item_info to the info for aFile
       set ext to name extension of item_info
       
       ignoring case
           if ext is not "CSV" then
               exit repeat
           else
               open for access aFile
               set outName to (name of item_info & ".sql")
               tell application "Finder"
                   make new file at desktop with properties {name:outName}
                   set outFile to file outName of desktop --I probably don't have to put this in the tell block, but "Meh..."
               end tell
               open for access (outFile as alias) with write permission
               set eof of the (outFile as alias) to 0
               
               set prof1 to text returned of (display dialog "Primary Professor" default answer " " buttons {"OK"} default button "OK")
               set prof2 to text returned of (display dialog "Secondary Professor" default answer " " buttons {"OK"} default button "OK")
               
               set first_pass to true
               
               repeat
                   try
                       set s_temp to read aFile until eol
                       if first_pass is false then --The first line of the CSV file contains the category headers. We need to skip this line.
                           
                           set {myTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {"\""}}
                           set temp to text items of s_temp
                           set temp2 to ((item 1 of temp as string) & item 3 of temp as string)
                           set AppleScript's text item delimiters to {","}
                           set temp to text items of temp2
                                                                               
                           set SQL_temp to {item 7 of temp, item 12 of temp, item 3 of temp, item 4 of temp, "I AGREE", item 10 of temp, item 9 of temp, item 11 of temp, item 13 of temp, item 5 of temp, item 1 of temp, item 2 of temp, item 6 of temp, prof1, prof2}
                           set SQL_statement to "insert into student_records values("
                           set r_count to count of items in SQL_temp
                           
                           repeat with i from 1 to (r_count - 1)
                               set SQL_statement to SQL_statement & quoted form of item i of SQL_temp & ","
                           end repeat

                           set SQL_statement to SQL_statement & quoted form of item r_count of SQL_temp & ")" & return
                           write SQL_statement to (outFile as alias) starting at eof
                           
                           set AppleScript's text item delimiters to myTID
                       else
                           set first_pass to false
                       end if
                       
                   on error ErrorText number ErrorNumber
                       if ErrorNumber is not equal to -39 then --Error -39 is End Of File error
                           display dialog "There was an error. " & return & return & ErrorNumber & " - " & ErrorText with icon 2
                       else
                           display dialog "Done!" buttons {"OK"} default button "OK" giving up after 2
                       end if
                       exit repeat
                       
                   end try
                   
               end repeat
               close access aFile
               close access (outFile as alias)
           end if
       end ignoring
       
   end repeat
end open

Last edited by steven_hunter (2006-12-01 09:15:52 am)


Filed under: Finder

Offline

 

Board footer

Powered by FluxBB

[ Generated in 0.680 seconds, 8 queries executed ]

RSS (new topics) RSS (active topics)