Tuesday, October 25, 2016

#1 2010-03-11 05:43:36 pm

sam452
Member
From: Nashville
Registered: 2006-12-28
Posts: 96

CSV > sqlite3; clean up values before entering

I'm excited about using sqlite3 to solve some problems. I want to make a droplet that will take a CSV file and create a database.

I've used these handlers that were kindly uploaded and where I need some advice.

http://macscripter.net/viewtopic.php?id=27928

Here's a representative string of a row I want to insert into a table:

{"caiv2,2010BBDGRC,\"President, Board of Directors\""}

I'd like to end up with ('caiv2','2010BBDGRC',\"President, Board of Directors\") to insert.

You can see that I need help with how to handle the escaped field value.

But before that, I need to get the values ready to work. The handlers seem to work great except for the values. I can say that I'm able to create the db, tables and columns, but I'm breaking on the values. Reading the comments, this ought to work but it's breaking as below:

Applescript:


set a to {"caiv2,2010BBDGRC,\"President, Board of Directors\""}
set b to my tidStuff(quote, a) --> {"caiv2,2010BBDGRC,\"President, Board of Directors\""} Doesn't help me.
set c to my returnCommaSepQuotedString(a, quote) --> "\"caiv2,2010BBDGRC,\"\"President, Board of Directors\"\"\"" --as you can see it doesn't break up into single quotes
set d to my returnCommaSepQuotedStringEntities(a) --> "\"\"\"caiv2,2010BBDGRC,\"\"\"\"President, Board of Directors\"\"\"\"\"\"\"" --This was just to try it.

return c

-----

-- TURN STRING INTO LIST
on tidStuff(paramHere, textHere)
   set OLDtid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to paramHere
   set theItems to text items of textHere
   set AppleScript's text item delimiters to OLDtid
   return theItems
end tidStuff

--for colums
on returnCommaSepQuotedStringEntities(the_array) -- use for column names (eg in create_db), table names, not values
   set quoteChar to quote -- AppleScript's reserved word for a double quote: "
   return returnCommaSepQuotedString(the_array, quoteChar)
end returnCommaSepQuotedStringEntities

on returnCommaSepQuotedStringValues(the_array) -- use for values in rows (eg in sql_insert), not for column names
   set quoteChar to "'" -- single quote
   returnCommaSepQuotedString(the_array, quoteChar)
end returnCommaSepQuotedStringValues

on returnCommaSepQuotedString(the_array, quoteChar)
   set oldDelimiters to AppleScript's text item delimiters
   -- Find any text values, quote them and escape any quote characters with quotes:
   repeat with item_ref in the_array
       set item_value to contents of item_ref
       if class of item_value is text then
           if item_value contains quoteChar then
               set AppleScript's text item delimiters to quoteChar
               set parsedList to text items in item_value
               set AppleScript's text item delimiters to (quoteChar & quoteChar)
               set item_value to parsedList as text
           end if
           set item_value to (quoteChar & item_value & quoteChar) as text
           set contents of item_ref to item_value
       end if
   end repeat
   -- Join all the values together as a string, separated by commas:
   set AppleScript's text item delimiters to ", "
   set return_string to the_array as text
   set AppleScript's text item delimiters to oldDelimiters
   return return_string
end returnCommaSepQuotedString


The handler's see the one string as one field so it's breaking down.

How do I use it, or something like to break down the one text string into values, while keeping the escaped value because I really need that comma as part of the customer's job title.

What am I overlooking? I've beaten myself up trying to see past this, thanx, sam


Filed under: SQLite3, unix, csv

Offline

 

#2 2010-03-11 08:11:53 pm

Craig Williams
Administrator
From: Ft. Smith, AR
Registered: 2006-12-06
Posts: 888

Re: CSV > sqlite3; clean up values before entering

You may find something useful here

Offline

 

#3 2010-03-12 01:31:01 pm

sam452
Member
From: Nashville
Registered: 2006-12-28
Posts: 96

Re: CSV > sqlite3; clean up values before entering

Thanks, Craig, but this was helpful, but not for this problem.

If I have a list of CSV as in the above, how would I extract out the actual values without using comma as a text item delimiter?

I cannot use a comma because it would rip this value:

\"President, Board of Directors\"

How should I parse out:

{"caiv2,2010BBDGRC,\"President, Board of Directors\""}

to get:

{'caiv2','2010BBDGRC',\"President, Board of Directors\"}

thanx, sam


Filed under: SQLite3, unix, csv

Offline

 

#4 2010-03-13 01:34:15 pm

Craig Williams
Administrator
From: Ft. Smith, AR
Registered: 2006-12-06
Posts: 888

Re: CSV > sqlite3; clean up values before entering

Get all of the quoted elements first and then get the comma-sep elements then put them back together.

Offline

 

#5 2010-03-13 02:50:16 pm

Adam Bell
Administrator
From: Nova Scotia, Canada
Registered: 2005-10-04
Posts: 4653

Re: CSV > sqlite3; clean up values before entering

If you're not to fussy about retaining commas in the third element:

Applescript:

set tStart to {"caiv2,2010BBDGRC,\"President, Board of Directors\""}
-- note that this is a list of one item; a string.
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ","
set tParts to text items of (tStart as text)
set AppleScript's text item delimiters to tid
-- assuming that items 1 and 2 never have commas
-- and that you don't mind losing the comma in the
-- title:
tell tParts
   set CSV_1 to item 1 --> "caiv2"
   set CSV_2 to item 2 --> "2010BBDGRC"
   set CSV_3 to items 3 thru -1 as text --> "\"President Board of Directors\""
end tell

To get fancier than that, you'd have to know more about that third part.


iMac running OS X 10.11.6

Offline

 

#6 2010-03-13 03:58:27 pm

Nigel Garvey
Moderator
From: Warwickshire, England
Registered: 2002-11-19
Posts: 4084

Re: CSV > sqlite3; clean up values before entering

I wrote a CSV-to-list converter a few years ago. I don't know if it's what you want.

Applescript:

(* Assumes that the CSV text adheres to the convention:
   Records are delimited by LFs or CRLFs (but CRs are also allowed here).
   The last record in the text may or may not be followed by an LF or CRLF (or CR).
   Fields in the same record are separated by commas (unless specified differently by parameter).
   The last field in a record must not be followed by a comma.
   Trailing or leading spaces in unquoted fields are not ignored (unless so specified by parameter).
   Fields containing quoted text are quoted in their entirety, any space outside them being ignored.
   Fields enclosed in double-quotes are to be taken verbatim, except for any included double-quote pairs, which are to be translated as double-quote characters.
       
   No other variations are currently supported. *)


on csvToList(csvText, implementation)
   -- The 'implementation' parameter must be a record. Leave it empty ({}) for the default assumptions: ie. comma separator, leading and trailing spaces in unquoted fields not to be trimmed. Otherwise it can have a 'separator' property with a text value (eg. {separator:tab}) and/or a 'trimming' property with a boolean value ({trimming:true}).
   set {separator:separator, trimming:trimming} to (implementation & {separator:",", trimming:false})
   
   script o -- Lists for fast access.
       property qdti : getTextItems(csvText, "\"")
       property currentRecord : {}
       property possibleFields : missing value
       property recordList : {}
   end script
   
   -- o's qdti is a list of the CSV's text items, as delimited by double-quotes.
   -- Assuming the convention mentioned above, the number of items is always odd.
   -- Even-numbered items (if any) are quoted field values and don't need parsing.
   -- Odd-numbered items are everything else. Empty strings in odd-numbered slots
   -- (except at the beginning and end) indicate escaped quotes in quoted fields.
   
   set astid to AppleScript's text item delimiters
   set qdtiCount to (count o's qdti)
   set quoteInProgress to false
   considering case
       repeat with i from 1 to qdtiCount by 2 -- Parse odd-numbered items only.
           set thisBit to item i of o's qdti
           if ((count thisBit) > 0) or (i is qdtiCount) then
               -- This is either a non-empty string or the last item in the list, so it doesn't
               -- represent a quoted quote. Check if we've just been dealing with any.
               if (quoteInProgress) then
                   -- All the parts of a quoted field containing quoted quotes have now been
                   -- passed over. Coerce them together using a quote delimiter.
                   set AppleScript's text item delimiters to "\""
                   set thisField to (items a thru (i - 1) of o's qdti) as string
                   -- Replace the reconstituted quoted quotes with literal quotes.
                   set AppleScript's text item delimiters to "\"\""
                   set thisField to thisField's text items
                   set AppleScript's text item delimiters to "\""
                   -- Store the field in the "current record" list and cancel the "quote in progress" flag.
                   set end of o's currentRecord to thisField as string
                   set quoteInProgress to false
               else if (i > 1) then
                   -- The preceding, even-numbered item is a complete quoted field. Store it.
                   set end of o's currentRecord to item (i - 1) of o's qdti
               end if
               
               -- Now parse this item's field-separator-delimited text items, which are either non-quoted fields or stumps from the removal of quoted fields. Any that contain line breaks must be further split to end one record and start another. These could include multiple single-field records without field separators.
               set o's possibleFields to getTextItems(thisBit, separator)
               set possibleFieldCount to (count o's possibleFields)
               repeat with j from 1 to possibleFieldCount
                   set thisField to item j of o's possibleFields
                   if ((count thisField each paragraph) > 1) then
                       -- This "field" contains one or more line endings. Split it at those points.
                       set theseFields to thisField's paragraphs
                       -- With each of these end-of-record fields except the last, complete the field list for the current record and initialise another. Omit the first "field" if it's just the stub from a preceding quoted field.
                       repeat with k from 1 to (count theseFields) - 1
                           set thisField to item k of theseFields
                           if ((k > 1) or (j > 1) or (i is 1) or ((count trim(thisField, true)) > 0)) then set end of o's currentRecord to trim(thisField, trimming)
                           set end of o's recordList to o's currentRecord
                           set o's currentRecord to {}
                       end repeat
                       -- With the last end-of-record "field", just complete the current field list if the field's not the stub from a following quoted field.
                       set thisField to end of theseFields
                       if ((j < possibleFieldCount) or ((count thisField) > 0)) then set end of o's currentRecord to trim(thisField, trimming)
                   else
                       -- This is a "field" not containing a line break. Insert it into the current field list if it's not just a stub from a preceding or following quoted field.
                       if (((j > 1) and ((j < possibleFieldCount) or (i is qdtiCount))) or ((j is 1) and (i is 1)) or ((count trim(thisField, true)) > 0)) then set end of o's currentRecord to trim(thisField, trimming)
                   end if
               end repeat
               
               -- Otherwise, this item IS an empty text representing a quoted quote.
           else if (quoteInProgress) then
               -- It's another quote in a field already identified as having one. Do nothing for now.
           else if (i > 1) then
               -- It's the first quoted quote in a quoted field. Note the index of the
               -- preceding even-numbered item (the first part of the field) and flag "quote in
               -- progress" so that the repeat idles past the remaining part(s) of the field.
               set a to i - 1
               set quoteInProgress to true
           end if
       end repeat
   end considering
   
   -- At the end of the repeat, store any remaining "current record".
   if (o's currentRecord is not {}) then set end of o's recordList to o's currentRecord
   set AppleScript's text item delimiters to astid
   
   return o's recordList
end csvToList

-- Get the possibly more than 4000 text items from a text.
on getTextItems(txt, delim)
   set astid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to delim
   set tiCount to (count txt's text items)
   set textItems to {}
   repeat with i from 1 to tiCount by 4000
       set j to i + 3999
       if (j > tiCount) then set j to tiCount
       set textItems to textItems & text items i thru j of txt
   end repeat
   set AppleScript's text item delimiters to astid
   
   return textItems
end getTextItems

-- Trim any leading or trailing spaces from a string.
on trim(txt, trimming)
   if (trimming) then
       repeat with i from 1 to (count txt) - 1
           if (txt begins with space) then
               set txt to text 2 thru -1 of txt
           else
               exit repeat
           end if
       end repeat
       repeat with i from 1 to (count txt) - 1
           if (txt ends with space) then
               set txt to text 1 thru -2 of txt
           else
               exit repeat
           end if
       end repeat
       if (txt is space) then set txt to ""
   end if
   
   return txt
end trim

set csvText to "caiv2,2010BBDGRC,\"President, Board of Directors\"" & linefeed & "Another line, for demo purposes"
csvToList(csvText, {})
--> {{"caiv2", "2010BBDGRC", "President, Board of Directors"}, {"Another line", "for demo purposes"}}

Edit: It's been pointed out to me that the original script omitted empty fields. That's now corrected and the trim() hander's a trifle faster than it was. The script also now conforms to RFC 4180, but has options to trim leading and trailing spaces from unquoted fields — csvToList(csvText, {trimming:true}) — and/or to recognise a different field separator — eg. csvToList(csvText, {separator:tab, trimming:true}). An empty record gets the defaults: unquoted fields not trimmed; comma separator. Returns are now tolerated as record separators.

Last edited by Nigel Garvey (2012-03-29 11:51:52 am)


NG

Offline

 

#7 2010-03-15 10:45:02 am

sam452
Member
From: Nashville
Registered: 2006-12-28
Posts: 96

Re: CSV > sqlite3; clean up values before entering

Yes, yes, thank you for seeing what I was trying to communicate. This is an excellent gift to the community. I've learned a big deal going through it.


Filed under: parse, SQLite3, unix, csv

Offline

 

Board footer

Powered by FluxBB

[ Generated in 0.082 seconds, 10 queries executed ]

RSS (new topics) RSS (active topics)