sqllite3 - Absolute .db path & Working with Delimiters

Yvan_Koenig · December 8, 2016, 2:58pm

I gave the answer to the 2nd question in my late message.
It seems that your eyes weren’t fully open

Yvan KOENIG running Sierra 10.12.1 in French (VALLAURIS, France) jeudi 8 dÃ©cembre 2016 15:58:06

Paralysis · December 8, 2016, 4:16pm

My apologies, so you did!
I must have missed that email.

The script appears to be working as intended however it does fall over later as the variables are not pulled in like your script is indicated:

Please see below:

set pullData to "select printername, printersite, contactemail, ftpemail, ftppassword from printers where printercode = '" & printerCode & "'; "
set theAnswer to (do shell script head & pullData & quote) as string
display dialog theAnswer
--> theAnswer gives me this result - Including the spaces etc:
--> customer1    London       customer1@email.com  customer1@ftp.com  Password1
--> MacScripter appears to ignore the extra spaces inbetween the words - see below:
-- customer1  *4spaces*  London *7spaces*       customer1@email.com  *2spaces* customer1@ftp.com  *2spaces* Password   
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to my decoupe(theAnswer, ",")

Its almost there!
Thanks again for helping me

Paralysis · December 8, 2016, 4:31pm

I have found the issue!
The issue was in the head variable not the code.
As I had -column set in the head it wasnt pulling in correctly.
I removed that and the string was exported correctly but with “|” separating them.
I simply change the delimiter character in the code you provided.

Im all sorted now
Thanks again!

Yvan_Koenig · December 8, 2016, 4:45pm

I apologizes but I’m unable to guess that you post wrong infos.
In your original message I saw :

[format]-- Lets say I enter this info in: customer1, London, customer1@email.com, customer1@ftp.com, Password 1[/format]

So I used the comma as separator (in fact using ", " would have been better).

As items are separated by groups of spaces a complementary task is required.

set theAnswer to (do shell script head & pullData & quote)
set twoSpaces to space & space
repeat # replace every couple of space characters by a single one as long as there are such couples
	if theAnswer contains twoSpaces then
		set theAnswer to my remplace(theAnswer, twoSpaces, space)
	else
		exit repeat
	end if
end repeat
# Here theAnswer is : "customer1 London customer1@email.com customer1@ftp.com Password"
# So we may use a single space as separator
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to my decoupe(theAnswer, space)
--> {"customer1", " London", " customer1@email.com", " customer1@ftp.com", " Password 1"

# here continue your script

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====
(*
replaces every occurences of d1 by d2 in the text t
*)
on remplace(t, d1, d2)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d1}
	set l to text items of t
	set AppleScript's text item delimiters to d2
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end remplace

#=====

Yvan KOENIG running Sierra 10.12.1 in French (VALLAURIS, France) jeudi 8 dÃ©cembre 2016 17:44:34

Yvan_Koenig · December 9, 2016, 9:35am

Here the script use an alternate scheme to extract the useful values.

set theAnswer to (do shell script head & pullData & quote)
--> "customer1        London.     customer1@email.com        customer1@ftp.com       Password"
set twoSpaces to space & space
# splitting theAnswer by space character will generate several emty strings
set longList to my decoupe(theAnswer, space)
--> {"customer1", "", "", "", "", "", "", "", "London.", "", "", "", "", "customer1@email.com", "", "", "", "", "", "", "", "customer1@ftp.com", "", "", "", "", "", "", "Password"}
set shortList to {}
# Now, drop these empty strings
repeat with txt in longList
	set txt to txt as text
	if txt â‰  "" then set end of shortList to txt
end repeat
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to shortList
--> {"customer1", "London.", "customer1@email.com", "customer1@ftp.com", "Password"}
# here continue your script

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Maybe someone will give an other scheme using Grep/Regex. My old brain fails to understand these tools.

Yvan KOENIG Running Sierra 10.12.1 In French (VALLAURIS. France) vendredi 9 dÃ©cembre 2016 10:35:15

Nigel_Garvey · December 9, 2016, 10:54am

Assuming the first line works (I’m not familiar with sqlite scripting and can’t see what the ‘quote’ is for), you could possibly use this:

set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (do shell script head & pullData & quote & " | sed -E 's/[[:space:]]+/\\'$'\\n''/g'")
# here continue your script

Yvan_Koenig · December 9, 2016, 1:52pm

Thanks Nigel.

I tested it with :

set theAnswer to "customer1        London.     customer1@email.com        customer1@ftp.com       Password"
-- "customer1--------London.-----customer1@email.com--------customer1@ftp.com-------Password" In the true string - must be replaced by a space character
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (do shell script "sed -E 's/[[:space:]]+/\\'$'\\n''/g' <<<" & quoted form of theAnswer)

I’m wondering how the command must be changed so that it splits the string with the dashes.

Yvan KOENIG running Sierra 10.12.1 in French (VALLAURIS, France) vendredi 9 dÃ©cembre 2016 14:52:32

Nigel_Garvey · December 9, 2016, 5:18pm

Hi Yvan.

If you mean replace the sequences of dashes with linefeeds, it would be:

set theAnswer to "customer1--------London.-----customer1@email.com--------customer1@ftp.com-------Password"
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (do shell script "sed -E 's/-+/\\'$'\\n''/g' <<<" & quoted form of theAnswer)

But it would also replace hyphens in words, which may not be desired. This next version only replaces runs of two or more dashes:

set theAnswer to "customer1--------London.-----customer1@email.com--------customer1@ftp.com-------Pass-word"
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (do shell script "sed -E 's/--+/\\'$'\\n''/g' <<<" & quoted form of theAnswer)

To replace a run of anything between the words which isn’t an alphanumeric character, a dot, an “@”, or a hyphen between such characters:

set theAnswer to "customer1  London.,-----customer1@email.com,customer1@ftp.com#!?[]^(){}Pass-word"
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (do shell script "sed -E 's/([^[:alnum:].@-][^[:alnum:].@]*|-[^[:alnum:].@]+)/\\'$'\\n''/g' <<<" & quoted form of theAnswer)

Yvan_Koenig · December 9, 2016, 5:46pm

Thanks Nigel

Yvan KOENIG running Sierra 10.12.1 in French (VALLAURIS, France) vendredi 9 dÃ©cembre 2016 18:46:13

Yvan_Koenig · December 9, 2016, 7:56pm

As I am curious, I decided to test the efficiency of three scripts.

I built a script treating the same string 1000 times.

The results are interesting.
Here is the complete script applying the first scheme which I posted.

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"
set theString to "customer1        London.     customer1@email.com        customer1@ftp.com       Password"
set startDate to current application's NSDate's |date|()
repeat 1000 times
	copy theString to theAnswer
	--> "customer1        London.     customer1@email.com        customer1@ftp.com       Password"
	set twoSpaces to space & space
	repeat # replace every couple of space characters by a single one as long as there are such couples
		if theAnswer contains twoSpaces then
			set theAnswer to my remplace(theAnswer, twoSpaces, space)
		else
			exit repeat
		end if
	end repeat
	# Here theAnswer is : "customer1 London customer1@email.com customer1@ftp.com Password"
	# So we may use a single space as separator
	set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to my decoupe(theAnswer, space)
end repeat
# here continue your script
set timeDiff to startDate's timeIntervalSinceNow()
display dialog "That took " & (-timeDiff as real) & " seconds."
--> "That took 0,08715903759 seconds."

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====
(*
replaces every occurences of d1 by d2 in the text t
*)
on remplace(t, d1, d2)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d1}
	set l to text items of t
	set AppleScript's text item delimiters to d2
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end remplace

Then I replaced the main loop by :

repeat 1000 times
	copy theString to theAnswer
	--> "customer1        London.     customer1@email.com        customer1@ftp.com       Password"
	
	# splitting theAnswer by space character will generate several emty strings
	set longList to my decoupe(theAnswer, space)
	--> {"customer1", "", "", "", "", "", "", "", "London.", "", "", "", "", "customer1@email.com", "", "", "", "", "", "", "", "customer1@ftp.com", "", "", "", "", "", "", "Password"}
	set shortList to {}
	# Now, drop these empty strings
	repeat with txt in longList
		set txt to txt as text
		if txt â‰  "" then set end of shortList to txt
	end repeat
	set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to shortList
	--> {"customer1", "London.", "customer1@email.com", "customer1@ftp.com", "Password"}
end repeat

This time, that took 0,096091032028 seconds.

Last not least I tested the sed version with this main loop:

repeat 1000 times
	copy theString to theAnswer
	--> "customer1        London.     customer1@email.com        customer1@ftp.com       Password"
	
	set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (do shell script "sed -E 's/[[:space:]]+/\\'$'\\n''/g' <<<" & quoted form of theAnswer)
end repeat

It took 20,967127978802 seconds.

Honestly, I didn’t imagined that the fact to trigger a shell script would add such extra duration.
It’s not really reflecting the true efficiency because in real life, the sed scheme would me used in a single call to shell script, the one posted by Nigel:

set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (do shell script head & pullData & quote & " | sed -E 's/[[:space:]]+/\\'$'\\n''/g'")

Alas, I am unable to test the complete task which build the string and split it.

Maybe the asker will be interested by the comparison and will test the three schemes.
I would be interested by the results.

Yvan KOENIG running Sierra 10.12.1 in French (VALLAURIS, France) vendredi 9 dÃ©cembre 2016 20:56:16

Shane_Stanley · December 9, 2016, 11:01pm

It’s a time-consuming process.

You can also use Nigel’s regular expressions (with slight modifications) under AppleScriptObjC:

	set theAnswer to current application's NSString's stringWithString:theString
	set newString to theAnswer's stringByReplacingOccurrencesOfString:"([^[:alnum:].@-][^[:alnum:].@]*|-[^[:alnum:]\\.@]+)" withString:linefeed options:(current application's NSRegularExpressionSearch) range:{0, theAnswer's |length|()}
	set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to (newString's componentsSeparatedByString:linefeed) as list

That’s still going to be slower than your example, though only be a factor of about 3, rather than 100+.

Nigel_Garvey · December 10, 2016, 10:43am

Shane Stanley:

You can also use Nigel’s regular expressions (with slight modifications) under AppleScriptObjC:
	set theAnswer to current application's NSString's stringWithString:theString
	set newString to theAnswer's stringByReplacingOccurrencesOfString:"([^[:alnum:].@-][^[:alnum:].@]*|-[^[:alnum:]\\.@]+)" withString:linefeed options:(current application's NSRegularExpressionSearch) range:{0, theAnswer's |length|()}
	set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to (newString's componentsSeparatedByString:linefeed) as list
That’s still going to be slower than your example, though only be a factor of about 3, rather than 100+.

This version’s a bit faster:

set theAnswer to current application's NSMutableString's stringWithString:theString
theAnswer's replaceOccurrencesOfString:"[^[:alnum:].@-][^[:alnum:].@]*+|-[^[:alnum:].@]++" withString:linefeed options:(current application's NSRegularExpressionSearch) range:{0, (count theString)}
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (theAnswer as text)

But I think it’s safer to stick with theAnswer’s |length|() rather than (count theString) if there’s a possibility of certain Unicode combinations in string.

Yvan_Koenig · December 10, 2016, 11:05am

Thanks Shane.

Your code does the job as well if the values are separated by groups of spaces, by groups of hyphens or even by mixed groups of spaces, hyphens, underscores.

I tried to decipher it.
My understanding is that the strings to be replaced are:

strings starting with (a character which is not alphanumeric, dot, arobase or hyphen) plus 0 or several characters which aren’t alphanumeric, dot or arobase (this time the hyphen is not rejected)
OR strings starting with an hyphen, continuing with zero or several characters which aren’t alphanumeric, dot or arobase (this time the hyphen is not rejected)

I don’t understand what is the difference between the repetition star and the repetition plus.
I tried to use the plus twice, the star twice, the plus first and the star as 2nd one with no change in the final result.

I discovered Nigel message after submitting my message.
I’m now more puzzled by it’s use of plus and double plus.

I tested it with :
set theString to “customâ¼’â¼”er1- - __-- = ----London.-----customer1@email.com--------customer1@ftp.com-------Password”

The couple of Kangxi radicals was treated as a separator

Replacing (count theString) by theAnswer’s |length|() changed nothing.

Shane_Stanley · December 10, 2016, 11:49am

Suppose you have several strings:

abc abbc abbbc acd

If you search for ab*, you will get a match in them all: ab, abb, abbb and a. The match for b is essentially optional.

If you search for ab+, you will only get matches in the first three: ab, abb and abbb. This time there must be at least one b.

But it would if you had a Unicode character that cannot be represented as a single 16-bit value.

Nigel_Garvey · December 10, 2016, 12:43pm

Hi Yvan.

Almost. It’s:

sequences starting with (a character which is not alphanumeric, dot, arobase or hyphen) plus 0 or several characters which aren’t alphanumeric, dot or arobase (including hyphens)
OR sequences starting with a hyphen, continuing with one or several characters which aren’t alphanumeric, dot or arobase (including hyphens)

That is:
any run of one or more unwanted characters where the first isn’t a hyphen
OR any run of two or more unwanted characters where the first is a hyphen
So a single hyphen which isn’t part of a run of unwanted characters isn’t matched.

An asterisk means zero or more instances. A plus means one or more instances. In the version of regex supported by ObjC (but not in the version(s) used by sed), it’s possible to optimise the asterisk and plus actions (where appropriate) by putting another plus after these symbols. It basically tells the regex engine to throw away the run after matching it instead of storing it for possible re-interpretation later in the regex. In fact there’s no “later” in the regex here, so the difference (if any) will be negligible.

Yvan_Koenig · December 10, 2016, 2:48pm

Thanks Shane and Nigel.
I missed the fact that * was meaning 0 or some occurrences while + means 1 or some occurrences.

DJ_Bazzie_Wazzie · December 12, 2016, 2:09pm

Since you tried plus twice I think you need to use curly braces.

An atom followed by *' matches a sequence of 0 or more matches of the atom. An atom followed by +’ matches a sequence of 1 or more matches of the atom.
An atom followed by `?’ matches a sequence of 0 or 1 matches of the atom.
An atom followed by {n} matches a sequence exactly n times of the atom
An atom followed by {n,} matched a sequence n times or more of the atom
An atom followed by {n,m} matched a sequence n times or more and equal or less than m of the atom

Instead of ++ I think you want {2,} instead.

Yvan_Koenig · December 12, 2016, 2:19pm

Thanks but it’s not my script which used ++, it’s Nigel’s one

Nigel_Garvey · December 12, 2016, 3:00pm

Hi DJ.

My understanding is that the brace equivalent of “++” is “{1,}+” and of “*+” it’s “{0,}+”.

use framework "Foundation"

set theString to "customer1 London.,-----customer1@email.com,customer1@ftp.com#!?[]^(){}Pass-word" -- Deliberate mish-mash of irrelevant characters between fields.

set theAnswer to current application's NSMutableString's stringWithString:theString
theAnswer's replaceOccurrencesOfString:"[^[:alnum:].@-][^[:alnum:].@]{0,}+|-[^[:alnum:].@]{1,}+" withString:linefeed options:(current application's NSRegularExpressionSearch) range:{0, (count theString)}
set {printerName, printerSite, contactEmail, ftpEmail, ftpPassword} to paragraphs of (theAnswer as text)

DJ_Bazzie_Wazzie · December 12, 2016, 3:37pm

My mistake I completely overlooked the possessive quantifier.