Script to Extract Text from String with Delimiters

Hi All,

I have some data which I extract from a website using Applescript, and will need to break it down and assign it to different variables. My understanding is that this can be done perhaps using delimiters, but am not familiar with it.

Below sample text

Address 1, Address 2, District
Region, SHANGHAI, 154789
China

So the delimiters in this case would be the “,” and also the
. The end result should be something like below:

Address Line 1: Address 1
Address Line 2: Address 2
District: District
City: City
Region: Region
City: SHANGHAI
Postal Code: 154789
Country: China

Anyone able to assist/shed some light?

Thanks much in advance!

"Address 1, Address 2, District<br>Region, SHANGHAI, 154789<br>China"

set splitted to my decoupe(result, {", ", "<br>"})

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

May help.

Yvan KOENIG (VALLAURIS, France) lundi 24 mars 2014 11:37:54

Hi Yvan,

Thanks for the help. Apologies for the delay in reply.

I tested your solution and it partially solves the problem, though the main output I require is slightly different than current.

Original text
25 Southland Road, Smith Street, Southlake
Region, SHANGHAI, 154789
China

The algorithm will be something like
Read Original Text
On “,” or “

Set Address Line 1: to 25 Southland Road
Read Original Text
On “,” or “

Set Address Line 2: to Smith Street
Read Original Text
On “,” or “

Set District: to Southlake

etc

Hope that makes the requirements clearer?

Thanks

Edit: Nvm I found the solution. Thanks!

Something like this?

set OriginalText to "25 Southland Road, Smith Street, Southlake<br>Region, SHANGHAI, 154789<br>China"

set splitted to my decoupe(OriginalText, {", ", "<br>"})

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	set {AddressLine1, AddressLine2, District, Region, City, Zip, Country} to l
	return ("Address Line 1: " & AddressLine1 & return & "Address Line 2: " & AddressLine2 & return & "District: " & District & return & "Region: " & Region & return & "City: " & City & return & "Zip code: " & Zip & return & "Country: " & Country)
end decoupe

#=====

Isn’t it what my code return ?
I assume that
is a replacement for return OR linefeed.
As I don’t know which one, I kept the pseudo value.
As it seems that you didn’t understood that, here is an alternate version:

"25 Southland Road, Smith Street, Southlake" & return & "Region, SHANGHAI, 154789" & linefeed & "China"

set splitted to my decoupe(result, {", ", return, linefeed})
--> {"25 Southland Road", "Smith Street", "Southlake", "Region", "SHANGHAI", "154789", "China"}
#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

item1 = “25 Southland Road”
item2 = “Smith Street”
item3 = “Southlake”
item4 = “Region”
item5 = “SHANGHAI”
item6 = “154789”
item7 = “China”

Isn’t it what you want ?

You may add the instruction :

set {Address_Line_1, Address_Line_2,District,Region,City,Postal_Code,Country} to splitted

But maybe I understand wrongly and you want :

"25 Southland Road, Smith Street, Southlake" & return & "Region, SHANGHAI, 154789" & linefeed & "China"

set splitted to my decoupe(result, {", ", return, linefeed})

log splitted
(*Address Line 1: 25 Southland Road, Address Line 2: Smith Street, District: Southlake, Region: Region, City: SHANGHAI, Postal Code: 154789, Country: China*)

set {Address_Line_1, Address_Line_2, District, Region, City, Postal_Code, Country} to splitted
log Address_Line_1 (*25 Southland Road*)
log Address_Line_2 (*Smith Street*)
log District (*Southlake*)
log Region (*Region*)
log City (*SHANGHAI*)
log Postal_Code (*154789*)
log Country (*China*)

set item 1 of splitted to "Address Line 1: " & item 1 of splitted
set item 2 of splitted to "Address Line 2: " & item 2 of splitted
set item 3 of splitted to "District: " & item 3 of splitted
--set item 4 of splitted to "City: " & item 4 of splitted
set item 4 of splitted to "Region: " & item 4 of splitted
set item 5 of splitted to "City: " & item 5 of splitted
set item 6 of splitted to "Postal Code: " & item 6 of splitted
set item 7 of splitted to "Country: " & item 7 of splitted
splitted

my recolle(splitted, return)
(*
"Address Line 1: 25 Southland Road
Address Line 2: Smith Street
District: Southlake
Region: Region
City: SHANGHAI
Postal Code: 154789
Country: China"
*)
#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

on recolle(l, d)
	local oTIDs, t
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set t to "" & l
	set AppleScript's text item delimiters to oTIDs
	return t
end recolle

#=====

As you may see, the given source contains 7 items not 8 as what you posted in your original message :
Address Line 1: Address 1
Address Line 2: Address 2
District: District
City: City
Region: Region
City: SHANGHAI
Postal Code: 154789
Country: China

Yvan KOENIG (VALLAURIS, France) vendredi 28 mars 2014 10:02:28

Hi Yvan,

Yes, I realized that i could extract it based on the item #. I did mention at the bottom of my post with an edit “Edit: Nvm I found the solution. Thanks!”

Nonetheless, learned a more efficient and tidy way to code it

“set {Address_Line_1, Address_Line_2, District, Region, City, Postal_Code, Country} to splitter”

Appreciate the help guys! :slight_smile: Thanks!

Cheers