Parsing JSON files

Nice sed construct!

A little off topic, but my sense is that sed has one of the higher ratios of [hidden power]/[real-life usage] among tools in a coder’s toolbox. Fully tapping into that power, though, takes a great deal of study and experience.

I must be jaded. I get the biggest thrill from using code I know someone better than me has already debugged.

Hello, two questions:

– is JSON able to encode Unicode characters ?
– is SED able to apply to Unicode characters ?

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) lundi 11 septembre 2017 16:58:13

Hi Yvan.

It seems so. (See the script below.)

sed on Mac OS can handle Unicode as sequences of bytes, but doesn’t recognise Unicode characters per se. For instance, you can’t use the command y/Д/x/ because sed sees “Д” and “x” as different numbers of characters. But the script below works:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set |⌘| to current application

set unicodeText to "⌘řůД⦿"

set originalRecord to {aString:unicodeText, anArray:{unicodeText}}
set jsonData to |⌘|'s class "NSJSONSerialization"'s dataWithJSONObject:(originalRecord) options:(|⌘|'s NSJSONWritingPrettyPrinted) |error|:(missing value)
set jsonString to (|⌘|'s class "NSString"'s alloc()'s initWithData:(jsonData) encoding:(|⌘|'s NSUTF8StringEncoding)) as text

set reconstitutedRecord to run script (do shell script ("echo " & jsonString's quoted form & " | sed -En 's/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; H; $ {g; s/[[:cntrl:]]+//g; p;}'"))

{jsonString, reconstitutedRecord}
(* -->
{"{
  \"aString\" : \"⌘řůД⦿\",
  \"anArray\" : [
    \"⌘řůД⦿\"
  ]
}", {aString:"⌘řůД⦿", anArray:{"⌘řůД⦿"}}}
*)

Thanks a lot Nigel.

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) lundi 11 septembre 2017 19:42:43

Here’s a sed construct that accomplishes the same task and differs only in that it waits until all lines have been collected in the hold space before performing substitutions:


set applescriptValue to run script (do shell script ("echo " & jsonString's quoted form & " | sed -E 'H; $!d; g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g'"))

Any advantages or disadvantages to this approach, or just another way to skin the cat?

I’m not sure. Before posting my version, I ummed and ahhed over a similar approach:

This suppresses the normal line output globally and does an explicit output of the final result instead. Yours …

… suppresses the normal line output individually in all but the last line and lets the final result from there go through in the normal way. The same effect by different means.

I eventually went for performing the first edit before storing each line merely on the whim of the moment. Since the first edit potentially shortens each line, that’s potentially fewer characters to copy from the pattern space and appended to the hold space. On the other hand, that’s more times the first edit command has to be invoked! For one’s own sanity, it’s probably better not to worry too much about such things. :wink:

The only characters sed recognises as being line endings are linefeeds, so if the line endings happen to be returns, sed will treat the entire text as one line. In this case, the first ‘s’ command will perform the same global edit whether it’s done pre-hold-space or post-hold-space, but the number of characters transfered to and from the hold space will still be fewer if the edit’s done before then.

I need another cup of coffee.

I replicated the sample json string posted previously ( “{ "MenuID":5, "MenuVersion":1, … }”) to 10 times its original size, then ran the following time tests in a shell:


TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo "$largeJsonString" | sed -En 's/"([^"]+)"[[:space:]]*:/|\1|:/g; H; $ {g; s/[[:cntrl:]]+//g; p;}'
done) 2>&1 >/dev/null

# result = 1.582 seconds

TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo "$largeJsonString" | sed -E 'H; $!d; g; s/"([^"]+)"[[:space:]]*:/|\1|:/g; s/[[:cntrl:]]+//g'
done) 2>&1 >/dev/null

# result = 1.544 seconds

The execution times are nearly identical, mine perhaps being ever so slightly faster.

Edit note:

  • I simplified the time commands by removing from my original post the processing of the results through the bc calculator, which was unnecessary.

In the context of do shell script, I’m finding that mine’s faster. :wink: It’s probably of limited relevance from the point of view of determining whether one way’s generally better than the other as the test script’s designed to massage a particular kind of data in a particular way and has only been tested with a particular set of data. Also, of course, it was established back in post #7 that sed isn’t the right tool for parsing JSON data. :slight_smile: But anyway, here’s my test script:


set jsonString to "{ 
    \"MenuID\":5, 
    \"MenuVersion\":1, 
    \"MenuName\":\"Lunch Menu\", 
    \"MenuItems\":[ 
       { 
            \"Name\":\"TUSCANI MEDITERRANEAN CON POLLO\", 
            \"Description\":\"Pasta\", 
            \"PKID\":2, 
            \"ParentID\":1, 
            \"Ingredients\":[ 
               { 
                    \"PKID\":123, 
                    \"IngName\":\"Cheese\", 
                    \"Included\":true, 
                    \"ExtraPrice\":0
               }, 
               { 
                    \"PKID\":124, 
                    \"IngName\":\"Sausage\", 
                    \"Included\":false, 
                    \"ExtraPrice\":0.99
               } 
           ], 
            \"ItemPricing\":[ 
               { 
                    \"PKID\":456, 
                    \"SizeName\":\"Large\", 
                    \"SizePrice\":12.99
               }, 
               { 
                    \"PKID\":678, 
                    \"SizeName\":\"Small\", 
                    \"SizePrice\":14.99
               } 
           ]
       } 
   ]
}"

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to return
-- set jsonString to jsonString's paragraphs as text -- Uncomment to test with return line endings instead of linefeeds.
set AppleScript's text item delimiters to astid

set jsonString2 to jsonString & jsonString
set jsonString4 to jsonString2 & jsonString2
set jsonString10 to jsonString4 & jsonString4 & jsonString2

-- Four versions of the sed code.
set nDequoteFirst to "sed -En 's/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; H; $ {g; s/[[:cntrl:]]+//g; p;}'"
set nHoldFirst to "sed -En 'H; $ {g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g; p;}'"
set dDequoteFirst to "sed -E 's/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; H; $!d; g; s/[[:cntrl:]]+//g'"
set dHoldFirst to "sed -E 'H; $!d; g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g'"

-- Compare the times for 1000 iterations with any two.
compareTimes(jsonString10, nDequoteFirst, dHoldFirst)

on compareTimes(testString, sed1, sed2)
	do shell script "TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo " & testString's quoted form & " | " & sed1 & "
done) 2>&1 >/dev/null

TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo " & testString's quoted form & " | " & sed2 & "
done) 2>&1 >/dev/null"
	return result
end compareTimes

And here’s Shane’s ASObjC one-liner from post #11, unfolded and commented:

use AppleScript version "2.4" -- Mac OS 10.10 (Yosemite) or later.
use framework "Foundation"

-- set jsonString as in the previous scripts.

-- Get an ObjC version of the JSON text.
set jsonNSString to current application's NSString's stringWithString:jsonString
-- Get a data version of that.
set jsonData to jsonNSString's dataUsingEncoding:(current application's NSUTF8StringEncoding)
-- Derive the equivalent ObjC object from the JSON data.
set ASObjCValue to current application's NSJSONSerialization's JSONObjectWithData:jsonData options:0 |error|:(missing value)
-- Assuming the object's an NSDictionary, coerce it to an AppleScript record.
set applescriptValue to ASObjCValue as record

Virtually identical times on my end.

Agreed. I hope we haven’t exceeded Shane’s patience while we were having a little sed fun.

Oh, I’m entertained :slight_smile:

So this:

do shell script ("echo " & jsonString's quoted form & " | sed -E 'H; $!d; g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g'")

gives the same result with your test code as this:

set jsonString to current application's NSString's stringWithString:jsonString
set jsonString to jsonString's stringByReplacingOccurrencesOfString:"\\\"([^\"]+)\\\"[[:space:]]*:" withString:"|$1|:" options:(current application's NSRegularExpressionSearch) range:{0, jsonString's |length|()}
set jsonString to jsonString's stringByReplacingOccurrencesOfString:"[[:cntrl:]]+" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, jsonString's |length|()}
jsonString as text

When I compare them in Script Geek, running 1000 times with Nigel’s jsonString10, the latter takes about 1.3 seconds (including the time to create jsonString10). The former, with the overhead of of do shell script, takes 38+ seconds.

I think I need another cup of coffee.

I realize this is blasphemy in this forum, but if you are dealing a lot with JSON strings, have you considered using JavaScript for Automation (JXA) instead of AppleScript?

There are two main issues with using JSON strings in AppleScript:

  1. Parsing the JSON string into usable data (records usually)
  2. Ease of use of the results of #1.

Of course, all of this is built-in to core JavaScript, used by JXA. And JXA gives you the same access to Apple Events as AppleScript.

Parsing JSON strings is a very simple, effective, one line, both to parse, and then to convert to string.


//--- Parse JSON String to Create Object ---
var oMenu = JSON.parse(jsonString);

//--- Make Some Changes to oMenu ---
oMenu.MenuName = "Dinner Menu";

//--- Now Create a New JSON String ---
var menuJSONStr = JSON.stringify(oMenu);

//--- LOG Pretty JSON String ---
console.log(JSON.stringify(oMenu, null, 4));


JavaScript’s objects and arrays are much more powerful and easy to use than AppleScript records and lists. We all know how slow and difficult AppleScript records are.
For more info on JavaScript objects, see: JavaScript Objects in Detail

While I am far from a JXA expert, I’ll be glad to help you, or anyone, write (or convert from AppleScript) a JXA script that uses JSON strings.

Here’s the full JXA script. Open in Script Editor and be sure to set the Language to “JavaScript”;
(I had to use the AppleScript code tags since this forum does not provide tags for JavaScript)


//--- JSON String from Nigel's Post ---

var jsonString = `{ 
\"MenuID\":5, 
\"MenuVersion\":1, 
\"MenuName\":\"Lunch Menu\", 
\"MenuItems\":[ 
{ 
\"Name\":\"TUSCANI MEDITERRANEAN CON POLLO\", 
\"Description\":\"Pasta\", 
\"PKID\":2, 
\"ParentID\":1, 
\"Ingredients\":[ 
{ 
\"PKID\":123, 
\"IngName\":\"Cheese\", 
\"Included\":true, 
\"ExtraPrice\":0
}, 
{ 
\"PKID\":124, 
\"IngName\":\"Sausage\", 
\"Included\":false, 
\"ExtraPrice\":0.99
} 
], 
\"ItemPricing\":[ 
{ 
\"PKID\":456, 
\"SizeName\":\"Large\", 
\"SizePrice\":12.99
}, 
{ 
\"PKID\":678, 
\"SizeName\":\"Small\", 
\"SizePrice\":14.99
} 
]
} 
]
}`

//--- Parse JSON String into Complex Object ---
var oMenu = JSON.parse(jsonString);

//--- Make Some Changes to oMenu ---
oMenu.MenuName = "Dinner Menu";

//--- Now Create a New JSON String ---
var menuJSONStr = JSON.stringify(oMenu);

//--- LOG Pretty JSON String ---
console.log(JSON.stringify(oMenu, null, 4));

Well. A trifle on the heretical side, perhaps. :wink:

But seriously, although this thread’s specifically about getting JSON data with AppleScript, if anyone thinks the OP may find it more convenient to use JavaScript or anything else instead, I reckon it’s legitimate to mention it.

Actually, since JavaScript can be used with ‘osascript’, it suggests a way to make the shell scripts discussed above more reliable, which would be handy on systems not able to use ASObjC. Namely, use JavaScript’s JSON.parse() command to produce very nearly the required ‘run script’ string — hopefully taking care of all the interpretation pitfalls — and just leave sed the relatively uncontroversial task of replacing the quotes round keys with bars. It’s quite easy to extend the sed regex so that it doesn’t act when quotes are escaped in string “values”:

And since JSON.parse() takes care of the line endings, there’s no danger of sed any deleting line endings or other control codes in strings.

Unfortunately, the JSON string has to be double-escaped before being fed into the JavaScript code in the shell script. I’ve used sed for this, tick-enquoting the string at the same time:


Here’s the full script. The data are slightly different from in previous posts in that the “Name” value contains a linefeed and a quoted section followed by a colon:


set jsonString to "{ 
    \"MenuID\":5, 
    \"MenuVersion\":1, 
    \"MenuName\":\"Lunch Menu\", 
    \"MenuItems\":[ 
       { 
            \"Name\":\"TUSCANI \\nMEDITERRANEAN \\\"CON  POLLO\\\":\", 
            \"Description\":\"Pasta\", 
            \"PKID\":2, 
            \"ParentID\":1, 
            \"Ingredients\":[ 
               { 
                    \"PKID\":123, 
                    \"IngName\":\"Cheese\", 
                    \"Included\":true, 
                    \"ExtraPrice\":0
               }, 
               { 
                    \"PKID\":124, 
                    \"IngName\":\"Sausage\", 
                    \"Included\":false, 
                    \"ExtraPrice\":0.99
               } 
           ], 
            \"ItemPricing\":[ 
               { 
                    \"PKID\":456, 
                    \"SizeName\":\"Large\", 
                    \"SizePrice\":12.99
               }, 
               { 
                    \"PKID\":678, 
                    \"SizeName\":\"Small\", 
                    \"SizePrice\":14.99
               } 
           ]
       } 
   ]
}"

do shell script ("osascript -l JavaScript -s s -e " & "'JSON.parse('\"$(sed 's/\\\\/&&/g; 1s/^/`/; $s/$/`/' <<<" & quoted form of jsonString & " )\"');' | sed -E 's/([^\\])\"([^\"]+)\"[[:space:]]*:/\\1|\\2|:/g;'")
run script result

Unless I’m misunderstanding you, the point is somewhat moot in that JXA was only introduced with macOS 10.10, by which time ASObjC was already available.

For the record, there are a couple of ASObjC ways to call JXA. The first is via OSAKit:

use AppleScript version "2.3.1"
use framework "Foundation"
use framework "OSAKit"
use scripting additions

set jsonString to "{ 
--
}"

set theLang to current application's OSALanguage's languageForName:"JavaScript"
set theScript to current application's OSAScript's alloc()'s initWithSource:("JSON.parse(`" & jsonString & "`)") language:theLang
set {theResult, theError} to theScript's executeAndReturnError:(reference)
theResult as record

You can also call individual functions using OSAKit.

The other way is a bit faster, using JavaScriptCore:

use scripting additions
use framework "Foundation"
use framework "JavaScriptCore"

set jsonString to "{ 
--
}"

set theContext to current application's JSContext's new()
-- get the JSON value as a record
set theJSVaue to theContext's evaluateScript:("JSON.parse(`" & jsonString & "`)")
theJSVaue's toObject() as record
-- or extract a value from the JSON
set theJSVaue to theContext's evaluateScript:("JSON.parse(`" & jsonString & "`).MenuName")
theJSVaue's toObject() as text

You can also get an set JXA properties. For example:

use scripting additions
use framework "Foundation"
use framework "JavaScriptCore"

set jsonString to "{ 
--
}"

theContext's setObject:jsonString forKeyedSubscript:"jsonString"
set theJSVaue to theContext's evaluateScript:("JSON.parse(jsonString).MenuName")
theJSVaue's toObject() as text

Ah. Sorry. I was assuming that since osascript’s -l option has been available since OS 10.4, and it’s been possible to choose the default language in Script Editor and to run JavaScripts in Safari since long before then, JavaScript must have been one of the languages available on the system. But it wasn’t. The script doesn’t work in either Tiger or Leopard on my G5, complaining that there’s ‘no such component: “JavaScript”.’ It must have been necessary back then to download the language from a third party.

That’s right. In theory anyone can write their own OSA component, but in practice very few have.

This background app also works and is pretty straight forward:

https://itunes.apple.com/us/app/json-helper-for-applescript/id453114608?mt=12

Hey Shane

the Json I’m parsing are fields in a records field so if there are no fields there is still a record and so the code doesnt error.
My solution is to count the fields of theJSON, is there a faster way?

	set theJSON to (current application's NSJSONSerialization's JSONObjectWithData:((current application's NSString's stringWithString:jsonString)'s dataUsingEncoding:(current application's NSUTF8StringEncoding)) options:0 |error|:())
	log theJSON
	if (count of fields of |records| of theJSON) is 0 then
		display alert "Sku not found"
		error number -128
	end if

If I understand correctly, that’s reasonable. But the approach of asking for fields of |records| of… is forcing coercion of a dictionary to a record. If there’s not much info, that’s neither here nor there, but you can avoid it, probably something like this:

if (theJSON's valueForKeyPath:"records.fields"))'s |count|() is 0 then

But it’s hard to be sure without seeing what the JSON value looks like.