Thursday, September 20, 2018
  • Index
  •  » Code Exchange
  •  » Native Applescript decoder for dynamic Uniform Type Identifiers

#1 2018-03-04 06:08:34 pm

bmose
Member
From:: Massachusetts
Registered: 2006-01-03
Posts: 281

Native Applescript decoder for dynamic Uniform Type Identifiers

NOTE:  An excellent resource for decoding and interpreting dynamic Uniform Type Identfiers, along with Objective-C and Swift decoders based on that resource, have been previously published. The current post describes a vanilla Applescript version of the decoder, decodeDynUTI, that uses an alternative approach to the bitwise operations of the other decoders, demonstrates satisfactory execution speed (< 0.01 seconds per conversion), and is readily incorporated into native Applescript and ASObjC code. (Tested in macOS 10.13.3)

Apple's Uniform Type Identifier (UTI) system provides an elegant way of typing file system items. A straightforward way to get an item's UTI in Applescript is to query the type identifier property of the item with the System Events application. For example:

Applescript:


tell application "System Events" to return type identifier of alias "...HFS path to jpeg image file..." --> "public.jpeg"

The text content of the UTI of these common data types immediately identifies the item's type, for example, "public.jpeg" for a JPEG image file, or "com.adobe.pdf" for an Adobe PDF file.

The operating system recognizes a large number of UTIs for common data types. However, as stated in Apple's documentation, one may occasionally encounter a file system item without an assigned UTI. This might be the case, for instance, with a file of a new or obscure file type that is not recognized by the operating system. In these cases, the operating system dynamically assigns a UTI starting with the domain name "dyn." followed (always, it seems) by the character "a" followed by a series of unintelligible characters (described as "opaque" in Apple's documentation.) Lurking beneath the "opaque" character sequence, however, lies encoded information. It turns out that the characters following "dyn.a" are encoded with a custom base-32 encoding scheme in which the characters abcdefghkmnpqrstuvwxyz0123456789 represent the decimal values 0 through 31. The dynamic UTI can then be decoded by first converting the characters following "dyn.a" into left-zero-padded 5-character bit strings derived from the custom base-32 encoding scheme, and then converting each successive 8-bit group in left-to-right order into their Unicode character equivalents.

Taking examples from the aforementioned resource, here is the output of the decoder run on the dynamic UTI "dyn.ah62d4r34gq81k3p2su1zuppgsm10esvvhzxhe55c":

Applescript:


decodeDynUTI("dyn.ah62d4r34gq81k3p2su1zuppgsm10esvvhzxhe55c")
--> "?0=7:3=text/X-frob:1=frob"

And here is the output for the dynamic UTI "dyn.ah62d4r3qkk7dgtpyqz6hkp42fzxhe55cfvy042phqy1zuppgsm10esvvhzxhe55c":

Applescript:

decodeDynUTI("dyn.ah62d4r3qkk7dgtpyqz6hkp42fzxhe55cfvy042phqy1zuppgsm10esvvhzxhe55c")
--> "?0=7,B:3=text/X-frob,image/X-frob:1=frob"

What do the decoded text strings mean? Although undocumented by Apple, the interpretation of the decoded strings is described in the aforementioned resource. Here are a few important highlights from that resource:

1) The decoded string consists of colon-delimited expressions of the form: [UTI]=[value]
2) If an expression has multiple values (e.g., multiple UTIs to which the item conforms), the values are separated by commas:
    [UTI]=[value1,value2,...]
3) The hexadecimal digits 0 through F are used as abbreviations for the following common UTIs:
    ?0: UTTypeConformsTo    (the purpose of the "?" prefix is unexplained; perhaps it signifies that the value, UTTypeConformsTo, is not a UTI)
    1:  public.filename-extension
    2:  com.apple.ostype
    3:  public.mime-type
    4:  com.apple.nspboard-type
    5:  public.url-scheme
    6:  public.data
    7:  public.text
    8:  public.plain-text
    9:  public.utf16-plain-text
    A:  com.apple.traditional-mac-plain-text
    B:  public.image
    C:  public.video
    D:  public.audio
    E:  public.directory
    F:  public.folder
4) The following control characters must be escaped with a reverse slash if they appear as literal characters in a UTI or value:
    , : = \ NUL

With this information at hand, we can now interpret the decoded results from the examples.

?0=7:3=text/X-frob:1=frob means:

    ?0=7  ->  UTTypeConformsTo=public.text  ->  the item conforms to the UTI public.text
    3=text/X-frob  ->  public.mime-type=text/X-frob  ->  the item's mime type is text/X-frob
    1=frob  ->  public.filename-extension=frob  ->  the item's filename extension is frob

?0=7,B:3=text/X-frob,image/X-frob:1=frob means:

    ?0=7,B  ->  UTTypeConformsTo=public.text,public.image  ->  the item conforms to both the UTI public.text and the UTI public.image
    3=text/X-frob,image/X-frob  ->  public.mime-type=text/X-frob,image/X-frob  -> the item's mime types are text/X-frob and image/X-frob  (note: the system only recognizes the first mime type, as discussed in the resource; this odd combination of mime types for a single item reflects that fact that the author used a contrived example simply for demonstration purposes)
    1=frob  ->  public.filename-extension=frob  ->  the item's filename extension is frob

Handler:

Applescript:


on decodeDynUTI(dynamicUTI)
   -- Decodes a custom base-32-encoded dynamic Uniform Type Identifier of the form "dyn.a...", and returns the decoded text string
   -- Note: The handler only recognizes dynamic UTIs whose first letter following the domain name "dyn." is "a"
   script util
       -- Custom base-32 encoding scheme characters and their 5-bit equivalent bitstring values
       property customBase32Chars : "abcdefghkmnpqrstuvwxyz0123456789"
       property bitstringValues : {"00000", "00001", "00010", "00011", "00100", "00101", "00110", "00111", "01000", "01001", "01010", "01011", "01100", "01101", "01110", "01111", "10000", "10001", "10010", "10011", "10100", "10101", "10110", "10111", "11000", "11001", "11010", "11011", "11100", "11101", "11110", "11111"}
       -- Main handler
       on run
           tell dynamicUTI
               -- Perform a preliminary validation of the input argument
               if (its class ≠ text) or (it does not start with "dyn.a") then error "The input argument is not a dynamic Uniform Type Identifier of the form \"dyn.a[...]\"."
               -- Handle the special case of a dynamic UTI without content
               if length = 5 then return ""
               -- Convert the relevant portion of the dynamic UTI to its decoded bitstring equivalent with the recursive handler dynUTIToBitstring
               set currBitstring to my dynUTIToBitstring(text 6 thru -1)
           end tell
           -- Convert the bitstring to a Unicode character string with the recursive handler bitstringToUnicodeString
           set decodedString to my bitstringToUnicodeString(currBitstring)
           -- Return the decoded string
           return decodedString
       end run
       -- Utility handlers
       on dynUTIToBitstring(currString)
           -- Converts a custom base-32-encoded dynamic UTI string to its equivalent bitstring by replacing each input character with a corresponding 5-bit substring
           tell currString
               -- Handle the special case of an empty input string
               if length = 0 then return ""
               -- Get the index position in the custom base-32 and bitstring lists of the input string's first character
               set tid to AppleScript's text item delimiters
               try
                   set AppleScript's text item delimiters to text 1
                   set currIndex to (my customBase32Chars's first text item's length) + 1
               end try
               set AppleScript's text item delimiters to tid
               -- Throw an error if the input string's first character doesn't match any entry in the custom base-32 list
               if currIndex > my customBase32Chars's length then error "The following character in the dynamic UTI is invalid: " & return & return & tab & (text 1)
               -- Get the 5-character bitstring equivalent of the first input character
               set currBitstring to my bitstringValues's item currIndex
               -- If the dynamic UTI consists of only one character, return its 5-bit equivalent value
               if length = 1 then return currBitstring
               -- Otherwise, return the first character's 5-bit equivalent value concatenated with the 5-bit equivalent values of the remaining characters obtained recursively through the current handler
               return currBitstring & my dynUTIToBitstring(text 2 thru -1)
           end tell
       end dynUTIToBitstring
       on bitstringToUnicodeString(currBitstring)
           -- Converts a bitstring to its equivalent Unicode string by replacing 8-bit substrings in left-to-right order with corresponding Unicode characters
           tell currBitstring
               -- If the input bitstring is empty (i.e., the dynamic UTI has been fully processed) or is < 8 bits long and consists only of extraneous "0"'s (as is sometimes encountered in valid dynamic UTIs), return the empty string
               if (length = 0) or (it is in "0000000") then return ""
               -- If the input bitstring is < 8 bits long and has non-zero content, throw an "extraneous bits" error
               if length < 8 then error "The input argument is invalid because " & ({"there is 1 leftover trailing non-zero bit", "there are " & length & " leftover trailing non-zero bits"}'s item (1 + ((length > 1) as integer))) & " after processing all 8-bit substrings."
               -- Get the Unicode character equivalent of the first 8 bits
               set currUnicodeChar to character id ((128 * (text 1) + 64 * (text 2) + 32 * (text 3) + 16 * (text 4) + 8 * (text 5) + 4 * (text 6) + 2 * (text 7) + (text 8)))
               -- If the input bitstring is only 8 bits long, return its Unicode character equivalent
               if length = 8 then return currUnicodeChar
               -- Otherwise, return the Unicode character equivalent of the first 8 bits concatenated with the Unicode character equivalents of the remaining 8-bit substrings obtained recursively through the current handler
               return currUnicodeChar & my bitstringToUnicodeString(text 9 thru -1)
           end tell
       end bitstringToUnicodeString
   end script
   -- Decode the input text string, and return the decoded string
   return (run util) as text
end decodeDynUTI

Note: A minor edit was made to the dynUTIToBitstring handler code without any material functional changes since the original submission.

Last edited by bmose (2018-03-04 10:44:17 pm)


Filed under: UTI, decoder, dynamic uti

Offline

 
  • Index
  •  » Code Exchange
  •  » Native Applescript decoder for dynamic Uniform Type Identifiers

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)