Applescript to read mp3 metatags and sort file into folder

Thanks, Nigel.

Yvan has also found a couple of issues with accents. It looks like mdls is escaping non-ASCII characters – so “e\U0301” for “é”, for example. My guess is this is because the record is normally output in the old NeXT property list format, which was ACSII-only. This makes the mdls method a bit less attractive.

There also seems to be some other mangling of some accented characters going on, which looks like an underlying Unicode normalization issue.

:cool::cool::cool: That’s a pretty nice way to work around the asynchronous behavior of NSMetadataQuery.

Go to messages #53 & 54

Yvan KOENIG running Sierra 10.12.4 in French (VALLAURIS, France) samedi 15 avril 2017 17:00:43

Thanks, Yvan. I’ve corrected the script in post #21.

I’ve done some further testing on this and I don’t think my original guess is correct. What seems to be happening is that characters like é are returned fine in fields that store a string, such as kMDItemComment and kMDItemComposer, but mdls returns them as escaped when they are in fields that are stored as lists, such as the kMDItemAuthors field. It happens both with and without the -raw argument. My new guess is that the escaping is a side effect of how the data in such fields is converted to text.

It turns out this is a non-issue, in that the mangling in the sample file appears to have been done in the storing of the metadata.

The bottom line is that there appear to be issues with using mdls from AppleScript.

:slight_smile: Thanks. It’s kind of obvious in a way, but I confess it felt like a bit of a Eureka moment when I thought of it. AppleScript run loops :slight_smile:

Hi Yvan,

Your correct your script from Post 17 works, I will try not to use copy/paste method.

I have still been refining. I want it that if a genre, key, title, artist tag are not filled out (null). That it will prompt you (showing the file) therefore you have to manually change it if you want the script to move it.

I also made a change so it prompts when the file already exists in that target folder location. So if your doing bulk mp3/aiffs it will prompt you showing you the offending mp3/aiff file that is already there. I have also allowed the script to accept .aiff files.

The next step part I’m trying to work out is how to select a folder, and then do a recursive selection of files in that and subfolders for any mp3 or aiff. As I have a few hard drives of mp3s sorted in a deep folder structure.

the script so far…


use AppleScript version "2.4" # requires at least Yosemite
use scripting additions
use framework "Foundation"
-- Creates a new folder. There is no error if the folder already exists, and it will also create intermediate folders if required
on createFolder:POSIXPath
	log "point 1"
	set |⌘| to current application
	log "point 2"
	set theFolderURL to |⌘|'s |NSURL|'s fileURLWithPath:POSIXPath
	log "point 3"
	set theFileManager to |⌘|'s NSFileManager's defaultManager()
	log "point 4"
	set {theResult, theError} to theFileManager's createDirectoryAtURL:theFolderURL withIntermediateDirectories:true attributes:(missing value) |error|:(reference)
	log "point 5"
	if not (theResult as boolean) then error (theError's |localizedDescription|() as text)
	
	log "point 6"
end createFolder:

set theFiles to choose file with prompt "Please select an MP3/AIFF file" of type {"mp3", "aiff"} with multiple selections allowed  #TRYING TO FIND HOW TO SELECT BULK FILES FROM FOLDER/SUBFOLDERS??????

repeat with aFile in theFiles
	my treatThisFile(aFile)
end repeat

on treatThisFile(theFile)
	set posixFilePath to the quoted form of (POSIX path of theFile)
	
	set genreTag to (do shell script "mdls " & posixFilePath & " -name kMDItemMusicalGenre")
	set artistTag to (do shell script "mdls " & posixFilePath & " -name kMDItemAuthors")
	set tempoTag to (do shell script "mdls " & posixFilePath & " -name kMDItemTempo")
	set keyTag to (do shell script "mdls " & posixFilePath & " -name kMDItemKeySignature")
	set titleTag to (do shell script "mdls " & posixFilePath & " -name kMDItemTitle")
	
	set theGenre to text ((offset of "= " in genreTag) + 3) through -2 of genreTag
	set theArtist to text ((offset of "= " in artistTag) + 9) through -4 of artistTag
	set theTempo to text ((offset of "= " in tempoTag) + 2) through -1 of tempoTag
	set theKey to text ((offset of "= " in keyTag) + 3) through -2 of keyTag
	set theTitle to text ((offset of "= " in titleTag) + 3) through -2 of titleTag
	
	#SOME MP3 TAGS ARE NOT PROPERLY FILLED OUT- THIS BELOW PART DOESNT WORK AS SCRIPT PROCEEDS??????.
	
        if {theKey, theArtist, theTempo} is "null" then
		display dialog "Error: " & theTitle & " has a blank field - Please Fix"
		
	end if
	
	# For some files, theTitle is returned as "null" so it may generate duplicates # ADDED
	# I choose to use the original name deprieved of the extension # ADDED
	
	if theTitle is "null" then # ADDED
		tell application "System Events" # ADDED
			set theTitle to text 1 thru -5 of (get name of theFile) # ADDED
		end tell # ADDED
	end if # ADDED
	
	set renamedTitle to ("|" & theTempo & "-" & theKey & " |  " & theTitle)
	-- set name of theFile to renamedTitle # MISPLACED must be in a finder or a System Events block
	
	set DestFolder to ("/Volumes/Production/test/" & theGenre & "/" & theArtist)
	--set DestFolder to ("/Volumes/Macintosh HD/Users/Important/Test/" & theGenre & "/" & theArtist)
	
	my createFolder:DestFolder
	
	set DestFolder to POSIX file DestFolder
	
	tell application "Finder"
		set theExtension to name extension of theFile
		set name of theFile to renamedTitle & "." & theExtension  #AS MAYBE AIFF OR MP3
		try
			move theFile to DestFolder # EDITED
		on error errMsg
			display dialog renamedTitle & errMsg  #IF ALREADY EXISTS WILL PROMPT
		end try
		
	end tell
	
end treatThisFile


Go to messages #53 & 54

Yvan KOENIG running Sierra 10.12.4 in French (VALLAURIS, France) dimanche 16 avril 2017 11:56:48

I’ve been looking at this myself, although I don’t have any mp3s with accented characters in their metadata and have had to rely on scripts that Yvan’s sent me which contain substitution handlers to correct the results he gets. :wink:

The two representations of “é” that Yvan was getting from mdls, according to one of his scripts, were “A\U0303\U00a9” and “e\U0301”.

“e\U0301” represents the letter “e” followed by a combining acute accent. Together they produce the letter “é”, which character also exists in its own right with the Unicode value E9.

In UTF-8, this E9 character is represented by the sequence C3 A9. A9 is, tantalizingly, the code at the end of Yvan’s other result, “A\U0303\U00a9”. C3, if it’s taken as a Unicode value in its own right instead of part of a UTF-8 sequence, is the code for “Ô (capital A with a tilde). This character can also be produced by following an ordinary “A” with a combining tilde, whose Unicode value happens to be 0303. So “A\U0303\U00a9” is apparently a complete reinterpretation of the UTF-8 sequence for “é”, whereas “e\U0301” is simply a reinterpretation of the character itself. :slight_smile:

Simple “\Uhhhh” sequences can be reconstituted easily enough using ASObjC (!):

use framework "Foundation"

set |⌘| to current application
set aResult to |⌘|'s class "NSString"'s stringWithString:("Le gar\\U00e7on est arrive\\U0301.")

set dataObj to aResult's dataUsingEncoding:(|⌘|'s NSASCIIStringEncoding)
set aResult to |⌘|'s class "NSString"'s alloc()'s initWithData:(dataObj) encoding:(|⌘|'s NSNonLossyASCIIStringEncoding)
--> (NSString) "Le garçon est arrivé."

This doesn’t work with UTF-8-derived sequences:

use framework "Foundation"

set |⌘| to current application
set aResult to |⌘|'s class "NSString"'s stringWithString:("Le garA\\U0303\\U00a7on est arriveA\\U0303\\U00a9.")

set dataObj to aResult's dataUsingEncoding:(|⌘|'s NSASCIIStringEncoding)
set aResult to |⌘|'s class "NSString"'s alloc()'s initWithData:(dataObj) encoding:(|⌘|'s NSNonLossyASCIIStringEncoding)
--> (NSString) "Le garçon est arriveé."

And the wrong characters in this result appear to be the ones Yvan’s getting from the NSMetadataItem method.

The only point I’d add to this is that in the example I saw that had a “A\U0303\U…” sequence (for a ç, but otherwise similar), the ASObjC code returned it as Ã…, and iTunes also displayed it that way. Given that iTunes is reading from the .mp3 file’s metadata, and not the Spotlight metadata, that makes me think that that part of the problem is unrelated to the code at hand, and may well have been encoded into the file incorrectly in the first instance. In other words, what I think is happening in the mdls process is that à is being decomposed to A\U0303, and the following character is then escaped to \U00a9 or whatever.

For the below I was wanting that if either theGenre,theArtist,theTempo or theKey contain a null, then to prompt saying which file it is and then NOT move the file and continue with the next file.


   #SOME MP3 TAGS ARE NOT PROPERLY FILLED OUT- THIS BELOW PART DOESNT WORK AS SCRIPT PROCEEDS??????. # problem solved below
   if (count theGenre) ≤ (count "(null)") and theGenre contains "null" then set theGenre to "(no Genre)" # ADDED
   if (count theArtist) ≤ (count "(null)") and theArtist contains "null" then set theArtist to "(no Artist)" # ADDED
   if (count theTempo) ≤ (count "(null)") and theTempo contains "null" then set theTempo to "(no Tempo)" # ADDED
   if (count theKey) ≤ (count "(null)") and theKey contains "null" then set theKey to "(no Key)" # ADDED

So it’s effectively a non-problem. I suspect the OP doesn’t have many French MP3s anyway, correctly coded or otherwise.

Since there’s been no feedback about that, and since the OP appears to have ignored the two scripts I posted, I don’t think there’s much else I can contribute to this thread. I’ve replaced the mainBusiness() handler in the ASObjC script with the working version of Shane’s and I’ve updated the “vanilla” script to handle null results in the same way.

Go to messages #53 & 54

Yvan KOENIG running Sierra 10.12.4 in French (VALLAURIS, France) dimanche 16 avril 2017 18:10:33

NS¿¿¿¿¿¿StringEncoding should do it. :wink:

use AppleScript version "2.4" # requires at least Yosemite
use scripting additions
use framework "Foundation"

# Decode the string extracted by ASObjC from the metadatas.

my decodeText("David Guetta, Giorgio Tuinfort, Frédéric Riesterer, Taio Cruz, Nick Van De Wall, Rico Love, Raymond Usher & Aviici")
-- my decodeText("David Guetta, Giorgio Tuinfort, Frédéric Riesterer, Taio Cruz, Nick Van De Wall, Rico Love, Raymond Usher & Aviici")

on decodeText(theText)
	set |⌘| to current application
	
	set theText to |⌘|'s class "NSString"'s stringWithString:(theText)
	-- If the string contains at least two consecutive 8+bit characters, assume it's a mangled result.
	if ((theText's rangeOfString:("[\\u0080-\\U0010ffff]{2}") options:(|⌘|'s NSRegularExpressionSearch))'s |length|() > 0) then
		set dataObj to (theText's dataUsingEncoding:(|⌘|'s NSISOLatin1StringEncoding))
		set theText to (|⌘|'s class "NSString"'s alloc()'s initWithData:(dataObj) encoding:(|⌘|'s NSUTF8StringEncoding))
	end if
	
	return theText -- as text
end decodeText

Thanks Nigel
I would have certainly not build that.

Yvan KOENIG running Sierra 10.12.4 in French (VALLAURIS, France) dimanche 16 avril 2017 19:52:52

Hi Yvan.

I’ve just shortened it considerably!

Although they’ll still get bitten if they’re fans of Beyoncé…

But the version for use with mdls is slightly longer:

use AppleScript version "2.4" # requires at least Yosemite
use scripting additions
use framework "Foundation"

# Decode the string extracted by mdls from the metadata.

my decodeText("Le garA\\U0303\\U00a7on est arrivA\\U0303\\U00a9.")
my decodeText("Le gar\\U00e7on est arrive\\U0301.")

on decodeText(theText)
	set |⌘| to current application
	
	set theText to |⌘|'s class "NSString"'s stringWithString:(theText)
	-- Set up and use a regex to find Unicode substitute expressions of the garbled kind (misderived from UTF-8 sequences).
	set aHandyRegex to |⌘|'s class "NSRegularExpression"'s regularExpressionWithPattern:("([[:alpha:]]\\\\U[[:hex:]]{4})(\\\\U[[:hex:]]{4})") options:(0) |error|:(missing value)
	set regexMatches to aHandyRegex's matchesInString:(theText) options:(0) range:({0, theText's |length|()})
	set matchCount to (count regexMatches)
	-- If none are found, look for Unicode substitute expressions of the correct kind instead.
	set containsGarbledBlips to (matchCount > 0)
	if (not containsGarbledBlips) then
		set aHandyRegex to |⌘|'s class "NSRegularExpression"'s regularExpressionWithPattern:("(\\\\U[[:hex:]]{4})") options:(0) |error|:(missing value)
		set regexMatches to aHandyRegex's matchesInString:(theText) options:(0) range:({0, theText's |length|()})
		set matchCount to (count regexMatches)
	end if
	-- If either kind is found, reconstitute the original characters and substitute them for the matched expressions.
	if (matchCount > 0) then
		set theText to theText's mutableCopy()
		repeat with i from (count regexMatches) to 1 by -1 -- Reverse loop because theText's length will change.
			-- With each match, get the text matching the first first capture group in the regex.
			set thisMatch to item i of regexMatches
			set subrange1 to (thisMatch's rangeAtIndex:(1))
			set component1 to (theText's substringWithRange:(subrange1))
			-- Convert it to data using 7-bit ASCII encoding.
			set dataObj to (component1's dataUsingEncoding:(|⌘|'s NSASCIIStringEncoding))
			-- Convert it back to a character using "non-lossy ASCII" encoding.
			set reconstitutedCharacter to (|⌘|'s class "NSString"'s alloc()'s initWithData:(dataObj) encoding:(|⌘|'s NSNonLossyASCIIStringEncoding))
			-- If dealing with misderived expressions, perform a further twiddle.
			if (containsGarbledBlips) then
				-- Get the text matching the other capture group in the regex.
				set subrange2 to (thisMatch's rangeAtIndex:(2))
				set component2 to (theText's substringWithRange:(subrange2))
				-- Convert that to data using 7-bit ASCII encoding.
				set dataObj to (component2's dataUsingEncoding:(|⌘|'s NSASCIIStringEncoding))
				-- Convert it back to a character using "non-lossy ASCII" encoding.
				set component2 to (|⌘|'s class "NSString"'s alloc()'s initWithData:(dataObj) encoding:(|⌘|'s NSNonLossyASCIIStringEncoding))
				-- Convert the first character back to data, now with ISO Latin-1 encoding.
				set dataObj to (reconstitutedCharacter's dataUsingEncoding:(|⌘|'s NSISOLatin1StringEncoding))'s mutableCopy()
				-- Convert the second character to data too and append it to the data from the first.
				tell dataObj to appendData:(component2's dataUsingEncoding:(|⌘|'s NSISOLatin1StringEncoding))
				-- Convert the combination back to a single character using UTF-8 encoding.
				set reconstitutedCharacter to (|⌘|'s class "NSString"'s alloc()'s initWithData:(dataObj) encoding:(|⌘|'s NSUTF8StringEncoding))
			end if
			-- Replace the whole of the matched expression in the text with the character derived from it.
			tell theText to replaceCharactersInRange:(thisMatch's range()) withString:(reconstitutedCharacter)
		end repeat
	end if
	
	return theText -- as text
end decodeText

Ah yes. Or of Blue Öyster Cult. Mmm. That ages me …

In case there’s anyone still hanging in here, I maintain that making these “corrections” is wrong. If the file says “François”, even if the correct name is “François”, the code should return what the file (and iTunes) says, in this case “François”.