coreML analysis using ASOC

nowsnothetime · April 21, 2023, 11:14pm

I’ve created a coreML model with data for identifying a bass drum and am trying to write a script in ASOC to analyse and identify an audio sample to detect wether or not it is a bass drum. This is the script I have so far, any tips as to where I’m going wrong would be much appreciated.

Thanks in advance


use framework "Foundation"
use framework "CoreML"
use framework "AVFoundation"

-- Load the CoreML model
set modelPath to "/path/bassDrum.mlmodel"
set modelURL to current application's |NSURL|'s fileURLWithPath:modelPath
set mlModel to current application's |MLModel|'s modelWithContentsOfURL:modelURL error:(missing value)

-- Create a CoreML model prediction function
on predictBassDrum(input)
    set inputFeatures to current application's |MLAudioElectronicaFeatureProvider|'s new()
    inputFeatures's setInputAudio(input)
    set inputDict to inputFeatures's dictionaryBySettingValuesForKeys:(missing value)
    set outputDict to mlModel's predictionFromFeatures:inputDict error:(missing value)
    set outputProb to (outputDict's valueForKey:"classLabelProbs")'s valueForKey:"Bass Drum"
    return outputProb
end predictBassDrum

-- Test the model with an audio file
set audioPath to "~/Desktop/audiofile.wav"
set audioURL to current application's |NSURL|'s fileURLWithPath:audioPath
set {audioData, error} to current application's |NSData|'s dataWithContentsOfURL:audioURL options:(missing value) |error|:(reference)
if audioData = missing value then
    display dialog "Error reading audio file: " & (error's localizedDescription()) buttons {"OK"} default button "OK" with icon stop
else
    set audioFormatDescriptionRef to current application's |AVAudioFormat|'s alloc()'s initWithCommonFormat:(current application's |AVAudioPCMFormat|'s |AVAudioPCMFormatFloat32|) sampleRate:(48000) channels:(1) interleaved:(true)
    set audioBuffer to current application's |AVAudioPCMBuffer|'s alloc()'s initWithPCMFormat:audioFormatDescriptionRef frameCapacity:(audioData's length) / 4
    set mutableAudioBufferList to audioBuffer's mutableAudioBufferList()
    mutableAudioBufferList's mNumberBuffers = 1
    mutableAudioBufferList's mBuffers|mNumberChannels|'s intValue() = 1
    mutableAudioBufferList's mBuffers|mData|'s bytes() = (audioData's bytes())
    mutableAudioBufferList's mBuffers|mDataByteSize|'s intValue() = (audioData's length())
    set inputAudio to audioBuffer
    
    -- Call the prediction function
    set isBassDrum to predictBassDrum(inputAudio)
    
    -- Check if the sample is a bass drum
    if isBassDrum > 0.5 then
        display dialog "This sample is a bass drum."
    else
        display dialog "This sample is not a bass drum."
    end if
end if

Piyomaru · April 22, 2023, 6:03am

I wrote CoreML image classifier script in five years ago.

http://piyocast.com/as/archives/4853

The image classifier model is more easier and simpler than your model.
And…nobody can make advice without your mlmodel file.

I read informations from mlmodel file by using Netron.

And…there is no evidence of your mlmodel’s capability.
We don’t know about your model’s test score on Apple’s CreateML.

This script seems to the answer from ChatGPT.