-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[NOT FOR MERGING] Added iOS TFLAudioRecord and GMLAudio in Sound Classification Sample #385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
priankakariat
commented
Apr 23, 2022
- Added TFLAudioRecord and GMLAudio which will be used in the task library for audio input.
- Modified the sample to use these instead of AudioInputManager
/** | ||
* Uses AVAudioConverter to acheive the desired sample rate since the tap on the input node raises an exception if any format other than the input nodes default format is specified. | ||
* completion handler structure is used as opposed to polling since this is the pattern followed by all iOS APIs delivering continuous data on a background thread. | ||
* Even if we implement a read method, it will have to return data in a callback since installTapOnBus delivers data in a callback on a different thread. There will also be extra overhead to ensure thread safety to make sure that reads and writes happen on the same thread sine GMLAudio buffer is meant to be non local. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khanhlvg Please see my note on why callback is used instead of polling like in Java.
Java AudioRecord read returns immediately as opposed to iOS installTapOnBus which delivers results in completion handler.
lite/examples/sound_classification/ios/SoundClassification/GMLAudio/GMLAudio.h
Outdated
Show resolved
Hide resolved
#import "GMLRingBuffer.h" | ||
#import "GMLUtils.h" | ||
|
||
#define SUPPORTED_CHANNEL_COUNT 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to support multiple channels model. Both Python and iOS APIs support that.
Multiple channels ByteBuffer can be stored in row-major order in float* array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channels should be interleaved in a flat array. I think that's the way the task library expects the data to be and Java and python works this way.
Also the supported channel count is only internal to GMLAudioRecord. GMLAudio still supports multiple channels. If it's a file source, then we don't use the AudioRecord and it will work fine.
The problem is that currently the only format supported by AVAudioEngine is mono when we request audio samples. iOS 14 upwards there maybe a way to manipulate the avaudio session to connect to stereo. But I am not sure if after doing this it will let us use the audio engine for tapping the stereo mic to get the samples.
We may have to create a full audio library using low level C APIs in iOS to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channels should be interleaved in a flat array. I think that's the way the task library expects the data to be and Java and python works this way.
Yes that's correct.
The problem is that currently the only format supported by AVAudioEngine is mono when we request audio samples. iOS 14 upwards there maybe a way to manipulate the avaudio session to connect to stereo. But I am not sure if after doing this it will let us use the audio engine for tapping the stereo mic to get the samples.
We may have to create a full audio library using low level C APIs in iOS to do this.
Can you implement it this way to make it consistent with the Python and Android API?
- No need to support 3 or more channels. https://github.com/tensorflow/tflite-support/blob/d6829295cd04284f219e6b08594c699fc9abcf1a/tensorflow_lite_support/java/src/java/org/tensorflow/lite/task/audio/classifier/AudioClassifier.java#L447
- For iOS 13 and lower: Record in mono and duplicate the same content across the two channels if the model requires 2 channels.
- For iOS 14 and later: Record in stero.
lite/examples/sound_classification/ios/SoundClassification/GMLAudio/GMLRingBuffer.m
Outdated
Show resolved
Hide resolved
lite/examples/sound_classification/ios/SoundClassification/GMLAudio/GMLAudio.h
Outdated
Show resolved
Hide resolved
lite/examples/sound_classification/ios/SoundClassification/GMLAudio/GMLAudio.h
Outdated
Show resolved
Hide resolved
* | ||
* @return An instance of GMLAudioFormat | ||
*/ | ||
- (BOOL)loadAudioRecordBuffer:(GMLFloatBuffer *)audioRecordBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change to loadFromAudioRecord:(TFLAudioRecord *)audioRecord withError:(NSError **)error
to align with the Python and Android APIs. Please also see the comments in TFLAudioRecord.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to match Adroid
* error failing to do so . | ||
* | ||
*/ | ||
- (void)checkPermissionsAndStartTappingMicrophoneWithCompletionHandler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to startRecordingWithError
and stop
to align with the Python and Android API.
The TFLAudioRecord should have a getter buffer
to return the internal buffer, rather than returning the internal buffer only when start recording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getter will have a completion handler then. Because iOS AudioEngine returns data in callback continuously. Audio callbacks in iOS is always returned in a background thread. So we will have to ensure reads and writes happen in the same thread as reading from a different thread will return out of sync data. For this a dispatch_barrier_async will have to be used which won't return immediately.
So if we cut out the completion handler for data delivery here, we will have a completion handler in read. Also we won't be able to keep track of errors in real time. If there is any error we will have to populate the buffer with -1s or something and leave it to the users to figure out the issue. The read method can't return a status too as read will happen in a callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what's important is not that reading and writing happens in the same thread, but reading and writing don't happen at the same time. You can use a GCD's serial queue to make sure that read and write to the AudioRecord's internal buffer don't happen at the same time. In the getter, you can use GCD's sync method to in the getter to wait for reading from the internal's queue to complete before returning the copy of the buffer values in the correct order to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this according to your suggestion. Do have a look
- (void)checkPermissionsAndStartTappingMicrophoneWithCompletionHandler: | ||
(void (^)(GMLFloatBuffer *_Nullable buffer, NSError *_Nullable error))completionHandler { | ||
- (void)startRecordingWithCompletionHandler: | ||
(void (^)(NSError *_Nullable error))completionHandler { | ||
[[AVAudioSession sharedInstance] requestRecordPermission:^(BOOL granted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this call requires user to add a permission description string to the Info.plist, I think we should let the app developer get the permission in their app, outside of the scope of the Task Library. This will be more aligned to the Android API. (Python doesn't have the concept of permission request).
Could you change the API as follow?
- Check if the app has recording permission.
- If it's denied, return a "permission denied" NSError.
- If it's undetermined, return an NSError with the error message suggesting the user to read the Apple's requestRecordPermission documentation and request permission.
- Change the startRecordingWithCompletionHandler method to startRecordingWithError as we don't async interface anymore.
https://developer.apple.com/documentation/avfaudio/avaudiosession/1616463-recordpermission
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed now.