Saturday, June 2, 2012

Audio mix and record in Android

iOS offers, among its frameworks, many interesting features that allow to create audio tracks by simply mixing multiple tracks together. You can use Audio Unit and its methods, as described here:

But what if you need a similar result on Android? Android does't offer such feature in its audio framework. So I've spent a couple of days on google groups and stackoverflow, reading unanswered questions of android devs searching for a similar functionality on the Google mobile platform, or developed and released by third party contributors and external devs.
It appears there isn't nothing available.
So I've studied the problem and the tools I had to solve it. First let's see what possibilities the platform offers to play files.
Android audio framework consists of these main classes for audio playback:

  • MediaPlayer: useful to play compressed sources (m4a, mp3...) and uncompressed but formatted ones (wav). Can't play multiple sounds at the same time. [HIGH LEVEL methods]
  • SoundPool: can be used to play many raw sounds at the same time.
  • AudioTrack: can be used as SoundPool (raw sounds), but need to use threads to play many sounds at the same time. [LOW LEVEL methods]
I've found that AudioTrack works fine to play uncompressed raw data, and if you want to play multiple sounds at the same time, you can create different threads and start the playback in an asynchronous fashion.
Unluckily this is not always precise: sometimes you can experience a delay before a certain sound is played, and in such cases the final result is far from acceptable.

Another option is to mix sounds before playing them. This option offers you a nice plus: you obtain the mixed sound that is ready to be stored on file. If you mix sounds with SoundPool for instance, then when you play it, you cannot grab the output and redirect it to a file descriptor instead of to the audio hardware (headphones or speaker).
As mentioned at the beginning, there is no ready solution for such problem. But actually we will see the solution is rather trivial.

Before delving in the details of how 2 sounds can be mixed together, let's see how can we record a sound on Android. The main classes are:

  • MediaRecorder: sister-class of MediaPlayer, can be used to record audio using different codecs (amr, aac). [HIGH LEVEL methods]
  • AudioRecord: sister class of AudioTrack. It records audio in PCM (Pulse Code Modulation) format. It is the uncompressed digital audio format used in CD Audio, and it is very similar to .wav file format (the .wav file has 44 bytes header before the payload). [LOW LEVEL methods].

AudioRecord offers all the features we want be able to control: we can specify the frequency, the number of channels (mono or stereo), the number of bit per sample (8 or 16).

In the fragment of code posted above, there is a simple function that can be used to record a 44.1khz mono 16 bit PCM file on the external storage. The function is blocking so it must be run on a secondary thread; it continues to record until the boolean isRecording is set to false (for example when a timeout expires or when a user taps on a button).

And now comes the most interesting part: how to mix two sounds together?

Two digital sounds can be mixed easily if files have the same features (same number of channels, same bit per samples, same frequency). This is the simplest scenario and is the only one I'm covering in this post.
Every sample in such case is a 16 bit number. In java a short can be used to represent 16 bit numbers, and infact AudioRecord and AudioTrack work with array of shorts, which simply constitute the samples of our sound.

This is the main function used to mix 3 sounds together:

There are some complementary methods I'm not posting here because this post is already too long :) but these are some small hints of what they do:

  • createMusicArray reads the stream and returns a list of short objects (the samples)
  • completeStreams normalizes the streams by adding  a series of '0' shorts at the end of smaller files. At the end the 3 files have all the same length.
  • buildShortArray converts the list in an array of short numbers
  • saveToFile saves to file :)
The key point in the method is that we sum every sample together. We normalize the short to a float [-1,1] so we dont have under/overflow issues. At the end we reduce a bit the volume and we save it in the new array. That's it!

Of course this is the simplest scenario; if the samples have different frequency we should do other computations. But I think most of the time we want to mix sounds we can also control how they are recorded thus reducing its complexity a lot.

Once we have a PCM mixed sound, it can be transformed in a .wav file so that every player can read it. EDIT: as many people have asked me some more help, below it is the code snippet to build a short array from a file containing a raw stream.