ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

QTJ Audio
Pages: 1, 2

What Planet Is This?

Now that we understand the basics of playing audio with QuickTime, let's think about what else we'd need to provide a more complete player application to end users.



One of the most obvious needs for a modern player is the ability to present metadata about the current song: information such as the title, the artist's name, what album it's from, etc. Practically any player puts this information front and center in the GUI.

There are different schemes for different audio formats, since some were designed to contain metadata and others weren't. MP3s, for example, weren't designed with these needs in mind -- arguably the only "metadata" per se is a copyright bit in the MPEG frame header. However, the ID3 standard was cleverly developed as a means of attaching metadata to MP3 files by defining a format that could be placed inside of an MP3 file but outside of the individual media frames. Typically, this information is simply placed at the beginning of an MP3 file, before its first MPEG frame.

When we open an MP3 file in QuickTime, we're really importing it, changing it into a QuickTime movie in memory. In the course of doing this, the ID3 data is parsed and placed in the movie's structure. If you recall from an earlier article on the QuickTime file format, QuickTime movies are represented both in memory and on disk as a tree of "atoms." These atoms can either contain data or other atoms, but not both. Typically, the top level of a self-contained movie file will contain an mdat atom to hold the media samples and a moov atom, which defines the movie's structure. The moov contains multiple trak structures, and also a handy atom called udta, short for "user data."

When an MP3 is imported, the ID3 tags become part of this user data atom. An Apple Q&A describes how an application can get values out of the user data: we just look for atoms in the user data whose atom types match some constants reserved for metadata. For example, to get the name of a song, we look in the user data for an atom called ©nam, while the album name is in an atom called ©alb. A full set of these constants is defined in QuickTime's Movies.h file.

It's important to remember that those atom types are not Strings. They're QuickTime "four character codes," meaning they're 32-bit int representations of four 8-bit ASCII characters. So, if we represent things in hex (which is actually easiest in this case), ©alb is an int made from the characters A9, 61, 6C, and 62, and thus is 0xA9616C62.

Once we know the atom type as a four-character code, getting the atom's contents from the Movie is pretty straightforward. We get a UserData object with Movie.getUserData(), and then find our atom and retrieve its contents with UserData.getTextAsString(). This method takes three arguments: an int for the requested atom type, an index that indicates our interest in the index-th instance of the given type (note that multiple atoms of the same type are legal, and also that this call is one-based, not zero-based), and finally an "international region tag" that takes one of the lang... constants from quicktime.io.IOConstants (langUnspecified is a useful wildcard value here).

This article's sample application, QTBebop, contains a MetadataJTable with a setMovie method that retrieves all of the defined metadata entries and turns them into the model of a Swing JTable. It defines all of the constants from Movies.h in an array called TAG_NAMES and looks for matches in a UserData object like this:

ArrayList foundTags =
    new ArrayList (TAG_NAMES.length);
ArrayList foundValues =
    new ArrayList (TAG_NAMES.length);
for (int i=0; i<TAG_NAMES.length; i++) {
    try {
        int type =
            ((Integer) TAG_NAMES[i][0]).intValue();
        String value = 
            userData.getTextAsString (type,
                      1,
                      IOConstants.langUnspecified);
        if (value != null) {
           foundTags.add (TAG_NAMES[i][1]);
           foundValues.add (value);
        }
    } catch (QTException qte) {} // didn't have tag
} // for

After this section, the foundTags and foundValues are converted into a two-dimensional array and passed to a DefaultTableModel constructor.

Notice the squashed catch block. If a given type is not found, QuickTime throws a QTException. For our current purposes, we do nothing, because this exception simply means that one of the many possible metadata atom types wasn't found in the user data. Returning an error code may make sense in C, but in Java, using exceptions to control program flow is considered something of a worst practice because of the expense of building a stack trace that won't be used, since the exception isn't really signaling an error state. From a purely Java point of view, it would be nice if QTJ had something like a UserData.hasType(int) method, so we could check for an atom without the performance hit of building a throwaway stack-trace if it isn't there.

That said, the MetadataJTable does its job, and works fairly quickly. Figure 1 shows an example of the table, running against an MP3 I ripped from my CD collection:

Parsed ID3 tags
Figure 1. Parsed ID3 tags

If you look closely at this figure, you might notice something missing: the artist! This points out a rather serious limitation of QuickTime's ID3 tag parsing. This file does have an artist tag, but it's in Unicode: "菅野よう子" (or, in Western characters, "Yoko Kanno," composer of the soundtracks for Cowboy Bebop, The Vision of Escaflowne, and other TV shows and movies). It seems, and has been confirmed on the quicktime-api mailing list, that QuickTime ignores any tag whose value isn't in plain old ASCII. Note that it doesn't help to supply a more appropriate language value to the getTextAsString() method -- there's no ©ART atom to call it on!

So how does iTunes support Unicode ID3 tags? Presumably, it has its own ID3 library, which makes sense, considering that it needs to both read and write ID3 data. So while QuickTime gives us easy ID3 tag parsing, the lack of support for international character sets might make you consider using another library for tag parsing, or rolling your own.

Bad Dog, No Biscuit

Since we know that QuickTime is used to play the AAC files supported by iTunes 4 and sold by the iTunes Music Store, we'd want and expect it to be able to handle metadata from those files, too.

In fact, since the M4A format for user-ripped AACs and the M4P for Apple-DRM'ed songs are both in the MPEG-4 file format, which itself was adapted from the QuickTime file format, we might reasonably expect that their metadata tags are already in the user-data atom, arranged in the same way that ID3 tags are parsed.

Yeah, we might expect that ... but we'd be wrong.

The metadata is still in the movie's user data, but in a much different and apparently undocumented format. So we have to examine it by hand. (Sigh ... This kind of thing is why I keep HexEdit on my dock.)

These iTunes-ripped files have an atom in the user data called meta. Its contents look like valid atoms, but aren't, since the first four bytes, which should be the size of the first child atom, are 0x00000000. Maybe that's meant to throw off QuickTime file parsers. Interestingly, a set of valid atoms begins after that, with four bytes of size and a four-byte type, just as we'd expect.

meta has a child called ilst, which in turn has children that use tag-name constants that we saw before. We can't use getUserDataAsString to get values from these atoms because we're now two levels below the user data, and besides, we're not through with undocumented oddities yet. In this AAC world, these atoms seem not to contain data, but rather a child atom called data, which contains eight junk bytes (perhaps flags) and then, finally, the data for the tag.

MetadataJTable also handles this kind of metadata. Its strategy in setMovie(), which kicks off a parse, is to look in the user data for the meta atom. If absent, the movie is assumed to be an ID3-tagged MP3 and uses the previously-described code. If it finds meta, then it looks for an ilst atom. If that succeeds, it starts looking for atoms named by TAG_NAMES. When one is found, it jumps ahead 24 bytes (to skip the size, type, size, "data," and 8 junk bytes) and reads the value.

An example of parsing a song purchased from the iTunes Music Store is shown in Figure 2.

Parsed .mp4 metadata
Figure 2. Parsed M4P metadata

You Make Me Cool

Surprisingly, everything we've done so far is in the main QuickTime API and is not strictly limited to audio content. Again, this speaks to QT's worldview that anything it reads in is a movie. Still, there are cool features that are specific to audio that we get at by retrieving a "handler" for the low-level audio data.

One thing we might want to provide for an audio player is a visual representation of the sound. On a home stereo or professional recording or mixing equipment, this would be represented as level meters that show the intensity of various frequency bands at an instant in time. In iTunes, these values are used to distort the visualizations and express the sound data in a visually pleasing way.

We can get these levels from QuickTime by first getting an AudioMediaHandler, which provides methods for getting and setting balance and metering audio levels. It's interesting to note that this class is an interface, implemented by SoundMediaHandler, StreamMediaHandler, and MPEGMediaHandler. The first is used for audio files and sound tracks within normal QuickTime movies and the second for streaming data, and the third represents the long-annoying fact that QuickTime sees multiplexed MPEG-1 files not as separate audio and video tracks but as a single opaque media type, which makes extracting sound and video from MPEG-1 quite difficult. Fortunately, MPEG-4 files read in as normal QuickTime movies, with separate video and audio tracks.

But how do we get an AudioMediaHandler? Again, it's helpful to state things in terms of QuickTime's view of the world:

  • Movies have Tracks.
  • Each Track has exactly one Media.
  • Each Media has an associated MediaHandler.

So getting the AudioMediaHandler consists of code like the following:

AudioMediaHandler audioMediaHandler = null;
for (int i=1; i<=movie.getTrackCount(); i++) {
    Track track = movie.getTrack(i);
    Media media = track.getMedia();
    MediaHandler handler = media.getHandler();
    if (handler instanceof AudioMediaHandler) {
        audioMediaHandler =
            (AudioMediaHandler) handler;
        break;
   }
}

Notice that once again a QuickTime get-by-index call, Movie.getTrack() in this case, uses indices that start at 1, not 0.

Now that we have the AudioMediaHandler, we can set balance, bass, and treble, and monitor sound levels. The first two are trivial. For the third, we need to pass in a structure representing which sets of frequencies, or "bands," we want to monitor. We do this with a MediaEQSpectrumBands object, which wraps the desired bands. For the QTBebop sample application, I've used the bands shown by iTunes' graphic equalizer, represented by the array EQ_LEVELS. So setting up for monitoring looks like this:

int[] EQ_LEVELS = {
    32,
    64,
    125,
    250,
    500,
    1000,
    2000,
    4000,
    8000,
    16000
};

...

MediaEQSpectrumBands bands =
    new MediaEQSpectrumBands (EQ_LEVELS.length);
for (int i=0; i<EQ_LEVELS.length; i++) {
    bands.setFrequency (i, EQ_LEVELS[i]);
}
audioHandler.setSoundEqualizerBands (bands);
audioHandler.setSoundLevelMeteringEnabled (true);

To get the levels, we call getSoundEqualizerBandLevels(), passing in the number of bands that we set up in the first place (e.g., EQ_LEVELS.length). This returns an int array, with values from 0 to 255. The QTBebop sample app uses a javax.swing.Timer to call this method every 100 milliseconds and redraw an offscreen java.awt.Graphics buffer with rectangles of a height proportional to the returned level values -- in other words, the rectangle gets 0 height if the level is zero, and is the height of the buffer when the level is 255.

The resulting application is shown in Figure 3.

The QTBebop application
Figure 3. The QTBebop application, with level meter

Author's Note: When run on Mac OS X with Java 1.4.1, the scrubber bar has repaint problems when a file is opened but is not yet playing. It does not have problems on OS X's Java 1.3.1 or on Windows, so this may be a version-specific bug, and has been filed appropriately. You can look in the sample code for the many workarounds I tried to get the scrubber repainted correctly.

See You, Space Cowboy

Obviously, our sample application could benefit from a graphical upgrade to make the bars more attractive -- perhaps spacing between bars, LED-like blocks of color, use of red and yellow regions in the upper part of each level, or a "sticky" line that represents the peak of each band's frequency over the last second. Adding balance and bass/treble controls would also be an easy improvement.

A more significant feature to add would be support for audio streams. As covered much earlier in this series, you can create a Movie from a URL by creating a DataRef from the URL string, which you then pass to the static Movie.fromDataRef() method. In terms of playable URLs, QuickTime can play RTSP-streamed content, of course, and can handle Shoutcast-style HTTP-streamed audio by changing the URL's http: protocol to the pseudo-protocol icy:, as detailed in the QuickTime 6 documentation.

With its support for a huge number of formats and codecs, QuickTime Java offers a great engine for writing audio clients. Using the techniques in this article should get your application off to a strong start.

Example Code

Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.


Return to ONJava.com.