Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

Extend JavaSound to Play MP3, Ogg Vorbis, and More

by The JavaZOOM Team

The JavaSound API adds audio capabilities to the Java platform. It's been part of J2SE since version 1.3 and it supports the WAV, AU, and AIFF audio formats, and provides MIDI support. It doesn't support some other audio formats, such as MP3, but it provides a flexible plugin architecture allowing any third-party vendor to add custom audio format support through the JavaSound Service Provider Interfaces (SPIs). This article deals with this plugin architecture and API, how to write and use a custom SPI implementation, how metadata such as title, artist, and copyright are exposed, and how multiple SPI implementations could be integrated in an application such as player or a game.

Plugin Architecture

The JavaSound API provides a plugin architecture, allowing third parties to support new formats such as MP3, Ogg Vorbis, FLAC, Monkey's Audio, and more. This architecture allows the JVM to discover and load plugins at runtime. Each plugin must implement the service provider interfaces. One implementation is needed for each new audio format supported. That's the reason why you can find one SPI implementation for MP3, one for Monkey's Audio, and so on.

To be loaded, the SPI implementation must be available in the JVM runtime classpath. To play audio, the JVM will look for javax.sound.sampled.spi.AudioFileReader and javax.sound.sampled.spi.FormatConversionProvider, text files stored in META-INF/services folder. These files contain the concrete classnames of the SPI implementation that will be instantiated. They are needed for loading and decoding audio data. Then, when an application needs to play an audio file, JavaSound will try each SPI implementation until throwing UnsupportedAudioFileException if none matches. Thus, for a JavaSound-based application (such as an audio player, game, educational program, etc.), developers don't have to pay attention to audio-format support. Instead, the application just needs to use the JavaSound API. SPI classes are needed at runtime only and not at build time, so, in addition to technical advantages, their use could have business advantages for GNU-GPL-based solutions integration.

From the SPI Provider Side: An MP3 Sample

Related Reading

Java Cookbook
By Ian F. Darwin

The JavaZOOM team provides an open source MP3 SPI implementation. It focuses on MP3 playing only. It relies on JLayer, an open source Java library that decodes and converts MP3 (MPEG 1, 2, and 2.5, Layers 1, 2, and 3) frames to PCM, the standard for uncompressed audio data. JavaSound service provider interfaces allows caller to read, convert, and write audio data, but to play MP3, we only need the read and convert features. JavaZOOM's MP3 SPI does not allow MP3 encoding.

Thus, the JavaSound API requires us to implement the AudioFileReader and FormatConversionProvider abstract classes. First, let's focus on our MpegAudioFileReader that extends AudioFileReader. Six methods must be implemented; three return an AudioFileFormat instance from an input (File, URL, or InputStream) and three return an AudioInputStream instance.

To avoid code duplication, we developed one more generic method for getAudioFileFormat:

Indeed, File and URL could be seen as InputStreams with a known length. We also did the same for getAudioInputStream. The work of our getAudioFileFormat is to read and parse the first MP3 frame to:

  1. Check if InputStream is a valid MP3 stream (if not, then it throws an UnsupportedAudioFileException).
  2. Extract audio information such as MPEG version, layer version, VBR flag, bitrate (bps), frequency (Hz), framesize, framerate, etc.
  3. Extract metadata such as ID3 tags (artist, album, date, copyright, comments, etc.).
  4. Return an MpegAudioFileFormat instance with all of these audio properties.

MpegAudioFileFormat extends AudioFileFormat by adding MP3-specific, high-level audio properties such as metadata (ID3 tags). Its constructor needs a Type and an AudioFormat:

We also extended AudioFormat to MpegAudioFormat to add MP3-specific properties (VBR, CRC flag, padding, etc.). Unlike AudioFileFormat, AudioFormat includes low-level audio properties such as sampling rate, channels, framesize and AudioFormat.Encoding. We defined multiple AudioFormat.Encoding constants, one for each combination of MPEG version and layer:

public class MpegEncoding extends AudioFormat.Encoding
  public static final AudioFormat.Encoding MPEG1L1 =
      new MpegEncoding("MPEG1L1");
  public static final AudioFormat.Encoding MPEG1L2 =
      new MpegEncoding("MPEG1L2");
  public static final AudioFormat.Encoding MPEG1L3 =
      new MpegEncoding("MPEG1L3");
  public static final AudioFormat.Encoding MPEG2L1 =
      new MpegEncoding("MPEG2L1");
  public static final AudioFormat.Encoding MPEG2L2 =
      new MpegEncoding("MPEG2L2");
  public static final AudioFormat.Encoding MPEG2L3 =
      new MpegEncoding("MPEG2L3");
  public static final AudioFormat.Encoding MPEG2DOT5L1 =
      new  MpegEncoding("MPEG2DOT5L1");
  public static final AudioFormat.Encoding MPEG2DOT5L2 =
      new MpegEncoding("MPEG2DOT5L2");
  public static final AudioFormat.Encoding MPEG2DOT5L3 =
      new MpegEncoding("MPEG2DOT5L3");

	public MpegEncoding(String strName)

Now, let's focus on our MpegFormatConversionProvider, which extends FormatConversionProvider :

The getSourceEncodings and getTargetEncodings methods return the list of sources and target encodings supported by the conversion provider. For MP3, it's important that the returned AudioFormat.Encodings indicate only the combinations of sampling rate, bitrate, and channels that are allowed by the MP3 header specification. The getAudioInputStream methods return an AudioInputStream with the specified format (or encoding) from the given source AudioInputStream. For instance, MP3 SPI could return a decoded 44.1 kHz/16bits/stereo PCM stream given a 44.1 kHz/128 kbps/joint stereo input MP3 stream. To save time, we used the low-level classes of Tritonus. They provide nice methods for the matrix format conversion and circular buffer implementation (to store decoded PCM data) needed for most SPI implementations. This way, the main job of our MpegFormatConversionProvider is to call the JLayer API to synchronize and get decoded frames.

From the SPI User Side

Thanks to JavaSound plugin architecture, using the MP3 SPI in your application is as easy as using the JavaSound API.

To get MP3 information (such as channels, sampling rate, and other metadata), you need to call the AudioSystem.getAudioFileFormat(file) static method from AudioSystem. It will return an instance of MpegAudioFileFormat, from which you can get audio properties. Note that the AudioSystem class acts as the entry point to the sampled-audio system resources.

File file = new File("filename.mp3");
AudioFileFormat baseFileFormat = null;
AudioFormat baseFormat = null;
baseFileFormat = AudioSystem.getAudioFileFormat(file);
baseFormat = baseFileFormat.getFormat();
// Audio type such as MPEG1 Layer3, or Layer 2, or ...
AudioFileFormat.Type type = baseFileFormat.getType();
// Sample rate in Hz (e.g. 44100).
float frequency = baseFormat.getSampleRate();

To play MP3, you need first to call AudioSystem.getAudioInputStream(file) to get an AudioInputStream from an MP3 file, select the target format (i.e., PCM) according to input MP3 channels and sampling rate, and finally get an AudioInputStream with the target format. If JavaSound doesn't find a matching SPI implementation supporting the MP3-to-PCM conversion, then it will throw an exception.

File file = new File("filename.mp3");
AudioInputStream in= AudioSystem.getAudioInputStream(file);
AudioInputStream din = null;
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat =
    new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
                    baseFormat.getChannels() * 2,
din = AudioSystem.getAudioInputStream(decodedFormat, in);
// Play now.
rawplay(decodedFormat, din);

Second, you have to send the decoded PCM data to a SourceDataLine. This means you have to load PCM data from the decoded AudioInputStream into the SourceDataLine buffer until the end of file is reached. JavaSound will send this data to the sound card. Once the file is exhausted, the line resources must be closed.

private void rawplay(AudioFormat targetFormat,
                     AudioInputStream din)
   throws IOException, LineUnavailableException
  byte[] data = new byte[4096];
  SourceDataLine line = getLine(targetFormat);
  if (line != null)
    // Start
    int nBytesRead = 0, nBytesWritten = 0;
    while (nBytesRead != -1)
        nBytesRead = din.read(data, 0, data.length);
        if (nBytesRead != -1)
            nBytesWritten = line.write(data, 0, nBytesRead);
    // Stop

private SourceDataLine getLine(AudioFormat audioFormat)
    throws LineUnavailableException
  SourceDataLine res = null;
  DataLine.Info info =
    new DataLine.Info(SourceDataLine.class, audioFormat);
  res = (SourceDataLine) AudioSystem.getLine(info);
  return res;

If you're familiar with JavaSound API, you will notice that source code for playing MP3 is similar to the what you'd use to play a WAV file. The source code sample above has no dependencies upon the MP3 SPI implementation. It's transparent for the developer.

Notice that if the file to play was stored on a web server, we would have used:

URL url = new URL("http://www.myserver.com/filename.mp3");
AudioInputStream in= AudioSystem.getAudioInputStream(url);

instead of:

File file = new File("filename.mp3");
AudioInputStream in= AudioSystem.getAudioInputStream(file);


Most audio formats include metadata such as title, album, comments, compression quality, encoding, and copyright. ID3 tags, used for MP3, are the best-known metadata format. Depending on ID3 version (v1 or v2), they can be found either at the end or at the beginning of an MP3 file. They include information such as duration, title, album, artist, track number, date, genre, copyright, etc. They can even include lyrics and pictures. The famous (and free) SHOUTcast streaming MP3 server, from Nullsoft, uses a different scheme in order to provide additional metadata such as title streaming, which allows a player to display the current song being played from the online radio stream. All of these metadata items need to be parsed and exposed through the SPI implementation. As of J2SE 1.5, the JavaSound API standardizes the passing of metadata parameters through an immutable java.util.Map:

File file = new File("filename.mp3");
AudioFileFormat baseFileFormat =
Map properties = baseFileFormat.properties();
String key_author = "author";
String author = (String) properties.get(key_author);
String key_duration = "duration";
Long duration = (Long) properties.get(key_duration);

All metadata keys and types should be provided in the SPI documentation. However, common properties include:

Using Multiple SPIs in an Application

Adding MP3 audio capabilities to the Java platform means adding JAR files containing the MP3 SPI implementation to the runtime CLASSPATH. Adding Ogg Vorbis, Speex, Flac, or Monkey's Audio support would be similar, but could generate conflicts that make other SPI implementations fail. The following situation could occur:

  1. Your runtime application CLASSPATH includes both MP3 and Ogg Vorbis SPIs.
  2. Your application tries to play an MP3 file.
  3. JavaSound's AudioSystem tries Ogg Vorbis SPI first.
  4. The Ogg Vorbis SPI implementation doesn't detect that incoming file isn't an Ogg-Vorbis-compliant stream, so it doesn't throw any exception.
  5. Your application tries to play an MP3 with the Ogg Vorbis SPI. At best you will get a runtime exception (NullPointerException, ArrayIndexOutOfBoundException), and in the worst case, you will hear weird noises or just deadlock.

In the example above, it's true that the problem comes from the Ogg Vorbis SPI implementation, but it's not easy for the SPI provider to have reliable controls (just think about streaming). Thus, each SPI provider has to pay attention to the others. That's the main practical drawback of the JavaSound plugin architecture. So don't be surprised if you have problems making multiple SPIs work together in your application.

Differences with JMF

JMF stands for Java Media Framework. It's an optional J2SE packages that adds multimedia support to the Java platform. It includes audio (GSM, QuickTime, etc.), video (AVI, QuickTime, H.263, etc.) and RTP streaming features. JMF provides a plugin architecture, but it is not compliant with that of JavaSound. In fact, MP3 support was previously included in JMF, but it was removed in 2002 because of licensing issues.


JavaSound rocks. It provides a plugin architecture allowing any third-party provider to add custom audio format support, such as for MP3 files. API is flexible enough to plug most heterogeneous (lossy, lossless) audio formats, whatever their parameters and metadata, to the Java platform -- "Write once, play anywhere."

References and Resources

The JavaZOOM Team are the authors of the open source projects JLayer and jlGui.

Return to ONJava.com

Copyright © 2009 O'Reilly Media, Inc.