Top Ten New Things You Can Do with NIO
Pages: 1, 2, 3, 4
5: Memory-Mapped Files
The theme of wrapping ByteBuffer objects around
arbitrary memory spaces continues with MappedByteBuffer,
a specialized form of ByteBuffer. On most operating
systems, it's possible to memory map a file using the
mmap() system call (or something similar) on an open file
descriptor. Calling mmap() returns a pointer to a memory
segment, which actually represents the content of the file. Fetches
from memory locations within that memory area will return data from
the file at the corresponding offset. Modifications made to the memory
space are written to the file on disk.

Figure 7: User memory mapped to the filesystem.
There are two big advantages to memory mapped files. First, the "memory" does not usually consume normal virtual memory space. Or, more correctly, the virtual memory space of a file mapping is backed by the file data on disk. That means it's not necessary to allocate regular paging space for mapped files; their paging area is the file itself. If you were to open the file conventionally and read it into memory, that would consume a corresponding amount of paging space, because you're copying the data into regular memory. Second, multiple mappings of the same file share the same virtual address space. Theoretically, 100 mappings can be established by 100 different processes to the same 500MB file; each will appear to have the entire 500MB of data in memory, but the overall memory consumption of the system won't change a bit. Pieces of the file will be brought into memory as references are made, which will compete for RAM, but no paging space will be consumed.
In Figure 7, additional processes running in user space would map to that same physical memory space, through the same filesystem cache and thence to the same file data on disk. Each of those processes would see changes made by any other. This can be exploited as a form of persistent, shared memory. Operating systems vary in the way their virtual memory subsystems behave, so your mileage may also vary.
MappedByteBuffer instances are created by invoking the
map() method on an open FileChannel object.
The MappedByteBuffer class has a couple of additional
methods for managing caching and flushing of updates to the underlying
file.
Prior to NIO, it wasn't possible to memory map files without resorting to platform-specific, non-portable native code. It's now possible for any pure Java program to take advantage of memory mapping, easily and portably.
4: Scattering Reads and Gathering Writes
Here's a familiar bit of code:
byte [] byteArray = new byte [100];
...
int bytesRead = fileInputStream.read (byteArray);
This reads some data from a stream into an array of bytes.
Here's the equivalent read operation using ByteBuffer
and FileChannel objects (to move the examples into the
NIO realm):
ByteBuffer byteBuffer = ByteBuffer.allocate (100);
...
int bytesRead = fileChannel.read (byteBuffer);
And here's a common usage pattern:
ByteBuffer header = ByteBuffer.allocate (32);
ByteBuffer colorMap = ByteBuffer (256 * 3)
ByteBuffer imageBody = ByteBuffer (640 * 480);
fileChannel.read (header);
fileChannel.read (colorMap);
fileChannel.read (imageBody);
This performs three separate read() calls to load a
hypothetical image file. This works fine, but wouldn't it be great
if we could issue a single read request to the channel and tell it to
place the first 32 bytes into the header buffer, the next
768 bytes into the colorMap buffer, and the remainder into
imageBody?
No problem, can do, easy. Most NIO channel types support scatter/gather, also known as vectored I/O. A scattering read to the above buffers can be accomplished with this code:
ByteBuffer [] scatterBuffers = { header, colorMap, imageBody };
fileChannel.read (scatterBuffers);
Rather than pass a single buffer object to the channel, an array of buffers is passed in. The channel fills each buffer in turn until all are full or there're no more data to read. Gathering writes are done in a similar way; data are drained from each buffer in the list in turn and sent along the channel, exactly as if they had been written sequentially.
Scatter/gather can provide a real performance boost when reading or writing data that's partitioned into fixed-size, logically distinct segments. Passing a list of buffers means the entire transfer can be optimized (using multiple CPUs for example) and fewer overall system calls are needed.
Gathering writes can compose results from several buffers. For
example, an HTTP response could use a read-only buffer containing
static headers that are the same for every response, a dynamically-populated buffer for those headers unique to the response, and a
MappedByteBuffer object associated with a file, which is
to be the body of the response. A given buffer may even appear in more
than one gather list, or multiple views of the same buffer can be
used.
3: Direct Channel Transfers
Did you even notice that whenever you need to copy data to or from a file, you seem to write the same old copy loop over and over again? It's always the same story: you read a chunk of data into a buffer then immediately write it back out again somewhere else. You're not doing anything with that data so why is it necessary to pull it in just to shove it back out again? Why is it necessary to continually reinvent this wheel?
Here's a thought. Wouldn't it be great if you could just tell some class "Move the data from this file to that one" or "Write all the data that comes out of that socket to this file over there"? Well, thanks to the modern miracle of direct channel transfers, now you can.
public abstract class FileChannel
extends AbstractChannel
implements ByteChannel, GatheringByteChannel, ScatteringByteChannel
{
// This is a partial API listing
public abstract long transferTo (long position, long count,
WritableByteChannel target)
public abstract long transferFrom (ReadableByteChannel src,
long position, long count)
}
A channel transfer lets you cross-connect two channels so that data
is transfered directly from one to the other without any further
intervention on your part. Because the transferTo() and
transferFrom() methods belong to the FileChannel
class, a FileChannel object must be the source or
destination of a channel transfer (you can't transfer from one
socket to another, for example). But the other end may be any
ReadableByteChannel or WritableByteChannel,
as appropriate.
On operating systems with appropriate support, channel transfers can be done entirely in kernel space. This not only relieves you of the chore of doing the copy, it bypasses the JVM entirely! One low-level system call and boom! Done. Even on those OS platforms without kernel support for transfers, making use of these methods still saves you the trouble of writing yet another copy loop. And the odds are good that the implementation will use native code or other optimizations to move the data as quickly as possible, faster than you could ever do it yourself in regular Java code. And the best part: code you never write never has bugs.