ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

10 Reasons We Need Java 3.0

by Elliotte Rusty Harold
07/31/2002

Over the last few years, refactoring -- the process of gradually improving a code base by renaming methods and classes, extracting common functionality into new methods and classes, and generally cleaning up the mess inherent in most 1.0 systems -- has gained a lot of adherents. Integrated Development Environments (IDEs) like Eclipse and IDEA can now automatically refactor code.

But what if it's not just your code that needs refactoring? What if the language itself has inconsistencies, inefficiencies, and just plain idiocies that need to be corrected? When you get right down to it, the entirety of Java is really just like any other large code base. It has some brilliant parts, some functional parts, and some parts that make just about everyone scratch their heads and ask, "What the hell were they thinking?"

It's now a little more than 11 years after James Gosling began working on OAK, the language that would eventually become Java, and seven years since Sun posted the first public release of Java. The language, class library, and virtual machine collectively known as "Java" are all showing their age. There are many parts of Java that everyone agrees should be fixed but can't be, for reasons of backwards compatibility. Until now, revisions of Java have attempted to maintain "upwards compatibility;" that is, all earlier code should continue to run unchanged in later versions of Java. This has limited the changes that can be made to Java, and prevented Sun from fixing many obvious problems.

This article imagines a "Java 3" that jettisons the baggage of the last decade, and proposes numerous changes to the core language, virtual machine, and class libraries. The focus here is on those changes that many people (including the Java developers at Sun) would really like to make, but can't -- primarily for reasons of backwards compatibility.

I am specifically not focusing on new features that could be added to Java 2 today, useful as they might be. These can be addressed through the Java Community Process. Instead, I want to look at how we could do the same things Java does today, only better. For instance, while I'd love to see a complex number data type as a standard part of the Java language, this could be added to Java 1.5 without breaking existing code. On the other hand, changing the existing char type to use four bytes rather than two would be radically incompatible with most existing code.

Similarly, I am only looking at changes that will leave Java as the same language we know and love today. I want to talk about refactoring the language, not reinventing it. I am not interested in purely syntactic changes, such as eliminating the semicolons at the ends of lines or making indentation significant. These sorts of changes could readily be implemented as byte code compilers for other languages like Python and F. Indeed, such compilers already exist. The changes I want to address are much more fundamental, and often lay across the boundaries between language, library, and virtual machine. With that in mind, let's look at my top 10 list of possible refactorizations for Java 3. (See Gosling's "Design Principles" slide for a justification for simplicity and lack of redundancy.

10. Delete all deprecated methods, fields, classes, and interfaces.

This one's a no-brainer. Java 1.4.0 ships with 22 deprecated classes, 8 deprecated interfaces, 50 deprecated fields, and over 300 deprecated methods and constructors. Some, like List.preferredSize() and Date.parseDate(), are deprecated because there are now equivalent or better methods to do the same thing. Others like Thread.stop() and Thread.resume() are deprecated because they were a bad idea in the first place and could be actively dangerous. Whatever the reason a method has been deprecated, the fact is, we're not supposed to be using it.

Sun's official line is, "It is recommended that programs be modified to eliminate the use of deprecated methods and classes, though there are no current plans to remove such methods and classes entirely from the system." It's time to cut the umbilical cord. Ditch them all now. This can only make Java simpler, cleaner, and safer.

9. Fix incorrect naming conventions.

Related Reading

Java Network Programming
By Elliotte Rusty Harold

One of Java's contributions to code readability has been consistent naming conventions, even though they aren't enforced by the compiler. Class names are nouns that begin with capital letters. Fields, variables, and methods begin with lowercase letters. All use camel case. Named constants are written in all caps with underscores separating the words. I can pick up the code of any experienced Java programmer on the planet and expect that their naming conventions will match mine.

When Java 1.0 was being written, however, not all the programmers had internalized Java's naming conventions yet. There are numerous minor but annoying inconsistencies throughout the API. For instance, the color constants are Color.red, Color.blue, Color.green, etc., instead of Color.RED, Color.BLUE, Color.GREEN, etc. Java 1.4 finally added the capitalized versions, but still retains the incorrect lowercase versions, doubling the number of fields in this class. These inconsistencies should be cataloged and corrected.

Another beneficial coding convention Java thrust upon an occasionally resistant world was using full names with no abbreviations. However, some of the most basic Java methods are abbreviated. Why, for instance, do we type System.gc() instead of System.collectGarbage()? It's not as if this method is called so frequently that the time saved typing twelve fewer letters is important. Similarly the InetAddress class should really be named InternetAddress.

Along the way, let's move JDBC into the javax packages. JDBC is important, but it's hardly a core language feature. The only reason it isn't already in javax is because the javax naming convention for standard extensions hadn't been invented when JDBC was first added to the JDK back in Java 1.1. Programmers working with JDBC can still use it. The rest of us can safely ignore it.

8. Eliminate primitive data types.

This will undoubtedly be my most controversial proposal, but bear with me. I am not talking about removing int, float, double, char, and other types completely. I simply want to make them full objects with classes, methods, inheritance, and so forth. This would make Java's type system much cleaner. We'd no longer need to use type-wrapper classes to add primitives to lists and hash tables. We could write methods that operated on all variables and data. All types would be classes and all classes would be types. Every variable, field, and argument would be an instance of Object. Java would finally become a pure object-oriented language.

The reason Java used primitive data types in the first place was speed. The claim was that pure object-oriented languages like Smalltalk were too slow for production code. But after seven years of Moore's law, computers are a lot faster and have a lot more memory than they used to. Even more importantly, compiler technology has advanced to the point where it's really not so hard to replace object-based source code with primitive-based byte code where appropriate. Modern Eiffel, C#, and Smalltalk compilers already do this. In essence, a good compiler should be able to figure out when to use ints and when to use BigIntegers and transparently swap between the two.

The new byte, int, long, double, float, and char classes would still have the literal forms they have today. Just as the statement String s ="Hello" creates a new String object, so too would int i = 23 create a new int object. Similarly, the compiler would recognize all of the customary operators like +, -, and *, and map them to the appropriate methods in the classes. This is no more complicated than the compiler's native understanding of the plus sign for string concatenation today. Most existing arithmetic code would work exactly as it works today. The int/char/double/float/boolean objects would be immutable, so these objects would be thread-safe and could be interned to save memory. The classes would probably be final for reasons of both safety and performance.

I'd also like to consider whether Java's arithmetic rules are correct. The floating point operations are defined by IEEE 754 and, for compatibility with other languages and hardware, it's important to keep that. The integer types offer real room for improvement, however. It is mathematically incorrect for two billion plus two billion to equal -294,967,296, yet it does in Java today.

There should be at least one integer type that is not bounded in size, and perhaps it should be the default type. If so, it could easily subsume the short, int, and long types. The byte type still seems necessary for I/O, and it could also remain for those rare cases like image filters where bitwise manipulation is really necessary; however, using bitwise operators like << and & on integers confuses implementation with interface and thus violates a fundamental principle of object orientation. The various bitwise constants, such as Font.BOLD and SelectionKey.OP_ACCEPT, used throughout the Java API should be replaced with type-safe enums and/or getter and setter methods.

The basic story would be that integers are for arithmetic and bytes are for memory manipulation. Thus, in reverse, we might choose to ban arithmetic operations like addition and subtraction on bytes. Even today, adding two bytes automatically promotes them to ints because the virtual machine doesn't support these operations on any type narrower than an int.

There's substantial evidence from other pure OO languages that this scheme can be implemented efficiently. Nonetheless, I anticipate resistance to these ideas from the performance-at-any-cost crowd. Naive implementations will require more memory than existing Java code (which is already not particularly stingy with the megabytes). This is likely to be a special problem in J2ME and smaller environments. J2ME might choose to take a different path than J2SE and J2EE.

J2ME can continue development-based Java 2 with its dichotomy between primitive and object types, its 2+2=-1 arithmetic, and all of the problems that entails. In this environment, the benefits of moving may not outweigh the cost. But Java is no longer a language just for cheap set-top boxes (and really it never was). The needs of the desktop and the server are not the same as the needs of the cell phone and the digital watch. Programmers in each environment need a language tailored for them. One size does not fit all.

7. Extend chars to four bytes.

Whether the char type is primitive or an object, the truth is that Unicode is not a two-byte character set. This was perhaps not so important in the last millennium when Unicode characters outside the basic multilingual plane were just a theoretical possibility. As of version 3.2, however, Unicode has about 30,000 more characters than can be squeezed into two bytes. Four-byte characters include many mathematical and most musical symbols. In the future it's also likely to encompass fictional scripts like Tolkien's Tengwar and dead languages like Linear B. Currently, Java tries to work around the problem by using surrogate pairs, but the acrobatics required to properly handle these is truly ugly, and already causing major problems for systems like XML parsers that need to deal with this ugliness.

Whether Java promotes the char type to an object or not, it needs to adopt a model in which characters are a full four bytes. If Java does go to fully object-oriented types, it could still use UTF-16 or UTF-8 internally for chars and strings to save space. Externally, all characters should be created equal. Using one char to represent most characters but two chars to represent some is too confusing. You shouldn't have to be a Unicode expert just to include a little music or math in your strings.

Pages: 1, 2

Next Pagearrow