International Calendars in Java
IBM Center for Java Technology
Over the last few years, many programmers have had a growing awareness of international issues, both in Java and in other languages. The software industry and the economy as a whole are becoming much more global, and there is an increasing need for applications that can function properly in more than one language and country. In addition, many programming toolkits such as the Java Class Libraries, the Win32 and Macintosh API’s, POSIX, etc., have fairly extensive international support built in, which makes writing an internationalized application much easier than it used to be.
While these API’s are all designed differently, at their core they provide a similar set of functionality. There are character converters that transform Unicode to legacy code pages, and vice-versa. There are sorting routines or collator objects that can be used for language-sensitive string comparison. There are facilities for word- and line-break detection in different languages. Finally, there are ways of formatting numbers, dates, times, and currencies for different languages and countries.
The date, time, and number formatters are necessary because most countries have different conventions for displaying this data. For example, an American English speaker would write the date 1/1/2000 AD (or 1/1/2000 CE) as "Saturday, January 1, 2000." A British English speaker might write "Saturday, 1 January 2000" instead. And a French speaker would write "samedi 1 janvier 2000". Any self-respecting international library can handle this for you. In Java, you’d use the class
java.text.DateFormat to do the work.
Still, I suspect that even today a lot of programmers would look at the title of this article: "International Calendars", and wonder just what those two words have to do with each other. Dates, sure. But calendars? Though the topics might not seem linked at first, the connection is fairly obvious once you think about it.
Consider what should happen when a Hebrew speaker in Israel is using your program. The same date we discussed above, 1/1/2000 AD, would be displayed as "תבש 23 תבט 5760" or "Saturday 23 Tevet 5760". Not only are the strings different, the numbers are as well.
Though it comes as a surprise to many Americans, the official calendar in Israel is the Hebrew calendar, not the Gregorian one that we use in most of the Western world. The Hebrew calendar, as well as many others such as Hijri (Islamic), Hindu, Buddhist, and Japanese, all number the years differently. Many of them, including the Hebrew, also have a different system for calculating months, which leads to 1/1/2000 AD being "תבש 23", or "Tevet 23", rather than January 1.
All of this means that internationalization of dates and times requires more than just a different table of strings for each language. A program that naïvely assumes that the Gregorian calendar applies everywhere will be hopelessly wrong in countries that use a different calendar rather than just a different language.
In this article, I’ll discuss the Java Class Library facilities that allow you to manipulate and display dates and times. Next, I’ll show how you can extend the Java calendar classes to support calendars that are not built in to the JDK. Finally, I’ll discuss some free classes from IBM that support the Buddhist, Hebrew, Hijri, and Japanese Imperial calendars.
A brief history
First, let me jump back to JDK 1.0 for a moment. The first release of Java had relatively poor support for international dates and times, with
java.util.Date and its toString method the only real tools at your disposal. The situation in the rest of Java was similar. It had the beginnings of international support, because a Java char is stored as a Unicode character. But that's all. You couldn't enter or display non-Latin characters, and there were no facilities for language-sensitive formatting, sorting, and so on.
The management of Sun and IBM found a way to fix this problem for JDK 1.1. Java was missing international support. But IBM’s Taligent subsidiary had great international technology, talented engineers -- including Dr. Mark Davis, president of the Unicode Consortium -- and a location about 100 yards away from Sun’s JavaSoft division in Cupertino, California. Thus a partnership was born. IBM arranged for Taligent’s Text and International group to contribute international classes into Sun’s JDK in order to make Java powerful enough for real-world business applications.
Taligent, in collaboration with Sun's internationalization engineers, provided the new
Datejava.text package, plus a number of new classes in java.util. This included the date- and time-related classes DateFormat,SimpleDateFormat, Calendar, GregorianCalendar, TimeZone, and SimpleTimeZone. I’ll discuss these classes in turn, starting with the old Date class.
The
java.util.Date class has been part of Java since JDK 1.0. Each instance of Date represents a particular instant in time, stored as a long number of milliseconds since January 1, 1970 AD, 00:00 GMT. To construct a Date, you would typically use the constructor:Date(int year, int month, int date)
or one of its variants that takes additional arguments such as hours, minutes, seconds, etc. In addition, there is a constructor whose argument is a
String such as "Sat Aug 12 1995 13:30:00 GMT".
In JDK 1.0, and even today, these methods all work as advertised. If you execute the following code:
Date d = new Date(99, 1, 1); String s = d.toString();
the value of
s will be "Mon Feb 1 00:00:00 PDT 1999".
There are a few obvious problems here. The most blatant is that the year argument to the constructor is only two digits, with 1900 assumed to be the origin. This is a huge Y2K problem, because there’s no way to specify a date before 1900 or after 1999. There’s also no obvious way to fix this. Sun can’t change the meaning of the first parameter, because that would break existing code. Adding a Y2K-safe override would be difficult too, since that would require the overload to have different argument types.
The next problem is the
toString method. The JDK 1.0 documentation stated that the string it returns was always of the form "Sat Aug 12 1995 13:30:00 GMT", with US English day and names and time zone abbreviations. Again, there was no way to change this in a later release. Once the documentation, which is effectively the specification for the Java Class Libraries, guarantees a certain behavior, it can't be changed without the danger of breaking existing applications.
Finally, take another look at the Date constructor in the code snippet above:
Date d = new Date(99, 1, 1);
Notice that we passed in "1" for the month and day, but the resulting date was February 1st. Since arrays in Java and C are 0-based, and since month numbers are often used as an index into an array of strings, the original designers of Java decided to make the month numbers 0-based. So January is month 0, February is month 1, and so on. Unfortunately most people, even programmers, think of January as the 1st month of the year, not the 0th, so this choice has led to a great deal of confusion.
When it was time to work on JDK 1.1, we were faced with a decision: What should we do about
Date FormattingDate? Because of the problems I discussed above, IBM and Sun decided that Date was so broken that it couldn't be fixed and decided to replace it instead. But since Date was trying to do so many different things -- date formatting, calendar calculations, and time zones, we decided to replace it with several different classes.
The first of these new classes is
DateFormat. As I mentioned above, Date's String constructor and its toString method had two problems: the strings were in a fixed format and they were always in English. DateFormatsolves both of these problems.
The job DateFormat and its concrete subclass SimpleDateFormat is to convert from a Date object to a String and vice-versa, and to do it properly for all of the locales that Java supports. To format a date and time for the current locale, the code is fairly simple:
Date d = new Date(1999, 0, 1);
DateFormat f = DateFormat.getDateTimeInstance(
DateFormat.FULL, // Date style
DateFormat.FULL); // Time style
String s = s.format(d);
So far, this just seems like an expensive way of spelling
Date.toString. But you get something for the extra effort: internationalization. If you change the second line of code to this:DateFormat f = DateFormat.getDateTimeInstance(DateFormat.FULL,
DateFormat.FULL,
Locale.FRANCE);
DateFormat also solves the fixed-format problem. The examples above uses a "
FULL" date/time formatter. If you want to be a bit more concise, you can use DateFormat.MEDIUM, which gives the result "Jan 1, 1999 00:00:00 AM" for English. Similarly, DateFormat.SHORT gives "1/1/99 00:00 AM."If you want to see just the date in your output, not the time, the solution is also simple -- call
getDateInstance instead of getDateTimeInstance :Date d = new Date(99, 0, 1); DateFormat f = DateFormat.getDateInstance(DateFormat.FULL); String s = s.format(d);
Now, remember the
Date constructor that takes a String. That constructor had the same problems as toString , but in reverse: it required a fixed format and it assumed the string would be in English. DateFormat solves these problems too, because it doesn’t just format dates; it parses them. For example, consider the following code:DateFormat f = DateFormat.getDateInstance(DateFormat.FULL,
Locale.FRANCE);
Date d = f.parse("venredi 1 janvier 1999");
Date object, d, will end up referring to 1/1/1999. All of the other points I discussed above apply to parsing as well as to formatting: you can request a particular locale, choose shorter or longer formats, etc.TimeZone
Many of the
DateFormat examples I showed above included time zones in their output. In JDK 1.0, all of Java's time zone logic was baked into Date.toString. It assumed that you always wanted dates displayed using the current default time zone, and that you wanted the US English abbreviations for the time zone names. This was fixed in JDK 1.1 as well, with the addition of the new class java.util.TimeZone.TimeZone and its concrete subclass
SimpleTimeZone are relatively low-level classes that encapsulate the relationship between local clock time and Greenwich Mean Time. You can use them to convert from GMT to local time and back as well as to determine whether daylight savings time is in effect. The other classes use TimeZone in their time-related calculations, and many of them expose the time zone as a property that you can get and set.Here's a simple example. Say that you want to display the same date we've been using in all of our examples, but that you want to force it to be displayed in GMT, regardless of the time zone you're running in. The code would look like this:
Date d = new Date(99, 0, 1);
DateFormat f = DateFormat.getDateTimeInstance(DateFormat.FULL);
f.setTimeZone(TimeZone.getTimeZone("GMT"));
String s = s.format(d);
In JDK 1.1, there was one problem with the way that DateFormat used TimeZone. Let's jump back to this example for a moment:
Date d = new Date(99, 0, 1);
DateFormat f = DateFormat.getDateTimeInstance(DateFormat.FULL,
DateFormat.FULL,
Locale.FRANCE);
String s = s.format(d);
DateFormat was using the first time zone it could find for the locale you requested, in this case Locale.FRANCE. This caused no end of confusion, since it was almost never what programmers expected.Fortunately, there was a simple workaround for this problem. To force a DateFormat to use the default time zone, you just do this:
DateFormat f = . . . .; f.setTimeZone(TimeZone.getDefault());
DateFormat.setTimeZone to request the one you want.Calendar
Now that I've given a quick tour of the other date-related classes that were new to JDK 1.1, I can go on to the meat of this article:
java.util.Calendar. As an introduction, let's revisit a code snippet from our discussion of Date :Date d = new Date(99, 1, 1);
getDay, getMonth, getYear, etc. But as I described in the introduction, some countries use different calendars: Hebrew, Hijri, or whatever. A fully-internationalized Java application needs to be able to support multiple calendar systems, not just the Gregorian one.Since the Java Class Libraries are object-oriented, the obvious solution to this problem is to create an abstract class that represents a generic calendar, with concrete subclasses for specific calendar systems. And that's just what we did. JDK 1.1. included a new abstract class,
java.util.Calendar, which provides a generic API for calendar operations. It also included one concrete subclass, which as you might guess is GregorianCalendar.Calendar has a number of abstract methods that parallel the old, deprecated get methods of Date. For example, imagine that you want to find out what year it is. With the old Date API, the code would look like this:
int year = new Date().getYear();
int year = Calendar.getInstance().get(Calendar.MONTH);
getInstance creates a Calendar that is appropriate for the current locale, and the call to get returns the current value of the calendar's MONTH field. Calendar provides constants for about fifteen fields, including YEAR,DAY_OF_MONTH, DAY_OF_WEEK, WEEK_OF_YEAR, and many others. These constants are all interpreted in terms of the calendar system that your Calendar object represents, so if you have a Hebrew calendar object, you'll get the Hebrew month, not the Gregorian one.If you're wondering how this works, remember that Calendar is an abstract class. Each time that get is called, the calendar checks to see if the fields are up to date. If they are not, it calls the abstract, protected method
computeFields. Each subclass overrides this method to perform the calculations appropriate for that calendar system. For example, GregorianCalendar has a computeFields method that performs the standard Gregorian calculations.Calendar.get replaces the deprecated get methods on Date, but what about the constructor? That functionality is provided by constructors on the concrete Calendar subclasses. If you want to construct a Calendar set to January 1, 2000, you write:
Calendar c = new GregorianCalendar(2000, Calendar.JANUARY, 1);
Calendar c = new HebrewCalendar(5760, HebrewCalendar.TEVET, 23);
You would think that when we deprecated most of Date and added the new Calendar class, we would have fixed Date's biggest annoyance: the fact that January is month 0. We certainly should have, but unfortunately we didn't. We were afraid that programmers would be confused if Date used zero-based months and Calendar used one-based months. And a few programmers probably would have been. But in hindsight, the fact that Calendar is still zero-based has caused an enormous amount of confusion, and it was probably the biggest single mistake in the Java international API's.
When you're using Calendar or any of its subclasses, it's usually best not to use raw numbers in Calendar calls unless you just can't avoid it. Instead of writing code like this:
Calendar c = new GregorianCalendar(2000, 0, 1);
Calendar c = new GregorianCalendar(2000, Calendar.JANUARY, 1);
Add and Roll
One aspect of Calendars that wasn't addressed at all in
Date was calendar manipulation. For example, imagine that you want to determine what the date will be one month in the future. With the old API, you had to do an awful lot of work on your own: call getMonth, getYear, and getDate, add one to the month, see if it wrapped to a new year, make sure the day of the month is still in bounds (remembering those leap years!) and so on. Not only is this a hassle, it's not internationalized. Other calendar systems have a different number of days per month, different (and possibly variable) months per year, different leap year calculations, and so on.Calendar and its subclasses solve this problem for you, with their
add and roll methods. If you want to add one month to the current date, you only need two lines of code:Calendar c = Calendar.getInstance(); c.add(Calendar.MONTH, 1);
GregorianCalendar c = new GregorianCalendar(1999, Calendar.JULY, 29); c.add(Calendar.MONTH, 7); String s = DateFormat.getInstance().format(c.getTime());
GregorianCalendar.add knows that when the month passes 11 (December) it should roll back to 0. It also knows that February 29, 2000 is a valid date, using the complicated rules that it is a leap year because it's divisible by four, except it isn't because it's divisible by 100, except it is because it's divisible by 400.Closely related to
add is the roll method. This method is handy when you want to implement a user interface that "rolls" from the end of a month back to the beginning of the same month, or to do the same thing for weeks or years. The usage is almost identical to add:GregorianCalendar c = new GregorianCalendar(1999, Calendar.JULY, 29); c.roll(Calendar.DAY_OF_MONTH, 6);
Locale-Specific Calendar Properties
If you thought we'd now solved all possible calendar internationalization problems, you'd be incorrect. Even within a single calendar system, such as Gregorian, there are a few properties that can differ from one country to the next. As an example, here are the US and French versions of the calendar for July, 1999:
United States
| ||||||
Sun
|
Mon
|
Tue
|
Wed
|
Thu
|
Fri
|
Sat
|
1
|
2
|
3
| ||||
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
29
|
30
|
31
|
France
| ||||||
lun
|
mar
|
mer
|
jeu
|
ven
|
sam
|
dim
|
1
|
2
|
3
|
4
| |||
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
29
|
30
|
31
| |
Notice that in France, the first day of the week is Monday (or lundi), while in the United States it is Sunday. If you're writing an application that displays calendars graphically, you need to take this into account. Java provides the method
Calendar.getFirstDayOfWeek to handle this. When you create a calendar, you can specify the locale you're interested in:Calendar c = Calendar.getInstance(Locale.FRANCE);
getFirstDayOfWeek to find out how to draw it:int d = c.getFirstDayOfWeek();
getMinimalDaysInFirstWeek, which tells you how long a week has to be to qualify as the "first" week of the month. In the US calendar shown above, is the first week of July the week that starts on July 5, or the previous one that starts on June 27? According to Java's locale data it's the latter, because getMinimalDaysInFirstWeek returns 1.Creating your own Calendars
All of the international calendar features I've talked about so far are great. However, there's a catch that limits the amount of calendar internationalization that you can actually do. Both JDK 1.1 and Java 2 only provide one concrete subclass of
Calendar: GregorianCalendar. The traditional calendars used in other countries are not yet supported.However, all is not lost. It entirely possible to create your own subclasses of
Calendar that support different calendar systems. I've written classes that support the Buddhist, Hebrew, Hijri, and Japanese Imperial calendars, and I want to share some of that knowledge here.When you look at the Calendar class, you'll notice that it has 11 abstract methods: add, after, before, equals,
getMinimum, getMaximum, getGreatestMinimum, getLeastMaximum, roll, computeTime, and computeFields. Implementing your own calendar subclass requires that you override all of these methods to provide an implementation that's specific to your calendar system. These methods can be divided into three basic groups.The first group, the minimum and maximum functions, are the easiest so these are usually the ones that I implement first. The first two,
getMinimum and getMaximum, tell you the largest allowable range for each field, whilegetLeastMaximum and getGreatestMinimum , tell you the smallest range for the field. For example, the DAY_OF_MONTH field of GregorianCalendar has a minimum and maximum of 1 and 31, but a greatest minimum and least maximum of 1 and 29. Implementing this is easy. Since the result is a constant, you can just store it in a table. The methods almost always ends up looking like this:public int getMinimum(int field) {
return minMax[field][0];
}
computeFields and computeTime methods. The first, computeFields, calculates the values of all of the fields (year, month, day, etc.) from the absolute time, which is represented as the number of milliseconds since January 1, 1970. Conversely, computeTime uses the field values to calculate the absolute time.These two methods are usually quite complicated, because they must implement the calendar system's rules very precisely. The details for a real calendar are way beyond the scope of this article, so I've invented a very simple calendar that we can experiment with. It has 360 days per year, divided into 12 months of 30 days each, with no leap years. There are seven days per week, just like our calendar, and the day 1/1/1 in this calendar was a Saturday. Based on this simplification, I can offer a few generalizations.
First, your calculations will usually be based on an "epoch" date on which the calendar started. Usually you'll want this to be the 0th day of your calendar, that is the day before the first day of year 1. You should define a constant that specifies the epoch in milliseconds since 1/1/1970 AD. I'll start our example calendar on the same day the Hebrew calendar started, just because I have the constant handy:
private static final long EPOCH_MILLIS = -180799862400000L;
private static final long SECOND_MS = 1000; private static final long MINUTE_MS = 60 * SECOND_MS; private static final long HOUR_MS = 60 * MINUTE_MS; private static final long DAY_MS = 24 * HOUR_MS;
long absDay = (time - EPOCH_MILLIS) / DAY_MS;
int year = (int)(absDay / 360) + 1; int month = (int)((absDay / 30) % 12) + 1; int day = (int)(absDay % 30) + 1;
Real Code
If you'd like to see some real Java classes that implement non-Gregorian calendars, pay a visit to http://www.alphaWorks.ibm.com/tech/calendars. The "International Calendars" package you'll find on that page supports the Buddhist, Hebrew, Hijri, and Japanese Imperial calendars. It includes Java
ResourceBundle files containing translated strings for these calendars in a number of different languages, as well as some utility methods for formatting dates as strings using non-Gregorian calendars.I hope this article has given you a good feel for the things that you can do with calendars in Java. Though it has its warts, Java's calendar framework is the most powerful one I've seen in any major operating system or application framework.
Acknowledgements
Alan Liu, the IBM engineer responsible for the time and date classes in the JDK, was very helpful while I was writing this paper.
References
Calendrical Calculations, by Nachum Dershowitz and Edward M. Reingold (Cambridge University Press, 1997) has excellent descriptions of calendar algorithms in general as well as detailed algorithms for all of the calendar systems in common use today.
The Java Class Libraries, 2nd Edition, vol. 1, by Chan, Lee, and Kramer (Addison-Wesley, 1998) has a nice description of Calendar and GregorianCalendar.
Making your Java/C++/C Applications Global, at www.ibm.com/java/education/international-unicode/unicode1.html is a good overview of some of the issues involved in writing global applications.
No comments:
Post a Comment