Wednesday, June 18, 2008

Java Unicode to Hexadecimal Character Codes Conversion

     When it comes to internationalization, resource encoding is a big challenge. Sun's JDK doesn't offer UTF-8 encoding on ResourceBundle. So one way to work around is convert all UTF-8 translations in property files to Hexadecimal Characters first. There are lots of website offering online converters to serve the purpose. But there is still quite a few steps to do.

     Recently, I started to work on a resource repository that centralize global resources within the company to reduce duplicate work. Users are suppose to be able to import and export resources in property file format. So it got me thinking, what if I offer real-time conversion of Hexadecimal Characters upon exporting resources to save the trouble.

     After explored some of the online converter sites mentioned earlier, I find that the actual conversion is fairly simple. Strangely, all these websites uses JavaScript, but no search result turns out to do it Java way.

     Anyway, here is my little Test class:


package net.katiewang.test;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

/**
 * @author KWang
 *
 */
public class JavaUnicodeTest {
    
    private final static String digits = "0123456789ABCDEF"; 

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        if(args.length > 0){
            System.out.println("File path: " + args[0]);
            InputStream is = new FileInputStream(new File(args[0]));
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
            String line = "";
            while((line = reader.readLine()) != null){
                String newLine = "";
                for(int i = 0; i < line.length(); i++){
                    char ch = line.charAt(i);
                    newLine += unicodeToJava(ch);
                }
                System.out.println(newLine);
            }
        }else
            System.err.println("No argument specified!");
    }

    private static String unicodeToJava (char ch){
        switch (ch) {
            case 9:
                return "\\t";
            case 10:
                return "\\n";
            case 12:
                return "\\f";
            case 13:
                return "\\r";
            case 34:
                return "\\\"";
            case 92:
                return "\\\\";
            default:
                break;
        }
        if (ch < 32 || ch > 126)
            return "\\u" + getHexCode(ch);
        return String.valueOf(ch);
    }
    
    private static String getHexCode(char ch){
        return new String(new char[]{leastSignificantHexDigit(ch >>> 12), leastSignificantHexDigit(ch >>> 8), leastSignificantHexDigit(ch >>> 4), leastSignificantHexDigit(ch)});
    }
    
    private static char leastSignificantHexDigit(int ch){
        return digits.charAt(ch & 0x0f);
    }
}


     To run the test, simple execute with a file path as argument. In unicodeToJava, I choose to ignore any regular characters, which I recommend. But if you want to convert everything instead, skip all the conditions in unicodeToJava and leave only one line -- return "\\u" + getHexCode(ch); -- will do the trick.