Wednesday, September 5, 2007

Exporting .CSV and PDF file with i18n support

     I'm working on a international project that shows pages in both Chinese and English. One of the biggest complain from our QA is that exported files in .csv format cannot be directly opened in Excel but imported after saving. And another problem is that there is always a "java.lang. IllegalStateException: getOutputStream() has already been called for this response" exception message in console whenever I try to export .csv or PDF files that's not using jasper report.

     Originally, I wasn't working on file exporting. So I only did some basic research in helping the one who worked on it. However, lately I was able to take some time looking into the problem.

     The first thing I learned from online resources is that response.getOutputStream() doesn't work with response.getWriter(). And since I'm using Tomcat, the generated jsp servlet always have the following code in _jspService(HttpServletRequest request, HttpServletResponse response) which will call response.getWriter():

  ... ...
  finally{
      if(_jspxFactory != null)
          _jspxFactory.releasePageContext(_jspx_page_context);
  }

    So, the best way to avoid the exception is using response.getWriter() instead of response.getOutputStream(). Unfortunately, this is not an option for generating PDF files with iText. If I use response.getWriter(), it will just bring out a blank page.

    The easiest solution I find online is to add following code to the jsp page:

    out.clear();
    out = pageContext.pushBody();

    The issue for me is that I'm using a layout page which imports other pages in body. And the PDF file is done in an action class. If I put the code at the very beginning of the page. The whole page will be blank. But if I put the code at the end of the layout page, it will give even more exceptions since I'm trying to clear already flushed out content. To solve this issue, I wrapped the above code with content type check. The complete solution is:

In action class:

 PdfPTable table = new PdfPTable(columnCounts);
 FontSelector selector = new FontSelector();
 //Chinese Simplified font
 selector.addFont(FontFactory.getFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED));
 //Chinese Traditional font
 selector.addFont(FontFactory.getFont("MSung-Light", "UniCNS-UCS2-H", BaseFont.NOT_EMBEDDED));
    ... ...
 //add content to table
 //for more detail on how to use iText, please go to iText's official <a href="http://www.lowagie.com/iText/tutorial/index.html">tutorial site</a>
 //if the PDF contains Asian Characters, make sure you also includes iTextAsian.jar in your project
    ... ...
 response.setContentType("application/pdf;charset=UTF-8");
 response.setHeader("Content-Disposition", "attachment;filename=" + exportfileName);
 ServletOutputStream out = response.getOutputStream();
 Document document = new Document(PageSize.A3, 20, 20, 20, 20);
 PdfWriter.getInstance(document, out);
 document.open();
 document.add(table);
 document.close();
 out.flush();
 out.close();

At the beginning of the layout jsp page:

  if(response.getContentType().trim().toLowerCase()
        .startsWith("application/pdf")){
    out.clear();
    out = pageContext.pushBody();
  }

    I hope there is a much cleaner way to solve the problem since I'm trying to eliminate any use of scriptlet. But for now, it will do.

    One problem gone, and one more to go. Comparing to PDF, the solution for the exception when exporting csv files is simpler because I can just use response.getWriter(). But the i18n problem is really tricky since the file is most likely going to be opened with Microsoft Excel and using UTF-8 is not good enough for multibyte data here.

    The interesting thing is that the saved file can be opened correctly with notepad. And if I save the file again in notepad with UTF-16 encoding, then the newly saved file can be opened directly and correctly with Excel as well. So what's the magic? One word keeps popping up from related posts is "BOM" - Byte Order Mark, the magical bytes that notepad uses to identify the correct encoding byte order. And here comes the winner:

    response.setContentType("application/ms-excel; charset=UTF-16LE");
    response.setHeader("Content-Disposition", "attachment;filename=" + exportfileName);
    PrintWriter pw = response.getWriter();
    pw.write("\uFEFF");
    ... ...
     //write the csv file
    //one thing to note on is that although csv stands for 
    //Comma Separator Values, 
    //the default separator is not necessary to be comma
    //in my case, I have to use tab instead
    ... ...
    pw.flush();
    pw.close();