Quick tips on UTF-8 encoding on a Mac running Java & Tomcat6

I ran into a problem today with UTF-8 encoding on my Mac(10.5.8), running a Java/Tomcat6 environment. It took a while to figure out, so I’m posting here in the hopes it helps others in the future.

Mac OS uses its own variant of the Java SDK. The default character encoding is MacRoman. (Read this page, scroll to “Character Encoding”). This can cause problems, such as my situation, where we are consuming a service that encodes its data as UTF-8, operates or transforms the data within our application, then outputs it again as UTF-8. The original UTF-8 data was being converted to MacRoman and then back out to UTF-8. Corruption ensued.

Depending on exactly what your problem is, I identified 2 general fixes. I only needed the first one, but I want to document the 2nd as well.

  1. Add URIEncoding to both the Java HTTP and AJP Connectors
  2. Add “-Dfile.encoding=UTF-8” to your JAVA_OPTS environment variable.

Fix #1:

  1. Edit your server.xml file. This is typically in /apache-tomcat/conf/server.xml.
  2. Look for the HTTP Connector, mine looks like this: <Connector port=”8080″ protocol=”HTTP/1.1″ connectionTimeout=”20000″ redirectPort=”8443″ useBodyEncodingForURI=”true”/>
  3. Add the uriencoding to it: <Connector port=”8080″ protocol=”HTTP/1.1″ connectionTimeout=”20000″ redirectPort=”8443″ useBodyEncodingForURI=”true” uriencoding=”UTF-8″/>
  4. Look for the AJP Connector: <Connector port=”8009″ protocol=”AJP/1.3″ redirectPort=”8443″/>
  5. Add the uriencoding to it: <Connector port=”8009″ protocol=”AJP/1.3″ redirectPort=”8443″ uriencoding=”UTF-8″/>

Fix #2:

  1. In my setup, I have added my JAVA_OPTS and other environment variables to my ~/.bash_profile. I am assuming that you already have your dev environment setup and know where your JAVA_OPTS are. This is my setup.
  2. From the terminal: sudo pico ~/.bash_profile and find the line with your JAVA_OPTS. Again, this is my config line and yours may be different.
  3. Add “-Dfile.encoding=UTF-8” to it: export JAVA_OPTS=”-Xmx768m -XX:MaxPermSize=256m -Djava.awt.headless=true -Dfile.encoding=UTF-8″

It took me a few hours of searching around and it seems like these are the general settings that most people recommend. For me, it was the combination of both Connectors that made it work. It will probably be different for other people depending on your setup.

For extra info, visit these links for more info:

Tomcat and UTF-8

Apple’s Java Docs