Default style (Cherry Eve). Switch styles (Capricorn). Atom Feed Calendar
http://www.jroller.com/holy/date/20081002 Thursday October 02, 2008

Injecting better logging into a binary .class using Javassist

Have you ever been strucked by a completely useless exception message somewhere from the depth of a 3rd party application or library you had to use in your code? Have you ever wanted the bloody nameless programmer to have written a truly informative and helpful error message so that you wouldn't need to spend hours trying to find out what was the problem that could have been easily discovered if only more context information available at the moment when the exception occured was included in its error message? Have you wondered how only you could inject the necessary logging into the spot? Read on to get the answer.[Read More]

http://www.jroller.com/holy/date/20080926 Friday September 26, 2008

Add method tracing (params, result) to existing application w/o modifying it

Have you ever needed to learn what's going on in a 3rd-party java application and yet didn't want to debug it and step through it? Were you wishing to be able to see what methods get called in what order together with their actual parameters and return values? There is a "simple" solution: AspectWerkz.[Read More]

http://www.jroller.com/holy/date/20071102 Friday November 02, 2007

Truncating UTF String to the given number of bytes while preserving its validity [for DB insert]

Often you need to insert a String from Java into a database column with a fixed length specified in bytes. Using

string.substring(0, DB_FIELD_LENGTH);

isn't enough because it only cuts down the number of characters but in UTF-8 a single character may be represented by 1-4 bytes. But you cannot just turn the string into an array of bytes and use its first DB_FIELD_LENGTH elements because you could end up with an invalid UTF-8 character at the end (one that is represented by 2+ bytes while only its 1st byte fits into the field). There are two solutions for truncation the string in such a way, that it has at most DB_FIELD_LENGTH bytes and is a valid UTF-8 string.

Approach 1: Replace the invalid trailing byte(s) with a 'rectangle'

This is as simple as:

int maxLen = DB_FIELD_LENGTH-2;
string = new String( string.getBytes("UTF-8") , 0, maxLen, "UTF-8");

The new String constructor will automatically replace any invalid character (i.e. incomplete utf-8 char; we may only have one at the end) with the character \uFFFD, which looks like an empty rectangle. This character requires 3 bytes in utf-8 - therefore we decrease DB_FIELD_LENGTH by 2; the resulting string will have either exactly maxLen bytes if its last byte(s) is a valid utf-8 character or maxLen+2 bytes if it isn't valid and this 1 byte was replaced by \uFFFD (3B).

Approach 2: Skip the invalid trailing byte(s) altogether

If you don't want to have the rectangle character in the place of a split multibyte character, you must do yourself what the String constructor does internally, in a bit different way:

import java.nio.*; import java.nio.charset.*;
Charset utf8Charset = Charset.forName("UTF-8");
CharsetDecoder cd = utf8Charset.newDecoder();
byte[] sba = string.getBytes("UTF-8");
// Ensure truncating by having byte buffer = DB_FIELD_LENGTH
ByteBuffer bb = ByteBuffer.wrap(sba, 0, DB_FIELD_LENGTH); // len in [B]
CharBuffer cb = CharBuffer.allocate(DB_FIELD_LENGTH); // len in [char] <= # [B]
// Ignore an incomplete character
cd.onMalformedInput(CodingErrorAction.IGNORE)
cd.decode(bb, cb, true); 
cd.flush(cb);
string = new String(cb.array(), 0, cb.position());

The string will end with the last valid character in the given range.


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser