mmuzahid February 2016

How to parse String as Binary and convert it to UTF-8 equivalent in Java?

I need to parse String content as binary sequence and convert them to its UTF-8 equivalent String.

For example, UTF-8 binary equivalents of B, A and R are as follows:
B = 01000010
A = 01000001
R = 01010010


Now, I need to convert a string "010000100100000101010010" to string "BAR"
i.e. For above case input string with 24 characters are divided into three equal parts(8 character in each part) and translated to its UTF-8 equivalent as a String value.

Sample Code:

public static void main(String args[]) {
    String B = "01000010";
    String A = "01000001";
    String R = "01010010";
    String BAR = "010000100100000101010010";

    String utfEquiv = toUTF8(BAR);//expecting to get "BAR"
    System.out.println(utfEquiv);
}

private static String toUTF8(String str) {
    // TODO 
    return "";
}

What should be the implementation of method toUTF8(String str){}

Answers


Jon Skeet February 2016

You should separate this into two problems:

  • Converting the string into a byte array by parsing the binary values
  • Converting the byte array back into a string using UTF-8

The latter is very straightforward, using new String(bytes, StandardCharsets.UTF_8).

For the first part, the tricky part is that Byte.parseByte won't automatically handle a leading 1... so I'd probably parse each 8-bit string into a short and then cast to byte:

public static byte[] binaryToBytes(String input) {
    // TODO: Argument validation (nullity, length)
    byte[] ret = new byte[input.length() / 8];
    for (int i = 0; i < ret.length; i++) {
        String chunk = input.substring(i * 8, i * 8 + 8);
        ret[i] = (byte) Short.parseShort(chunk, 2);
    }
    return ret;
}

Post Status

Asked in February 2016
Viewed 2,816 times
Voted 7
Answered 1 times

Search




Leave an answer