Signup/Sign In

Conversion between String and Byte Array

We will often need to convert a string to a byte array(encoding) or a byte array to a string(decoding). A string is a sequence(or array) of Unicode char values. We can map each character of the array to a byte value and generate a byte array. A charset provides the mapping between characters and bytes. This charset is needed during the encoding and decoding process. In this tutorial, we will learn the conversion between a string and a byte array.

String to Byte Array(Encoding)

There are a few ways to obtain a byte array from a string. Let's learn how to do this.

Using getBytes() Method

The getBytes() method of the String class provides a convenient way to obtain a byte array. The String class contains three overloaded versions for the getBytes() method.

  • getBytes() - generates the byte array using the platform's default charset.
  • getBytes(String charsetName) - generates the byte array using the provided named charset.
  • getBytes(Charset charset) - generates the byte array using the Charset instance.

Let's use each one of these and see the output.

import java.util.Arrays;

public class Demo
{
	public static void main(String[] args)
	{
		String s = "demo!";
		byte[] byteArr = s.getBytes();
		System.out.print("String as Bytes: " + Arrays.toString(byteArr));
	}
}


String as Bytes: [100, 101, 109, 111, 33]

The getBytes() method, without any charset, uses the platform's default charset. This makes the byte array platform-dependent, and it may not decode correctly on some other system. We can view the default charset by using the Charset.defaultCharset() method.

import java.nio.charset.Charset;

public class Demo
{
	public static void main(String[] args)
	{
		System.out.print("Default Charset: " + Charset.defaultCharset());
	}
}


Default Charset: UTF-8

String getBytes(String charsetName) Method

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class Demo
{
	public static void main(String[] args) throws UnsupportedEncodingException
	{
		String s = "demo!";
		String namedCharset = "UTF-16";
		byte[] byteArr = s.getBytes(namedCharset);
		System.out.print("String as Bytes: " + Arrays.toString(byteArr));
	}
}


String as Bytes: [-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33]

We need to make sure that a validly named charset is used. Otherwise, we will get an UnsupportedEncodingException.

import java.io.UnsupportedEncodingException;

public class Demo
{
	public static void main(String[] args) throws UnsupportedEncodingException
	{
		String s = "demo!";
		String namedCharset = "UTF-40";//UTF-40 is not a valid encoding
		byte[] byteArr = s.getBytes(namedCharset);
	}
}


Exception in thread "main" java.io.UnsupportedEncodingException: UTF-40
at java.base/java.lang.StringCoding.encode(StringCoding.java:440)
at java.base/java.lang.String.getBytes(String.java:959)
at Demo.main(Demo.java:10)

String getBytes(Charset charset) Method

import java.nio.charset.Charset;
import java.util.Arrays;

public class Demo
{
	public static void main(String[] args)
	{
		String s = "demo!";
		Charset charset = Charset.forName("UTF-16");

		byte[] byteArr = s.getBytes(charset);
		System.out.print("String as Bytes: " + Arrays.toString(byteArr));
	}
}


String as Bytes: [-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33]

We can also a Standard Charset to create a Charset instance.

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Demo
{
	public static void main(String[] args)
	{
		String s = "demo!";
		Charset charset = StandardCharsets.UTF_16;
		byte[] byteArr = s.getBytes(charset);
		System.out.print("String as Bytes: " + Arrays.toString(byteArr));
	}
}


String as Bytes: [-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33]

Using a Charset instance will replace unsupported characters with a default replacement byte. The other two overloads do not have a defined behavior in the case of unsupported characters. As we can see in the output below, all the emojis are replaced by the default replacement byte value of 63.

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Demo
{
	public static void main(String[] args)
	{
		String s = "demo????????????";
		Charset charset = StandardCharsets.ISO_8859_1;
		byte[] byteArr = s.getBytes(charset);
		System.out.print("String as Bytes: " + Arrays.toString(byteArr));
	}
}


String as Bytes: [100, 101, 109, 111, 63, 63, 63]

Using Charset.encode() Method

Instead of using the getBytes(Charset charset) method, we can use directly use the encode() method of the Charset class. This method will also use a default replacement byte for unsupported characters. It returns a CharBuffer, and we use the array() method to convert it to a byte array.

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Demo{
    public static void main(String args[])
    {
      String s = "demo????????????";
      Charset charset = StandardCharsets.ISO_8859_1;
	  byte[] byteArr = charset.encode(s).array();
	  System.out.print("String as Bytes: " + Arrays.toString(byteArr));
    }
}


String as Bytes: [100, 101, 109, 111, 63, 63, 63, 0, 0, 0]

Using CharsetEncoder

The CharsetEncoder provides us with a lot more control over the encoding process. We can define the encoding in the case of unsupported characters or malformed input(if the char sequence is not a valid Unicode sequence).

We will first create a CharsetEncoder and define the encoding process. We can use methods like onMalformedInput() or onUnmappableCharacter() to so this.

Next, we will use the encode() method to perform the encoding.

import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Demo {
    public static void main(String args[]) throws CharacterCodingException
    {
      String s = "demo????????????";
      //Creating the encoder and defining the encoding behavior
      CharsetEncoder charsetEncoder = StandardCharsets.ISO_8859_1.newEncoder();
      charsetEncoder.onMalformedInput(CodingErrorAction.IGNORE);//Ignore malformed input
      charsetEncoder.onUnmappableCharacter(CodingErrorAction.REPLACE)
      				.replaceWith(new byte[] {-121});//Replace unmappable character with 0
      
      byte[] byteArr = charsetEncoder.encode(CharBuffer.wrap(s)).array();
      
      System.out.print("String as Bytes: " + Arrays.toString(byteArr));
    }
}


String as Bytes: [100, 101, 109, 111, -121, -121, -121, 0, 0, 0]

In the code above, we have ignored the malformed input, and we are replacing the unmappable characters(emojis in our case) with -121. We can also use REPORT to return a CoderResult object or to throw a CharacterCodingException.

Byte Array to String

Just like encoding, decoding also requires a charset. Let's learn how to decode a byte array to get the underlying string.

Using String Constructor

The string constructor can take a byte array and initialize a string using the array. This approach is just the opposite of what the getBytes() method does.

We just need to pass the byte array to the constructor, and the string will be generated using the system's default charset. This approach is not recommended because the string may not have been encoded using the same default charset. In the following example, the original string was encoded using the UTF-16 charset, but the system's default decoding is UTF-8. Because of it, we will not get the expected results.

public class Demo
{
    public static void main(String args[])
    {
   		byte[] byteArr = {-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33};//the original string is "demo!"
    	String stringFromBytes = new String(byteArr);			
    	System.out.print("String from the byte array: " + stringFromBytes);
    }
}


String from the byte array: ?? d e m o !

Using Named Charset:

We can also pass a named charset to the constructor. An error is thrown if an invalid charset is mentioned.

import java.io.UnsupportedEncodingException;

public class Demo
{
    public static void main(String args[]) throws UnsupportedEncodingException
    {
   		byte[] byteArr = {-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33};//the original string is "demo!"	
   		String stringFromBytes = new String(byteArr, "UTF-16");			
    	System.out.print("String from the byte array: " + stringFromBytes);
    }
}


String from the byte array: demo!

Using Charset Instance:

The string constructor can also take a Charset class instance for decoding.

import java.nio.charset.Charset;

public class Demo
{
    public static void main(String args[])
    {
   		byte[] byteArr = {-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33};//the original string is "demo!"	
   		Charset charset = Charset.forName("UTF-16");
   		String stringFromBytes = new String(byteArr, charset);			
    	System.out.print("String from the byte array: " + stringFromBytes);
    }
}


String from the byte array: demo!

We can also use a Standard Charset.

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;

public class Demo
{
    public static void main(String args[])
    {
   		byte[] byteArr = {-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33};//the original string is "demo!"	
   		Charset charset = StandardCharsets.UTF_16;
   		String stringFromBytes = new String(byteArr, charset);			
    	System.out.print("String from the byte array: " + stringFromBytes);
    }
}


String from the byte array: demo!

Using Charset.decode() Method

Like the encode() method, the Charset class provides a decode() method to decode a byte array. If the array contains invalid input, then it is replaced with a default character.

import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;

public class Demo
{
	public static void main(String[] args)
	{
		byte[] byteArr = {-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33, -10};
		Charset charset = StandardCharsets.UTF_16;
		String stringFromBytes = charset.decode(ByteBuffer.wrap(byteArr)).toString();
		
		System.out.print("String from byte array: " + stringFromBytes);
	}
}


String from byte array: demo!?

Using CharsetDecoder

All the methods discussed above internally use the CharsetDecoder. It provides us with more control over the decoding process. Just like the CharsetEncoder, we can IGNORE, REPLACE, or REPORT unmappable characters or malformed characters. In the code below, we are replacing such characters with an asterisk(*).

import java.nio.ByteBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;

public class Demo
{
	public static void main(String[] args) throws CharacterCodingException
	{
		byte[] byteArr = {-2, -1, 0, 100, 0, 101, 0, 109, 0, 111, 0, 33, -10};
		CharsetDecoder charsetDecoder = StandardCharsets.UTF_16.newDecoder();

		charsetDecoder.onUnmappableCharacter(CodingErrorAction.REPLACE)
					  .onMalformedInput(CodingErrorAction.REPLACE)
					  .replaceWith("*");//Replace unmappable and malformed characters with *

	    String stringFromBytes = charsetDecoder.decode(ByteBuffer.wrap(byteArr)).toString();
	    System.out.print("String from Byte Array: " + stringFromBytes);
	}
}


String from Byte Array: demo!*

Summary

There are a lot of different ways to convert a string to a byte array or vice versa. The String class itself provides three overloaded getBytes() methods to create a byte array. We can get the string back from the byte array by using the String class constructor. It is recommended to use CharsetEncoder and CharsetDecoder for the conversion. They provide more freedom and control over the encoding and decoding process.



About the author:
I am a 3rd-year Computer Science Engineering student at Vellore Institute of Technology. I like to play around with new technologies and love to code.