Java bytecode obfuscation

Zelix Klassmaster is a very powerful obfuscator for making Java bytecode harder to reverse engineer. You can get it from here: http://www.zelix.com/klassmaster/

Normally using a Java decompiler (such as jad) would give us something very close to the original Java source code, but after having put a compiled Java program through Klassmaster with aggressive flow obfuscation and string encryption this is no longer possible. Code paths can no longer be expressed in Java so the decompiler will have to give you the raw JVM instructions and all string literals have been encrypted so that they can not be read directly.

But the encrypted strings are of course usable by the program without us having to provide encryption keys. This means that we too should be able to reverse the encryption and get back the original strings.

I have used the following Java code which is put through Klassmasters obfuscation:

public class Test {
    public static void main (String args[]) {
        System.out.println("Hello, World!");
        System.out.println("How are you doing, World?");
        System.out.println("Goodbye, World!");
    }
}

A trip through Klassmaster and Jad gives us this:

import java.io.PrintStream;

public class Test
{

    public Test()
    {
    }

    public static void main(String args[])
    {
        boolean flag = A;
        System.out.println(B[2]);
        System.out.println(B[0]);
        System.out.println(B[1]);
        if(flag)
            z = !z;
    }

    public static boolean z;
    public static boolean A;
    private static final String B[];

    static 
    {
        String as[] = new String[3];
        as;
        as;
        0;
        "+\034\006W \021\026Q\016.\026S\025\030(\r\024]W\026\f\001\035\023~";
        -1;
          goto _L1
_L5:
        JVM INSTR aastore ;
        JVM INSTR dup ;
        true;
        "$\034\036\023#\032\026]W\026\f\001\035\023`";
        false;
          goto _L1
_L6:
        JVM INSTR aastore ;
        JVM INSTR dup ;
        2;
        "+\026\035\033.OS&\0303\017\027P";
        true;
          goto _L1
_L7:
        JVM INSTR aastore ;
        B;
_L1:
        JVM INSTR swap ;
        toCharArray();
        JVM INSTR dup ;
        JVM INSTR arraylength .length;
        JVM INSTR swap ;
        int i = 0;
          goto _L2
_L4:
        JVM INSTR dup ;
        i;
        JVM INSTR dup2 ;
        JVM INSTR caload ;
        byte byte0;
        switch(i % 5)
        {
        case 0: // '\0'
            byte0 = 0x63;
            break;

        case 1: // '\001'
            byte0 = 115;
            break;

        case 2: // '\002'
            byte0 = 113;
            break;

        case 3: // '\003'
            byte0 = 119;
            break;

        default:
            byte0 = 65;
            break;
        }
        byte0;
        JVM INSTR ixor ;
        (char);
        JVM INSTR castore ;
        i++;
_L2:
        JVM INSTR swap ;
        JVM INSTR dup_x1 ;
        i;
        JVM INSTR icmpgt 50;
           goto _L3 _L4
_L3:
        JVM INSTR new #41  ;
        JVM INSTR dup_x1 ;
        JVM INSTR swap ;
        String();
        intern();
        JVM INSTR swap ;
        JVM INSTR pop ;
        JVM INSTR swap ;
        JVM INSTR tableswitch 0 1: default 13
    //                   0 22
    //                   1 31;
           goto _L5 _L6 _L7
    }
}

Nice. The static block does the decryption of the string literals. I don't know about you but I don't think that the intermixing of nearly Java and JVM instructions is very helpful so here is a JVM instruction version of the static block (I used 'jad -dis' to get this):

    static 
    {
    //    0    0:iconst_3        
    //    1    1:anewarray       String[]
    //    2    4:dup             
    //    3    5:iconst_0        
    //    4    6:ldc1            #5   
    //    5    8:bipush          -1
    //    6   10:goto            38
    //    7   13:aastore         
    //    8   14:dup             
    //    9   15:iconst_1        
    //   10   16:ldc1            #6   
    //   11   18:iconst_0        
    //   12   19:goto            38
    //   13   22:aastore         
    //   14   23:dup             
    //   15   24:iconst_2        
    //   16   25:ldc1            #3   
    //   17   27:iconst_1        
    //   18   28:goto            38
    //   19   31:aastore         
    //   20   32:putstatic       #58  
    //   21   35:goto            160
    //   22   38:swap            
    //   23   39:invokevirtual   #47  
    //   24   42:dup             
    //   25   43:arraylength     
    //   26   44:swap            
    //   27   45:iconst_0        
    //   28   46:istore_0        
    //   29   47:goto            116
    //   30   50:dup             
    //   31   51:iload_0         
    //   32   52:dup2            
    //   33   53:caload          
    //   34   54:iload_0         
    //   35   55:iconst_5        
    //   36   56:irem            
    //   37   57:tableswitch     0 3: default 108
    //                   0 88
    //                   1 93
    //                   2 98
    //                   3 103
    //   38   88:bipush          99
    //   39   90:goto            110
    //   40   93:bipush          115
    //   41   95:goto            110
    //   42   98:bipush          113
    //   43  100:goto            110
    //   44  103:bipush          119
    //   45  105:goto            110
    //   46  108:bipush          65
    //   47  110:ixor            
    //   48  111:int2char        
    //   49  112:castore         
    //   50  113:iinc            0  1
    //   51  116:swap            
    //   52  117:dup_x1          
    //   53  118:iload_0         
    //   54  119:icmpgt          50
    //   55  122:new             #41  
    //   56  125:dup_x1          
    //   57  126:swap            
    //   58  127:invokespecial   #51  
    //   59  130:invokevirtual   #56  
    //   60  133:swap            
    //   61  134:pop             
    //   62  135:swap            
    //   63  136:tableswitch     0 1: default 13
    //                   0 22
    //                   1 31
    //   64  160:return          
    }

If you don't read JVM assembly I can highly recommend the book "Principles of Computer Organization and Assembly Language" which teaches the language quite well.

The first number in the listing is the instruction number. It is incremented by one on each new instruction. The second number is the offset within the static block and it is incremented by the size of the instruction + its arguments.

The quick overview

Instruction number 0 and 1 creates a new String array with room for three elements. This is for containing the decrypted strings.

Instruction number 2 to 7 is a group for decrypting one encrypted string literal and storing it in the array, and this group of instructions is duplicated for each string that needs to be decrypted.

First (instruction 2) the array is duplicated on the stack, then (instruction 3) the index into the array is pushed onto the stack. The index is different for each string. Then (instruction 4) the encrypted string is pushed onto the stack. Then an identifier (at instruction 5) is put onto the stack and this is used by a subroutine like piece of code to determine where to jump to when the subroutine is finished. JVM has actual subroutines which could have been used instead but I guess this is more cryptic. The code to jump back is found at instruction 136. Finally at instruction 6, the decrypting subroutine is called and at instruction 7 the result (found on the stack) is stored in the string array.

This is copied for each string but the index into the array, the encrypted string and the identifier are of course different.

After the last string has been decrypted the string array is stored in a static variable (at instruction 20) and at instruction 21 we jump to address 160 which returns from the static block.

Decrypter subroutine

The JVM supports actual subroutines but the decrypter is not implemented as one...instead some tricks which kind of emulates this has been used.

The decrypter starts at instruction 22 and runs to instruction 63. It expects the top element on the stack to be an integer which identifies where to jump back when decryption is done. The next element on the stack is expected to be the encrypted string.

First (instruction 22) the two top elements are swapped and then toCharArray() is called (instruction 23) on the encrypted string which is now the top element. After this the top element is a reference to the char array which is duplicated (instruction 24).

The duplicate is replaced with the arrays length (instruction 25) and the char array and its length are swapped (instruction 26).

At instruction 27 and 28 local variable 0 is initialized to zero (a counter ?) and we jump to instruction 51 (at instruction 29).

From instruction 51 we first swap the two top elements and then the dup_x1 instruction duplicates the top element but adds it underneath the second element. After this the top of the stack contains <char array length>, <char array>, <char array length>. The two instructions combined means 'put a duplicate of the second element on the top'.

In instruction 53 and 54 local variable 0 and the top element (char array length) are compared and if the char array length is greater than the local variable we jump to instruction 30. After this the stack contains (from top to bottom and only values relevant to the decryption routing): <char array>, <char array length>, <jump back identifier>

From instruction 30 we first duplicate the char array, then load local variable 0 to the stack, and then duplicates the top two elements on the stack.

At instruction 32 we replace the two top elements with the character at index 'local var 0' in the char array. So variable 0 is an index into the char array.

From instruction 34 to 36 we calculate 'local variable 0 modulus 5' whose result is now on the stack.

This result is used to push one of five integers onto the stack. From instruction 37 to 46 we use the result to switch into putting either 99, 115, 113, 119 or 65 onto the stack (replaces the 'var 0 modulus 5' result). What are these values (I have a guess but read on).

At instruction 47 the XOR of the two top elements (one of the five values from before and the char from the char array) is computed and replaces the two top elements on the stack, and (in instruction 48) converted from an integer to a char, which in instruction 49 is then placed into the char array at the index held in variable 0.

In instruction 50 the local variable 0 is incremented by one (the stack is not changed).

Now we are back at instruction 51 ready for another round. This goes on until variable 0 has the same value as the array length. In other words this is an XOR decryption routine and the key length in this case is five and the key bytes are the ones that we switched over. The value five is probably chosen because the JVM has special instructions for pushing the values 0 to 5 onto the stack thus making the code smaller.

But the routine is not over yet. When the counter has reached the length of the array we continue on to instruction 55 where a new string object is first created, then duplicated (beneath the char array) and its constructor taking a char array as parameter is called. The duplicate is used for calling 'intern()' upon the string.

At 60 to 62 the char array is removed from the stack leaving the decrypted string and the jump back id. These are swapped and the id is used in the tableswitch at instruction 63 to jump back to where the decrypter was called from.

Am I right ?

So, if I am right the following java code should write out the strings:

public class Verify {
    private static String decrypt(String str) {
        char key[] = new char[] {99,115,113,119,65};
        char arr[] = str.toCharArray();
        for (int i = 0; i < arr.length; i++) {
            arr[i] ^= key[i % 5];
        }
        return new String(arr);
    }

    public static void main (String args[]) {
        System.out.println(decrypt("+\034\006W \021\026Q\016.\026S\025\030(\r\024]W\026\f\001\035\023~"));
        System.out.println(decrypt("$\034\036\023#\032\026]W\026\f\001\035\023`"));
        System.out.println(decrypt("+\026\035\033.OS&\0303\017\027P"));
    }
}

Let's try it out:

$ javac Verify.java 
$ java Verify
How are you doing, World?
Goodbye, World!
Hello, World!
$ 

Yay!

Checking out several decompiled files show that the key is always five characters long but the values change. This makes sense.

Also I noticed that sometimes the decrypter is an actual subroutine (and the jump is using the 'jsr' instruction). This probably also makes sense when having many strings. Otherwise the tableswitch would have to be quite large thus making the machine code large.

Now all I need to do is create a tool for patching a class file. That could be interesting.

To be continued.