Introduction to Android Bytecode Exploitation (Part 1)

Android resides among the most popular operating systems for mobile devices, which causes Android to also be among the most popular targets for exploitation. While Android is frequently updated to fix the latest CVEs, malicious actors already search for new vulnerabilities, as gaining control over millions of computationally powerful devices is very appealing. The market shares underpin that Android is by far the most lucrative platform for malicious actors targeting mobile platforms.

Mobile OS Market Shares

Luckily, Android comes with various enforced security mechanisms. Depending on what layer is considered, there are for example the higher - level Android permission system encouraging to adhere to the principle of least privileges, and the lower - level Address Space Layout Randomization (ASLR) randomizing the layouts of process images so adversaries cannot easily predict the locations of critical code or data. Furthermore, apps are isolated from each other, which prevents malicious apps from e.g. manipulating a banking app’s internal storage. Overall, for an adversary to be “successful”, depending on the adversaries’ goals, there are plenty of security mechanisms to bypass.

The attack surface of Android is large and consists of, among other things, the intent system and the associated binder, socket communication and app configurations. Adding to the pile, Android apps can utilize the Java Native Interface (JNI) to delegate parts of an application to the native layer, potentially increasing app performance. However, using native code, i.e. code written in memory unsafe languages like C/C++ that runs directly on the CPU, brings in the problems of that specific language. Memory unsafe languages are famous for memory corruption vulnerabilities, which would be impossible to get when writing apps only in e.g. Java. For a concrete example of a Write - What - Where (WWW) condition, consider the following sample code:

extern "C"
JNIEXPORT void JNICALL
Java_<package_path>_<class_name>_<native_method_name>(JNIEnv *env,
        jclass clazz,
        jlong address, jlong value) {
    *(uint64_t*)address = (uint64_t)value;
}

This could still occur (in a more subtle way) inside a legitimate app that uses JNI. Luckily, by default Android uses security mechanisms like stack canary, NX, ASLR, Fortify, RELRO and the hardened heap allocator Scudo to name a few examples. Further, Android provides vendors with the ability to build entire components with ShadowCallStack and Control - Flow Integrity (CFI) . For a concrete example, consider the output of checksec on libart.so:

$ ./checksec --version                                                                           
checksec v2.7.1, Brian Davis, github.com/slimm609/checksec.sh, Dec 2015
Based off checksec v1.5, Tobias Klein, www.trapkit.de, November 2011
$ ./checksec --file=libart.so
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH	Symbols		FORTIFY	Fortified	Fortifiable	FILE
Full RELRO      Canary found      NX enabled    DSO             No RPATH   No RUNPATH   36417 Symbols	N/A	0		0		libart.so

With all these mechanisms in place, the question we want to answer in this series of blog posts is the following:

Assuming exploitation on native code level using techniques that directly work with native code (.*ROP, JOP, COOP, LOOP, …) does not work. What exploitation techniques specific to Android are still available to an attacker trying to exploit a native - level vulnerability?

There is plenty of research targeting the Java - and apk - layers and IPC mechanisms. However, to the best of our knowledge, there does not seem to be any research related to bytecode - based exploitation at runtime. Of course, patching an apk - file requires (static) bytecode injection, but this is not useful if an attacker wants to exploit a vulnerability that is only exposed at runtime. Furthermore, by understanding the offensive perspective, we may be able to derive security mechanisms that prevent a particular technique (memory safe languages do not count for now!). Surprisingly, it turns out that Android app security in presence of a memory error is the minimum of the security of native code and bytecode! Below figure illustrates the idea.

Android Security Gap

In other words, there are two execution models on Android, namely native and bytecode. Although each model utilizes its own mountain of security mechanisms, the total security level of an app in presence of a memory error is just as high as the smallest mountain. Thus, there is a difference between the expected and actual security in presence of a memory error, which I call security gap. For example, an attacker facing CFI and ShadowStack is likely to fail defeating the native mountain, but may succeed by conquering the bytecode mountain. Some security mechanisms have impact on both models, like ASLR and the permission system to name a few. In addition to practical security mechanisms, the defensive research community has developed a lot of strong, yet impractical solutions to prevent exploitation of memory errors.

Now, one may raise the question about practicality . Are there even apps that use JNI and vulnerable native - level functions? According to Almanee et al. , Android apps use a plethora of native - level libraries. Furthermore, it is shown that updating vulnerable libraries takes a considerable amount of time, meaning that adversaries have a good chance to successfully exploit a bug in an app using a library with a known CVE. Also, Borzacchiello et al. show that vulnerable native functions can often be reached from the Java side. With both studies combined, we gain the privilege to focus solely on exploitation of an assumed vulnerability, because vulnerabilities are known to be patched slowly and to be reachable. However, apps rarely state that they are affected by a vulnerability in one of their native libraries, which is why we skip over the hunt for a real - world vulnerability for now. If we show that exploitation of a native - level vulnerability (like stack - buffer overflow) using only bytecode is possible, then this means that exploitability of all apps using vulnerable libraries must be reconsidered. This stems from the fact that bytecode - based exploitation is not expected to be mitigated by native - level exploit mitigations. For a concrete example, even though ROP may be (almost) impossible to use to exploit a stack - buffer overflow without executable pointer leak, orthogonally, bytecode injection may very well be applicable.

Overall, the following topics are covered by individual blog posts:

  1. Introduction (this post): Gives an overview over the test environment, test device, resources used during research and some tips for (dynamic) Android app analysis.
  2. Android Process Fundamentals : Shows how bytecode execution is kicked off in Android 13, what common libraries reside in every App and how Java (JNI) methods are invoked. Also, some more advanced app analysis techniques are discussed.
  3. Bytecode Injection : First exploitation technique shows a major flaw in the design of (probably any) interpreter: data = code. A first PoC using bytecode injection, where e.g. ROP seems impossible, is discussed in detail.
  4. Bytecode Reuse : Second exploitation technique shows that solving the first flaw will not help, because (byte-)code reuse attacks are possible again. This is a very advanced and specific technique based on the ideas of COOP and insecure deserialization .

Of course, there is an associated github repository that contains all PoCs, deliberately vulnerable apps and more.

Notice that bytecode exploitation on Android as of writing is very tricky, requiring knowledge that seems out - of - place when talking about native - level exploitation. The goal of every PoC will be arbitrary command execution in the context of the vulnerable app. However, I claim that an attacker can fully take over the target app, including its privileges. Also, it may be that some observations about Android are sprinkled into the posts.

Before delving into the details, the setup must be discussed. Also, some information on the disclosure process is given.

Responsible Disclosure

All research results, including working PoCs, have been submitted to Google’s bug bounty program to ensure that publishing these blog posts does not cause any severe security problems and to give Google time to investigate the findings and respond, if necessary. Of course, there are no concrete vulnerabilities, but rather a new exploitation concept on Android. Also, I find it hard to estimate the practical impact of these blog posts, because many stars must align for bytecode injection and reuse to work, which is why I welcomed the feedback. Fortunately, Google decided the results are not a security concern and gave permission to publish blog posts on that matter!

Test Environment

Everything discussed in this series of blog posts has been thoroughly tested on a rooted Google Pixel 7, with build number TQ3A.230901.001.C2. This boils down to tag android-13.0.0_r78. The only reason the device is rooted is that it simplifies using gdb and frida-server.

Furthermore, we build deliberately vulnerable apps that expose their vulnerabilities via a simple network interface based on ServerSocket and Socket. These apps are built in release mode using default configurations of Android Studio (version: 2023.2.1 Patch 1 (Build #AI-232.10300.40.2321.11567975)), working with the assumption that the principle of security by default is followed thoroughly in Android Studio.

Analysis Setup

With a rooted Pixel 7, USB debugging is used to get a shell on the device via adb . Assuming the device is connected to a host, adb is set up and USB - debugging is enabled, run:

(host)$ adb shell
(device)$ su
(device)# 

to obtain a root shell.

Furthermore, gdbserver or frida-server are used to attach to a target app for detailed dynamic analysis. Luckily, there exists the /data/local/tmp directory, which is world - readable and - writeable. Using adb push <host file name> /data/local/tmp/<file name> allows moving e.g. gdbserver to the device. In turn, this enables debugging apps.

Note: Another interesting, world - readable file is /data/misc/shared_relro/libwebviewchromium64.relro . This contains actual virtual addresses used by apps that utilize WebView. At first glance, an unprivileged app does not gain anything from reading this file, because Android’s fork server architecture shares such information anyways.

Debugging Android Apps

To gain a deep understanding of control flow at runtime, especially during execution of an exploit, gdb is used. To be precise, gef is used, but this is personal preference. Associated with gdb is gdbserver , which allows attaching to a running process, like an app, based on the app’s pid. To figure out the pid of e.g. "youtube", run the following commands:

(devive)$ pm list packages youtube
package:com.google.android.youtube      # <--- probably this one
package:com.google.android.apps.youtube.music
(device)$ pidof com.google.android.youtube  # not running
(device)$ pidof com.google.android.youtube
11287

Combining this with gdb yields the following command for attaching to the target app:

(device)# /data/local/tmp/gdbserver :1337 --attach $(pidof <name of app>)

To connect gdb to gdbserver running on the device, the debug port must be forwarded:

(host)$ adb forward tcp:1337 tcp:1337

Before we can start debugging, it may be useful to get some symbol information. To that end, identify what libraries are relevant for the current task, use adb pull to pull them into a directory, say dbgtmp, and run:

(device)$ gdb-multiarch -q
gef➤  set solib-search-path ./dbgtmp/
gef➤  set solib-absolute-prefix ./dbgtmp/

Now, gdb knows where to look for symbols while debugging.

At last, gdb can connect to the server

gef➤  target extended-remote :1337
gef➤  sharedlibrary

These may seem like a lot of steps, but it is worth it!

About lldb

Of course, it is also possible to use lldb to debug an app. For example, one can utilize Android Studio’s integrated native debugging based on lldb to debug apps. lldb does a great job at resolving symbols, but there seems to be a lack of plugins. For example, I use gef for debugging Android apps, which comes with features like memory search and hexdumping. That being said, lldb is (hopefully) sematically equivalent to gdb and its plugins. It does not matter what debugger is used, but I prefer gdb.

Instrumentation via Frida

While debugging can also be automated, it is somewhat inefficient and does not always provide the features needed for dynamic analysis. For example, finding the representation of the java.lang.Class object seems like a daunting task when only equipped with a debugger. This is where frida comes to the rescue. Unfortunately, frida is too complex to fit a tutorial into this series of blog posts.

To get frida up and running, utilize frida-server :

(host)$ adb push frida-server /data/local/tmp/frida-server
(host)$ adb shell
(device)$ su
(device)# /data/local/tmp/frida-server

Now, to attach to some process based on the pid, use the following python code

import frida

device = frida.get_usb_device()
session = device.attach(<pid>)
script = session.create_script(f'''
    for (let i = 0; i < 10; i++) {{
        console.log("Hijacked {pid}");
    }}
''')
def on_message(message, data):
    if message['type'] == 'send':
        print(f'{message['payload']}')
    else:
        print(f'Something went wrong: {message["description"]}')
        print(message['stack'])

script.on('message', on_message)
script.load()

It may be beneficial to use format strings for setting up the js script to run inside the target app, e.g. to set a default block size for memory dumping.

Resources

The main resource, the foundation of all this research, is the source code of the Android Runtime (ART) . However, a lot of knowledge has been generated from simply testing hypotheses, days of debugging and trying to make as many cross - references to already well - known topics as possible. For example, having some knowledge on COOP attacks allowed me to come up with the technique for bytecode reuse. This is why some statements made throughout this series are without a concrete reference.

Conclusion

In this blog post, we laid some groundwork for understanding bytecode - based exploitation on Android. Most importantly, we discussed the execution environment and how to set up gdb and frida. The next post will be a deep dive into various Android components, especially the fork server architecture and its implications for Android apps!