Fundamentals for Bytecode Exploitation (Part 2)

Exploiting a vulnerability always requires a certain knowledge about the operating system, including how processes are launched, what libraries are used and how control - flow “works”. While the latter could be considered coming from the architecture, this is not always the case on Android, because the Android RunTime (ART) provides ways to call bytecode methods and redirect bytecode control - flow. Hence, ART dictates how bytecode control - flow works, not directly the underlying CPU. Understanding the above mechanisms is the minimal requirement for understanding bytecode - based exploitation. Based on that, more sophisticated analysis techniques can be built specifically for Android bytecode, to make bytecode - based exploitation feasible.

In this blog post, we first dive into how Android bytecode methods are invoked, which entails a discussion about how Java classes, objects, methods and fields are handled in memory by ART. Then, we take a look at Android’s fork server architecture. Building on this, interesting memory regions common to all Android apps are discussed, which include .dex, .so, .art and .oat files. During all of this, I showcase analysis methods used to practically verify the discussed topics.

Android Runtime

Every Android app contains the libart.so library, which is the ART implementation. This is probably the most vital library for Android apps! Not only that, but from the perspective of an attacker trying to abuse a memory error in a JNI function, libart.so is gadget heaven. Among other things, libart.so is responsible for executing bytecode. Further, it dictates the memory layout of methods, fields, classes and thus objects. We start off with some basics on ART structures and Android bytecode, then discuss how Android bytecode is executed.

ART Structures

Bytecode operates on data representing methods, fields, classes, objects and more. Therefore, before looking into concrete bytecode implementations, these structures must be discussed. From now on, we assume we are looking at an app written in Java. Also, we do not consider inheritance for simplicity.

The most famous structure is probably the (Java) object, which is defined in mirror::Object . Surprisingly, an object consists of only two fields:

class Object {
    HeapReference<Class> klass_;    // 32-bit value
    uint32_t monitor_;
}

Overall, this means an object is just a 64 - bit value, which seems odd given the fact that in Java we can access all kinds of fields. Conceptually comparable to how Chromium’s JavaScript engine V8 handles dynamic objects using the “static” language C++, Android uses a mirror::Class to define where to find the fields of an object. So, although the C++ representation of an object does not explicitly account for all accessible fields, in memory an object is followed by the concrete values of its fields. The associated Class stored in object->klass_ contains an array of ArtField objects called Class::ifields_. Each ArtField represents a single field by specifying e.g. the field’s type, access flags (like private) and offset relative to the end of the C++ object in memory. Fields stored relative to an object/instance are called instance fields. Static fields are stored relative to the Class.

In other words, an object looks like this in memory:

0x00: object->klass_
0x04: object->monitor_
0x08: object->field0
0x0c: object->field1
...

Usually, fields are 32 - bit values, but some primitive types like long and double require 64 bits. Because ArtFields only tell the runtime where to find a field inside an object’s memory and what type that field is, objects are very dynamic in terms of shape. By the way, speaking V8, the Class is somewhat similar to the shape of an object. Consider the following visualization:

Bytecode Object Memory Layout

Although there can be many fields for a particular class instance, Android enforces a particular field order . Hence, unless e.g. field types change, the field order inside an object is constant and thus predictable.

Regarding method invocations, Android utilizes a structure called ArtMethod . The comments indicate what a field means for a bytecode method:

class ArtMethod {
    uint32_t access_flags_; // A lot of potential flags like "public"
    uint32_t dex_method_index_; // Index into .dex file (context)
    uint16_t method_index_; // Index into vtable (?)
    union {
        uint16_t hotness_count_;    // How often this method is called. Triggers JIT compilation on 0, else > 0.
        uint16_t imt_index_;    // Index into Interface Method Table (IMT)
    };
    struct PtrSizedFields {
        void *data_;    // Reference to code_item in .dex file
        void *entry_point_from_quick_compiled_code_;    // Reference to ExecuteNterpImpl
    } ptr_sized_fields_;
}

Of course, an ArtMethod does not necessarily need to represent a bytecode method. The ART provides many kinds of methods, of which a bytecode method is just a single instance.

There are various ways an ArtMethod can be linked into a Class, because e.g. Java provides many different types of methods and thus method invocations. For example, if a method is “normal”, like

class Me {
    public void run() {
        System.out.println("What am I reading??");
    }
}
...
new Me().run();

then the method run is invoked via the invoke-virtual bytecode. This means that (a pointer to) the ArtMethod is stored inside the embedded vtable of the class Me, which is located right behind the Class object in memory.

Obviously, things are never simple. To avoid method resolution everytime a method is invoked, Android uses multiple caching layers, one of which is implemented by the DexCache class. In a nutshell, a DexCache is linked to a single .dex file and tries to prevent the runtime from repeatedly parsing the .dex file to e.g. get a type definition. Every Class instance is linked to a single DexCache instance. When invoking a method on an object, the DexCache associated with the class that declares the calling method is used to look up the ArtMethod before parsing the .dex file (if no lower cache layer contains the method). This means any method is bound to a single DexCache and thus to a single .dex file, which I define as the context of a bytecode method. Each method is restricted to the types, methods etc. defined in the context, i.e. the .dex file of the DexCache associated with the declaring class of the method.

Notice we are only scratching the surface! In a later post, we will investigate how interface method invocation works in detail. Also, observe that there are a lot of cross - references among the above structures. Objects reference their classes, which again reference methods each pointing back to their declaring class. Fields have relative pointers into objects and are referenced by a class. This is what makes understanding and manipulating such structures highly complicated!

Android Bytecode

Android bytecode (actually called Dalvik bytecode, but whatever), is what is executed by Android’s interpreters. As we focus on Android 13, the main interpreter is called nterp and is mostly written in assembly. There is another interpreter called the switch interpreter, written in C++, but it seems to be rarely used. A single bytecode instruction consists of at least two bytes. Moreover, the number of bytes required for each bytecode instruction is divisible by two. This stems from the fact that bytecode works with so called code units, where 1 code unit = 2 bytes.

Lets consider an example:

01 12x move vA, vB

Actually, the instruction is move vA, vB, because the value 0x01 is the opcode indicating that this instruction is a move and 12x is the format of the instruction. E.g. 12x means op vA, vB. A concrete instance of the above instruction would be

move v0, v1

where the value in virtual register v1 is moved/copied into vreg v0. On Intel x64, this is somewhat similar to mov eax, ebx.

A single vreg can hold a 32 - bit value. Further, each vreg is associated with a virtual reference (vref), which also holds 32 bits. However, vrefs are never accessed directly, i.e. there will never be an instruction like move rA, rB. Instead, vrefs are used to handle object references in the background. For example, let r0 be the vref associated with vreg v0. If v0 holds an integer, i.e. a non - object value, then r0 = 0. However, if v0 holds an object, then r0 also holds the same object. Of course, holding an object means holding an object reference, which must be a valid 32-bit pointer. Depending on what the method containing the bytecode requested, more vregs may be accessible, like v1 or v65535. Vreg indices are always unsigned and not only depend on what the method requested, but also on the bytecode operating on the vreg.

In memory, the above example move v0, v1 looks like so:

01 10

As stated earlier, 01 is the opcode indicating a move. Furthermore, the format op vA, vB indicates that vreg indices are 4 bits each. Basically, each letter in a vreg description, like A in vA, represents another 4 bits available to encode a vreg index. In this case, vA and vB may range from v0 to v(2**4-1)=v15. When writing Java code, there is no way to manually specify the vreg indices. However, when writing bytecode shellcode, care must be taken to not exceed bytecode instruction - enforced ranges. Otherwise, indices may be different from what is needed in the exploit, or throw off following bytecode instructions.

One pecularity remains to be discussed: why is v0, v1 translated to 10 in the above example? This stems from the little - endian architecture of the test device. Basically, the format enforces a memory layout like so 01 BA.

Lets consider another example: a “normal” method invocation

6e 35c invoke-virtual {vC, vD, vE, vF, vG}, meth@BBBB

Similar to before, vC to vG are given 4 bits each to index the vregs that hold method arguments. Two things are odd though:

  1. Where is A in the above definition?
  2. What is meth@BBBB?

Regarding the first point, A is a 4 bit value that determines the actual number of arguments passed. Otherwise, any method invocation via invoke-virtual would be forced to handle exactly 5 arguments. Secondly, meth@BBBB is a 16 - bit unsigned index into the method table of the .dex file of the calling/current method, i.e. the context. To make sure this is clear, consider the following example:

public void run() {
    ...
    invoke-virtual {v0, v1}, meth@1234
    ...
}

Here, the context is the .dex file associated with the ArtMethod representing run! Therefore, if we try to call a method that does not appear inside the context, like Runtime::exec(String command), then an error will be thrown or worse a method with the same method index will be called and cause a crash!

The format of invoke-virtual dictates a memory layout like 6e AG BB BB DC FE. If A < 5, then any superfluous vreg indices are set to 0, although they are ignored anyways. Of course, BBBB is stored in little endian.

Finally, lets consider one last example: creating arrays directly from bytecode

26 31t fill-array-data vAA, +BBBBBBBB 

This time, vAA may be any of v0 to v255. Furthermore, +BBBBBBBB is a 32-bit signed branch offset relative to the address of this instruction in code units. Assuming dex_pc is the virtual address of a fill-array-data instruction, to compute the address this instruction is referencing do the following: dex_pc + 2 * (+BBBBBBBB). Because the 32-bit value is signed, whatever is referenced may be located above or below the instruction. Interestingly, the data structure referenced is a fill-array-data-payload:

NameFormat
ident0x0300 (u16)
element_widthu16
sizeu32
data[u8]

Note: Java uses two bytes for a single char. So, to create a char[] directly from bytecode, we must ensure that element_width = 2. Bytecode is full of such caveats!

Android Bytecode Execution

Execution of bytecode requires an interpreter, in this case nterp. Given an ArtMethod, the interpreter starts with ExecuteNterpImpl . It sets up the execution environment, which includes

  1. Spilling registers on stack. Bytecode uses fixed hardware registers , just like native code.
  2. Allocating memory for vregs and vrefs. These are initialized to 0.
  3. Setting up arguments to be passed to the invoked method.

Below are the most important hardware registers and their descriptions:

RegisterDescription
x19Thread pointer
x29Interpreted frame pointer
x25Base of vrefs
x22Interpreted program counter (dex_pc)

Usually, native code execution involves managing stack frames, which seems daunting considering the complexity of ART. Luckily, the documentation of nterp, despite spread over various files, is pretty good. Hence, the nterp stack layout is fully documented:

/*    ----------------
 *    |              |      All callee save registers of the platform
 *    | callee-save  |      (core and floating point).
 *    | registers    |      On x86 and x64 this includes the return address,
 *    |              |      already spilled on entry.
 *    ----------------
 *    |   x86 args   |      x86 only: registers used for argument passing.
 *    ----------------
 *    |  alignment   |      Stack aligment of kStackAlignment.
 *    ----------------
 *    |              |      Contains `registers_size` entries (of size 4) from
 *    |    dex       |      the code item information of the method.
 *    |  registers   |
 *    |              |
 *    ----------------
 *    |              |      A copy of the dex registers above, but only
 *    |  reference   |      containing references, used for GC.
 *    |  registers   |
 *    |              |
 *    ----------------
 *    |  caller fp   |      Frame pointer of caller. Stored below the reference
 *    ----------------      registers array for easy access from nterp when returning.
 *    |  dex_pc_ptr  |      Pointer to the dex instruction being executed.
 *    ----------------      Stored whenever nterp goes into the runtime.
 *    |  alignment   |      Pointer aligment for dex_pc_ptr and caller_fp.
 *    ----------------
 *    |              |      In case nterp calls compiled code, we reserve space
 *    |     out      |      for out registers. This space will be used for
 *    |   registers  |      arguments passed on stack.
 *    |              |
 *    ----------------
 *    |  ArtMethod*  |      The method being currently executed.
 *    ----------------  <----- STACK POINTER: qword [SP] = ArtMethod*
 */

Hence, vrefs and vregs are adjacent arrays of 32-bit values. Furthermore, way up the stack are callee - save registers, which will include the dex_pc of the calling function, if the calling function is also a bytecode method, i.e. there is a new return address - do you see where this is going? >:)

Bytecode Oddities

With the basics out of the way, lets quickly gloss over some weird things in bytecode. It seems like Android’s interpreter nterp is making the following assumption:

Bytecode is correct.

This is well - founded under another assumption that bytecode can only originate from e.g. Java or Kotlin. However, in a setting where an attacker is able to control e.g. a stack - buffer overflow, so that the stored bytecode return address is overwritten to point to attacker - controlled memory, this assumption does not hold anymore. Overall, the assumption chain may look like (Bytecode can only originate from Java/Kotlin => Bytecode is correct) => No need for checks.

Consider the implemention of a previous sample bytecode instruction: move vA, vB

%def op_move(is_object="0"):
    /* for move, move-object, long-to-int */
    /* op vA, vB */
    lsr     w1, wINST, #12              // x1<- B from 15:12
    ubfx    w0, wINST, #8, #4           // x0<- A from 11:8
    FETCH_ADVANCE_INST 1                // advance rPC, load wINST
    GET_VREG w2, w1                     // x2<- fp[B]
    GET_INST_OPCODE ip                  // ip<- opcode from wINST
    .if $is_object
    SET_VREG_OBJECT w2, w0              // fp[A]<- x2
    .else
    SET_VREG w2, w0                     // fp[A]<- x2
    .endif
    GOTO_OPCODE ip                      // execute next instruction

With a basic understanding of assembly, we cannot seem to spot any bounds checks on vreg indices… Surely they are located in macros like GET_VREG and SET_VREG(_OBJECT)!

.macro GET_VREG reg, vreg
    ldr     \reg, [xFP, \vreg, uxtw #2]
.endm
.macro GET_VREG_OBJECT reg, vreg
    ldr     \reg, [xREFS, \vreg, uxtw #2]
.endm
.macro SET_VREG reg, vreg
    str     \reg, [xFP, \vreg, uxtw #2]
    str     wzr, [xREFS, \vreg, uxtw #2]
.endm
.macro SET_VREG_OBJECT reg, vreg
    str     \reg, [xFP, \vreg, uxtw #2]
    str     \reg, [xREFS, \vreg, uxtw #2]
.endm

There are no checks! Looking back at the nterp stack frame allows estimating the impact: (almost) arbitrary stack index out - of - bounds access (i.e. read and write). Interestingly, all bytecode implementations I reviewed missed out on index bounds checks.

What is more is that, if we claimed that e.g. v0 contains an object and called move-object v42, v0, then nterp would happily copy it over. Notice that move_object calls op_move(is_object="1"). For a concrete example of how to interpret an arbitrary value as an object reference, consider the case where the method is allocated enough space to hold four vregs, i.e. v0 to v3 are valid. Knowing that r0 to r3 precede v0 in memory, this also implies that r4 = v0, i.e. r4 and v0 overlap. Hence, setting v0 makes v4 interpret the contents of v0 as an object. Another, similar approach can be seen below:

const-wide/32 v0, <fake object address>
move-object v0, v0

This works, because op_move v0, v0 moves the value in vreg v0 into both, v0 and r0. I.e. when calling move-object vA, vB, it does not matter whether vB actually contains an object. After the move, whatever was in vB (not rB) will be interpreted as an object in vA. This is kind of like a fakeobj primitive in browser exploitation, i.e. a type confusion.

In a nutshell, Android’s interpreter nterp blindly trusts that the bytecode to execute is correct and does not try to “harm” the execution environment. This is what makes bytecode injection a nice intermediate stage to eventually run a JITROP.

Fork Server Architecture on Android

A fork server architecture refers to a multi - process mechanism that creates new processes from a base process using the fork syscall, often as a reaction to a certain event. Most importantly, whatever process invokes fork is duplicated, and control - flow for both, parent (old) and child (new) processes, continues right behind the fork syscall instruction. Usually, a C program calling fork looks like so:

pid_t pid = fork();
if (pid < 0) {
    // error
} else if (pid == 0) {
    // child process
} else {
    // parent process
}

Assuming the child process does not run execve or similar, child and parent share the same memory layout. This means e.g. that the base addresses of common libraries like libc.so.6 are identical in child and parent processes! Given a vulnerability in the parent process and that an attacker already controls the child process, information leaks are often not required anymore for successful exploitation. To take this even further, not just the layout is identical, but also majority of the contents.

While duplicating processes seems like a bad idea from a security perspective, especially when memory unsafe languages are involved, the fork server architecture can function as a form of optimization. Specifically, on mobile devices with significantly less computational power and memory than your average PC, operating systems can pre - initialize common parts of all apps in a particular base process. Then, when a user starts an app, the base process is forked and app - specific behaviour is loaded - no need to repeat the same initialization phase for each app.

Of course, Android does exactly that: uses a base process called zygote64 to set up, among other things, the JVM. Also, common shared libraries are loaded, garbage collection is set up, some common objects are created and much more. Although this increases app performance, especially during startup, it also implies that every app knows the majority of the memory layout of all other apps. Notice that the process used for forking, i.e. zygote64, is run by root. Therefore, all “normal” apps know a lot about the memory layout and contents of a root process!

Now, lets figure out what and how many memory regions are actually shared!

Maps Diffing

A very simple heuristic to figure out common memory regions over a set of running processes is to use the /proc/<pid>/maps file. Usually, on any operating system not using a fork server architecture, we would expect to have no matching memory regions in terms of name, virtual address, size and permissions. However, knowing that apps are forked from zygote64, we can simply take the intersection of the sets of memory regions of all processes in question to get an approximation for all duplicated memory regions.

To get this job done, we can reuse an existing maps parser . The most crucial code can be seen below:

MAPS_LINE_RE = re.compile(r"""
    (?P<addr_start>[0-9a-f]+)-(?P<addr_end>[0-9a-f]+)\s+  # Address
    (?P<perms>\S+)\s+                                     # Permissions
    (?P<offset>[0-9a-f]+)\s+                              # Map offset
    (?P<dev>\S+)\s+                                       # Device node
    (?P<inode>\d+)\s+                                     # Inode
    (?P<pathname>.*)\s+                                   # Pathname
""", re.VERBOSE)

@dataclass()
class Record:
    addr_start: int
    addr_end: int
    perms: str
    offset: int
    dev: str
    inode: int
    pathname: str
    ...

def parse(lines: List[str]) -> List[Record]:
    """Parses maps records from the list of all lines in a maps file.
    """
    return [
        Record(*m.groups())
        for line in lines
        if (m := MAPS_LINE_RE.match(line)) is not None
    ]

def diff_regions(lhs: List[Record], rhs: List[Record]) -> List[Record]:
    """Performs lhs = lhs <intersect> rhs in terms of set intersection.
    """
    return [
        lhs[lhs.index(r)]
        for r in rhs
        if r in lhs
    ]

Then, with a list of memory regions for each target app, taking the intersection gives the approximate set of duplicated memory regions. Of course, we could simply use zygote64 and any app to get the set of memory regions coming directly from the base process. For a sample comparison on the test device, consider the following output:

(host)$ python3 ./maps_differ.py <pid of zygote64> <pid of com.google.android.youtube>
...
[1754]: 0x78f19d7000 - 0x78f19d9000 (offset: 0x0): [anon:.bss] (rw-p)
[1755]: 0x7fdab74000 - 0x7fdab75000 (offset: 0x0):  (---p)
[1756]: 0x7fdab75000 - 0x7fdb374000 (offset: 0x0): [stack] (rw-p)
Total: 1757

Despite looking like a big number, without a reference we cannot make any observations on that number alone. So, consider the number of entries in the /proc/$(pidof zygote64)/maps:

(device)# wc -l /proc/$(pidof zygote64)/maps
1784 /proc/752/maps

This means that 1757 out of 1784 maps entries may have been duplicated during the fork, which is roughly 98.5%. Notice that the child process may have unmapped or remapped some regions of its parent. The above intersection is done on all attributes of the maps entries, meaning that remapping a memory region already causes our analysis to discard that region. However, in this case only memory regions that have not changed are relevant for exploitation! To stabilize this approach, one can repeatedly restart an app and compute the set of common memory regions over multiple app restarts.

Note: Doing this maps diffing over device reboots reveals two duplicated memory regions, namely [anon:dalvik-main space (region space)] (rw-p) and [anon:dalvik-Sentinel fault page] (---p). However, rebooting the device should also re - randomize the layout of zygote64! Therefore, these two memory regions must be deterministic! While the sentinel fault page does not have any permissions set (maybe look into segfault handlers for special logic for that region), the first region is readable and writable. Hence, even a remote attacker can predict the location of a readable and writable memory region without a concrete information leak other than the target OS version! However, notice that this region is garbage collected and where (Java) objects are stored, so writing without understanding the GC will probably cause various crashes and be very instable.

Observe that getting an idea of what memory regions are duplicated across all apps allows to make reasonable assumptions on memory region positions in a local attacker scenario. If an attacker controls an unprivileged app, then knowing that e.g. libart.so is duplicated will enable the attacker to make the assumption that his/her location of libart.so is identical to the location in a victim app. Also, the only reason I spotted the deterministic memory regions was because of maps diffing. I never expected to find deterministic memory regions on modern devices and thus did not bother searching.

Memory Regions

With a set of duplicated memory regions at hand, we can now proceed with analysing what is actually shared. To save some time, the following entries are of interest:

[0001]: 0x6f3be000 - 0x6f64c000 (offset: 0x0): [anon:dalvik-/system/framework/boot.art] (rw-p)                               
[0008]: 0x6fa71000 - 0x6fa72000 (offset: 0x0): [anon:.bss] (rw-p)                                                            
[0425]: 0x763343d000 - 0x7633dbd000 (offset: 0x1a18000): /system/framework/framework.jar (r--p)
[0431]: 0x7635e00000 - 0x7635f54000 (offset: 0x0): /apex/com.android.art/lib64/libart.so (r--p)

There are plenty of resources discussing common file types like .so and .jar. However, .art is a bit trickier. Most importantly, .art files contain heap dumps of C++ objects. Because (Java) objects are based on C++ objects on Android, this means .art may also contain (Java) objects (which they do btw).

So, what about the [anon:.bss]? To this day, I have no clue what this is used for, but they proved to have the following important properties:

  1. Some [anon:.bss] regions are in 32-bit memory.
  2. Readable and writable permissions.
  3. At least 0x1000 bytes in size.

The first property is very important for object references in e.g. Java, because these are restricted to 32-bit pointers.

Further, the framework.jar file seems to contain the entire ART Java code and probably the standard library part like Runtime. This is important, because if an attacker is able to redirect bytecode control - flow to a chosen location, framework.jar will be gadget heaven. Hence, we will take a quick look at the internals of framework.jar.

Framework JAR File Components

It turns out that the framework.jar is just a zip archive. Unzipping using unzip reveals the following, filtered contents:

(host)$ unzip framework.jar
(host)$ ls
android  classes.dex  classes2.dex  classes3.dex  classes4.dex  com  META-INF  res

Everything except for the .dex files is a directory, probably describing the structure of .jar file and what data is provided, i.e. metadata. The most interesting components are .dex files , which contain the actual code provided by the .jar file, along with types, strings and more. .dex files are also part of .apk files, i.e. apps, and contain the application - specific and some framework code. Again, whatever .dex is the current context, we will be restricted to the resources provided by that .dex file, if the goal is to stay on bytecode - level.

To analyse .dex files a plethora of tools is available online. Mainly, we are interested in indices into certain tables like type and method tables. To that end, one can simply use dexlib2 or Topper .

Although I developed Topper for a different, now obsolete reason, its core engine still allows extracting the information needed. Basically, what can be done is the following:

> file --file <path to .dex file> --type DEX
classes.dex> list methods --regex "UiAutomation::executeShellCommand"
[Offset = 0x0]: classes.dex
  [Index = 0x39fc, Offset = 0x25e0f0, Num Regs = 0x10]: private android/os/ParcelFileDescriptor[] android/app/UiAutomation::executeShellCommandInternal(java/lang/String command, boolean includeStderr)
  [Index = 0x39fb, Offset = 0x25e05c, Num Regs = 0x8]: public android/os/ParcelFileDescriptor android/app/UiAutomation::executeShellCommand(java/lang/String command)
  [Index = 0x39fd, Offset = 0x25e1f0, Num Regs = 0x3]: public android/os/ParcelFileDescriptor[] android/app/UiAutomation::executeShellCommandRw(java/lang/String command)
  [Index = 0x39fe, Offset = 0x25e20c, Num Regs = 0x3]: public android/os/ParcelFileDescriptor[] android/app/UiAutomation::executeShellCommandRwe(java/lang/String command)
  [Index = 0x39fb]: android/os/ParcelFileDescriptor android/app/UiAutomation::executeShellCommand(Ljava/lang/String;)
  [Index = 0x39fc]: android/os/ParcelFileDescriptor[] android/app/UiAutomation::executeShellCommandInternal(Ljava/lang/String;, Z)
  [Index = 0x39fd]: android/os/ParcelFileDescriptor[] android/app/UiAutomation::executeShellCommandRw(Ljava/lang/String;)
  [Index = 0x39fe]: android/os/ParcelFileDescriptor[] android/app/UiAutomation::executeShellCommandRwe(Ljava/lang/String;)

Notice that loading classes.dex from framework.jar may take some time, because Topper tries to “decompile” all methods into their smali representations, all on a single thread (it was a research project btw.)! However, observe that methods are prefixed with information about their index, offset and number of used virtual registers. Some information is taken from the so - called code_item . Apparently, the .dex file contains duplicate definitions of methods, or, more likely, an abstract description and a concrete definition. When looking at methods, only consider the entries that have an offset and the number of registers set. With the above, it is apparent that framework.jar contains code that enables execution of shell commands.

Similar to method indices, type indices can be extracted like so:

classes.dex> list types --regex "android/app/UiAutomation"
[Offset = 0x0]: classes.dex
  [Index = 0x599]: android/app/UiAutomation$AccessibilityEventFilter
  [Index = 0x59a]: android/app/UiAutomation$ConnectionState
  [Index = 0x59b]: android/app/UiAutomation$IAccessibilityServiceClientImpl$1$$ExternalSyntheticLambda0
  [Index = 0x59c]: android/app/UiAutomation$IAccessibilityServiceClientImpl$1
  [Index = 0x59d]: android/app/UiAutomation$IAccessibilityServiceClientImpl
  [Index = 0x59e]: android/app/UiAutomation$OnAccessibilityEventListener
  [Index = 0x59f]: android/app/UiAutomation
  ...

So, one might ask why we use a tool to extract indices. In a practical setting, an attacker controlling an unprivileged app most likely does not have access to external tools like Topper. There are two perspectives to consider. First, these indices can be considered to be part of the manual component of an exploit, similar to what gadget offsets are in a simple, classical ROP chain. The other perspective is more similar to the idea of JITROP, where the app dynamically calculates the needed indices. Luckily, the latter is possible, because e.g. framework.jar is duplicated into every app process. If, however, an application - specific .dex file is required to make an exploit work, then only the first perpespective seems feasible, unless an attacker can get a hold of the target .apk file.

Alternative: dexdump

Another tool that may reveal the required information is dexdump . It seems to be available on Android devices by default, i.e. it can be run via

(device)$ dexdump ./classes.dex | head
Processing './classes.dex'...
Opened './classes.dex', DEX version '039'
Class #0            -
  Class descriptor  : 'Landroid/Manifest$permission;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Superclass        : 'Ljava/lang/Object;'
  Interfaces        -
  Static fields     -
    #0              : (in Landroid/Manifest$permission;)
      name          : 'ACCEPT_HANDOVER'
...

Although no guarantees are made that dexdump provides the information required for exploitation, it may be easier to utilize than Topper!

Region Diffing

Instead of comparing the memory layout of two apps over multiple app restarts or even device reboots, we can also inspect how the actual data changes, in fancy terms: region diffing. Again, we use a naive heuristic that simply compares two instances of the same memory region, each coming from a different app process (potentially the same app, just restarted), byte by byte. Because many memory regions are large, comparing two or more memory regions is time consuming. Hence, this post only considers a special region: [anon:dalvik-/system/framework/boot.art] (rw-p).

There are several ways to hijack bytecode control - flow on Android. One way revolves around replacing an existing, valid object with a fake object. boot.art contains heap - dumps consisting of e.g. objects and turned out to be quite reliable as regards hijacking control - flow. It is important to note that these dumped objects come from somewhere that exists before the app does, so their layouts and locations are most likely predictable. With the deterministic dalvik-main space, an attacker could try to replace an object with a fake one, but the GC often moves objects within that region, so that does not seem like a consistent, stable approach. Thus, boot.art seems like the best bet for this task.

To perform region diffing, we can use Frida . Similar to maps diffing, we start an app, pull its boot.art from memory and repeat. Before that, we can also ensure that boot.art is used at all. One way to do that is by simply changing the region’s permissions from rw- to ---, which should trigger a crash:

gef➤  vmmap boot.art
0x0000006fe57000 0x000000700e5000 0x00000000000000 rw- [anon:dalvik-/system/framework/boot.art]
gef➤  mprotect 0x0000006fe57000 0
gef➤  continue
Thread 15 "Profile Saver" received signal SIGSEGV, Segmentation fault.

Using a simple tool I wrote, we can obtain the following, interesting matching regions regarding boot.art:

$ python3 maps_differ.py
(Cmd) ...
(Cmd) search maps --regex ".*boot.art"
0x6fe57000 - 0x700e5000 (offset: 0x0): [anon:dalvik-/system/framework/boot.art] (rw-p)
...
(Cmd) diff region --count 4 --region-base 0x6fe57000 --names com.poc.poc_local
...
Found 4010 matching blob(s).
(Cmd) search blobs
0x6fe57000 - 0x6fe57e08 (size: 0xe08)
0x6fe57e09 - 0x6fe59bbc (size: 0x1db3)
0x6fe5700d - 0x6fe57012 (size: 0x5)
0x6fe57024 - 0x6fe57028 (size: 0x4)
0x6fe57034 - 0x6fe5703a (size: 0x6)
...
0x6fff9250 - 0x7003c656 (size: 0x43406)

From the above results, we can see that the first 0xe08 bytes of the region remain constant over multiple runs. Furthermore, a single byte at address <boot.art> + 0xe08 seems to change over multiple runs. After the single byte, another 0x1db3 bytes are constant. Also, towards the end of boot.art seems to be a very large constant region. Therefore, and because of the fact that boot.art contains dumped objects, boot.art is a suitable candidate for semantic analysis. Note that up to this point, we only checked whether resources change over app restarts via byte - by - byte comparison. Btw. some changing resources are also interesting. For example, an object located in boot.art may change, because the GC moved a referenced object to another location.

Monitoring Memory Accesses to boot.art

The goal is to figure out what resources in boot.art are accessed by an app. Knowing that Android apps adhere to the application lifecycle , we are especially interested in resources that are accessed during inevitable events like onStop and onDestroy!

Again, Frida helps monitoring memory accesses in the following way:

def region_to_js(region):
    return json.dumps({
        'addr_start': region.addr_start,
        'addr_end': region.addr_end
    })

def regions_to_js_array(regions):
    ar = '['
    for i, region in enumerate(regions):
        ar += region_to_js(region)
        if i < len(regions) - 1:
            ar += ','
    return ar + ']
...
script = session.create_script(f'''
    const regions = {regions_to_js_array(regions)};
    console.log(JSON.stringify(regions, null, 2));

    for (let region of regions) {{
        const frida_region = {{base: ptr(region.addr_start), size: region.addr_end - region.addr_start}};
        MemoryAccessMonitor.enable(frida_region, {{
            onAccess: function(details) {{
                console.log('Accessed ' + details['address'] + ' from: ' + details['from'].toString(16));
            }}
        }});
    }}
''')

What the above snippet does is monitor memory accesses to all regions specified in the regions list. From previous analyses, it is known that only one region is of interest, namely boot.art. With the above script and some user interaction to trigger onStop and onDestroy manually, it is possible to identify objects that are only accessed during these late lifecycle events. In my case, I found boot.art + 0x215f0, which turned out to be GRANDFATHERED . One approach to figure out the identity of an object in memory is to set an object’s klass_ pointer to null and observe the stack trace in the crash dump via logcat. GRANDFATHERED is defined in LanguageTag like so:

class LanguageTag {
    ...
    private static final Map<String, String[]> GRANDFATHERED = new HashMap<>();
    ...
    public static LanguageTag parse(String languageTag, ParseStatus sts) {
        ...
        String[] gfmap = GRANDFATHERED.get(LocaleUtils.toLowerString(languageTag));
        ...
    }
}

Apparently, LanguageTag.parse is invoked somewhere inside a lifecycle method, most likely onStop, because the app does not need to be terminated to trigger this method invocation. To generalize this approach, if we are able to trigger an observable event when the object of interest is used, then we can emit behaviour that specifically triggers execution of lifecycle methods. For example, onDestroy is expected to only be called on app termination. Thus, if the object is used when the app is moved to the back, then it may be that the object is used in onStop. Of course, we would have to come up with a sophisticated set of experiments to precisely determine the lifecycle method that uses an object of interest!

Unfortunately, this story must be put on hold until the post on bytecode reuse attacks, as we only want to cover the basics in this post ;)

Summary

In this blog post, we peeked into various Android fundamentals, always driven by the goal of bytecode - based exploitation. First, bytecode basics are discussed, which lay a foundation for a follow - up discussion on weird behaviour inside bytecode. Then, a more practical view on Android’s fork server architecture is given. All major memory analysis steps used throughout this blog series are discussed.

Next up is the first bytecode - based exploitation technique: bytecode injection!