Bytecode Reuse Attack (Part 4)

As last blog post on bytecode - based exploitation on Android, the next step following bytecode injection is discussed, namely: bytecode reuse.

To answer the question about why an attacker needs bytecode reuse, although there already is bytecode injection, remember the arms race in (binary) exploitation. In a nutshell, a new exploitation technique triggers a reaction in form of at least one security mechanism that (partially) mitigates the new technique. If only bytecode injection was researched, then the best response would be the development of a new security mechanism that prevents nterp from executing arbitrary data. In other words, nterp would be restricted to executable code, i.e. bytecode. To be honest, every developer would respond with such a fix, myself included! However, bytecode injection is not the full potential of bytecode - based exploitation.

Therefore, the core idea is to provide enough information on bytecode - based exploitation to be able to understand its implications on security and maybe design fitting mitigations. In terms of the below visualization, instead of filling in the left mountain one by one, providing more research results on bytecode - based exploitation may enable the creation of a batch of security mechanisms. Notice that the below illustration shows a kind of security deception: the security level of an app in the presence of a memory error is the minimum of the security levels of all interpreters. As nterp is not protected at all except by side effects of e.g. ASLR, the presence of strong native - level security mechanisms may give a false sense of security.

Android Security Gap

Later on, after the bytecode reuse attack is somewhat understood, a few mitigation attempts are discussed. However, practical mitigations are yet to be found!

Before we delve into bytecode reuse attacks on Android, just a heads up:

Disclaimer: Bytecode reuse is the most complicated exploitation technique in this series of blog posts. To derive it, we draw from various fields in offensive security. Hence, this post is very technical and one of the harder posts to digest.

Assumptions

For simplicity, we assume an attacker is able to trick a victim into installing and runnning an unprivileged app. I.e. in this blog post, next to bytecode reuse, the potential security impact of Android’s fork server architecture is investigated via a local attacker. While a successful remote attacker shows a much greater impact than a successful local attacker, according to other research the installation of some arbitrary, potentially malicious app is no unrealistic assumption!

Again, for simplicity, the local attacker is represented by a “simple” python script that emulates app interactions via socket communication. Therefore, when working with fork server - related information leaks, instead of writing an app that parses its own process image, the python script is simply given addresses taken from gdb. This requires caution to not use any app - specific addresses. In order to ensure that only fork server - related leaks are used by the script, the attack must be successful over multiple app restarts!

For the same reasons discussed for bytecode injection, a vulnerable app is created. However, the app itself does not provide any information leaks, but only the ability to repeatedly invoke a write - what - where (WWW) condition. I.e. an attacker is able to directly specify a value and an address to write the value to. The goal then is to derive a generic exploitation technique that reuses existing bytecode in the target app. Again, a WWW, while a very strong primitive, is only an example vulnerability to ease testing and demonstrating different attacks. It does not yield any benefit to artificially construct a complicated vulnerability! In fact, it would make the ~1000 LoC PoC for the WWW even longer and harder to read (although a python guru would most likely be able to squeeze my 1000 LoC into 10).

Below is the vulnerable native function accessible to an attacker.

extern "C"
JNIEXPORT void JNICALL
Java_com_poc_poc_1local_MainActivity_www(
        JNIEnv *env,
        jclass clazz, 
        jlong address,
        jlong value) __attribute__ ((optnone)) {
    *(uint64_t*)address = (uint64_t)value;
}

Next, as usual, a few basics must be discussed.

Necessary Groundwork

Luckily, the majority of basics is covered in a previous blog post. Therefore, the only mechanism left to understand is invoke-interface. Although we all love reading through tons of source code, I use a numbering scheme of the form [1], [2] etc. to mark points of interest in code. After the source code listings, these markings are summarized and discussed, so there is no need to fully read all source code snippets!

Invoking Interface Methods

The bytecode instruction invoke-interface is used in scenarios where polymorphism makes resolving the concrete method to call complicated. As the name suggests, this bytecode instruction can be utilized to invoke implementations of interface methods.

Because different types can implement the same interface, invoke-interface must be using a type - agnostic mechanism to resolve the method to invoke. This motivates to look further into the implementation of the bytecode instruction iteself, to identify how invoke-interface accesses objects and associated classes during method resolution and invocation. Remember that an attacker is able to inject data into the target process, including fake objects and classes.

Generic Analysis

invoke-interface is fully defined by the assembly function invoke_interface. The code of invoke_interface reveals interesting properties about interface method resolution! Before delving into the details, lets build an example

interface Logger {
	void log(String message);
}

class FileLogger implements Logger {
	@Override
	public void log(String message) {/*...*/}
}

class ConsoleLogger implements Logger {
	@Override
	public void log(String message) {/*...*/}
}

class Test {
	public static void main(String[] args) {
		Logger[] loggers = new Logger[] {
			new FileLogger(), // =: fl
			new ConsoleLogger() // =: cl
		};
		for (Logger logger : loggers) {
			logger.log("Test123");
		}
	}
}

In the following, fl and cl denote their respective instance of FileLogger and ConsoleLogger in the context of the above Java code example.

Caching Interface Method

First of all, invoke-interface starts off with a fast - path for interface method resolution. I.e. for finding the ArtMethod* representing the abstract method declared inside an interface. The caching mechanism can be seen below.

%def fetch_from_thread_cache(dest_reg, miss_label):
    add      ip, xSELF, #THREAD_INTERPRETER_CACHE_OFFSET       // cache address
    ubfx     ip2, xPC, #2, #THREAD_INTERPRETER_CACHE_SIZE_LOG2  // entry index
    add      ip, ip, ip2, lsl #4            // entry address within the cache
    ldp      ip, ${dest_reg}, [ip]          // entry key (pc) and value (offset)
    cmp      ip, xPC
    b.ne     ${miss_label}

Notice that the cache uses the current dex program counter xPC to derive the cache set (of size 1). If the entry in that cache set matches the xPC, then the corresponding value is loaded, i.e. the ArtMethod*.

Therefore, it cannot be that the cached method represents a concrete implementation of the interface method. Consider the above example code. Invoking fl.log initially causes a cache miss, which triggers method resolution via nterp_get_method , i.e. NterpGetMethod . Thus, if the concrete implementation FileLogger::log was cached instead of a generic representation of Logger::log, then the second iteration trying to run cl.log would wind up calling fl.log again, because the current dex program counter causes a cache hit and thus triggers the fast - path, fully avoiding another method resolution. All of this implies that whatever is returned by NterpGetMethod must represent the abstract method declared in the interface, e.g. in Logger.

Method Resolution Via NterpGetMethod

The first time a specific invoke-interface is executed during execution of an app will always cause a cache miss, unless the cache is initialized with some methods. NterpGetMethod is then used to find the corresponding ArtMethod* or an encoded version of the method index referencing the abstract method relative to a declaring .dex file.

Consider the stripped version of NterpGetMethod:

size_t NterpGetMethod(Thread *self, ArtMethod *caller, uint16_t *dex_pc_ptr)
    REQUIRES_SHARED(Locks::mutator_lock_)
{
	// [1]
    UpdateHotness(caller);
    const Instruction *inst = Instruction::At(dex_pc_ptr);
    InvokeType invoke_type = kStatic;
    uint16_t method_index = 0;
    switch (inst->Opcode())
    {
    // ...
    case Instruction::INVOKE_INTERFACE:
    {
        method_index = inst->VRegB_35c();
        invoke_type = kInterface;
        break;
    }
    // ...
    default:
        LOG(FATAL) << "Unknown instruction " << inst->Opcode();
    }
    ClassLinker *const class_linker = Runtime::Current()->GetClassLinker();
    /**
     * SkipAccessChecks() is a flag in the caller's `access_flags_` field.
     * Apparently access checks are usually done for native, i.e. not 
     * interpreted, code.
     *
     * Either class_linker->ResolveMethod find the ArtMethod* in the DexCache, or
	 * performs a manual resolution using the underlying .dex file. ASSUMING THE 
	 * LATTER, BECAUSE THIS CLEARLY SHOWS WHAT METHOD IS RETURNED.
     */
     // [2]
    ArtMethod *resolved_method = caller->SkipAccessChecks()
        ? class_linker->ResolveMethod<ClassLinker::ResolveMode::kNoChecks>(
                                    self, method_index, caller, invoke_type)
		:class_linker->ResolveMethod<ClassLinker::ResolveMode::kCheckICCEAndIAE>(
									self, method_index, caller, invoke_type);
    if (resolved_method == nullptr)
    {
        DCHECK(self->IsExceptionPending());
        return 0;
    }
    if (invoke_type == kSuper)
    { /*...*/
    }
    if (invoke_type == kInterface)
    {
        size_t result = 0u;
        if (resolved_method->GetDeclaringClass()->IsObjectClass())
        {
            /**
             * If declaring class is java.lang.Object:
             */
            // Set the low bit to notify the interpreter it should do a vtable 
            // call.
            DCHECK_LT(resolved_method->GetMethodIndex(), 0x10000);
            result = (resolved_method->GetMethodIndex() << 16) | 1U;
        }
        else
        {
            DCHECK(resolved_method->GetDeclaringClass()->IsInterface());
            DCHECK(!resolved_method->IsCopied());
            if (!resolved_method->IsAbstract())
            {
                /**
                 * If declaring class is any non - abstract class:
                 */
                // Set the second bit to notify the interpreter this is a default
                // method.
                result = reinterpret_cast<size_t>(resolved_method) | 2U;
            }
            else
            {
                /**
                 * If declaring class is abstract class (may still provide 
                 * definition):
                 */
                 //[3]
                result = reinterpret_cast<size_t>(resolved_method);
            }
        }
        UpdateCache(self, dex_pc_ptr, result);
        return result;
    }
    else if (resolved_method->GetDeclaringClass()->IsStringClass() && !resolved_method->IsStatic() && resolved_method->IsConstructor()){ /*...*/}
    else if (invoke_type == kVirtual){ /*...*/}
    else{ /*...*/}
}

Above code is still complex, so consider the following explanation:

  1. General information is extracted from bytecode. Most importantly, this is where a method invocation is classified as e.g. kInterface, i.e. an invoke-interface. The method_index refers to the method_id_item inside the declaring .dex file.
  2. Based on the method_index, the ClassLinker is utilized to resolve the corresponding ArtMethod*, i.e. the ART representation of a Java method. Interestingly, the calling method caller dictates whether validation checks are performed or not via its caller->access_flags_ field. Invoking bytecode in a “benign way” should not trigger such checks.
  3. After a matching ArtMethod* has been found, its declaring_class is checked. This dictates whether the returned value is either a valid ArtMethod*, or encoded variant of either an ArtMethod* or a method_index. Under the assumption that the resolved_method represents an abstract interface method, its declaring_class field should be the declaring interface, e.g. Logger, which is neither java.lang.Object nor a concrete class. Hence, an ArtMethod* is returned as is, without any encoding.

Following class_linker->ResolveMethod gives insights into where ART searches for the ArtMethod*.

ClassLinker - based Method Resolution

Consider the following reduced implementation of class_linker->ResolveMethod :

inline ArtMethod *ClassLinker::ResolveMethod(Thread *self,
                                             uint32_t method_idx,
                                             ArtMethod *referrer,
                                             InvokeType type)
{
    // ...

    /**
     * Below comment implies that there exists an array of ArtMethods pointing
     * to native methods that are resolved at app startup.
     */
    // We do not need the read barrier for getting the DexCache for the initial 
    // resolved method
    // lookup as both from-space and to-space copies point to the same native 
    // resolved methods array.
    // [1]
    ArtMethod *resolved_method = 
	    referrer->GetDexCache<kWithoutReadBarrier>()->GetResolvedMethod(
	        method_idx);

    /**
     * Method resolution using fast path failed, so resolve method manually.
     */
     // [2]
    if (UNLIKELY(resolved_method == nullptr))
    {
        referrer = referrer->GetInterfaceMethodIfProxy(image_pointer_size_);
        ObjPtr<mirror::Class> declaring_class = referrer->GetDeclaringClass();
        StackHandleScope<2> hs(self);
        Handle<mirror::DexCache> h_dex_cache(
	        hs.NewHandle(referrer->GetDexCache()));
        Handle<mirror::ClassLoader> h_class_loader(
	        hs.NewHandle(declaring_class->GetClassLoader()));
        resolved_method = ResolveMethod<kResolveMode>(method_idx,
                                                      h_dex_cache,
                                                      h_class_loader,
                                                      referrer,
                                                      type);
    }
    // ...
    // Note: We cannot check here to see whether we added the method to the 
    // cache. It might be an erroneous class, which results in it being hidden 
    // from us.
    // [3]
    return resolved_method;
}

The above code does the following:

  1. Tries to find the ArtMethod* in an associated DexCache. Notice that upon app creation, Zygote already sets up a DexCache, which means that all apps on the device know the address of at least one DexCache instance. In this case, it is assumed that we get a cache miss to further investigate how method resolution works. In reality, this may already mark the end of method resolution in case of a cache hit.
  2. In case of a cache miss the method is resolved by method_idx, which was taken from the fixed parameter of invoke-interface opcode.
  3. Finally, the ArtMethod* is returned. However, the comment states that resolving a method manually may bypass validation checks. This will not be a problem, if the checks are skipped anyways.

Now comes the most interesting part of the method resolution, namely another ResolveMethod implementation:

inline ArtMethod *ClassLinker::ResolveMethod(uint32_t method_idx,
									 Handle<mirror::DexCache> dex_cache,
									 Handle<mirror::ClassLoader> class_loader,
									 ArtMethod *referrer,
									 InvokeType type)
{
    // Check for hit in the dex cache.
    ArtMethod *resolved = dex_cache->GetResolvedMethod(method_idx);
    bool valid_dex_cache_method = resolved != nullptr; // = false
    if (kResolveMode == ResolveMode::kNoChecks && valid_dex_cache_method)
    { /*...*/}

    /**
     * Uses .dex file to resolve the method_id of the method to be invoked. A 
     * method_id consists of a `class_idx`, `proto_idx` and `name_idx`. For an 
     * interface method, `class_idx` is expected to refer (within .dex file) to 
     * an interface.
     *
     * Interestingly, to fake an interface method invocation, the only thing that
     * needs to be "not - faked" is the method_idx used to identify the 
     * method_id.
	 * For example, to fake a call to UiAutomation::executeShellCommand, it 
	 * suffices to know its method_idx (and DexCache). From there, the type is 
	 * inferred by the method_id data to be UiAutomation. In other words, when 
	 * creating a fake ArtMethod object, its declaring class must reference a 
	 * correct DexCache (shared by Zygote) and its method_idx must describe the 
	 * correct method.
     */
    const DexFile &dex_file = *dex_cache->GetDexFile();
    const dex::MethodId &method_id = dex_file.GetMethodId(method_idx);
    ObjPtr<mirror::Class> klass = nullptr;
    if (valid_dex_cache_method) { /*...*/}
    else
    {
	    // [1]
        // The method was not in the DexCache, resolve the declaring class.
        klass = ResolveType(method_id.class_idx_, dex_cache, class_loader);
        if (klass == nullptr)
        {
            /*...*/
            return nullptr;
        }
    }
    /*...*/
    if (!valid_dex_cache_method)
    {
	    // [2]
        resolved = FindResolvedMethod(
				        klass, dex_cache.Get(), class_loader.Get(), method_idx);
    }
    /*...*/

    // If we found a method, check for incompatible class changes.
    // [3]
    if (LIKELY(resolved != nullptr) &&
        LIKELY(kResolveMode == ResolveMode::kNoChecks ||
               !resolved->CheckIncompatibleClassChange(type)))
    {
        return resolved;
    }
    else
    {
        // If we had a method, or if we can find one with another lookup type,
        // it's an incompatible-class-change error.
        /*...*/
        return nullptr;
    }
}

Again, consider the following explanations:

  1. If the method is not part of the DexCache, the class associated with the method referenced by method_idx will be resolved. This is where .dex is explicitly used to extract the method_id_item.
  2. Then, the ArtMethod* is resolved based on that class and method_idx.
  3. With a resolved method at hand, some mandatory validation checks are performed before the method is returned.

Skipping some intermediate methods, eventually FindInterfaceMethodWithSignature is invoked:

static inline ArtMethod *FindInterfaceMethodWithSignature(ObjPtr<Class> klass,
                                                          std::string_view name,
                                                          const SignatureType &signature,
                                                          PointerSize pointer_size)
    REQUIRES_SHARED(Locks::mutator_lock_)
{
    // If the current class is not an interface, skip the search of its declared 
    // methods; such lookup is used only to distinguish between 
    // IncompatibleClassChangeError and NoSuchMethodError and the caller has 
    // already tried to search methods in the class.
    // [1]
    if (LIKELY(klass->IsInterface()))
    {
        // Search declared methods, both direct and virtual.
        // (This lookup is used also for invoke-static on interface classes.)
        for (ArtMethod &method : klass->GetDeclaredMethodsSlice(pointer_size))
        {
            if (method.GetNameView() == name &&
                method.GetSignature() == signature)
            {
                return &method;
            }
        }
    }
    // TODO: If there is a unique maximally-specific non-abstract superinterface 
    // method, we should return it, otherwise an arbitrary one can be returned.

    /**
     * Check all interfaces specified in iftable of the class. This gives the 
     * ArtMethod of the interface method, not its concrete implementation! For 
     * an invoke-interface opcode, `klass` is currently a class that provides a 
     * concrete implementation.
     * Thus `klass` skips the above `klass->IsInterface()` check and its iftable 
     * is read.
     * THEREFORE, IFTABLE OF A CLASS SPECIFIES IMPLEMENTED INTERFACES.
     */
    // [2]
    ObjPtr<IfTable> iftable = klass->GetIfTable();
    for (int32_t i = 0, iftable_count = iftable->Count(); i < iftable_count; ++i)
    {
        ObjPtr<Class> iface = iftable->GetInterface(i);
        for (ArtMethod &method : iface->GetVirtualMethodsSlice(pointer_size))
        {
            if (method.GetNameView() == name && 
                method.GetSignature() == signature)
            {
                return &method;
            }
        }
    }
    /*...Check super classes and java.lang.Object, or fail and return nullptr*/
    return nullptr;
}

In a nutshell, method resolution boils down to iterating over the klass->iftable_ field and checking all methods of all implemented interfaces for a matching method signature. To that end:

  1. Initially, the klass is still a class with the concrete implementation of the interface method. For example, this could still be FileLogger or ConsoleLogger, but not Logger.
  2. Classes that implement an interface use an interface table that describes where to find the concrete implementations of an implemented interface. This table is enumerated and each interface is checked for whether it declares a method that matches the signature of the method to be invoked by invoke-interface. Whatever method matches (first; although signatures should be unique) is returned.

This concludes the quick dive into method resolution. Overall, executing invoke-interface tries to find the abstract method declared in the interface based on the method index that is a fixed operand of the invoke-interface opcode. Eventually, whatever is returned from NterpGetMethod is cached, so that future executions that pass the dex program counter of the invoke-interface can take the fast path.

Interface Method Tables

Going back to the implementation of invoke-interface, the following code remains to be understood:

/**
    * At this point, x26 is either
    * - ArtMethod* describing the declared method to be invoked, or
    * - Encoded method index
    */
    // First argument is the 'this' pointer.
    /**
    * w1=index of this-register
    */
    FETCH w1, 2
    .if !$range
    and w1, w1, #0xf
    .endif
    /**
    * w1=this
    */
    GET_VREG w1, w1
    // Note: if w1 is null, this will be handled by our SIGSEGV handler.

    /**
    * w1=Class object of this
    */
    // ============[1]
    ldr w2, [x1, #MIRROR_OBJECT_CLASS_OFFSET]
    // Test the first two bits of the fetched ArtMethod:
    // - If the first bit is set, this is a method on j.l.Object
    // - If the second bit is set, this is a default method.
    /**
    * Implicit assumption that ArtMethod* are 4-byte aligned.
    */
    tst w26, #0x3
    b.ne 3f

    /**
    * Case: Non - abstract class, but not java.lang.Object.
    * Query w3=imt_index_ from ArtMethod* of the interface method (abstract 
    * method).
    */
    // ============[2]
    ldrh w3, [x26, #ART_METHOD_IMT_INDEX_OFFSET]
2:
    /**
    * Use first entry of embedded vtable, i.e. Interface Method Table pointer
    * with imt_index_ to select the concrete implementation of the interface 
    * method.
    */
    // ============[3]
    ldr x2, [x2, #MIRROR_CLASS_IMT_PTR_OFFSET_64]
    ldr x0, [x2, w3, uxtw #3]

    /**
    * x0 holds concrete implementation, i.e. an ArtMethod*
    */
    .if $range
    b NterpCommonInvokeInterfaceRange
    .else
    // ============[4]
    b NterpCommonInvokeInterface
    .endif

Although all parts are relevant, the following main steps are taken:

  1. w1 is equal to this, i.e. the pointer to the object used for invocation. E.g. in fl.log("Test") that would be fl, not FileLogger, not Logger and also not log! Basically, w1 contains the receiving object. Then w2 contains the pointer to the class of w1, i.e. a mirror::Class*. Furthermore, the least - significant 2 bits of the resolved ArtMethod* (or encoded method index) are checked. This stems from NterpGetMethod, which is assumed to have simply returned a not - encoded ArtMethod*. Hence, the branch is not taken.
  2. Next, the resolved_method->imt_index_ is extracted into w3. This selects the concrete implementation of the resolved method.
  3. Then, the actual ImTable* is read from the first entry of the embedded vtable. Using the imt_index_ scaled by the size of a ArtMethod*, x0 is set to be the concrete implementation of the resolved abstract method.
  4. Finally, NterpCommonInvokeInterface is used to invoke the concrete implementation.

Observe that the concrete method invocation is basically just a lookup in the ImTable*, which is similar to a vtable in C++ and stored inside the embedded vtable of the Class. A Class is referenced by an object. Therefore, if an attacker controls an object, then the attacker can also reference a fake Class and thus a fake ImTable*. Overall, hijacking an object gives an attacker control over what methods are invoked during invoke-interface!

This of course requires knowledge on some internal values, among which reside:

  • class_idx of the class providing the concrete implementation. It should not be possible to inject a custom class with invalid class_idx, because either type resolution goes through DexCache or the associated .dex file. If neither contain the type, an error is raised.
  • Valid iftable that states that a particular interface is implemented by the fake class. Technically, an iftable will not be needed if the fast - path is taken or the method_index can be resolved by the DexCache.
  • Valid embedded_vtable, which references an ImTable in its first entry. It is possible to overlap the following vtable entries and the ImTable.

As a rule of thumb:

When creating a fake object and class, try to build as valid structures as feasible.

In other words, no need to be fancy with complex overlapping pointers etc, because the probability that some code inside the enormous ART code base validates or tries to work with the fake structures is pretty high (keyword: garbage collector). The exception to this rule of thumb is overlapping the ImTable with the embedded vtable, because it is an easy and almost foolproof way to save some space.

Goal

With the theory out of the way, we again settle for arbitrary command execution in the context of the vulnerable app. For simplicity, the goal is to eventually invoke Runtime.getRuntime().exec("<command>"). However, it is forbidden to inject bytecode into the target process. It is only allowed to either inject data, i.e. objects, classes and more. Also, only existing bytecode may be reused. Trivially, using native techniques as intermediate step to gain bytecode execution is also forbidden.

Core Idea

As mentioned before, bytecode reuse draws from various fields of offensive security. To be precise, we use

  • Counterfeit Object Oriented Programming (COOP) : An exploitation technique for memory errors in C++ programs that is based on fake object injection and vtable pointer manipulation.
  • Insecure Deserialization: A vulnerability that allows an attacker to determine the data to be deserialized by the target app. For this post, the idea of gadget chains is critical.

Without diving into all the rabbit holes I found myself in during research, the overall idea is to identify a good sequence of invoke-interface bytecode instructions. For example, the chain could look like this.objA.funcA(this.objB). It is important to note that the “surrounding” object represented by this is controlled by an attacker. Thus, an attacker also controls objA and objB. If an attacker controls objA, it may be possible to control what function is invoked. This is due to the fact that an attacker can choose the composition of an ImTable of a fake object.

COOP vs. Gadget Chains

For those interested in or familiar with COOP, the original approach using a main loop gadget does not work well with bytecode. The most limiting factor is passing arguments from one gadget to another. Consider the following setup:

interface Observer {
    void invoke(Object data);
}
...
Observer[] observers = ...;
...
for (Observer o : observers) {
    o.invoke(...);
}

Basically, the o.invoke invocation internally uses an invoke-interface bytecode instruction. This means an attacker can inject an array of fake objects that provide their own implementations of Observer::invoke through their fake class definitions. Now, being able to execute an arbitrary list of methods will be useful, if either the methods do not need to cooperate or use a shared object to pass (intermediate) results. Unfortunately, the approach of using spilled hardware registers or the stack to pass data between gadgets is not (easily) applicable to bytecode. Also using vregs and vrefs does not work, because these are cleared when nterp sets up the execution environment for a method. Therefore, at best, there is a global object referenced by the methods invoked via fake objects, or the object passed as parameter is usable in some way. Notice that both approaches drastically restrict the set of available gadgets. Being able to invoke a sequence of methods that is semantically equivalent to System.exec("<command>") seems like a daunting and impossible task.

This is the reason why we abstract away from the structure used in COOP attacks shown above, i.e. the for - loop over a fake object array. Observe that every piece of Java code that uses invoke-interface is a potential structure, including this.objA.funcA(this.objB), which could translate to this.shell.executeShellCommand(this.commandString). Again, notice the combination of COOP and gadget chains from insecure deserialization: invoke-interface uses the IMT of a class to determine the interface method implementation to invoke, and the structure gives the framework or layout to be adhered to.

High - Level Solution

To reach the goal, the structure this.objA.func(this.objB) is used. An attentive reader may realize that the structure does not match Runtime.getRuntime().exec("<command>"). In order to make them match, it would be required that objA = <Runtime instance>. Unfortunately, we cannot assume that the location of a Java object in memory remains the same across all apps forked from zygote64, due to garbage collection. Creating a fake runtime is also infeasible due to the complexity and relevance of that object. However, it may be possible to create a fake object that provides a method, which eventually triggers execution of Runtime.getRuntime().exec("<command>"), where the command string is also controllable.

Without showing the time - consuming search for candidate gadgets, which has been supported by some static analysis of .dex files of framework.jar available in every app using a modified version of Topper, the classes of interest are VirtualKeyboard , UiAutomation and String .

Lets break down the overall approach. First of all, VirtualKeyboard::close provides the structure:

@Override
@RequiresPermission(android.Manifest.permission.CREATE_VIRTUAL_DEVICE)
public void close() {
    try {
        // this.objA.funcA(this.objB)
        mVirtualDevice.unregisterInputDevice(mToken);
    } catch (RemoteException e) {
        throw e.rethrowFromSystemServer();
    }
}

Mapping the structure to variable names yields:

  • objA = mVirtualDevice
  • funcA = unregisterInputDevice
  • mToken = objB

Now, one might argue that objects are strictly typed and thus cannot be changed to different types, even at runtime. To that end, consider the bytecode of VirtualKeyboard::close below.

[Index = 0xe8da, Offset = 0x4e85a8, Num Regs = 0x3]: public void VirtualKeyboard::close()
    0000: IGET_OBJECT v0, v2, FIELD:VirtualKeyboard;->mVirtualDevice:IVirtualDevice;
    0004: IGET_OBJECT v1, v2, FIELD:VirtualKeyboard;->mToken:IBinder;
    0008: INVOKE_INTERFACE {v0, v1}, METHOD:IVirtualDevice;->unregisterInputDevice(IBinder;)V
    000e: NOP 
    0010: RETURN_VOID 
    0012: MOVE_EXCEPTION v0
    0014: INVOKE_VIRTUAL {v0}, METHOD:RemoteException;->rethrowFromSystemServer()RuntimeException;
    001a: MOVE_RESULT_OBJECT v1
    001c: THROW v1

Note: The annotations Override and RequiresPermission do not seem to be enforced at runtime or impact method invocation in any way, which seems to align with the definition .

Unless VirtualKeyboard::close throws an exception, the method really consists of only 4 relevant instructions:

0000: IGET_OBJECT v0, v2, FIELD:VirtualKeyboard;->mVirtualDevice:IVirtualDevice;
0004: IGET_OBJECT v1, v2, FIELD:VirtualKeyboard;->mToken:IBinder;
0008: INVOKE_INTERFACE {v0, v1}, METHOD:IVirtualDevice;->unregisterInputDevice(IBinder;)V
0010: RETURN_VOID 

Notice that iget-object vA, vB, field@CCCC does exactly as the name suggests: move the field with index CCCC of object referenced by vB into vreg vA. The last spark of hope is that iget-object checks the type of the fields it operates on. From fundamentals we get that method resolution using invoke-interface does not really care about the type of the involved objects, but rather only looks at the ImTable of the class of the receiving object. Now, lets rip apart the illusion of type checks at runtime by considering the implementation of iget-object :

%def op_iget_object():
%  op_iget(load="ldr", volatile_load="ldar", maybe_extend="", wide="0", is_object="1")

%def op_iget(load="ldr", volatile_load="ldar", maybe_extend="", wide="0", is_object="0"):
%  slow_path = add_slow_path(op_iget_slow_path, volatile_load, maybe_extend, wide, is_object)
%  fetch_from_thread_cache("x0", miss_label=slow_path)
.L${opcode}_resume:
   lsr     w2, wINST, #12              // w2<- B
   GET_VREG w3, w2                     // w3<- object we're operating on
   ubfx    w2, wINST, #8, #4           // w2<- A
   cbz     w3, common_errNullObject    // object was null
   .if $wide
   $load   x0, [x3, x0]
   SET_VREG_WIDE x0, w2                // fp[A] <- value
   .elseif $is_object                  // ===================[1]
   $load   w0, [x3, x0]                // ===================[2]
   TEST_IF_MARKING .L${opcode}_read_barrier
.L${opcode}_resume_after_read_barrier:
   SET_VREG_OBJECT w0, w2              // fp[A] <- value
   .else
   $load   w0, [x3, x0]
   SET_VREG w0, w2                     // fp[A] <- value
   .endif
   FETCH_ADVANCE_INST 2                // ===================[3]
   GET_INST_OPCODE ip
   GOTO_OPCODE ip
   .if $is_object
.L${opcode}_read_barrier:
   bl      art_quick_read_barrier_mark_reg00
   b       .L${opcode}_resume_after_read_barrier
   .endif

While there is nothing more refreshing than reading arm assembly mixed with custom macros, whose definitions are sprinkled over various files, below is the short version:

  1. After passing the caching mechanism, which sets x0 to the field offset in memory and is also used in invoke-interface, it is checked what kind of field is moved from the object in vB to vA. The code distinguishes between wide types like long and double, objects and the rest. As iget-object sets is_object = 1 when calling into op_iget, the object path is taken.
  2. Access the field at offset x0 relative to the base of object referenced by vB. This loads a 32-bit address into w0. Observe that x0 >= 8, because objects have predefined klass_ and monitor_ fields.
  3. Continue with the next instruction.

The above code only tells half of the story, because a cache miss means the ArtField must be resolved using the slow path through nterp_get_instance_field_offset . Notice that even if the slow path performed type checks, any repeated execution of the iget-object instructions in VirtualKeyboard::close would use the fast path, unless their cache entries are evicted. Of course, it may be sufficient to only check the type in the slow path and then assume its correctness in the fast path.

Continuing with the high - level approach, the structure provided by VirtualKeyboard::close is very dynamic at runtime, allowing an attacker to replace not only the objects but also their classes and thus their invoked methods. Now, looking into UiAutomation reveals the reason for why VirtualKeyboard::close is a suitable candidate for shell invocation :

public ParcelFileDescriptor executeShellCommand(String command) {
    warnIfBetterCommand(command);
    ParcelFileDescriptor source = null;
    ParcelFileDescriptor sink = null;
    try {
        ParcelFileDescriptor[] pipe = ParcelFileDescriptor.createPipe();
        source = pipe[0];
        sink = pipe[1];
        // Calling out without a lock held.
        mUiAutomationConnection.executeShellCommand(command, sink, null);
    } catch (IOException ioe) {
        Log.e(LOG_TAG, "Error executing shell command!", ioe);
    } catch (RemoteException re) {
        Log.e(LOG_TAG, "Error executing shell command!", re);
    } finally {
        IoUtils.closeQuietly(sink);
    }
    return source;
}

Surprisingly, UiAutomation provides a convenience method for invoking shell commands. However, in comparison to Runtime.getRuntime().exec, it is a lot easier to create a fake UiAutomation object than it is to create a fake Runtime instance.

The key component in the above Java code is mUiAutomationConnection.executeShellCommand(command, sink, null). Without showing the entire call stack, eventually, the following code is called:

public void executeShellCommandWithStderr(final String command, final ParcelFileDescriptor sink,
        final ParcelFileDescriptor source, final ParcelFileDescriptor stderrSink)
        throws RemoteException {
    synchronized (mLock) {
        throwIfCalledByNotTrustedUidLocked();
        throwIfShutdownLocked();
        throwIfNotConnectedLocked();
    }
    final java.lang.Process process;
    try {
        process = Runtime.getRuntime().exec(command);
    } catch (IOException exc) {
        throw new RuntimeException("Error running shell command '" + command + "'", exc);
    }
    ...
}

Basically, if an attacker is able to pass the methods throwIfCalledByNotTrustedUidLocked, throwIfShutdownLocked and throwIfNotConnectedLocked without crashing the app, then an attacker - chosen command will be executed. For simplicity, we do not care what happens after command execution. I.e. crashing the app after successful command execution is enough to prove that bytecode reuse attacks are possible.

Without further ado, consider the critical methods :

private void throwIfShutdownLocked() {
    if (mIsShutdown) {
        throw new IllegalStateException("Connection shutdown!");
    }
}
private void throwIfNotConnectedLocked() {
    if (!isConnectedLocked()) { // Returns: this.mClient != null
        throw new IllegalStateException("Not connected!");
    }
}
private void throwIfCalledByNotTrustedUidLocked() {
    final int callingUid = Binder.getCallingUid();
    if (callingUid != mOwningUid && mOwningUid != Process.SYSTEM_UID
            && callingUid != 0 /*root*/) {
        throw new SecurityException("Calling from not trusted UID!");
    }
}

Bypassing these checks is trivial, because the mUiAutomationConnection object of UiAutomation is also attacker - controlled. Therefore, setting the fields appropriately allows passing the checks. For example, throwIfNotConnectedLocked tries to enforce that this.mClient != null before the command is executed. Internally, this simply compares the field value in the mUiAutomationConnection object with 0. This means that setting mClient = 1 bypasses the check, although this is not a valid reference.

Lets conclude with a visualization of the entire high - level approach. First of all, we start off with the correct structure. Method invocation works with the correct objects and classes.

Original structure and method invocation

However, after objects have been replaced with their fake counterparts, the invocation looks like can be seen below. Observe that the overall structure remains the same, only objects, classes and associated method implementations change.

Modified objects in original structure

As the latter image already shows, when looking for the method to invoke via invoke-interface, nterp uses the fake class and eventually the .dex file associated with that class. After the abstract method has been resolved, the abstract method’s imt_index_ field is used to choose the correct ImTable entry. Therefore, the ImTable may be shrinked to only account for the lookup of the entry at index imt_index_. Because nterp uses whatever method is found in the ImTable, invocation of executeShellCommand is inevitable.

Long story short, not only is an attacker able to inject fake objects, classes, ArtMethods, ArtFields and tables, but it is also possible to force existing bytecode to operate on those fake structures. In a nutshell, an attacker can convert the WWW condition into a type confusion and trick the interpreter to work with custom objects, causing invocation of arbitrary methods at runtime (of course, method signatures should match).

Kicking Off Execution

Building on the blog posts covering Android basics for bytecode exploitation, the GRANDFATHERED map can be used to kick off execution of the chain that executes a shell command. To that end, reconsider the following code:

class LanguageTag {
    ...
    private static final Map<String, String[]> GRANDFATHERED = new HashMap<>();
    ...
    public static LanguageTag parse(String languageTag, ParseStatus sts) {
        ...
        String[] gfmap = GRANDFATHERED.get(LocaleUtils.toLowerString(languageTag));
        ...
    }
}

Observe that the invocation GRANDFATHERED.get uses invoke-interface again, because GRANDFATHERED is a HashMap, but ::get is declared in Map interface. Therefore, using a similar approach as discussed above, GRANDFATHERED can be replaced with an instance of VirtualKeyboard, and the invocation of ::get can be redirected to VirtualKeyboard::close. Setting up the VirtualKeyboard instance to contain instances of UiAutomation and String for mVirtualDevice and mToken, respectively, allows kicking off command execution. What is more is that LanguageTag::parse is most likely called inside a lifecycle method like onStop, which guarantees execution of the gadget chains.

Because GRANDFATHERED seems to be located inside the boot.art memory region, which again seems to be shared by all maps, it is a relatively stable target to abuse in the test environment.

Proof of Concept

The concrete PoC code is about 1000 LoC, because we need to respect structures like mirror::Class etc. Encoding these structures in Python bloats up the PoC. However, the quintessence is exactly what is discussed in the above sections. For a visual proof, consider the following PoC video.

It is important to note that the memory region, in which fake objects are built up using the WWW condition, must be in a 32 - bit address range, because references to objects and classes must be 32 - bit addresses. In case of the above video, that region is [anon:.bss]. The other memory regions are used to reference existing bytecode (framework.jar), reference the interpreter handler ExecuteNterpImpl to construct valid ArtMethod instances (libart.so), kick off gadget chain execution via GRANDFATHERED (boot.art) and reference a valid DexCache instance initialized by zygote64 (boot-framework.art). All of these memory regions have been confirmed to be duplicated upon fork in a previous blog post using maps diffing.

Of course, a better PoC would be to construct a malicious Android app that attacks the victim app. However, creating parsers for e.g. boot.art to spot GRANDFATHERED dynamically is considered a lot of busywork and does not show more than the extern python script mimicking a local app.

Potential Solutions

From a security perspective, multiple mitigations come to mind:

  1. Enforce type checks at runtime.
  2. Use a kind of random token (like csrf token) sampled after the app is forked from zygote64. Then, each object holds that token as a field next to monitor_ and klass_. Upon usage of an object, the object’s token is compared to the original random token. If both tokens match, execution will continue. Otherwise, the app is aborted.

Of course, these mitigations do not take into account the performance overhead introduced by all the checks. If every bytecode instruction validated a random token, performance would propably be a lot worse. On the other hand, one may argue that the interpreter started off too greedily as regards performance, and such security checks are legitimate. This is a common tradeoff: security vs. performance. Luckily, the techniques discussed in this series of blog posts are fairly hard to pull off, which severely reduces practicality.

Responsible Disclosure

All research results, including working PoCs, have been submitted to Google’s bug bounty program to ensure that publishing these blog posts does not cause any severe security problems and to give Google time to investigate the findings and respond, if necessary. Of course, there are no concrete vulnerabilities, but rather a new exploitation concept on Android. Also, I find it hard to estimate the practical impact of these blog posts, because many stars must align for bytecode injection and reuse to work, which is why I welcomed the feedback. Fortunately, Google decided the results are not a security concern and gave permission to publish blog posts on that matter!

Summary

This concludes our journey through the land of bytecode - based exploitation on Android! Here, the more advanced bytecode reuse technique is discussed, along with fundamentals necessary to grasp all concepts described. Also, some security mechanisms that immediately come to mind are mentioned without taking into account performance impact.

Naturally, there is a lot more to be discovered about bytecode execution and exploitation on Android. The series of blog posts on Android bytecode is the result of about 1.5 years of part-time research, with some distractions along the way. Hence, the blog posts do not contain everything discovered or tested, but only the most interesting cherries! I stopped counting the rabbit holes I followed that did not provide any results or at best “funny” facts, like e.g. throw - oriented programming.

Overall, I learned that security research on a well - known operating system like Android is similar to walking the corridor in Hilbert’s hotel: infinite options, so you really need to choose the doors you open wisely. Regardless, persistence is key to find something that is interesting, so keep learning, researching and hacking! ;)