Bytecode Reuse Attack (Part 4)
As last blog post on bytecode - based exploitation on Android, the next step following bytecode injection is discussed, namely: bytecode reuse.
To answer the question about why an attacker needs bytecode reuse, although there already is bytecode injection, remember the arms race in (binary) exploitation. In a nutshell, a new exploitation technique triggers a reaction in form of at least one security mechanism that (partially) mitigates the new technique. If only bytecode injection was researched, then the best response would be the development of a new security mechanism that prevents nterp
from executing arbitrary data. In other words, nterp
would be restricted to executable code, i.e. bytecode. To be honest, every developer would respond with such a fix, myself included! However, bytecode injection is not the full potential of bytecode - based exploitation.
Therefore, the core idea is to provide enough information on bytecode - based exploitation to be able to understand its implications on security and maybe design fitting mitigations. In terms of the below visualization, instead of filling in the left mountain one by one, providing more research results on bytecode - based exploitation may enable the creation of a batch of security mechanisms. Notice that the below illustration shows a kind of security deception: the security level of an app in the presence of a memory error is the minimum of the security levels of all interpreters. As nterp
is not protected at all except by side effects of e.g. ASLR, the presence of strong native - level security mechanisms may give a false sense of security.
Later on, after the bytecode reuse attack is somewhat understood, a few mitigation attempts are discussed. However, practical mitigations are yet to be found!
Before we delve into bytecode reuse attacks on Android, just a heads up:
Disclaimer: Bytecode reuse is the most complicated exploitation technique in this series of blog posts. To derive it, we draw from various fields in offensive security. Hence, this post is very technical and one of the harder posts to digest.
Assumptions
For simplicity, we assume an attacker is able to trick a victim into installing and runnning an unprivileged app. I.e. in this blog post, next to bytecode reuse, the potential security impact of Android’s fork server architecture is investigated via a local attacker. While a successful remote attacker shows a much greater impact than a successful local attacker, according to other research the installation of some arbitrary, potentially malicious app is no unrealistic assumption!
Again, for simplicity, the local attacker is represented by a “simple” python script that emulates app interactions via socket communication. Therefore, when working with fork server - related information leaks, instead of writing an app that parses its own process image, the python script is simply given addresses taken from gdb
. This requires caution to not use any app - specific addresses. In order to ensure that only fork server - related leaks are used by the script, the attack must be successful over multiple app restarts!
For the same reasons discussed for bytecode injection, a vulnerable app is created. However, the app itself does not provide any information leaks, but only the ability to repeatedly invoke a write - what - where (WWW) condition. I.e. an attacker is able to directly specify a value and an address to write the value to. The goal then is to derive a generic exploitation technique that reuses existing bytecode in the target app. Again, a WWW, while a very strong primitive, is only an example vulnerability to ease testing and demonstrating different attacks. It does not yield any benefit to artificially construct a complicated vulnerability! In fact, it would make the ~1000 LoC PoC for the WWW even longer and harder to read (although a python guru would most likely be able to squeeze my 1000 LoC into 10).
Below is the vulnerable native function accessible to an attacker.
extern "C"
JNIEXPORT void JNICALL
Java_com_poc_poc_1local_MainActivity_www(
JNIEnv *env,
jclass clazz,
jlong address,
jlong value) __attribute__ ((optnone)) {
*(uint64_t*)address = (uint64_t)value;
}
Next, as usual, a few basics must be discussed.
Necessary Groundwork
Luckily, the majority of basics is covered in a previous blog post. Therefore, the only mechanism left to understand is invoke-interface
. Although we all love reading through tons of source code, I use a numbering scheme of the form [1]
, [2]
etc. to mark points of interest in code. After the source code listings, these markings are summarized and discussed, so there is no need to fully read all source code snippets!
Invoking Interface Methods
The bytecode instruction invoke-interface
is used in scenarios where polymorphism makes resolving the concrete method to call complicated. As the name suggests, this bytecode instruction can be utilized to invoke implementations of interface methods.
Because different types can implement the same interface, invoke-interface
must be using a type - agnostic mechanism to resolve the method to invoke. This motivates to look further into the implementation of the bytecode instruction iteself, to identify how invoke-interface
accesses objects and associated classes during method resolution and invocation. Remember that an attacker is able to inject data into the target process, including fake objects and classes.
Generic Analysis
invoke-interface
is fully defined by the assembly function invoke_interface
. The code of invoke_interface
reveals interesting properties about interface method resolution! Before delving into the details, lets build an example
interface Logger {
void log(String message);
}
class FileLogger implements Logger {
@Override
public void log(String message) {/*...*/}
}
class ConsoleLogger implements Logger {
@Override
public void log(String message) {/*...*/}
}
class Test {
public static void main(String[] args) {
Logger[] loggers = new Logger[] {
new FileLogger(), // =: fl
new ConsoleLogger() // =: cl
};
for (Logger logger : loggers) {
logger.log("Test123");
}
}
}
In the following, fl
and cl
denote their respective instance of FileLogger
and ConsoleLogger
in the context of the above Java code example.
Caching Interface Method
First of all, invoke-interface
starts off with a fast - path for interface method resolution. I.e. for finding the ArtMethod*
representing the abstract method declared inside an interface
. The caching mechanism can be seen below.
%def fetch_from_thread_cache(dest_reg, miss_label):
add ip, xSELF, #THREAD_INTERPRETER_CACHE_OFFSET // cache address
ubfx ip2, xPC, #2, #THREAD_INTERPRETER_CACHE_SIZE_LOG2 // entry index
add ip, ip, ip2, lsl #4 // entry address within the cache
ldp ip, ${dest_reg}, [ip] // entry key (pc) and value (offset)
cmp ip, xPC
b.ne ${miss_label}
Notice that the cache uses the current dex program counter xPC
to derive the cache set (of size 1). If the entry in that cache set matches the xPC
, then the corresponding value is loaded, i.e. the ArtMethod*
.
Therefore, it cannot be that the cached method represents a concrete implementation of the interface method. Consider the above example code. Invoking fl.log
initially causes a cache miss, which triggers method resolution via nterp_get_method
, i.e. NterpGetMethod
. Thus, if the concrete implementation FileLogger::log
was cached instead of a generic representation of Logger::log
, then the second iteration trying to run cl.log
would wind up calling fl.log
again, because the current dex program counter causes a cache hit and thus triggers the fast - path, fully avoiding another method resolution. All of this implies that whatever is returned by NterpGetMethod
must represent the abstract method declared in the interface, e.g. in Logger
.
Method Resolution Via NterpGetMethod
The first time a specific invoke-interface
is executed during execution of an app will always cause a cache miss, unless the cache is initialized with some methods. NterpGetMethod
is then used to find the corresponding ArtMethod*
or an encoded version of the method index referencing the abstract method relative to a declaring .dex
file.
Consider the stripped version of NterpGetMethod
:
size_t NterpGetMethod(Thread *self, ArtMethod *caller, uint16_t *dex_pc_ptr)
REQUIRES_SHARED(Locks::mutator_lock_)
{
// [1]
UpdateHotness(caller);
const Instruction *inst = Instruction::At(dex_pc_ptr);
InvokeType invoke_type = kStatic;
uint16_t method_index = 0;
switch (inst->Opcode())
{
// ...
case Instruction::INVOKE_INTERFACE:
{
method_index = inst->VRegB_35c();
invoke_type = kInterface;
break;
}
// ...
default:
LOG(FATAL) << "Unknown instruction " << inst->Opcode();
}
ClassLinker *const class_linker = Runtime::Current()->GetClassLinker();
/**
* SkipAccessChecks() is a flag in the caller's `access_flags_` field.
* Apparently access checks are usually done for native, i.e. not
* interpreted, code.
*
* Either class_linker->ResolveMethod find the ArtMethod* in the DexCache, or
* performs a manual resolution using the underlying .dex file. ASSUMING THE
* LATTER, BECAUSE THIS CLEARLY SHOWS WHAT METHOD IS RETURNED.
*/
// [2]
ArtMethod *resolved_method = caller->SkipAccessChecks()
? class_linker->ResolveMethod<ClassLinker::ResolveMode::kNoChecks>(
self, method_index, caller, invoke_type)
:class_linker->ResolveMethod<ClassLinker::ResolveMode::kCheckICCEAndIAE>(
self, method_index, caller, invoke_type);
if (resolved_method == nullptr)
{
DCHECK(self->IsExceptionPending());
return 0;
}
if (invoke_type == kSuper)
{ /*...*/
}
if (invoke_type == kInterface)
{
size_t result = 0u;
if (resolved_method->GetDeclaringClass()->IsObjectClass())
{
/**
* If declaring class is java.lang.Object:
*/
// Set the low bit to notify the interpreter it should do a vtable
// call.
DCHECK_LT(resolved_method->GetMethodIndex(), 0x10000);
result = (resolved_method->GetMethodIndex() << 16) | 1U;
}
else
{
DCHECK(resolved_method->GetDeclaringClass()->IsInterface());
DCHECK(!resolved_method->IsCopied());
if (!resolved_method->IsAbstract())
{
/**
* If declaring class is any non - abstract class:
*/
// Set the second bit to notify the interpreter this is a default
// method.
result = reinterpret_cast<size_t>(resolved_method) | 2U;
}
else
{
/**
* If declaring class is abstract class (may still provide
* definition):
*/
//[3]
result = reinterpret_cast<size_t>(resolved_method);
}
}
UpdateCache(self, dex_pc_ptr, result);
return result;
}
else if (resolved_method->GetDeclaringClass()->IsStringClass() && !resolved_method->IsStatic() && resolved_method->IsConstructor()){ /*...*/}
else if (invoke_type == kVirtual){ /*...*/}
else{ /*...*/}
}
Above code is still complex, so consider the following explanation:
- General information is extracted from bytecode. Most importantly, this is where a method invocation is classified as e.g.
kInterface
, i.e. aninvoke-interface
. Themethod_index
refers to themethod_id_item
inside the declaring.dex
file. - Based on the
method_index
, theClassLinker
is utilized to resolve the correspondingArtMethod*
, i.e. the ART representation of a Java method. Interestingly, the calling methodcaller
dictates whether validation checks are performed or not via itscaller->access_flags_
field. Invoking bytecode in a “benign way” should not trigger such checks. - After a matching
ArtMethod*
has been found, itsdeclaring_class
is checked. This dictates whether the returned value is either a validArtMethod*
, or encoded variant of either anArtMethod*
or amethod_index
. Under the assumption that theresolved_method
represents an abstract interface method, itsdeclaring_class
field should be the declaring interface, e.g.Logger
, which is neitherjava.lang.Object
nor a concrete class. Hence, anArtMethod*
is returned as is, without any encoding.
Following class_linker->ResolveMethod
gives insights into where ART searches for the ArtMethod*
.
ClassLinker
- based Method Resolution
Consider the following reduced implementation of class_linker->ResolveMethod
:
inline ArtMethod *ClassLinker::ResolveMethod(Thread *self,
uint32_t method_idx,
ArtMethod *referrer,
InvokeType type)
{
// ...
/**
* Below comment implies that there exists an array of ArtMethods pointing
* to native methods that are resolved at app startup.
*/
// We do not need the read barrier for getting the DexCache for the initial
// resolved method
// lookup as both from-space and to-space copies point to the same native
// resolved methods array.
// [1]
ArtMethod *resolved_method =
referrer->GetDexCache<kWithoutReadBarrier>()->GetResolvedMethod(
method_idx);
/**
* Method resolution using fast path failed, so resolve method manually.
*/
// [2]
if (UNLIKELY(resolved_method == nullptr))
{
referrer = referrer->GetInterfaceMethodIfProxy(image_pointer_size_);
ObjPtr<mirror::Class> declaring_class = referrer->GetDeclaringClass();
StackHandleScope<2> hs(self);
Handle<mirror::DexCache> h_dex_cache(
hs.NewHandle(referrer->GetDexCache()));
Handle<mirror::ClassLoader> h_class_loader(
hs.NewHandle(declaring_class->GetClassLoader()));
resolved_method = ResolveMethod<kResolveMode>(method_idx,
h_dex_cache,
h_class_loader,
referrer,
type);
}
// ...
// Note: We cannot check here to see whether we added the method to the
// cache. It might be an erroneous class, which results in it being hidden
// from us.
// [3]
return resolved_method;
}
The above code does the following:
- Tries to find the
ArtMethod*
in an associatedDexCache
. Notice that upon app creation,Zygote
already sets up aDexCache
, which means that all apps on the device know the address of at least oneDexCache
instance. In this case, it is assumed that we get a cache miss to further investigate how method resolution works. In reality, this may already mark the end of method resolution in case of a cache hit. - In case of a cache miss the method is resolved by
method_idx
, which was taken from the fixed parameter ofinvoke-interface
opcode. - Finally, the
ArtMethod*
is returned. However, the comment states that resolving a method manually may bypass validation checks. This will not be a problem, if the checks are skipped anyways.
Now comes the most interesting part of the method resolution, namely another ResolveMethod
implementation:
inline ArtMethod *ClassLinker::ResolveMethod(uint32_t method_idx,
Handle<mirror::DexCache> dex_cache,
Handle<mirror::ClassLoader> class_loader,
ArtMethod *referrer,
InvokeType type)
{
// Check for hit in the dex cache.
ArtMethod *resolved = dex_cache->GetResolvedMethod(method_idx);
bool valid_dex_cache_method = resolved != nullptr; // = false
if (kResolveMode == ResolveMode::kNoChecks && valid_dex_cache_method)
{ /*...*/}
/**
* Uses .dex file to resolve the method_id of the method to be invoked. A
* method_id consists of a `class_idx`, `proto_idx` and `name_idx`. For an
* interface method, `class_idx` is expected to refer (within .dex file) to
* an interface.
*
* Interestingly, to fake an interface method invocation, the only thing that
* needs to be "not - faked" is the method_idx used to identify the
* method_id.
* For example, to fake a call to UiAutomation::executeShellCommand, it
* suffices to know its method_idx (and DexCache). From there, the type is
* inferred by the method_id data to be UiAutomation. In other words, when
* creating a fake ArtMethod object, its declaring class must reference a
* correct DexCache (shared by Zygote) and its method_idx must describe the
* correct method.
*/
const DexFile &dex_file = *dex_cache->GetDexFile();
const dex::MethodId &method_id = dex_file.GetMethodId(method_idx);
ObjPtr<mirror::Class> klass = nullptr;
if (valid_dex_cache_method) { /*...*/}
else
{
// [1]
// The method was not in the DexCache, resolve the declaring class.
klass = ResolveType(method_id.class_idx_, dex_cache, class_loader);
if (klass == nullptr)
{
/*...*/
return nullptr;
}
}
/*...*/
if (!valid_dex_cache_method)
{
// [2]
resolved = FindResolvedMethod(
klass, dex_cache.Get(), class_loader.Get(), method_idx);
}
/*...*/
// If we found a method, check for incompatible class changes.
// [3]
if (LIKELY(resolved != nullptr) &&
LIKELY(kResolveMode == ResolveMode::kNoChecks ||
!resolved->CheckIncompatibleClassChange(type)))
{
return resolved;
}
else
{
// If we had a method, or if we can find one with another lookup type,
// it's an incompatible-class-change error.
/*...*/
return nullptr;
}
}
Again, consider the following explanations:
- If the method is not part of the
DexCache
, the class associated with the method referenced bymethod_idx
will be resolved. This is where.dex
is explicitly used to extract themethod_id_item
. - Then, the
ArtMethod*
is resolved based on that class andmethod_idx
. - With a resolved method at hand, some mandatory validation checks are performed before the method is returned.
Skipping some intermediate methods, eventually FindInterfaceMethodWithSignature
is invoked:
static inline ArtMethod *FindInterfaceMethodWithSignature(ObjPtr<Class> klass,
std::string_view name,
const SignatureType &signature,
PointerSize pointer_size)
REQUIRES_SHARED(Locks::mutator_lock_)
{
// If the current class is not an interface, skip the search of its declared
// methods; such lookup is used only to distinguish between
// IncompatibleClassChangeError and NoSuchMethodError and the caller has
// already tried to search methods in the class.
// [1]
if (LIKELY(klass->IsInterface()))
{
// Search declared methods, both direct and virtual.
// (This lookup is used also for invoke-static on interface classes.)
for (ArtMethod &method : klass->GetDeclaredMethodsSlice(pointer_size))
{
if (method.GetNameView() == name &&
method.GetSignature() == signature)
{
return &method;
}
}
}
// TODO: If there is a unique maximally-specific non-abstract superinterface
// method, we should return it, otherwise an arbitrary one can be returned.
/**
* Check all interfaces specified in iftable of the class. This gives the
* ArtMethod of the interface method, not its concrete implementation! For
* an invoke-interface opcode, `klass` is currently a class that provides a
* concrete implementation.
* Thus `klass` skips the above `klass->IsInterface()` check and its iftable
* is read.
* THEREFORE, IFTABLE OF A CLASS SPECIFIES IMPLEMENTED INTERFACES.
*/
// [2]
ObjPtr<IfTable> iftable = klass->GetIfTable();
for (int32_t i = 0, iftable_count = iftable->Count(); i < iftable_count; ++i)
{
ObjPtr<Class> iface = iftable->GetInterface(i);
for (ArtMethod &method : iface->GetVirtualMethodsSlice(pointer_size))
{
if (method.GetNameView() == name &&
method.GetSignature() == signature)
{
return &method;
}
}
}
/*...Check super classes and java.lang.Object, or fail and return nullptr*/
return nullptr;
}
In a nutshell, method resolution boils down to iterating over the klass->iftable_
field and checking all methods of all implemented interfaces for a matching method signature. To that end:
- Initially, the
klass
is still a class with the concrete implementation of the interface method. For example, this could still beFileLogger
orConsoleLogger
, but notLogger
. - Classes that implement an interface use an interface table that describes where to find the concrete implementations of an implemented interface. This table is enumerated and each interface is checked for whether it declares a method that matches the signature of the method to be invoked by
invoke-interface
. Whatever method matches (first; although signatures should be unique) is returned.
This concludes the quick dive into method resolution. Overall, executing invoke-interface
tries to find the abstract method declared in the interface based on the method index that is a fixed operand of the invoke-interface
opcode. Eventually, whatever is returned from NterpGetMethod
is cached, so that future executions that pass the dex program counter of the invoke-interface
can take the fast path.
Interface Method Tables
Going back to the implementation of invoke-interface
, the following code remains to be understood:
/**
* At this point, x26 is either
* - ArtMethod* describing the declared method to be invoked, or
* - Encoded method index
*/
// First argument is the 'this' pointer.
/**
* w1=index of this-register
*/
FETCH w1, 2
.if !$range
and w1, w1, #0xf
.endif
/**
* w1=this
*/
GET_VREG w1, w1
// Note: if w1 is null, this will be handled by our SIGSEGV handler.
/**
* w1=Class object of this
*/
// ============[1]
ldr w2, [x1, #MIRROR_OBJECT_CLASS_OFFSET]
// Test the first two bits of the fetched ArtMethod:
// - If the first bit is set, this is a method on j.l.Object
// - If the second bit is set, this is a default method.
/**
* Implicit assumption that ArtMethod* are 4-byte aligned.
*/
tst w26, #0x3
b.ne 3f
/**
* Case: Non - abstract class, but not java.lang.Object.
* Query w3=imt_index_ from ArtMethod* of the interface method (abstract
* method).
*/
// ============[2]
ldrh w3, [x26, #ART_METHOD_IMT_INDEX_OFFSET]
2:
/**
* Use first entry of embedded vtable, i.e. Interface Method Table pointer
* with imt_index_ to select the concrete implementation of the interface
* method.
*/
// ============[3]
ldr x2, [x2, #MIRROR_CLASS_IMT_PTR_OFFSET_64]
ldr x0, [x2, w3, uxtw #3]
/**
* x0 holds concrete implementation, i.e. an ArtMethod*
*/
.if $range
b NterpCommonInvokeInterfaceRange
.else
// ============[4]
b NterpCommonInvokeInterface
.endif
Although all parts are relevant, the following main steps are taken:
w1
is equal tothis
, i.e. the pointer to the object used for invocation. E.g. infl.log("Test")
that would befl
, notFileLogger
, notLogger
and also notlog
! Basically,w1
contains the receiving object. Thenw2
contains the pointer to the class ofw1
, i.e. amirror::Class*
. Furthermore, the least - significant 2 bits of the resolvedArtMethod*
(or encoded method index) are checked. This stems fromNterpGetMethod
, which is assumed to have simply returned a not - encodedArtMethod*
. Hence, the branch is not taken.- Next, the
resolved_method->imt_index_
is extracted intow3
. This selects the concrete implementation of the resolved method. - Then, the actual
ImTable*
is read from the first entry of the embedded vtable. Using theimt_index_
scaled by the size of aArtMethod*
,x0
is set to be the concrete implementation of the resolved abstract method. - Finally,
NterpCommonInvokeInterface
is used to invoke the concrete implementation.
Observe that the concrete method invocation is basically just a lookup in the ImTable*
, which is similar to a vtable in C++ and stored inside the embedded vtable of the Class
. A Class
is referenced by an object. Therefore, if an attacker controls an object, then the attacker can also reference a fake Class
and thus a fake ImTable*
. Overall, hijacking an object gives an attacker control over what methods are invoked during invoke-interface
!
This of course requires knowledge on some internal values, among which reside:
class_idx
of the class providing the concrete implementation. It should not be possible to inject a custom class with invalidclass_idx
, because either type resolution goes throughDexCache
or the associated.dex
file. If neither contain the type, an error is raised.- Valid
iftable
that states that a particular interface is implemented by the fake class. Technically, aniftable
will not be needed if the fast - path is taken or themethod_index
can be resolved by theDexCache
. - Valid
embedded_vtable
, which references anImTable
in its first entry. It is possible to overlap the following vtable entries and theImTable
.
As a rule of thumb:
When creating a fake object and class, try to build as valid structures as feasible.
In other words, no need to be fancy with complex overlapping pointers etc, because the probability that some code inside the enormous ART code base validates or tries to work with the fake structures is pretty high (keyword: garbage collector). The exception to this rule of thumb is overlapping the ImTable
with the embedded vtable, because it is an easy and almost foolproof way to save some space.
Goal
With the theory out of the way, we again settle for arbitrary command execution in the context of the vulnerable app. For simplicity, the goal is to eventually invoke Runtime.getRuntime().exec("<command>")
. However, it is forbidden to inject bytecode into the target process. It is only allowed to either inject data, i.e. objects, classes and more. Also, only existing bytecode may be reused. Trivially, using native techniques as intermediate step to gain bytecode execution is also forbidden.
Core Idea
As mentioned before, bytecode reuse draws from various fields of offensive security. To be precise, we use
- Counterfeit Object Oriented Programming (COOP) : An exploitation technique for memory errors in C++ programs that is based on fake object injection and vtable pointer manipulation.
- Insecure Deserialization: A vulnerability that allows an attacker to determine the data to be deserialized by the target app. For this post, the idea of gadget chains is critical.
Without diving into all the rabbit holes I found myself in during research, the overall idea is to identify a good sequence of invoke-interface
bytecode instructions. For example, the chain could look like this.objA.funcA(this.objB)
. It is important to note that the “surrounding” object represented by this
is controlled by an attacker. Thus, an attacker also controls objA
and objB
. If an attacker controls objA
, it may be possible to control what function is invoked. This is due to the fact that an attacker can choose the composition of an ImTable
of a fake object.
COOP vs. Gadget Chains
For those interested in or familiar with COOP, the original approach using a main loop gadget does not work well with bytecode. The most limiting factor is passing arguments from one gadget to another. Consider the following setup:
interface Observer {
void invoke(Object data);
}
...
Observer[] observers = ...;
...
for (Observer o : observers) {
o.invoke(...);
}
Basically, the o.invoke
invocation internally uses an invoke-interface
bytecode instruction. This means an attacker can inject an array of fake objects that provide their own implementations of Observer::invoke
through their fake class definitions. Now, being able to execute an arbitrary list of methods will be useful, if either the methods do not need to cooperate or use a shared object to pass (intermediate) results. Unfortunately, the approach of using spilled hardware registers or the stack to pass data between gadgets is not (easily) applicable to bytecode. Also using vregs and vrefs does not work, because these are cleared when nterp
sets up the execution environment for a method. Therefore, at best, there is a global object referenced by the methods invoked via fake objects, or the object passed as parameter is usable in some way. Notice that both approaches drastically restrict the set of available gadgets. Being able to invoke a sequence of methods that is semantically equivalent to System.exec("<command>")
seems like a daunting and impossible task.
This is the reason why we abstract away from the structure used in COOP attacks shown above, i.e. the for
- loop over a fake object array. Observe that every piece of Java code that uses invoke-interface
is a potential structure, including this.objA.funcA(this.objB)
, which could translate to this.shell.executeShellCommand(this.commandString)
. Again, notice the combination of COOP and gadget chains from insecure deserialization: invoke-interface
uses the IMT of a class to determine the interface method implementation to invoke, and the structure gives the framework or layout to be adhered to.
High - Level Solution
To reach the goal, the structure this.objA.func(this.objB)
is used. An attentive reader may realize that the structure does not match Runtime.getRuntime().exec("<command>")
. In order to make them match, it would be required that objA = <Runtime instance>
. Unfortunately, we cannot assume that the location of a Java object in memory remains the same across all apps forked from zygote64
, due to garbage collection. Creating a fake runtime is also infeasible due to the complexity and relevance of that object. However, it may be possible to create a fake object that provides a method, which eventually triggers execution of Runtime.getRuntime().exec("<command>")
, where the command string is also controllable.
Without showing the time - consuming search for candidate gadgets, which has been supported by some static analysis of .dex
files of framework.jar
available in every app using a modified version of Topper
, the classes of interest are VirtualKeyboard
, UiAutomation
and String
.
Lets break down the overall approach. First of all, VirtualKeyboard::close
provides the structure:
@Override
@RequiresPermission(android.Manifest.permission.CREATE_VIRTUAL_DEVICE)
public void close() {
try {
// this.objA.funcA(this.objB)
mVirtualDevice.unregisterInputDevice(mToken);
} catch (RemoteException e) {
throw e.rethrowFromSystemServer();
}
}
Mapping the structure to variable names yields:
objA = mVirtualDevice
funcA = unregisterInputDevice
mToken = objB
Now, one might argue that objects are strictly typed and thus cannot be changed to different types, even at runtime. To that end, consider the bytecode of VirtualKeyboard::close
below.
[Index = 0xe8da, Offset = 0x4e85a8, Num Regs = 0x3]: public void VirtualKeyboard::close()
0000: IGET_OBJECT v0, v2, FIELD:VirtualKeyboard;->mVirtualDevice:IVirtualDevice;
0004: IGET_OBJECT v1, v2, FIELD:VirtualKeyboard;->mToken:IBinder;
0008: INVOKE_INTERFACE {v0, v1}, METHOD:IVirtualDevice;->unregisterInputDevice(IBinder;)V
000e: NOP
0010: RETURN_VOID
0012: MOVE_EXCEPTION v0
0014: INVOKE_VIRTUAL {v0}, METHOD:RemoteException;->rethrowFromSystemServer()RuntimeException;
001a: MOVE_RESULT_OBJECT v1
001c: THROW v1
Note: The annotations
Override
andRequiresPermission
do not seem to be enforced at runtime or impact method invocation in any way, which seems to align with the definition .
Unless VirtualKeyboard::close
throws an exception, the method really consists of only 4
relevant instructions:
0000: IGET_OBJECT v0, v2, FIELD:VirtualKeyboard;->mVirtualDevice:IVirtualDevice;
0004: IGET_OBJECT v1, v2, FIELD:VirtualKeyboard;->mToken:IBinder;
0008: INVOKE_INTERFACE {v0, v1}, METHOD:IVirtualDevice;->unregisterInputDevice(IBinder;)V
0010: RETURN_VOID
Notice that iget-object vA, vB, field@CCCC
does exactly as the name suggests: move the field with index CCCC
of object referenced by vB
into vreg vA
. The last spark of hope is that iget-object
checks the type of the fields it operates on. From fundamentals
we get that method resolution using invoke-interface
does not really care about the type of the involved objects, but rather only looks at the ImTable
of the class of the receiving object. Now, lets rip apart the illusion of type checks at runtime by considering the implementation of iget-object
:
%def op_iget_object():
% op_iget(load="ldr", volatile_load="ldar", maybe_extend="", wide="0", is_object="1")
%def op_iget(load="ldr", volatile_load="ldar", maybe_extend="", wide="0", is_object="0"):
% slow_path = add_slow_path(op_iget_slow_path, volatile_load, maybe_extend, wide, is_object)
% fetch_from_thread_cache("x0", miss_label=slow_path)
.L${opcode}_resume:
lsr w2, wINST, #12 // w2<- B
GET_VREG w3, w2 // w3<- object we're operating on
ubfx w2, wINST, #8, #4 // w2<- A
cbz w3, common_errNullObject // object was null
.if $wide
$load x0, [x3, x0]
SET_VREG_WIDE x0, w2 // fp[A] <- value
.elseif $is_object // ===================[1]
$load w0, [x3, x0] // ===================[2]
TEST_IF_MARKING .L${opcode}_read_barrier
.L${opcode}_resume_after_read_barrier:
SET_VREG_OBJECT w0, w2 // fp[A] <- value
.else
$load w0, [x3, x0]
SET_VREG w0, w2 // fp[A] <- value
.endif
FETCH_ADVANCE_INST 2 // ===================[3]
GET_INST_OPCODE ip
GOTO_OPCODE ip
.if $is_object
.L${opcode}_read_barrier:
bl art_quick_read_barrier_mark_reg00
b .L${opcode}_resume_after_read_barrier
.endif
While there is nothing more refreshing than reading arm assembly mixed with custom macros, whose definitions are sprinkled over various files, below is the short version:
- After passing the caching mechanism, which sets
x0
to the field offset in memory and is also used ininvoke-interface
, it is checked what kind of field is moved from the object invB
tovA
. The code distinguishes between wide types likelong
anddouble
, objects and the rest. Asiget-object
setsis_object = 1
when calling intoop_iget
, the object path is taken. - Access the field at offset
x0
relative to the base of object referenced byvB
. This loads a 32-bit address intow0
. Observe thatx0 >= 8
, because objects have predefinedklass_
andmonitor_
fields. - Continue with the next instruction.
The above code only tells half of the story, because a cache miss means the ArtField
must be resolved using the slow path through nterp_get_instance_field_offset
. Notice that even if the slow path performed type checks, any repeated execution of the iget-object
instructions in VirtualKeyboard::close
would use the fast path, unless their cache entries are evicted. Of course, it may be sufficient to only check the type in the slow path and then assume its correctness in the fast path.
Continuing with the high - level approach, the structure provided by VirtualKeyboard::close
is very dynamic at runtime, allowing an attacker to replace not only the objects but also their classes and thus their invoked methods. Now, looking into UiAutomation
reveals the reason for why VirtualKeyboard::close
is a suitable candidate for shell invocation
:
public ParcelFileDescriptor executeShellCommand(String command) {
warnIfBetterCommand(command);
ParcelFileDescriptor source = null;
ParcelFileDescriptor sink = null;
try {
ParcelFileDescriptor[] pipe = ParcelFileDescriptor.createPipe();
source = pipe[0];
sink = pipe[1];
// Calling out without a lock held.
mUiAutomationConnection.executeShellCommand(command, sink, null);
} catch (IOException ioe) {
Log.e(LOG_TAG, "Error executing shell command!", ioe);
} catch (RemoteException re) {
Log.e(LOG_TAG, "Error executing shell command!", re);
} finally {
IoUtils.closeQuietly(sink);
}
return source;
}
Surprisingly, UiAutomation
provides a convenience method for invoking shell commands. However, in comparison to Runtime.getRuntime().exec
, it is a lot easier to create a fake UiAutomation
object than it is to create a fake Runtime
instance.
The key component in the above Java code is mUiAutomationConnection.executeShellCommand(command, sink, null)
. Without showing the entire call stack, eventually, the following code
is called:
public void executeShellCommandWithStderr(final String command, final ParcelFileDescriptor sink,
final ParcelFileDescriptor source, final ParcelFileDescriptor stderrSink)
throws RemoteException {
synchronized (mLock) {
throwIfCalledByNotTrustedUidLocked();
throwIfShutdownLocked();
throwIfNotConnectedLocked();
}
final java.lang.Process process;
try {
process = Runtime.getRuntime().exec(command);
} catch (IOException exc) {
throw new RuntimeException("Error running shell command '" + command + "'", exc);
}
...
}
Basically, if an attacker is able to pass the methods throwIfCalledByNotTrustedUidLocked
, throwIfShutdownLocked
and throwIfNotConnectedLocked
without crashing the app, then an attacker - chosen command will be executed. For simplicity, we do not care what happens after command execution. I.e. crashing the app after successful command execution is enough to prove that bytecode reuse attacks are possible.
Without further ado, consider the critical methods :
private void throwIfShutdownLocked() {
if (mIsShutdown) {
throw new IllegalStateException("Connection shutdown!");
}
}
private void throwIfNotConnectedLocked() {
if (!isConnectedLocked()) { // Returns: this.mClient != null
throw new IllegalStateException("Not connected!");
}
}
private void throwIfCalledByNotTrustedUidLocked() {
final int callingUid = Binder.getCallingUid();
if (callingUid != mOwningUid && mOwningUid != Process.SYSTEM_UID
&& callingUid != 0 /*root*/) {
throw new SecurityException("Calling from not trusted UID!");
}
}
Bypassing these checks is trivial, because the mUiAutomationConnection
object of UiAutomation
is also attacker - controlled. Therefore, setting the fields appropriately allows passing the checks. For example, throwIfNotConnectedLocked
tries to enforce that this.mClient != null
before the command is executed. Internally, this simply compares the field value in the mUiAutomationConnection
object with 0
. This means that setting mClient = 1
bypasses the check, although this is not a valid reference.
Lets conclude with a visualization of the entire high - level approach. First of all, we start off with the correct structure. Method invocation works with the correct objects and classes.
However, after objects have been replaced with their fake counterparts, the invocation looks like can be seen below. Observe that the overall structure remains the same, only objects, classes and associated method implementations change.
As the latter image already shows, when looking for the method to invoke via invoke-interface
, nterp
uses the fake class and eventually the .dex
file associated with that class. After the abstract method has been resolved, the abstract method’s imt_index_
field is used to choose the correct ImTable
entry. Therefore, the ImTable
may be shrinked to only account for the lookup of the entry at index imt_index_
. Because nterp
uses whatever method is found in the ImTable
, invocation of executeShellCommand
is inevitable.
Long story short, not only is an attacker able to inject fake objects, classes, ArtMethod
s, ArtField
s and tables, but it is also possible to force existing bytecode to operate on those fake structures. In a nutshell, an attacker can convert the WWW condition into a type confusion and trick the interpreter to work with custom objects, causing invocation of arbitrary methods at runtime (of course, method signatures should match).
Kicking Off Execution
Building on the blog posts covering Android basics for bytecode exploitation, the GRANDFATHERED
map can be used to kick off execution of the chain that executes a shell command. To that end, reconsider the following code:
class LanguageTag {
...
private static final Map<String, String[]> GRANDFATHERED = new HashMap<>();
...
public static LanguageTag parse(String languageTag, ParseStatus sts) {
...
String[] gfmap = GRANDFATHERED.get(LocaleUtils.toLowerString(languageTag));
...
}
}
Observe that the invocation GRANDFATHERED.get
uses invoke-interface
again, because GRANDFATHERED
is a HashMap
, but ::get
is declared in Map
interface. Therefore, using a similar approach as discussed above, GRANDFATHERED
can be replaced with an instance of VirtualKeyboard
, and the invocation of ::get
can be redirected to VirtualKeyboard::close
. Setting up the VirtualKeyboard
instance to contain instances of UiAutomation
and String
for mVirtualDevice
and mToken
, respectively, allows kicking off command execution. What is more is that LanguageTag::parse
is most likely called inside a lifecycle
method like onStop
, which guarantees execution of the gadget chains.
Because GRANDFATHERED
seems to be located inside the boot.art
memory region, which again seems to be shared by all maps, it is a relatively stable target to abuse in the test environment.
Proof of Concept
The concrete PoC code is about 1000 LoC, because we need to respect structures like mirror::Class
etc. Encoding these structures in Python bloats up the PoC. However, the quintessence is exactly what is discussed in the above sections. For a visual proof, consider the following PoC
video.
It is important to note that the memory region, in which fake objects are built up using the WWW condition, must be in a 32 - bit address range, because references to objects and classes must be 32 - bit addresses. In case of the above video, that region is [anon:.bss]
. The other memory regions are used to reference existing bytecode (framework.jar
), reference the interpreter handler ExecuteNterpImpl
to construct valid ArtMethod
instances (libart.so
), kick off gadget chain execution via GRANDFATHERED
(boot.art
) and reference a valid DexCache
instance initialized by zygote64
(boot-framework.art
). All of these memory regions have been confirmed to be duplicated upon fork in a previous blog post using maps diffing.
Of course, a better PoC would be to construct a malicious Android app that attacks the victim app. However, creating parsers for e.g. boot.art
to spot GRANDFATHERED
dynamically is considered a lot of busywork and does not show more than the extern python script mimicking a local app.
Potential Solutions
From a security perspective, multiple mitigations come to mind:
- Enforce type checks at runtime.
- Use a kind of random token (like csrf token) sampled after the app is forked from
zygote64
. Then, each object holds that token as a field next tomonitor_
andklass_
. Upon usage of an object, the object’s token is compared to the original random token. If both tokens match, execution will continue. Otherwise, the app is aborted.
Of course, these mitigations do not take into account the performance overhead introduced by all the checks. If every bytecode instruction validated a random token, performance would propably be a lot worse. On the other hand, one may argue that the interpreter started off too greedily as regards performance, and such security checks are legitimate. This is a common tradeoff: security vs. performance. Luckily, the techniques discussed in this series of blog posts are fairly hard to pull off, which severely reduces practicality.
Responsible Disclosure
All research results, including working PoCs, have been submitted to Google’s bug bounty program to ensure that publishing these blog posts does not cause any severe security problems and to give Google time to investigate the findings and respond, if necessary. Of course, there are no concrete vulnerabilities, but rather a new exploitation concept on Android. Also, I find it hard to estimate the practical impact of these blog posts, because many stars must align for bytecode injection and reuse to work, which is why I welcomed the feedback. Fortunately, Google decided the results are not a security concern and gave permission to publish blog posts on that matter!
Summary
This concludes our journey through the land of bytecode - based exploitation on Android! Here, the more advanced bytecode reuse technique is discussed, along with fundamentals necessary to grasp all concepts described. Also, some security mechanisms that immediately come to mind are mentioned without taking into account performance impact.
Naturally, there is a lot more to be discovered about bytecode execution and exploitation on Android. The series of blog posts on Android bytecode is the result of about 1.5 years of part-time research, with some distractions along the way. Hence, the blog posts do not contain everything discovered or tested, but only the most interesting cherries! I stopped counting the rabbit holes I followed that did not provide any results or at best “funny” facts, like e.g. throw - oriented programming.
Overall, I learned that security research on a well - known operating system like Android is similar to walking the corridor in Hilbert’s hotel: infinite options, so you really need to choose the doors you open wisely. Regardless, persistence is key to find something that is interesting, so keep learning, researching and hacking! ;)