Secure coding and more

lunedì 2 gennaio 2023

The Segmented Memory Model and How It Works in Windows x64

I created this post as part of my jouring in getting more acquainted with the Intel architecture. Segmentation is a very important topic in the Intel architecture, so here is my contribution. For my experiment I'll use a x64 Windows 10 running in a VM attached to a kernel debugger.

Mode of Operations

The first step is to identify the processor mode of operation. x64 supports various modes and memory models. Let's try to identify the current one. This information is stored in the 32-bit CR0 control register ([1]), under the flag PE stored at position 0 (position 0 is the least significant bit (LSB), that is, the right-most bit). If this bit is set, we are running in protected mode, otherwise we are running in real-address mode. Let's use the kernel debugger to perform this check as shown in Figure 1.

kd> .formats cr0
Evaluate expression:
  Hex:     00000000`80050031
  Decimal: 2147811377
  Decimal (unsigned) : 2147811377
  Octal:   0000000000020001200061
  Binary:  00000000 00000000 00000000 00000000 10000000 00000101 00000000 00110001
  Chars:   .......1
  Time:    ***** Invalid
  Float:   low -4.59246e-040 high 0
  Double:  1.06116e-314Figure 1. Operation Mode Identification

The CR0.PE bit is set to 1, so we are running in protected mode using a segmented memory model (you might also notice that the CR0.PG bit, at position 31 is set, indicating that we are also using paging). We can also check the sub-mode operation by inspecting the IA32_EFER Machine Specific Register (MSR) (0xC0000080) ([2]), and checking the LME (bit position 8) and LMA (bit position 10) flags. You can see the result in Figure 2.

kd> rdmsr 0xC0000080
msr[c0000080] = 00000000`00000d01
kd> .formats 00000000`00000d01
Evaluate expression:
  Hex:     00000000`00000d01
  Decimal: 3329
  Decimal (unsigned) : 3329
  Octal:   0000000000000000006401
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00001101 00000001
  Chars:   ........
  Time:    Thu Jan  1 01:55:29 1970
  Float:   low 4.66492e-042 high 0
  Double:  1.64474e-320Figure 2. Operation Sub-Mode Identification

The IA32_EFER.LMA and IA32_EFER.LME bits are set, so we are running in IA-32e sub-mode (64-bit). This information will be used later in the text.

Segmented Memory Model

The Segmented Memory Model accesses the memory by using the segment concept. A segment provides information on how to translate a given address. According to the executed instruction, a different segment is involved (eg. for call instruction the code segment is used, instead, for the push and pop instructions the stack segment is used). The Intel architecture defines a total of six segment registers: CS, DS, ES, SS, GS, and FS. For example, the CS segment (code segment) is used when a call instruction is executed. Let's see how this works with a practical example, let's consider the instruction in Figure 3.

00007FFD42C7D5C1 | E8 1A000000  | call kernelbase.7FFD42C7D5E0Figure 3. How Segmentation Works

The call instruction uses the value 1A000000 to specify the address of the function to execute. Since we are in a x64 bit operation mode, the value is RIP-relative, this explains why the function address in the disassembly is 0x7FFD42C7D5E0 (0x7FFD42C7D5C1 (RIP) + 0x1a (offset) + 0x05 (instruction size)). In addition to the mentioned value, the value of the CS segment is also used. The combination of the CS with the function address is called the logical address. The segment value is then used to translate the logical address into what is known as the virtual address (this process is described in the next section). Since our system is using paging, and additional translation step is performed to translate the virtual address into the physical address (this topic is not covered in this post). All the translation steps are represented in Figure 4.

Figure 4. Logical to Physical Address Translation

How Segmentation Works

The segment registers are 16-bit registers whose structure is reported in Figure 5.

Figure 5. Segment Selector Format

The Index field is used as an index in a table that contains information on all the available segments. The TI flag indicates which table must be used, and the Request Privilege Level (RPL) field specifies the protection level of the code requesting access to a specific segment. The possible protection level values are: 0, 1, 2 and 3, and are often represented as protection rings, where ring 0 is the most privileged (where the kernel mode code is executed) and ring 3 is the least privileged (where user mode code is executed).

The two tables that contain information on the segments are the Global Descriptor Table (GDT) and the Local Descriptor Table (LDT). The registers GDTR and LDTR contain the base address of the respective table. In the latest Windows versions, the LDT is no more used, so the TI flag will always be 0. The GDT is an array of segment descriptors, where each segment descriptor is typically represented by the 64-bit structure reported in Figure 6.

Figure 6. Segment Descriptor Format

Given the segment descriptor definition, we can now explain how the logical address to virtual address translation is performed. The Base field is added to the logical address in order to obtain the virtual address. This process is described in Figure 7.

Figure 7. Segment Descriptor Usage in Address Translation

A very important field is DPL. It indicates the privilege level of the code running in that segment, for example, a DPL value of 0 can execute privileged instructions such as CLI. Another relevant field is L. This field indicates if the segment is running in Long mode (if it is set to 1) or in compatibility mode (if it is set to 0). Figure 8 shows how to inspect the GDT and all the defined segments.

kd> rgdtr
gdtr=fffff804382f3fb0
kd> db fffff804382f3fb0 
fffff804`382f3fb0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
fffff804`382f3fc0  00 00 00 00 00 9b 20 00-00 00 00 00 00 93 40 00  ...... .......@.
fffff804`382f3fd0  ff ff 00 00 00 fb cf 00-ff ff 00 00 00 f3 cf 00  ................
fffff804`382f3fe0  00 00 00 00 00 fb 20 00-00 00 00 00 00 00 00 00  ...... .........
fffff804`382f3ff0  67 00 00 20 2f 8b 00 38-04 f8 ff ff 00 00 00 00  g.. /..8........
fffff804`382f4000  00 3c 00 00 00 f3 40 00-00 00 00 00 00 00 00 00  .<....@.........
fffff804`382f4010  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
fffff804`382f4020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
kd> dg 10 50
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029b
0018 00000000`00000000 00000000`00000000 Data RW Ac 0 Bg By P  Nl 00000493
0020 00000000`00000000 00000000`ffffffff Code RE Ac 3 Bg Pg P  Nl 00000cfb
0028 00000000`00000000 00000000`ffffffff Data RW Ac 3 Bg Pg P  Nl 00000cf3
0030 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P  Lo 000002fb
0038 00000000`00000000 00000000`00000000  0 Nb By Np Nl 00000000
0040 00000000`382f2000 00000000`00000067 TSS32 Busy 0 Nb By P  Nl 0000008b
0048 00000000`0000ffff 00000000`0000f804  0 Nb By Np Nl 00000000
0050 00000000`00000000 00000000`00003c00 Data RW Ac 3 Bg By P  Nl 000004f3Figure 8. Dumping All Segments

The first two commands obtain the address of the GDT register and dump the memory value. The first non null entry is at offset 0x10 from the GDT base address (the first entry in the GDT is always null). To have a more readable view, we can use the dg command; it dumps all the segments and shows relevant information. There are various Code and Data segments, having as privilege 0 (kernel mode) and 3 (user mode).

In particular, there is a segment in user mode that is running in 32-bit compatibility mode (Long=0); its segment selector is 0x20. Similarly, there is a segment running in user mode as long mode (Long=0); its segment selector is 0x30.

Windows and the Flat Memory Model

You might have heard that Windows uses a flat memory model, but, we stated above that we are running in a segment memory model. What does it mean? By now, you know how a segment descriptor is used to compute the virtual address and we have also dumped all the segment descriptors defined in the system. You might have noticed that all the Code and Data segments have the Base address field to 0. This implies that Windows is not taking advantage of the segment concept, since having as Base always 0 has as result that the logical address is equal to the virtual address. This means that we are using a segmented memory model without using the segment concept. This mode is known as flat memory model. This statement is also reported by the Intel official documentation:

In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as additional base registers in linear address calculations. They facilitate addressing local data and certain operating system data structures. Note that the processor does not perform segment limit checks at runtime in 64-bit mode.

Decoding a Segment Register

Let's try decoding the value stored in a segment register. Let's consider the CS register, having value 0x33. This value in binary format is 00110011b. As described in Figure 5, bits 3-15 represent the index in the GDT table, which in this case have decimal value 6 (110b). To obtain the segment selector we have to multiply the index by the size of a segment descriptor, which is 8 bytes. Figure 9 shows this operation in the kernel debugger.

kd> .formats 0x33
Evaluate expression:
  Hex:     00000000`00000033
  Decimal: 51
  Decimal (unsigned) : 51
  Octal:   0000000000000000000063
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00110011
  Chars:   .......3
  Time:    Thu Jan  1 01:00:51 1970
  Float:   low 7.14662e-044 high 0
  Double:  2.51973e-322
kd> dq gdtr + (6 * 8) L1
fffff804`382f3fe0  0020fb00`00000000
Figure 9. Obtain the Segment Selector

The segment descriptor value is 0020fb00`00000000. Now, let's use the dg and dt commands to display the segment descriptor associated with index 6, by using the operation 6 * 8 = 48 (0x30). The result is reported in Figure 10.

kd> dg 30
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0030 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P  Lo 000002fb
kd> dt nt!_KGDTENTRY64 fffff804`382f3fe0 -b
   +0x000 LimitLow         : 0
   +0x002 BaseLow          : 0
   +0x004 Bytes            : 
      +0x000 BaseMiddle       : 0 ''
      +0x001 Flags1           : 0xfb ''
      +0x002 Flags2           : 0x20 ' '
      +0x003 BaseHigh         : 0 ''
   +0x004 Bits             : 
      +0x000 BaseMiddle       : 0y00000000 (0)
      +0x000 Type             : 0y11011 (0x1b)
      +0x000 Dpl              : 0y11
      +0x000 Present          : 0y1
      +0x000 LimitHigh        : 0y0000
      +0x000 System           : 0y0
      +0x000 LongMode         : 0y1
      +0x000 DefaultBig       : 0y0
      +0x000 Granularity      : 0y0
      +0x000 BaseHigh         : 0y00000000 (0)
   +0x008 BaseUpper        : 0
   +0x00c MustBeZero       : 0
   +0x000 DataLow          : 0n9283176673312768
   +0x008 DataHigh         : 0n0Figure 10. Dump of a Segment Descriptor

As you can see, the result is the same in both cases.

Experimenting With Kernel Mode and User Mode Code

Let's use windbg to inspect the segments of a piece of code running in kernel mode (Figure 11).

kd> r
rax=0000000000000003 rbx=fffff804382fde60 rcx=fffff804382fde60
rdx=fffff804382fde10 rsi=fffff80433b731a0 rdi=fffff80433b73190
rip=fffff80435414be5 rsp=fffff804382fdde8 rbp=0000000000000000
 r8=0000000000000003  r9=fffff804382fddf8 r10=0000000000000000
r11=fffff804382fddd0 r12=fffff80433b73100 r13=0000000000000000
r14=0000000000000100 r15=00000000ffffffff
iopl=0         nv up di ng nz na po nc
cs=0010  ss=0000  ds=002b  es=002b  fs=0053  gs=002b             efl=00040086
nt!DebugService2+0x5:
fffff804`35414be5 cc              int     3Figure 11. 64-bit Kernel Mode Process Registers

As you can see, RIP points to kernel address, and the CS segment value is 0x10 that, according to the result from Figure 8, corresponds to a segment of type Code, with privilege 0 (the most privileged) and Long mode enabled. Now let's try the same experiment by analyzing a 64-bit user-mode process (Figure 12).

Figure 12. 64-bit User Mode Process Registers

The image shows a CS segment value of 0x33, that corresponds to a segment of type Code, with privilege 3 (the lowest privilege) and Long mode enabled. Finally, let's see an example of a 32-bit user-mode process running on a 64-bit OS (Figure 13).

Figure 13. 32-bit User Mode Process Registers

The image shows a CS with value 0x23, that corresponds to a segment of type Code, with privilege 3 and Long mode disabled. Since Long mode is disabled, this implies that the process is running in compatibility-mode (32-bit).

Segment Transition and Syscall

We mentioned that code running in kernel mode has a different CS value with DPL value 0. How is the segment transition performed? There are various ways to change the segment descriptor. One way is by using specific instructions that change the CS register, such as retf, which reads the new CS value from the stack. However, due to a lower DPL we can not use such a mechanism.

An alternative method is to use a call gate segment descriptor ([3]). However, this mechanism is not used in modern Windows OS, which prefers to use the syscall instruction. Among the various actions performed by this instruction, there is the change of the segment selector. But, how is the correct segment chosen? This information is obtained from the IA32_STAR (0xC0000081) MSR. Bit 32-47 are extracted and used as value for the new segment selector (which is 0x10 in case of transition to kernel mode). Let's use windbg to verify this aspect (Figure 14).

kd> rdmsr 0xC0000081
msr[c0000081] = 00230010`00000000
kd> .formats 00230010`00000000
Evaluate expression:
  Hex:     00230010`00000000
  Decimal: 9851692904349696
  Decimal (unsigned) : 9851692904349696
  Octal:   0000430001000000000000
  Binary:  00000000 00100011 00000000 00010000 00000000 00000000 00000000 00000000
  Chars:   .#......
  Time:    Sun Mar 21 11:08:10.434 1632 (UTC + 1:00)
  Float:   low 0 high 3.21426e-039
  Double:  5.28462e-308
kd> .formats 0y0000000000010000
Evaluate expression:
  Hex:     00000000`00000010
  Decimal: 16
  Decimal (unsigned) : 16
  Octal:   0000000000000000000020
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010000
  Chars:   ........
  Time:    Thu Jan  1 01:00:16 1970
  Float:   low 2.24208e-044 high 0
  Double:  7.90505e-323
kd> dg 10
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029bFigure 14. Transition to DPL 0 Via Syscall Instruction

We first read the IA32_STAR MSR and extract the bits related to the new CS, whose value is 00000000 00010000. Converting this value to hex results in 0x10, which is exactly the same value that we obtained when we inspected the CS register in kernel mode in the previous section.

Heaven's Gates Consideration

If you reached this point, you now have all the information to understand the concept behind the Heaven's Gate mechanism, which is used to transition from x64 to x86 code in order to run 32-bit binaries. Microsoft created a specific segment descriptor for this purpose, assigning to it the value 0x20. The privileges between the two segment descriptors are the same, and it is possible to perform the transition by using one of the many instructions that take into consideration the CS register, such as retf or a far call. A lot of documentation is written on this aspect, and Microsoft refers to this with the name Windows-on-Windows (WoW64).

Conclusion

Modern OS are executed in protected mode under a flat segmented memory model. In this post we analyzed how this model works and how it can be used to change privilege levels. If you want to know more, I invite you to read the references.

References

[1] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A): System Programming Guide - Chapter 2.5 CONTROL REGISTERS
[2] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 4: Model-Specific Registers - IA32_EFER
[3] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A): System Programming Guide - Chapter 5.8.3 Call Gates
[4] - Call Gates' Ring Transitioning in IA-32 Mode
[5] - Bringing Call Gates Back
[6] - Windows Internals, Part 2, 7th Edition
[7] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1
[8] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture

domenica 26 giugno 2022

TheMatrix - A process inspection tool aimed at easier the malware analysis task

Twitter: @s4tan
Download: https://github.com/enkomio/thematrix

In this post I'll describe a project that I created to easier the malware analysis process. The goal of the project is to run a target binary in a controlled environment and logging the Win32 function calls. I wanted to create something that is easy to extend and robust. I'm aware that other similar tools exists, but my intent was to have fun in doing Assembly programming and learning stuff that I only reversed but never implemented :)

How it works

TheMatrix is a program mostly written in assembly (x86/x64) that implements the following features:

It creates a PE loader (also referenced as an activator) that loads a user input binary (also know as target binary).
A multi-arch hook engine that monitors the Win32 API function calls.

Create an activator

The first task consists in creating an activator. This is a binary that once executed loads the embedded PE file (the target binary) and runs its entry-point. The activator will be a DLL if the targety binary is a DLL or an EXE otherwise. The activator exports an additional function which is DllRegisterServer. This function is commonly used by malware to start the main code.

Activator execution

When executed, the activator extracts the embedded binary and loads it in memory. Before executing the target binary entry-point, various Win32 function hooks are placed. This ensures that the malware execution is monitored. By default, TheMatrix implements various Windows hooks that log the input data to the folder: ./Desktop/thematrix/<PID>/<API_name>.log. During the PE loading step, the PEB.Ldr field is updated by including the target binary. This field contains a double linked list of all the currently loaded DLL and it is used by various Win32 API such as GetProcAddress. I still wonder why of the many PE loader projects available online, no one modifies the Ldr field.

TheMatrix Under the hood

The core of TheMatrix is implemented in assembly. This gave me the possibility to improve my x64 assembly programming skills and at the same time to implement features that I only reveresed. The x86 and x64 version have quite a few differences which are detailed below.

x86 Version

The 32-bit version of TheMatrix uses Microsoft Hot Patching mechanism to place the function hooks (see file x86_hook_engine.inc). The inserted JMP instruction will jump to a trampoline (a concept described later) that is placed in a code cave. The code cave is found by searching in the DLL sections. At execution time, when the API function is called by the target binary, the trampoline will execute and a jump to the user defined hook function is performed.

x64 Version

I started to implement the project in x86 assembly. As soon as I finished the initial version, the malware that I was interested in analysing switched to x64. This forced me to re-implement all the code in x64 assembly too (here is my reaction when I discovered this fact: https://twitter.com/s4tan/status/1516488723294298116).

When I decided to implement the x64 version too, I find myself in trouble since the x64 Win32 APIs do not support hot patching in the same way as the x86 version. This forced me to choose a different approach to place my hooks. In the end, I decided to use Export Address Table (EAT) hooking. As for the x86 version, a trampoline is used that will call the user defined hook function (see file x64_hook_engine.inc).

An additional aspect that is often ignored during the binary reversing process, it is that MS uses a different x64 function call convention when compared to x86 code (see this doc for more details). In addition, the stack needs to be 16 bytes aligned. In theory the concept is simple, but as often happens, the evil is in the details :) Luckily I found a useful 300 loc file that help me with this task (see https://twitter.com/s4tan/status/1522150733839273986).

Trampoline and hook function

The trampoline contains part of the magic that allowed me to create a clean design. Below you can see the x64 version of the trampoline code before being written to the identified code cave.

@trampoline_code_start:
	mov rax, 011223344aabbccddh ; store the address of the original function
	mov qword ptr gs:[28h], rax ; TIB.ArbitraryUserPointer, see: https://codemachine.com/articles/arbitraryuserpointer_usage.html
	mov rax, 011223344aabbccddh ; hook function address
	jmp rax

Two places needs to be patched at runtime. The first is the address of the user defined function hook, and the second one is the original address of the hooked function. This latest information is necessary in order to easily call the original function as show in the section below. To store this value I choosed the TIB.ArbitraryUserPointer field which is part of the Thread Environment Block (or TIB in this case). This field is rarely used and is a good place to store our information. The only requirement is that the original function must be called in the same thread of the function hook.

Usage

As mentioned, the first step is to create the activator. This is achieved by using the -add command and specifying the target binary. TheMatrix will create a copy of itself containing the target binary. If the target binary is a DLL, TheMatrix will modify the activator file in order to result as a DLL and not as an EXE file. Once the activator is created, it can be executed in the same way as the target binary.

One of the main goal of my project was to create something that was really easy to update. Adding a new function hook must be a deadly easy operation. In the end I come up with a design where you can extend the project in a simple way, you just need a bit of Win32 API programming skill (you can implement your code in C, no Assembly programming required ^^). To place an hook you just need to use the hook_add function, by specifying the DLL name, the API function name and the user defined hook function. An example of call is the following one:

hook_add("Bcrypt.dll", "BCryptImportKeyPair", hook_BCryptImportKeyPair);

Then, you have to implements your function hook. To call the original function it is enough to use the call_original function by passing the input parameters of the original function. This kind of design is possible thanks to the freedom provided by programming in assembly. An example of usage is shown below.

LPVOID __stdcall hook_BCryptImportKeyPair(BCRYPT_ALG_HANDLE hAlgorithm, BCRYPT_KEY_HANDLE hImportKey, LPCWSTR pszBlobType, BCRYPT_KEY_HANDLE* phKey, PUCHAR pbInput, ULONG cbInput, ULONG dwFlags)
{
	// save imported key bytes
	char name[MAX_PATH] = { 0 };
	snprintf(name, sizeof(name), "BCryptImportKeyPair_%llx_%d", (uint64_t)pbInput, cbInput);
	log_data(cbInput, pbInput, name);

	LPVOID ret = call_original(
		hAlgorithm,
		hImportKey,
		pszBlobType,
		phKey,
		pbInput,
		cbInput,
		dwFlags
	);
	return ret;
}

In the example above, the hook function logs the imported key before calling the original function. The final step is to inform TheMatrix of the available hooks before to run the target binary. This action is performed in the function hooks_init, whose definition is the following:

bool hooks_init(uint8_t* hMod)

The file hooks.c contains the function call, and can be customized by the user.

Demo

The following video shows an example of TheMatrix usage. The video shows the execution of a malware and demonstrates how TheMatrix is able to trace the execution of a new process and the extraction of relevant information. The malware is a famous one and it is not difficult to recognize it if you are into malware analysis ;)

venerdì 20 maggio 2022

Alan c2 Framework v7.0: Hyper-Pivoting

Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework/releases/latest
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

A new Alan C2 Framework version was released, codename: Hyper-Pivoting. This new version includes some cool features like a proxy usage to allow the operator to easily pivoting through networks.

SOCKS5 Proxy

Network Pivoting is an essential part of every red-team activities and a must have feature for every C2 Framework. Alan v7.0 implements a proxy feature to easier network pivoting. By using the proxy command the operator can create a SOCKS5 compliant proxy on the machine where the agent is running, or interacting with an already running proxy.

Proxy chain is another useful feature that allows the operator to chain multiple proxies togheter. Creating a proxy chain is very simple, just use the command: proxy chain [proxy ID source] [proxy ID dest]. Some network segments can communicate only with specific addresses, this implies that reaching the C2 server is not an easy task. By using a chain of proxies the agent can establish a path to the Alan server and being able to compromise very segmented networks too.

The executed proxies are protected by a username and password. If the operator does no specify them, a randomly generated username and password is used (the operator can see the username and password by running the proxy command). As mentioned, the proxy are SOCKS5 proxies and can be used by any other programs that accept a SOCKS5 proxy.

One of the main Alan pillars is the in-memory execution of all its components, and the proxy has no exception. When a proxy is executed, its code runs inside the host process without touching the disk.

Misc features

Alan 7.0 includes other relevant features. The info command was improved by showing the Machine ID and if the agent is using a proxy. All Alan logs are now saved to the alan.log file. In addition, all the output generated by the Alan server and the commands inserted by the operator are saved to an evidence file. This allows the operator to include the evidence file as part of the red-team activity report.

Demo

The video below shows an example of proxy usage. After creating a proxy the Alan agent is instructed to use it. The video demonstrates that the running proxies are compliant to the SOCKS5 specification, by using one the created proxy with the curl utility. Next, a proxy chain is created and the network traffic displayed to show that the chain of proxies is traversed before to reach the Alan server.

domenica 20 febbraio 2022

Alan c2 Framework v6.0: Alan + JavaScript = ♡

Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework/releases/latest
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

Alan v6.0 was release with a new cool feature: JavaScript execution. The scripts are executed in memory and do not depend on any third party program. The scripts source code can be downloaded from the GitHub Alan repository.

Being able to extend the framework is a mandatory feature in today red-team tools. Each team has its own methodology to perform a red-team activity and being able to customize or extend the tool capabilities is mandatory. One of the main goal with Alan was to provide a framework that can be easily adapted to vairous modus-operandi. Alan v6.0 adds a new feature to support an easy extension, it allows the operator to execute JavaScript file directly in memory. This feature is implemented inside an Alan core module and does not depend on any third party program.

In other tools, this kind of feature requires the operator to compile C code by following a specific process. This might be overhelming and unecessary complex. Javascript is an easy language and even novices can become proficient in a short time.

However, being able to execute JavaScript code is not enough, since in most cases the operator needs to interact with native Windows function to perform a given action. Alan provides an interface to call native Windows functions by using the handy JavaScript syntax. This blog post explores the details of this feature and how to use it to extend the Alan capabilities.

Gettin Started

Executing a JavaScript file in Alan is extremyl easy, just use the run command and specify a file with .js extension. In order to call a Windwos function, Alan implements the Win32 module that exposes two methods: GetProcAddress and LoadLibrary. These are the basic methods to call virtually any Windows functions. Let's try to write a simple file that prints the process ID.

import * as win32 from 'Win32';

var kernel32 = win32.LoadLibrary("kernel32.dll");
var GetCurrentProcessId = win32.GetProcAddress(kernel32, "GetCurrentProcessId");
var IsWow64Process = win32.GetProcAddress(kernel32, "IsWow64Process");
var GetCurrentProcess = win32.GetProcAddress(kernel32, "GetCurrentProcess");


var my_pid = GetCurrentProcessId();
var is_wow64 = new Array(4);
IsWow64Process(GetCurrentProcess(), is_wow64);

var msg = "Hello world from Javascript executed in process: " + my_pid;
if (is_wow64[0] == 1)
	msg += " - I'm running under Wow64 :)";
print(msg);

The script opens the Win32 module in order to load the Kernel32 DLL by calling the LoadLibrary function. Using the obtained handle, the GetCurrentProcessId function address is resolved by using the GetProcAddress function. The other functions are resolved in the same way. You can now use the resolved functions by calling them as standard JavaScript functions. As final step, the script prints a string showing a message containing information extracted from the Windows APIs.

A fundamental step of the entire process is being able to easily test the script during the development stage. In this new Alan version, a new folder named tools was added to the Alan package. It contains the files cqjsx86.exe and cqjsx64.exe. These files are JavaScript interpreters in x86 and x64 version. Let's try to run our script with both files to see what result is produced (the --file option is used to specify the file path).

C:\Alan.v6.0.511.24\tools>cqjsx64.exe --file test.js
Hello world from Javascript executed in process: 15532

C:\Alan.v6.0.511.24\tools>

If we use the wqjsx86.exe program, we obtain the following result (I'm running my test in a x64 OS).

C:\Alan.v6.0.511.24\tools>cqjsx86.exe --file test.js
Hello world from Javascript executed in process: 30844 - I'm running under Wow64 :)

C:\Alan.v6.0.511.24\tools>

As can be noticed, the result is different according to the used version.
Once that the script works as expected, we can run it in the Alan agent by simply using the run command and specifying the full path of the script.

Windows API Data Structure Interoperation

The GetProcAddress and the LoadLibrary should provide the basic functionality to call every Wind32 APIs. However, interacting with a native API might requires further information. A typical example are parameters that are used as buffer (both in input and output). When this is the case, the following rules apply:

Each JavaScript Array is considered as an array of bytes when passed to a Win32 function. Each byte is casted to uint8_t (this causes a data truncation and a potential data corruption). If the array contains other complex data types (such as a String) its value is converted to NULL.
Boolean values are converted to 1 if true and 0 if false.
Each number is converted to a 32-bit interger on x86 process, and to 64-bit integer on x64 process.
Each JavaScript String is converted to an ascii string when passed to a Win32 function.
You can not call functions with more than 20 parameters.

The rules above imply that:

Each parameter passed by address to a Win32 function needs to be converted to an array (eg. to pass a LPDWORD you have to create an Array(4) parater if running in 32-bit or an Array(8) if running in 64-bit).
If a Win32 function accept a structure, it needs to be converted to an Array too. For example, a PROCESSENTRY32 structure must be represented as an Array and then parsed by refercing the fields by their offset (an example using this structure is presented later with some helper function to simplify the job).

All these rules might be quite annoying during the development of a not trivial script. In the next section I'll show how to easier the development task by implementing an lsass process memory dumper.

Implementing a simple lsass.exe process memory dumper

This is a perfect case to explore more in-depth this new feature. Being able to dump the process memory of the lsass process is very import to further compromise an host. There are various techniques to achieve this goal, but for the sake of simplicity I'll go for the simpler one, by using the MiniDumpWriteDump function. I'll put the script on GitHub so you can have a look at its full source code.

Let's suppose that our Agent is running as Administrator, then the following points have to be considered to write the dumper:

Enable SE_DEBUG_NAME privilege.
Scan all processes to identify the lsass.exe process.
Create a mini dump of the lsass.exe process.

As first step we have to load all the needed functions. This is a trivial task and already demonstrated in the previous example. Enabling SE_DEBUG_NAME is the next step. To perform this action we have to use a TOKEN_PRIVILEGES structure. This structure is quite simple, so for this task we will just create an array of 0x10 bytes and reference the sTP.Privileges[0].Luid, the sTP.PrivilegeCount and the sTP.Privileges[0].Attributes by their array offset. After calling the AdjustTokenPrivileges function we are now reayd to proceed with the next and probably most complex step.

We have to identify the lsass.exe process. To achieve this goal we use the CreateToolhelp32Snapshot function to obtain a snapshot and loop through all processes untile we find a process whose name is lsass.exe. This implies the usage of a PROCESSENTRY32 structure which is not that simple. To easies the task I created various JavaScript functions helper that serialize an object to a JavaScript array. The serialization function inspects the prefix of each field name and according to its value a specific serialization action is performed. For example, field names that start dw_ are serializated as DWORD. Field names that start with p_ are serializated to a four bytes array or eigth bytes array according to the value of a global variable that I defined at the start of the script (this step can be more dynamic by using the IsWow64Process function). Thanks to these functions, working with structures is now a lot easier (see the script source code for full details).

The final step is to create a file and call the MiniDumpWriteDump function to create a file dump that you can now download to your machine for post-processing.

Demo

Now that we have create our script to dump the lsass.exe process memory, let's use it. The video below provides a demonstration about how to dump the lsass.exe process memory by running our JavaScript script in the agent.

giovedì 20 gennaio 2022

Analyzing an IDA Pro anti-decompilation code

Twitter: @s4tan
GitHub: https://github.com/enkomio/

In this post I'll analyze a piece of code that induces IDA Pro to decompile the assembly in a wrong way. I'll propose a fix, but I'm open to more elegant solutions :)

The function that we want to decompile has the following assembly code (I'm using IDA Pro v7.6):

.text:1001BC95 56                  push    esi
.text:1001BC96 FF 74 24 10         push    [esp+4+arg_8]     
.text:1001BC9A 8B 74 24 10         mov     esi, [esp+8+arg_4] 
.text:1001BC9E 56                  push    esi
.text:1001BC9F FF 74 24 10         push    [esp+0Ch+arg_0]
.text:1001BCA3 52                  push    edx
.text:1001BCA4 51                  push    ecx
.text:1001BCA5 E8 57 20 FF FF      call    nullsub_1
.text:1001BCAA 8B 0A               mov     ecx, [edx]      
.text:1001BCAC 83 C4 14            add     esp, 14h
.text:1001BCAF 89 4E 0C            mov     [esi+0Ch], ecx
.text:1001BCB2 8B 42 04            mov     eax, [edx+4]
.text:1001BCB5 03 C1               add     eax, ecx
.text:1001BCB7 89 46 04            mov     [esi+4], eax
.text:1001BCBA 5E                  pop     esi
.text:1001BCBB C3                  retn

The function uses two arguments with an unconventional calling convention. If we decompile the code, we obtain:

int __cdecl sub_1001BC95(int a1, int a2)
{
  int *v2; // edx
  int v3; // ecx
  int result; // eax

  nullsub_1();
  v3 = *v2;
  *(a2 + 12) = *v2;
  result = v3 + v2[1];
  *(a2 + 4) = result;
  return result;
}

In IDA Pro the v2 variable (corrisponding to the line at address 0x1001BCAA) is colored in red, since its value might be undefined.

Custom calling convention might cause some problems to the decompilation process (see this), but, in general, there exist an easy fix to it: it is enough to inform IDA Pro that the function uses a custom calling convention. By modifying the function, we can set the new type with the following definition:

int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)

with this new definition, the decompiled code now looks like the following:

int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
{
  int *v1; // edx
  int v2; // ecx
  int result; // eax
  int v4; // [esp+Ch] [ebp+8h]

  nullsub_1();
  v2 = *v1;
  *(v4 + 12) = *v1;
  result = v2 + v1[1];
  *(v4 + 4) = result;
  return result;
}

We haven't done any progress at all. The only place where we haven't checked is the nullsub_1 function, the problem must be in its call. If we analyze this function, we notice that it has an empty body, as shown below.

.text:1000DD01 C3                  retn

Why is this function causing problems? The answer is in the software convention used by the compiler. During the compilation, the compiler considers some registers as volatile. This means that the value of these registers, after a function call, should not be considered preserved ([1]). Among the volatile registers, there is EDX, which is exactly one of the registers used to pass a function parameter in the custom calling convention.

This code causes problem to the decompilation process that considers (correctly) the EDX register to have an undefined value after the function call.

I'm not aware of any particular IDA Pro command to inform the decompiler to not consider EDX as volatile, so the simpler solution that I found is to just remove the call instruction (I patched the bytes E8 57 20 FF FF with 90 90 90 90 90). The result is a much cleaner decompiled code, as shown below.

int __usercall sub_1001BC95@<eax>(PUCHAR arg0@<edx>, int garbage, PUCHAR arg1)
{
  PUCHAR v3; // ecx
  int result; // eax
  
  v3 = *arg0;
  *(arg1 + 3) = *arg0;
  result = &arg0[1][v3];
  *(arg1 + 1) = result;
  return result;
}

Now we can proceed to further improve the decompilation code (we can clearly see the usage of a struct in the code) now that the decompiled code represents the real intent of the assembly code.

Update:

I received a message on twitter and reddit that suggests to have a look at the __spoils keyword mentioned in this Igor’s tip of the week post [2] (shame on me for not having found it).

Its meaning is exactly what we need to solve the problem in a more elegant and generic way. It is enough to change the nullsub_1 function definition by adding the __spoils keyword, as show below:

void __spoils<> nullsub_1(void)

The decompilation result of the function sub_1001BC95 is the same as before with the exception that the call to the nullsub_1 function is still there (it is not necessary to patch the bytes anymore).

Links:

[1] Register volatility and preservation
[2] Igor’s tip of the week #51: Custom calling conventions

sabato 18 dicembre 2021

Alan c2 Framework v5.0 - All you can in-memory edition

Twitter: @s4tan
Download: https://github.com/enkomio/AlanFramework
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

I just released version 5.0 of my C2 post-exploitation framework Alan. You can download the binaries and read the release notes at: https://github.com/enkomio/AlanFramework/releases/latest

My goal with the Alan project is to provide a post-exploitation framework that can help red-team operators to further compromise their targets. Tipically, each team has its preferred tools to exploit the target, an example is the pletora of tools that can perform the memory dump of the lsass process. Alan does not enforce any particular tool, instead it provides the ground to run whatever tools the operator like. All tools are executed in memory in the address space of a pre-configured host process, or injected into another process.

This feature is achieved by the introduction of the new command run. This command accepts a file path on the operator machine and executes it on the compromised host without touching the disk. It is possible to specify command-line arguments that are passed to the executed program (this feature is not so common in the other C2 framework ;)). For this reason I decided to name this version "All you can in-memory" :)

Other commands were also implemented that allow the operator to execute a program on the compromised host. In particular the command exec was added to execute a new process and the shell command was modified to accept an argument that is the command to execute (if no argument is specified, a command shell is presented to the operator).

Find below the video that shows the following features:

Creation of a x64 powershell agent.
In-memory execution of the nanodump utility by using the configured host program (raserver.exe in this case) and passing a command-line argument. The Process Hacker windows will display the execution of the raserver.exe process.
Execution of the program notepad.exe in background.
In-memory execution of the dumper utility by injecting the binary in the just created notepad process. In this case the raserver.exe is not executed.

domenica 26 settembre 2021

Alan post-exploitation framework v4.0 released

Twitter: @s4tan
Download: GitHub
Documentation: https://github.com/enkomio/AlanFramework/tree/main/doc

I just released version 4.0 of my post-exploitation framework Alan. You can download the binaries and read the release notes at: https://github.com/enkomio/AlanFramework/releases/latest

I also made a video that shows the following features:

Creation of two agents, a x86 and a x64 version
Migration of agent x86 to a process with a different integrity level
Execution of a command-shell on the compromised host and the execution of the x64 agent directly from the command-shell
Migration of the x64 agent to another x64 process
Restart of the Alan server to show that the agents reconnect to the server after the restart (the agent session is not lost)