PE File Structure:
The native windows file format (Microsoft introduced PE in Windows NT 3.1). 32 bit DLL, COM, OCX and NT kernel mode drivers are all in PE file format as well. (64-bit version is called PE32+)
The PE File starts of with a magic number, which is a 4 byte data block at the beginning of the PE file, that can helps the operating system define file type and therefor how to execute the file.
Here are some magic number examples:
This is the windows executable binary header (MZ), named after Mark Zvikowvski on of the early microsoft architects.
This is the magic number of a PDF file.
This is the magic number of a ZIP file.
Following the magic number, there is a 2 byte sector defining the machine architecture to which the executable was written, and another 4 byte sector with the number of sections included in the file.
The PE optional Header contains useful information for the malware analyst (and is not actually such optional), like the executable type (exe,com,dll, etc.), how the executable should be loaded, etc. Some important fields from the Portable Executable header are the Entry Point Address which points to the first instruction to be executed when the malware is loaded, and the Image Base which defines where the executable is loaded in virtual memory.
The header is followed by the IAT, EAT and Sections Table:
- IAT – The Imports Address Table, has information about functions that the program calls from DLL files. Those functions and DLLs expose some or all of the malware functionality. For example a malware that imports ws32_32.dll may have some network functionality.
- EAT – The Exports Address Table is generally used in DLL files, and exports functions for other programs to call.
- Section Table – actual sections of the file, each of which contains useful information. Some common section are:
- .text – The .text section contains the instructions that the CPU executes. All other sections . Most of the times, this is the only section that can execute, and it should be the only section that includes code. (Another sections including code might be sign for packed malware)
- .rdata – Holds read-only data that is globally accessible within the program
- .data – Stores global data accessed throughout the program
- .idata – Sometimes present and stores the import function information; if this section is not present, the import function information is stored in the .rdata section
- .edata – Sometimes present and stores the export function information; if this section is not present, the export function information is stored in the .rdata section
- .pdata – Present only in 64-bit executables and stores exception-handling information
- .rsrc – Stores resources needed by the executable
- .reloc – Contains information for relocation of library files (Loading library files to different memory addresses if the preferred addresses cannot be allocated for some reason.)
Like I have said in previous post, static analysis is the process of studying a malware sample without executing it. We can look for suspicious strings (File Paths, IP Addresses, URLs, Registry Keys, etc.). We can also look at the IAT,EAT and section tables, that may indicate a lot about the malware expected behavior. In this sample i will use a tool called MASTIFF which is a static analysis automation framework:
In the first screenshot we can see the file sections, and observe that those are not standard section names, and that multiple sections are marked for execution which is really suspicious. We also can not ignore the section names: UPX0,UPX1 and UPX2, UPX is the name of a knows packer, and we suspect that this file has been packed with it.
In this screenshot we can see various imports of another malware sample. For example the import of the InternetGetConnectedState function from the WININET.dll implies that the malware checks if there is connection to the internet. Reference about all the Microsoft library functions can be obtained from MSDN (Microsoft Developer Network).
This last screenshot is of a strings execution against another malware sample, revealing some interensting file names, URLs, Error codes and library calls.