Steganography in malware, known as stegomalware or stegware, is stealthily increasing in popularity as attackers diversify in pursuit of flying under the radar with their malicious code hidden from view in parasitic fashion. Malware authors continue to display versatility in devising new techniques, and re-inventing existing ones, in the hunt for ways to hide their malicious wares.

Malware writers are bringing the ancient practice of steganography up to date by masking malicious code in pictures, videos and other seemingly harmless types of image files. Many of these types of files are considered to be a low security risk and are often overlooked for further analysis. This has provided an ideal gateway of opportunity for would-be cyber attackers and for the concealment of malicious code.

In recent months, there has been a spike in the discovery of steganography techniques used maliciously through stegware. Security companies and researchers report on this worrying trend as they aim to raise awareness of the pitfalls and dangers of this threat. Meanwhile attackers continue to master the use of stegware in a growing number of applications as their techniques become ever more sophisticated.

Steganography in malware can be traced back many years but it came to the fore in 2006 with the one of the first large-scale usages in Operation Shady RAT. This led to attacks against numerous institutions worldwide and inflicted damage for months.

The main program responsible for this attack was Trojan.Downbot. This trojan created a back door and then downloaded files appearing as real HTML pages or JPEG images. The files were encoded with commands that allowed remote servers to gain access to local files on the infected host computer.

The proportion of known incidents involving stegware, compared to other types of threats, has grown between 2011 and 2019, as depicted in Table 1 below. This shows a significant increase in the discovery of stegware; however, it is likely that there is more, as yet undiscovered, stegware out there than these figures suggest. Security experts do not always correctly recognize and classify the techniques used which serves to exacerbate the problem of accurate measurement of security threats. It is most probable that the amount of information-hiding-capable malware is heavily underestimated.

Increase in information-hiding capable malware
Figure 1: Increase in information-hiding capable malware (CUING)

SIMARGL research in this area aims to improve the information available, to raise awareness on different types of stegware, and to address the threats. Initial research by SIMARGL partners focuses on stegware definitions and types from which three categories are classified as follows:

Table 1: SIMARGL Definitions of stegoware
Name Definition
Group 1 Malware that embeds secret data by using digital media steganography
Group 2 Malware that embeds secret data by modifying a digital image file's structure
Group 3 Malware that injects secret data into network traffic

Early SIMARGL research details the distribution of information hiding discovered in real-life malware. Figure 2 highlights that in the samples analysed, stegoware was more frequently found in digital images, followed by network traffic and file structures.

Distribution of information hiding-capable real-life malware
Figure 2: Distribution of information hiding-capable real-life malware (CUING)

The definitions outlined in Table 1 will be discussed in more detail in a follow-up blog along with the research carried out by SIMARGL partners on this subject. Work is currently being carried out on:

  • a dataset of 100,000 of clean-stego image pairs
  • a dataset of 4,642 working scripts from combines datasets from Palo Alto and VirusTotal

One specific focus is on the utilization and detection of the Invoke-PSImage tool, which embeds the bytes of a Powershell script into the pixels of a PNG image and generates a oneliner to execute from a file or from the web. Research centres on:

  • Detection of steganographic modifications of the digital images sent within the network
  • Estimating the size of an embedded PowerShell script – possibly to identify it from the set of PS scripts

The results of the research will be used to develop a solution that can be integrated with the SIMARGL partners’ products.

This blog post is published as part of CUING's work on the SIMARGL project. Part 2 of this blog series has now been published.