Developers Hit By Compromised Software Packages
A Typosquat campaign uses slight variations on well-known names to mislead a user to access a rogue rather than genuine asset. It is well known within email addresses and web sites where a slight change in spelling leads to a scam or malware host. These locations are often short lived as the domain name issuers shut them down but they can stay live long enough to do significant harm.
The examples discussed below involve the use of false development packages delivered through Typosquat techniques. These have names that are similar to known solutions. If the hacker chooses to infect an open source target they already have, by definition, a fully functional code template. They can then embed additional code to view information on the target machine or remotely install malware.
Some sort of modular approach is almost essential for modern software development. Many essential tasks have already been solved with code and there is little incentive to ‘re-invent the wheel’ by coding such solutions again. A software package has (hopefully) documented end points that any developer can send and receive data to or from and achieve the required processing result. This frees time to create more advanced solutions as a developer will be relying on known tried and trusted solutions within the packages used. In many cases these segments of code are closed packages but that is not the case of the open source world. By definition such packages can be changed allowing wide testing and opportunities to constantly improve their performance or to fork the project to provide a complimentary yet slightly differing solution.
It would be relatively hard to change the official version of a package because its distribution is controlled and well documented. That would not be the case of a forked product, the criminal can then place malware within the new code and set up a distribution hub that mimics the genuine release site. Such a package could be hosted on a typosquat web page. More worryingly it could be downloaded directly into a developer’s source code through text based links to a compromised repository. An example article on the educational site w3Schools explains how to download and install a Node.js package with simple command line commands. Many developers will be working with a largely or wholly text-based editor and a simple misspelling of a package name will install the rogue rather than genuine code. The engines behind development systems usually include an element of code completion to ease the programmer’s workload. A rogue package could then be introduced through the automated editor picking the wrong package without the coder noticing. As any rogue example includes all the functionality of the original then any malware will not be immediately evident. The most likely targets would be student developers. Their code might be a ‘work in progress’ and might never reach the level of a full release but they could still be using infected packages and their use, perhaps in testing or as in-house corporate solution, could compromise business or academic networks.
In October 2024 Phylum Research detected an attack mimicking the Puppeteer package through the packages ‘pupeter’ and ‘pupetier’. The genuine Puppeteer package is installed through Node.js and is used to control the Chrome browser for example allowing scraping data from web pages.
In May 2024 the Python Package Index (PyPI) repository briefly suspended new user sign ups following the upload of more than 500 malicious typosquatting packages. These included variants for machine learning libraries such as Pytorch (26), Matplotlib (38), and Selenium (28). The attack was almost certainly automated. Most led to the installation of malware that steals files, Discord tokens or data from web browsers and cryptocurrency tokens. It also attempts to download a Python script (“hvnc.py”) to the Windows Startup folder for persistence.
In August 2022 it was claimed that many repositories on the popular GitHub platform were infected with malware. There are few restrictions on creating GitHub repositories and pushing code onto them. The core purpose of the engine is to share code development and distribution. The engine itself does give clues as to the reliability of any code including documentation, a history of maintenance and testing. A lack of any of this is a bad sign.
These activities would be considered an attack on the software supply chain and can be mitigated in similar means to protecting a physical supply chain:
- Consider if use of a package is essential for the product.
- Investigate the provenance and reliability of any package.
- Set up a ‘white list’ of approved repositories.
- Force access to repositories through a proxy server. This can allow or block access to repositories.