The artifact of Real or Rogue? Detecting Malicious Miniapps with Deceptive Reporting Interface in WWW'26.
WWW'26 Artifact Evaluation Information: Instructions for checking this artifact is on Section 3.
This tool, "MIRAGE", is used to identify the Miniapps with Deceptive Reporting Interfaces. In general, the tool contains three components.
Component 0: analyze the entire miniapp package and generate the data flow graph.
Component 1A: Reconstruct the displayable contents of the pages and resolve the texts on each page
Component 1B: Reconstruct the data flow to resolve any domains associated with APIs to send information
Component 2: Similarity model to fit, embed, and infer the similarity between any given page to the official report interface.
Availability: The artifact provides full availability of any code, dataset, and accompanying analysis result for the malware behavior. The code and analysis notes are included in this repository. The dataset is partially included in this repository to minimize the overhead to perform preliminary test on the artifact. A full dataset can be requested by following the instructions on https://minimalware.github.io/
As per requested by affected platforms with ownership of the miniapp packages, we cannot release the dataset without verifying their identity and ensuring ethical processing of the dataset.
Functionality: The released code should have full functionality, i.e., being able to execute and output expected forms of output.
- The module of analyzing and reconstructing displayable contents and domain list will be written to
scannedpages.csvandscannedurls.csv. - The results of similarity scores w.r.t. official interface for malware identification, as well as final results of identification, will be generated in
data/folder.
Reproducability: Due to the nature of large-scale analysis and the embedding model necessary for malware identification, the reproducability of this tool is evaluated separately. For the data reconstruction model output (1-run.py), the result is determinitstic and always hold the same across different combinations of miniapps to analyze. The results are reproducable, and the snippet containing the entries of the 5 miniapps from the original json file is attached in ground_truth for cross-check. For the similarity-based identification (2-analyze.py), the result may vary across execution and will vary with different corpus of miniapps to analyze. Despite that, this repository retains the script of training transformer model, setting threshold, and identifying malware utilizing this function for future researchers to work on. However, please bear in mind that as this repository provides only 5 miniapps to evaluate, the result of identification will differ from the paper. For any further inquiry, please contact the first author at https://frostwing98.com/.
To setup and use the tool, please follow the following steps.
chmod +x delete.sh
chmod +x decryp.sh
chmod +x restart.sh
./restart.sh
source ./venv/bin/activate
python3 -m venv ./venv
Please bear in mind that this repo has NodeJS code. To reduce potential issues, the node_module is included. The users are welcomed to set up and install the required libraries on their own if needed.
For python, please install the following necessary libraries, depending on which function you want to perform.
python3 -m pip install {necessary_lib}
These libraries are required for executing the tool for analysis 1-run.py.
escodegen
graphviz
lxml
These libraries are required for identifying the malware based on similarity scores 2-analyze.py.
matplotlob
transformers
sentence_transformers
pandas
seaborn
To perform checking, run
python3 1-run.py
There should be two files generated at the root folder. Kindly checkout scannedpages.csv and scannedurls.csv.
Optionally, to check how the malware is identified, run
python3 2-analyze.py
There should be miniapp_all_result1.csv and miniapp_all_result2.csv showing results of whether there are miniapps containing reporting interfaces similar to the one in figure 1 and figure 7 of the paper separately. Bear in mind that the threshold set for the full dataset, not the 5-miniapp dataset in this repo.
This project includes 2 modules. The core module of MIRAGE detecting the MIRROR malware is located in myanalyzer, and the supporting module comprising basic miniapp analysis framework derived from CMRFScanner and TaintMini is located in pdg_js.
data and interm is used to store intermediate data of detection result and static analysis graphs.
ground_truth, artifact, and miniapps is used to store a partial set of the dataset for evaluating the functionality of the tool.
Due to certain inherited issue of the miniapp package unpacker, the wuWxapkg series of scripts have to be placed at the root folder, making it look messy.
The entire topology is as follows.
MIRAGE # root folder
├─ artifact/
│ └─ appids.csv # index list for appids to analyze
│
├─ miniapps/ # miniapp packages
│
├─ interm/ # directory for MIRAGE interm data
│
├─ data/ # folder storing final results
│
├─ ground_truth/ # info for validating reproducability
│
├─ metainfo/ # detailed info for the dataset
│
├─ myanalyzer/ # CORE MIRAGE FUNCTIONALITIES
│ ├─ cfg.py # data flow analysis module
│ ├─ wxmlparser.py # WXML adaptive module
│ └─ main_thread.py # main entrance script
│
├─ pdg_js/ # supporting Miniapp Analyzer utility
│
├─ 1-run.py # script for phase 1 validation
├─ 2-analyze.py # script for phase 2 validation
│
├─ main_similarity.py # SentenceBERT embedding generation
├─ testsimilarity.py # similarity score generation
├─ readcsv.py # csv processing & malware detection
│
├─ 5-url-assigner.py # multicore batch script (server)
├─ 5-url-worker.py # multicore batch script (client)
│
├─ decrypt.sh # package unpacker shell
├─ wuWxapkg.js # miniapp package unpacker
├─ restart.sh # directory initialization
├─ delete.sh # unpacked miniapp remover
We host the dataset on Google Drive. However, given the sensitive nature of these packages, as per requested by the affected platform during our responsible disclosure, the dataset shall not be released to malicious party. Hence, an agreement on a mandatory identity and motivation check is enforced. To access the full dataset, please follow the instruction on https://minimalware.github.io/.
Please note that the dataset is shared on a "in dubio pro reo" basis, i.e., we trust the users who access the dataset is using them for research purpose and will delete after the research is finished. Therefore, we require self-identification and motivation statements.
However, we retain the rights to withdraw access to the dataset should ethical concerns arise, or petition from original owners of the dataset is received.
We thank Dr. Yue Zhang for his time and advices contributed to this research. We also thank Chao Wang for his effort maintaining the cloud infrastructure that is fundamental to this research. We thank Tencent and WeChat Security Team for providing initial examples and valuable insights that initiates this research. And we thank Dr. Chaoshun Zuo and Dr. Yue Zhang for helping us obtain the dataset of miniapp packages.
In addition, Dr. Yuqing Yang particularly thanks Mr. Yakun Yang and Ms. Jing Feng, his parents, for helping him evaluate generality of the research by testing whether there are geographic differences for oversea versions of Super Apps.
If this work benefits your research, kindly use the following bibtex entry for endorsement.
@inproceedings{mirror,
author = {Yuqing Yang and Zhiqiang Lin},
title = {Rogue? Detecting Malicious Miniapps with Deceptive Reporting Interface},
booktitle = {Proceedings of the {ACM} on Web Conference 2026, {WWW} 2026, Dubai, United Arab Emirates, April 13-17, 2026},
publisher = {{ACM}},
year = {2026},
url = {https://doi.org/10.1145/3774904.3792470},
doi = {10.1145/3774904.3792470},
}
Additional inquiries can be redirected to the first author's email. Kindly refer to https://frostwing98.com/
