This repository contains the reference implementation and evaluation of SITARA, a privacy-by-default, three-tier system designed to protect bystanders while preserving utility for consenting parties.
The system enforces on-device blurring at capture, storing only encrypted face packets and embeddings so raw facial pixels are never exposed. It supports landmark-driven synthetic face replacements on a companion phone for immediate wearer utility. When a bystander explicitly consents, the system uses a cryptographic, consent-mediated split-key protocol with a Trusted Third Party (TTP) to restore the original face.
The prototype runs in real-time on Raspberry Pi 4 hardware, and this repository includes the full pipeline, evaluation scripts, and a novel dataset.
- Privacy at Capture: Mandatory on-device blurring with per-face encrypted packets.
- Reversible & Consented Restoration: A TTP-mediated split-key protocol ensures restorations only occur with bystander signatures.
- Usability-Preserving Synthetic Replacement: Landmark-driven, mobile-optimized face replacement to maintain wearer experience without compromising privacy.
- Working Prototype: Full implementation on Raspberry Pi 4 + companion Android app.
- Dataset: 16,500 annotated frames collected with Ray-Ban Meta hardware (released with this repo).
- User Study: A comprehensive qualitative evaluation involving 9 camera-glass wearers and 9 bystanders.
The repository code and assets are mapped to the paper’s three-tier architecture:
| Component | Description |
|---|---|
main.py |
On-Device Pipeline: Handles face detection, landmark extraction, convex-hull blurring, per-stream AES key generation, and encryption of face packets/embeddings. |
Synthetic Replacement/ |
Companion Pipeline: Warping and MobileFaceSwap refinement for synthetic face generation. |
decryption/ |
Restoration & TTP: Includes ttp_code_cosine.py (server-side matching of encrypted embeddings) and restore.py (companion phone restoration). |
SITARA_eval.ipynb |
Evaluation: Full accuracy evaluation framework. |
We release a sampled subset (16,500 annotated frames) captured with Ray-Ban Meta-style glasses. 🔗 Download Dataset Here
The dataset includes:
video_frames_mapping.csv: Mapping of video filenames to extracted frame numbers.Annotated XMLs/: Manually annotated XML files ({VideoName}_Frame{FrameNumber}_output.xml).Annotated JSONs/: Manually annotated JSON files compatible with COCO metrics.- Categorized Folders:
Movement of Faces/: Videos categorized by subject movement.Num of Faces/: Videos categorized by density (0–5 faces).Size of Faces/: Videos categorized by face size (Close, Medium, Far).
The repository has been thoroughly tested on Windows 11 Pro with Python 3.12.1 and 3.12.3. Please ensure your system has installed both the gcc and g++ compilers. You can check this by executing the following commands.
gcc --version
g++ --versiongit clone <repository-url>
cd SmartGlassesPrivacy
pip install -r requirements.txtCreate the necessary input/output directories and place your source video in input/:
mkdir input
mkdir output
# Place your video.mp4 inside input/Edit the Config class in main.py to point to your specific video and adjust parameters:
class Config:
input_video_path = "./input/video.mp4"
output_video_path = "./output/video.mp4"
OVERLAY_DETECTOR_BOX = False # Set True for debugging
SAVE_OUTPUT = True
# ...other config options...Run the main pipeline to generate the blurred video, encrypted metadata, face embeddings, and landmark files:
python main.pyTroubleshooting VideoWriter: If the blurred video doesn't save, modify the codec in src/utils:136:
fourcc = cv2.VideoWriter_fourcc(*'avc1')Try alternatives like *'mp4v', *'XVID', or *'MJPG'. Note: XVID/MJPG require .avi extension.
Note: main.py uses a sequential demo approach for easy prototyping. For the actual concurrency testing described in the paper (using the 3-queue model), please refer to encryption/performance_eval_rpi.py.
Warp a video with a synthetic face and refine it using MobileFaceSwap.
Prerequisites:
- Ensure the MobileFaceSwap checkpoints are correctly placed inside the
Synthetic Replacement/MobileFaceSwap/directory. The checkpoints can be downloaded from the official MobileFaceSwap repository through their provided drive link: https://github.com/Seanseattle/MobileFaceSwap
Run Command:
cd "Synthetic Replacement"
python main.py \
--video "/absolute/path/to/your_video.mp4" \
--landmarks_root "/absolute/path/to/video_landmarks" \
--after_swapOutput: The script generates the warped video and the final swapped result in the repo directory.
To demo the consent-based restoration, navigate to the decryption folder.
This script decrypts embeddings, matches them against a local "database" (images in the output folder), and generates keys for valid matches:
cd decryption
python ttp_code_cosine.pyUsing the keys released by the TTP, restore the original faces:
python restore.pyOptional utility: Run decrypt_face_blobs_per_id.py to inspect decrypted face regions as standalone JPEGs.
We provide a Jupyter Notebook to reproduce our COCO-style metrics (AP/AR):
- Ensure Ground Truth JSONs are in
Annotated JSONs/ - Ensure Prediction JSONs are in
output/frame_json/ - Run:
jupyter notebook SITARA_eval.ipynb
Values in bold indicate the best performance in the comparison.
| Category | Sub-Category | Our Pipeline (AP) | Our Pipeline (AR) | EgoBlur (AP) | EgoBlur (AR) |
|---|---|---|---|---|---|
| Number of Faces | One Face | 0.990 | 0.997 | 0.932 | 1.000 |
| Two Face | 0.967 | 0.979 | 0.963 | 0.982 | |
| Three Face | 0.979 | 0.982 | 0.974 | 0.981 | |
| Four Face | 0.900 | 0.904 | 0.909 | 0.934 | |
| Five Face | 0.882 | 0.917 | 0.952 | 0.969 | |
| Movement State | Rest | 0.990 | 0.998 | 0.971 | 0.999 |
| Head | 0.920 | 0.947 | 0.925 | 0.980 | |
| Bystander | 0.959 | 0.968 | 0.961 | 0.983 | |
| Face Size | Far | 1.000 | 1.000 | 0.930 | 1.000 |
| Medium | 0.990 | 0.992 | 0.856 | 0.993 | |
| Close | 0.990 | 0.997 | 0.989 | 1.000 |
| Method | CCV2 (AP) | CCV2 (AR) | Custom Dataset (AP) | Custom Dataset (AR) |
|---|---|---|---|---|
| Our Pipeline | 0.98 | 0.99 | 0.9421 | 0.9531 |
| EgoBlur | 0.99 | 0.99 | 0.9354 | 0.9736 |
The following figure highlights the stability of Tier 1 Blurring Accuracy across detector confidence thresholds
The evaluation.py file inside the Synthetic Replacement folder compares our pipeline with the baseline (MFS) across 5 different metrics. Modify the folder path to include the original videos, MFS videos, and videos processed by our pipeline.
The figure below shows a breakdown of synthetic face replacement metrics by category.
The following table shows a summarized version of the results.
| Metric | Baseline | Our Pipeline (Sitara) | Theoretical Range |
|---|---|---|---|
| FID ↓ | 31.00 ± 13.83 | 63.70 ± 27.78 | ≥ 0 |
| SSIM ↑ | 0.76 ± 0.07 | 0.61 ± 0.07 | [0, 1] |
| PSNR (dB) ↑ | 15.87 ± 2.77 | 12.85 ± 1.95 | [0, ∞) |
| LPIPS ↓ | 0.14 ± 0.06 | 0.27 ± 0.07 | [0, 1] |
| Landmark Dist. ↓ | 8.81 ± 3.99 | 15.94 ± 7.13 | [0, ∞) |
System latency and energy overheads are reported on live videos recorded using RPI Camera Module 1.3. The measurement workbench is shown here:
| Metric | Baseline | Average (Privacy Only) | Average (Privacy + Synthetic) |
|---|---|---|---|
| Storage | 56.16 MB | — | 69.33 MB |
| Energy | 40.00 J | 67.04 J | 112.05 J |
| Latency | 9.98 s | 13.69 s | 22.88 s |
| Scene type | Storage (MB) | Energy (J) | Latency (s) |
|---|---|---|---|
| Close | 88.20 | 91.28 | 18.45 |
| Medium | 71.90 | 83.84 | 17.31 |
| Far | 65.00 | 78.49 | 15.59 |
| Head | 62.10 | 84.60 | 16.99 |
| Bystander | 66.50 | 86.49 | 17.40 |
| Rest | 72.70 | 83.14 | 17.43 |
| Category | Storage (MB) | Energy (J) | Latency (s) |
|---|---|---|---|
| 0 Face | 55.00 | 49.66 | 10.02 |
| 1 Face | 70.30 | 85.24 | 17.89 |
| 2 Face | 67.90 | 117.44 | 24.04 |
| 3 Face | 69.60 | 149.28 | 30.89 |
| 4 Face | 69.90 | 192.27 | 39.60 |
| 5 Face | 72.80 | 242.81 | 48.88 |
The figure below shows Power Consumption traces for each category on RPi 4 B:
On-device face blurring and encryption processes each frame through: Face Detector → Landmark Detector → Blurring + Encryption. The highest computational cost is running full-frame face detection.
The figure below highlights the detector inference cost in terms of Power (W) and Time (ms) on RPi 4 Model B:
To reduce this computational bottleneck, we utilize a frame skip strategy with optical flow tracking between frames. The figures below show the Accuracy-Latency tradeoff across skip values:
Synthetic Replacement applies landmark-driven replacement on blurred inputs. We compare our pipeline with a baseline where synthetic replacements are applied on unblurred faces as a target upper-bound.
The figure below visually demonstrates Tier 2 synthetic replacement accuracy for our pipeline compared with baselines:
We implement the consent-based restoration flow through a simulated server architecture:
-
main.py- Executes the code for the camera glasses, generating blurred video and encrypted data. -
decryption/transmit_data_to_phone.py- Transmits encrypted data to the phone and subsequently encrypted keys and embeddings to the TTP (placed in decryption folder). -
decryption/ttp_code_cosine.py- Performs the matching on the TTP server. -
decryption/transmit_keys_to_phone.py- Transmits the decrypted keys back to the companion phone (assuming consent is granted). -
Companion Phone Application - Listens for this data and uses the key(s) to decrypt the data and restore the decrypted regions back into the blurred video.
Prerequisites:
- The companion Android application should be running for both transmit files.
- Ensure all encrypted data is generated from
main.pybefore starting the restoration process.
Execution Order:
# Step 1: Generate encrypted data
python main.py
# Step 2: Transmit to phone and TTP
cd decryption
python transmit_data_to_phone.py
# Step 3: TTP performs matching
python ttp_code_cosine.py
# Step 4: Transmit keys back to phone
python transmit_keys_to_phone.pyThe Android application implementation can be found at this drive link: https://drive.google.com/drive/u/4/folders/1MIjSEbBOurB1UHyRVYXc_2DuNivU9QtL
We conducted a qualitative study (N=18) on wearers' and bystanders' perceptions of opt-in, privacy-by-default approaches for camera glasses. Participants interacted with the protocol interface and discussed their perceptions in semi-structured interviews. Our findings show that bystanders viewed the opt-in protocol as essential and advocated for even stronger anonymization. Wearers appreciated the protocol's safeguards but found it visually limiting, expressing a desire for a context-dependent version that can be enabled in relevant scenarios.
We recruit participants from our local university group. The following figure shows an overview of our recruitment and assignment workflow, including the number of participants at each stage.
The study employed a two-phase design with two participant groups: wearers and bystanders. Wearers were provided with the Meta Ray-Ban Stories glasses during a one-week onboarding period. Bystander participants proceeded directly to the in-person interview. Participants recorded short videos using the glasses, with footage processed through our protocol to generate blurred and AI-replaced versions. The following figure shows a screenshot of the protocol interface used by the participants.
We structure our exploration to examine the following research questions:
- RQ1 examines bystander privacy needs in opt-in approaches
- RQ2 investigates wearer usability requirements
Our findings reveal that consent mechanisms introduce complex social dynamics, obfuscation effectiveness depends heavily on context, and both stakeholder groups balance competing priorities: bystanders emphasize privacy protection while wearers prioritize usability and recording capability.
This divide goes beyond simple preference differences and represents distinct frameworks for understanding privacy in the age of ubiquitous recording.
Bystanders approached the protocol from a rights-based perspective, viewing its protections as essential safeguards rather than optional features.
Wearers evaluated the protocol through a social and practical lens. While they acknowledged the protocol's value in providing social license to record and reducing ethical burden.
These opposing frameworks suggest that successful privacy-mediating technologies must somehow reconcile rights-based and pragmatic perspectives without simply defaulting to restrictive approaches, which would limit adoption, or permissive ones, which fail to address legitimate privacy concerns.
Context Dependent Application: Wearers seek contextual flexibility while bystanders require mandatory protection. Future systems should explore context-dependent ways of enabling or disabling the protocol, i.e., the protocol could be relaxed in familiar private locations (e.g., a wearer's home) but mandatory in public spaces.
Mitigating Consent Fatigue: Meaningful consent inherently introduces fatigue for both wearers and bystanders. Future systems should allow bystanders to specify contextual constraints and preference settings.
Mitigating TTP-Associated Risks: Bystanders require context to make informed privacy decisions, yet excessive metadata increases third-party exposure. Progressive disclosure balances these needs by providing minimal initial info while allowing bystanders to request more context as required. Future work can also explore decentralized architectures.
Incentives for Manufacturers: Current camera glass manufacturers focus primarily on wearer usability while neglecting these social barriers that limit adoption. Our work points to an important value proposition for manufacturers: while the protocol introduces some operational overhead, it legitimizes recording practices and reduces the social stigma attached to camera glasses.
This work has been published at the IEEE International Conference on Pervasive Computing and Communications (PerCom 2026) and the ACM CHI Conference on Human Factors in Computing Systems (CHI 2026).
If you find our code, dataset, or qualitative exploration of opt-in privacy useful, please use the following citations:
@inproceedings{khawaja2026now,
title={{Now You See Me, Now You Don’t: Consent-Driven Privacy for Smart Glasses}},
author={Khawaja, Yahya and Nabeel, Eman and Humayun, Sana and Javed, Eruj and Krombholz, Katharina and Alizai, Hamad and Bhatti, Naveed},
booktitle={Proceedings of the IEEE International Conference on Pervasive Computing and Communications (PerCom)},
year={2026},
publisher = {IEEE},
}
@inproceedings{khawaja2026see,
title={{See Me If You Can: A Multi-Layer Protocol for Bystander Privacy with Consent-Based Restoration}},
author={Khawaja, Yahya and Rehman, Shirin and Ponticello, Alexander and Bhardwaj, Divyanshu and Krombholz, Katharina and Alizai, Hamad and Bhatti, Naveed},
booktitle={Proceedings of the Conference on Human Factors in Computing Systems (CHI)},
year={2026},
publisher = {ACM}
}














