🕶️ Consent-Driven Privacy for Smart Glasses

This repository contains the reference implementation and evaluation of SITARA, a privacy-by-default, three-tier system designed to protect bystanders while preserving utility for consenting parties.

The system enforces on-device blurring at capture, storing only encrypted face packets and embeddings so raw facial pixels are never exposed. It supports landmark-driven synthetic face replacements on a companion phone for immediate wearer utility. When a bystander explicitly consents, the system uses a cryptographic, consent-mediated split-key protocol with a Trusted Third Party (TTP) to restore the original face.

The prototype runs in real-time on Raspberry Pi 4 hardware, and this repository includes the full pipeline, evaluation scripts, and a novel dataset.

Highlights & Contributions

Privacy at Capture: Mandatory on-device blurring with per-face encrypted packets.
Reversible & Consented Restoration: A TTP-mediated split-key protocol ensures restorations only occur with bystander signatures.
Usability-Preserving Synthetic Replacement: Landmark-driven, mobile-optimized face replacement to maintain wearer experience without compromising privacy.
Working Prototype: Full implementation on Raspberry Pi 4 + companion Android app.
Dataset: 16,500 annotated frames collected with Ray-Ban Meta hardware (released with this repo).
User Study: A comprehensive qualitative evaluation involving 9 camera-glass wearers and 9 bystanders.

Architecture

The repository code and assets are mapped to the paper’s three-tier architecture:

Component	Description
`main.py`	On-Device Pipeline: Handles face detection, landmark extraction, convex-hull blurring, per-stream AES key generation, and encryption of face packets/embeddings.
`Synthetic Replacement/`	Companion Pipeline: Warping and MobileFaceSwap refinement for synthetic face generation.
`decryption/`	Restoration & TTP: Includes `ttp_code_cosine.py` (server-side matching of encrypted embeddings) and `restore.py` (companion phone restoration).
`SITARA_eval.ipynb`	Evaluation: Full accuracy evaluation framework.

📂 Dataset Contents

We release a sampled subset (16,500 annotated frames) captured with Ray-Ban Meta-style glasses. 🔗 Download Dataset Here

The dataset includes:

video_frames_mapping.csv: Mapping of video filenames to extracted frame numbers.
Annotated XMLs/: Manually annotated XML files ({VideoName}_Frame{FrameNumber}_output.xml).
Annotated JSONs/: Manually annotated JSON files compatible with COCO metrics.
Categorized Folders:
- Movement of Faces/: Videos categorized by subject movement.
- Num of Faces/: Videos categorized by density (0–5 faces).
- Size of Faces/: Videos categorized by face size (Close, Medium, Far).

⚙️ Setup & Installation

The repository has been thoroughly tested on Windows 11 Pro with Python 3.12.1 and 3.12.3. Please ensure your system has installed both the gcc and g++ compilers. You can check this by executing the following commands.

gcc --version
g++ --version

1. Clone & Install

git clone <repository-url>
cd SmartGlassesPrivacy
pip install -r requirements.txt

2. Directory Setup

Create the necessary input/output directories and place your source video in input/:

mkdir input
mkdir output
# Place your video.mp4 inside input/

3. Configuration

Edit the Config class in main.py to point to your specific video and adjust parameters:

class Config:
    input_video_path = "./input/video.mp4"
    output_video_path = "./output/video.mp4"
    OVERLAY_DETECTOR_BOX = False # Set True for debugging
    SAVE_OUTPUT = True
    # ...other config options...

Usage

1. On-Device Capture (Blurring & Encryption)

Run the main pipeline to generate the blurred video, encrypted metadata, face embeddings, and landmark files:

python main.py

Troubleshooting VideoWriter: If the blurred video doesn't save, modify the codec in src/utils:136:

fourcc = cv2.VideoWriter_fourcc(*'avc1')

Try alternatives like *'mp4v', *'XVID', or *'MJPG'. Note: XVID/MJPG require .avi extension.

Note: main.py uses a sequential demo approach for easy prototyping. For the actual concurrency testing described in the paper (using the 3-queue model), please refer to encryption/performance_eval_rpi.py.

2. Synthetic Replacement

Warp a video with a synthetic face and refine it using MobileFaceSwap.

Prerequisites:

Ensure the MobileFaceSwap checkpoints are correctly placed inside the Synthetic Replacement/MobileFaceSwap/ directory. The checkpoints can be downloaded from the official MobileFaceSwap repository through their provided drive link: https://github.com/Seanseattle/MobileFaceSwap

Run Command:

cd "Synthetic Replacement"
python main.py \
  --video "/absolute/path/to/your_video.mp4" \
  --landmarks_root "/absolute/path/to/video_landmarks" \
  --after_swap

Output: The script generates the warped video and the final swapped result in the repo directory.

3. Restoration (Decryption)

To demo the consent-based restoration, navigate to the decryption folder.

Step A: Simulate TTP Matching

This script decrypts embeddings, matches them against a local "database" (images in the output folder), and generates keys for valid matches:

cd decryption
python ttp_code_cosine.py

Step B: Restore Video

Using the keys released by the TTP, restore the original faces:

python restore.py

Optional utility: Run decrypt_face_blobs_per_id.py to inspect decrypted face regions as standalone JPEGs.

📊 Evaluation & Results

Accuracy Evaluation (SITARA_eval.ipynb)

We provide a Jupyter Notebook to reproduce our COCO-style metrics (AP/AR):

Ensure Ground Truth JSONs are in Annotated JSONs/
Ensure Prediction JSONs are in output/frame_json/
Run: jupyter notebook SITARA_eval.ipynb

Privacy Protection (Tier 1 Blurring)

Detailed Breakdown (AP/AR)

Values in bold indicate the best performance in the comparison.

Category	Sub-Category	Our Pipeline (AP)	Our Pipeline (AR)	EgoBlur (AP)	EgoBlur (AR)
Number of Faces	One Face	0.990	0.997	0.932	1.000
	Two Face	0.967	0.979	0.963	0.982
	Three Face	0.979	0.982	0.974	0.981
	Four Face	0.900	0.904	0.909	0.934
	Five Face	0.882	0.917	0.952	0.969
Movement State	Rest	0.990	0.998	0.971	0.999
	Head	0.920	0.947	0.925	0.980
	Bystander	0.959	0.968	0.961	0.983
Face Size	Far	1.000	1.000	0.930	1.000
	Medium	0.990	0.992	0.856	0.993
	Close	0.990	0.997	0.989	1.000

Summary: Ours vs. Baseline

Method	CCV2 (AP)	CCV2 (AR)	Custom Dataset (AP)	Custom Dataset (AR)
Our Pipeline	0.98	0.99	0.9421	0.9531
EgoBlur	0.99	0.99	0.9354	0.9736

The following figure highlights the stability of Tier 1 Blurring Accuracy across detector confidence thresholds

Synthetic Face Replacement

The evaluation.py file inside the Synthetic Replacement folder compares our pipeline with the baseline (MFS) across 5 different metrics. Modify the folder path to include the original videos, MFS videos, and videos processed by our pipeline.

Category Breakdown

The figure below shows a breakdown of synthetic face replacement metrics by category.

The following table shows a summarized version of the results.

Metric	Baseline	Our Pipeline (Sitara)	Theoretical Range
FID ↓	31.00 ± 13.83	63.70 ± 27.78	≥ 0
SSIM ↑	0.76 ± 0.07	0.61 ± 0.07	[0, 1]
PSNR (dB) ↑	15.87 ± 2.77	12.85 ± 1.95	[0, ∞)
LPIPS ↓	0.14 ± 0.06	0.27 ± 0.07	[0, 1]
Landmark Dist. ↓	8.81 ± 3.99	15.94 ± 7.13	[0, ∞)

System Cost (Storage, Energy, Latency)

System latency and energy overheads are reported on live videos recorded using RPI Camera Module 1.3. The measurement workbench is shown here:

Summary vs. Baseline

Metric	Baseline	Average (Privacy Only)	Average (Privacy + Synthetic)
Storage	56.16 MB	—	69.33 MB
Energy	40.00 J	67.04 J	112.05 J
Latency	9.98 s	13.69 s	22.88 s

System Cost breakdown per category

Detailed Breakdown by Scene Type

Scene type	Storage (MB)	Energy (J)	Latency (s)
Close	88.20	91.28	18.45
Medium	71.90	83.84	17.31
Far	65.00	78.49	15.59
Head	62.10	84.60	16.99
Bystander	66.50	86.49	17.40
Rest	72.70	83.14	17.43

Detailed Breakdown by Number of Faces

Category	Storage (MB)	Energy (J)	Latency (s)
0 Face	55.00	49.66	10.02
1 Face	70.30	85.24	17.89
2 Face	67.90	117.44	24.04
3 Face	69.60	149.28	30.89
4 Face	69.90	192.27	39.60
5 Face	72.80	242.81	48.88

Power Consumption Traces by Category

The figure below shows Power Consumption traces for each category on RPi 4 B:

System Performance Deep Dive

Tier 1: On-Device Face Blurring and Encryption

On-device face blurring and encryption processes each frame through: Face Detector → Landmark Detector → Blurring + Encryption. The highest computational cost is running full-frame face detection.

Detector Inference Cost

The figure below highlights the detector inference cost in terms of Power (W) and Time (ms) on RPi 4 Model B:

Frame Skip Strategy & Optical Flow Tracking

To reduce this computational bottleneck, we utilize a frame skip strategy with optical flow tracking between frames. The figures below show the Accuracy-Latency tradeoff across skip values:

Tier 2: Synthetic Face Replacement

Synthetic Replacement applies landmark-driven replacement on blurred inputs. We compare our pipeline with a baseline where synthetic replacements are applied on unblurred faces as a target upper-bound.

Visual Quality Comparison

The figure below visually demonstrates Tier 2 synthetic replacement accuracy for our pipeline compared with baselines:

Tier 3: Consent-Based Restoration

We implement the consent-based restoration flow through a simulated server architecture:

Restoration Pipeline Flow

main.py - Executes the code for the camera glasses, generating blurred video and encrypted data.
decryption/transmit_data_to_phone.py - Transmits encrypted data to the phone and subsequently encrypted keys and embeddings to the TTP (placed in decryption folder).
decryption/ttp_code_cosine.py - Performs the matching on the TTP server.
decryption/transmit_keys_to_phone.py - Transmits the decrypted keys back to the companion phone (assuming consent is granted).
Companion Phone Application - Listens for this data and uses the key(s) to decrypt the data and restore the decrypted regions back into the blurred video.

Running the Restoration Flow

Prerequisites:

The companion Android application should be running for both transmit files.
Ensure all encrypted data is generated from main.py before starting the restoration process.

Execution Order:

# Step 1: Generate encrypted data
python main.py

# Step 2: Transmit to phone and TTP
cd decryption
python transmit_data_to_phone.py

# Step 3: TTP performs matching
python ttp_code_cosine.py

# Step 4: Transmit keys back to phone
python transmit_keys_to_phone.py

Companion Android Application

The Android application implementation can be found at this drive link: https://drive.google.com/drive/u/4/folders/1MIjSEbBOurB1UHyRVYXc_2DuNivU9QtL

Qualitative Evaluation

We conducted a qualitative study (N=18) on wearers' and bystanders' perceptions of opt-in, privacy-by-default approaches for camera glasses. Participants interacted with the protocol interface and discussed their perceptions in semi-structured interviews. Our findings show that bystanders viewed the opt-in protocol as essential and advocated for even stronger anonymization. Wearers appreciated the protocol's safeguards but found it visually limiting, expressing a desire for a context-dependent version that can be enabled in relevant scenarios.

Methodology

We recruit participants from our local university group. The following figure shows an overview of our recruitment and assignment workflow, including the number of participants at each stage.

The study employed a two-phase design with two participant groups: wearers and bystanders. Wearers were provided with the Meta Ray-Ban Stories glasses during a one-week onboarding period. Bystander participants proceeded directly to the in-person interview. Participants recorded short videos using the glasses, with footage processed through our protocol to generate blurred and AI-replaced versions. The following figure shows a screenshot of the protocol interface used by the participants.

Results and Discussion

We structure our exploration to examine the following research questions:

RQ1 examines bystander privacy needs in opt-in approaches
RQ2 investigates wearer usability requirements

Our findings reveal that consent mechanisms introduce complex social dynamics, obfuscation effectiveness depends heavily on context, and both stakeholder groups balance competing priorities: bystanders emphasize privacy protection while wearers prioritize usability and recording capability.

This divide goes beyond simple preference differences and represents distinct frameworks for understanding privacy in the age of ubiquitous recording.

Privacy as a Non-Negotiable Right

Bystanders approached the protocol from a rights-based perspective, viewing its protections as essential safeguards rather than optional features.

Privacy as a Contextual Tool

Wearers evaluated the protocol through a social and practical lens. While they acknowledged the protocol's value in providing social license to record and reducing ethical burden.

These opposing frameworks suggest that successful privacy-mediating technologies must somehow reconcile rights-based and pragmatic perspectives without simply defaulting to restrictive approaches, which would limit adoption, or permissive ones, which fail to address legitimate privacy concerns.

Design Directives for Opt-in Privacy

Context Dependent Application: Wearers seek contextual flexibility while bystanders require mandatory protection. Future systems should explore context-dependent ways of enabling or disabling the protocol, i.e., the protocol could be relaxed in familiar private locations (e.g., a wearer's home) but mandatory in public spaces.

Mitigating Consent Fatigue: Meaningful consent inherently introduces fatigue for both wearers and bystanders. Future systems should allow bystanders to specify contextual constraints and preference settings.

Mitigating TTP-Associated Risks: Bystanders require context to make informed privacy decisions, yet excessive metadata increases third-party exposure. Progressive disclosure balances these needs by providing minimal initial info while allowing bystanders to request more context as required. Future work can also explore decentralized architectures.

Incentives for Manufacturers: Current camera glass manufacturers focus primarily on wearer usability while neglecting these social barriers that limit adoption. Our work points to an important value proposition for manufacturers: while the protocol introduces some operational overhead, it legitimizes recording practices and reduces the social stigma attached to camera glasses.

Citation

This work has been published at the IEEE International Conference on Pervasive Computing and Communications (PerCom 2026) and the ACM CHI Conference on Human Factors in Computing Systems (CHI 2026).

If you find our code, dataset, or qualitative exploration of opt-in privacy useful, please use the following citations:

@inproceedings{khawaja2026now,
  title={{Now You See Me, Now You Don’t: Consent-Driven Privacy for Smart Glasses}},
  author={Khawaja, Yahya and Nabeel, Eman and Humayun, Sana and Javed, Eruj and Krombholz, Katharina and Alizai, Hamad and Bhatti, Naveed},
  booktitle={Proceedings of the IEEE International Conference on Pervasive Computing and Communications (PerCom)},
  year={2026},
  publisher = {IEEE},
}

@inproceedings{khawaja2026see,
  title={{See Me If You Can: A Multi-Layer Protocol for Bystander Privacy with Consent-Based Restoration}},
  author={Khawaja, Yahya and Rehman, Shirin and Ponticello, Alexander and Bhardwaj, Divyanshu and Krombholz, Katharina and Alizai, Hamad and Bhatti, Naveed},
  booktitle={Proceedings of the Conference on Human Factors in Computing Systems (CHI)},
  year={2026},
  publisher = {ACM}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Synthetic Replacement		Synthetic Replacement
decryption		decryption
encryption		encryption
imgs		imgs
input/ttp_DB		input/ttp_DB
src		src
README.md		README.md
SITARA_eval.ipynb		SITARA_eval.ipynb
main.py		main.py
requirements.txt		requirements.txt

SYSNET-LUMS/SmartGlassesPrivacy

Folders and files

Latest commit

History

Repository files navigation