Skip to content

Commit be76fed

Browse files
author
Alex J Lennon
committed
Merge upstream main from sseanliu/VisionClaw
- Resolved .gitignore: keep fork build/QR/Python entries, add upstream Android entries - Resolved README: keep fork Quick Start (Linux install, QR config, samples/ paths) Made-with: Cursor
2 parents 045d1e7 + 4472c8e commit be76fed

70 files changed

Lines changed: 6227 additions & 35 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,10 @@
1111
# Python
1212
__pycache__/
1313
*.pyc
14+
15+
# Android (upstream)
16+
samples/CameraAccessAndroid/app/src/main/java/**/Secrets.kt
17+
samples/CameraAccessAndroid/local.properties
18+
samples/CameraAccessAndroid/.gradle/
19+
samples/CameraAccessAndroid/build/
20+
samples/CameraAccessAndroid/app/build/

README.md

Lines changed: 157 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# VisionClaw 🦞+😎
1+
# VisionClaw
22

33
[![iOS Build](https://github.com/DynamicDevices/VisionClaw/actions/workflows/ios-build.yml/badge.svg)](https://github.com/DynamicDevices/VisionClaw/actions/workflows/ios-build.yml)
44
[![License](https://img.shields.io/badge/license-Meta%20Developer%20Terms-blue.svg)](LICENSE)
@@ -11,7 +11,9 @@ A real-time AI assistant for Meta Ray-Ban smart glasses. See what you see, hear
1111

1212
![Cover](assets/cover.png)
1313

14-
Built on [Meta Wearables DAT SDK](https://github.com/facebook/meta-wearables-dat-ios) + [Gemini Live API](https://ai.google.dev/gemini-api/docs/live) + [OpenClaw](https://github.com/nichochar/openclaw) (optional).
14+
Built on [Meta Wearables DAT SDK](https://github.com/facebook/meta-wearables-dat-ios) (iOS) / [DAT Android SDK](https://github.com/nichochar/openclaw) (Android) + [Gemini Live API](https://ai.google.dev/gemini-api/docs/live) + [OpenClaw](https://github.com/nichochar/openclaw) (optional).
15+
16+
**Supported platforms:** iOS (iPhone) and Android (Pixel, Samsung, etc.)
1517

1618
**📱 Quick Links:** [Development Guide](docs/DEVELOPMENT.md) | [Original Upstream](https://github.com/sseanliu/VisionClaw)
1719

@@ -31,25 +33,25 @@ The glasses camera streams at ~1fps to Gemini for visual context, while audio fl
3133
![How It Works](assets/how.png)
3234

3335
```
34-
Meta Ray-Ban Glasses (or iPhone camera)
36+
Meta Ray-Ban Glasses (or phone camera)
3537
|
3638
| video frames + mic audio
3739
v
38-
iOS App (this project)
40+
iOS / Android App (this project)
3941
|
4042
| JPEG frames (~1fps) + PCM audio (16kHz)
4143
v
4244
Gemini Live API (WebSocket)
4345
|
44-
|-- Audio response (PCM 24kHz) --> iOS App --> Speaker
45-
|-- Tool calls (execute) -------> iOS App --> OpenClaw Gateway
46-
| |
47-
| v
48-
| 56+ skills: web search,
49-
| messaging, smart home,
50-
| notes, reminders, etc.
51-
| |
52-
|<---- Tool response (text) <----- iOS App <-------+
46+
|-- Audio response (PCM 24kHz) --> App --> Speaker
47+
|-- Tool calls (execute) -------> App --> OpenClaw Gateway
48+
| |
49+
| v
50+
| 56+ skills: web search,
51+
| messaging, smart home,
52+
| notes, reminders, etc.
53+
| |
54+
|<---- Tool response (text) <----- App <-------+
5355
|
5456
v
5557
Gemini speaks the result
@@ -58,9 +60,12 @@ Gemini Live API (WebSocket)
5860
**Key pieces:**
5961
- **Gemini Live** -- real-time voice + vision AI over WebSocket (native audio, not STT-first)
6062
- **OpenClaw** (optional) -- local gateway that gives Gemini access to 56+ tools and all your connected apps
61-
- **iPhone mode** -- test the full pipeline using your iPhone camera instead of glasses
63+
- **Phone mode** -- test the full pipeline using your phone camera instead of glasses
64+
- **WebRTC streaming** -- share your glasses POV live to a browser viewer
65+
66+
---
6267

63-
## Quick Start
68+
## Quick Start (iOS)
6469

6570
### Installing on iPhone (Linux/No Mac Required) 🐧
6671

@@ -121,7 +126,7 @@ Done! Your API key is now configured without rebuilding. See [QR Code Configurat
121126

122127
**Option B: Build from Source (Requires Xcode)**
123128

124-
Get a free API key at [Google AI Studio](https://aistudio.google.com/apikey).
129+
Copy the example file and fill in your values:
125130

126131
Create `samples/CameraAccess/CameraAccess/Secrets.swift` from the template:
127132

@@ -165,6 +170,65 @@ Then in VisionClaw:
165170
1. Tap **"Start Streaming"** in the app
166171
2. Tap the **AI button** for voice + vision conversation
167172

173+
---
174+
175+
## Quick Start (Android)
176+
177+
### 1. Clone and open
178+
179+
```bash
180+
git clone https://github.com/sseanliu/VisionClaw.git
181+
```
182+
183+
Open `samples/CameraAccessAndroid/` in Android Studio.
184+
185+
### 2. Configure GitHub Packages (DAT SDK)
186+
187+
The Meta DAT Android SDK is distributed via GitHub Packages. You need a GitHub Personal Access Token with `read:packages` scope.
188+
189+
1. Go to [GitHub > Settings > Developer Settings > Personal Access Tokens](https://github.com/settings/tokens) and create a **classic** token with `read:packages` scope
190+
2. In `samples/CameraAccessAndroid/local.properties`, add:
191+
192+
```properties
193+
github_token=YOUR_GITHUB_TOKEN
194+
```
195+
196+
> **Tip:** If you have the `gh` CLI installed, you can run `gh auth token` to get a valid token. Make sure it has `read:packages` scope -- if not, run `gh auth refresh -s read:packages`.
197+
>
198+
> **Note:** GitHub Packages requires authentication even for public repositories. The 401 error means your token is missing or invalid.
199+
200+
### 3. Add your secrets
201+
202+
```bash
203+
cd samples/CameraAccessAndroid/app/src/main/java/com/meta/wearable/dat/externalsampleapps/cameraaccess/
204+
cp Secrets.kt.example Secrets.kt
205+
```
206+
207+
Edit `Secrets.kt` with your [Gemini API key](https://aistudio.google.com/apikey) (required) and optional OpenClaw/WebRTC config.
208+
209+
### 4. Build and run
210+
211+
1. Let Gradle sync in Android Studio (it will download the DAT SDK from GitHub Packages)
212+
2. Select your Android phone as the target device
213+
3. Click Run (Shift+F10)
214+
215+
> **Wireless debugging:** You can also install via ADB wirelessly. Enable **Wireless debugging** in your phone's Developer Options, then pair with `adb pair <ip>:<port>`.
216+
217+
### 5. Try it out
218+
219+
**Without glasses (Phone mode):**
220+
1. Tap **"Start on Phone"** -- uses your phone's back camera
221+
2. Tap the **AI button** (sparkle icon) to start a Gemini Live session
222+
3. Talk to the AI -- it can see through your phone camera
223+
224+
**With Meta Ray-Ban glasses:**
225+
226+
Enable Developer Mode in the Meta AI app (same steps as iOS above), then:
227+
1. Tap **"Start Streaming"** in the app
228+
2. Tap the **AI button** for voice + vision conversation
229+
230+
---
231+
168232
## Setup: OpenClaw (Optional)
169233

170234
OpenClaw gives Gemini the ability to take real-world actions: send messages, search the web, manage lists, control smart home devices, and more. Without it, Gemini is voice + vision only.
@@ -194,22 +258,30 @@ In `~/.openclaw/openclaw.json`:
194258
```
195259

196260
Key settings:
197-
- `bind: "lan"` -- exposes the gateway on your local network so your iPhone can reach it
261+
- `bind: "lan"` -- exposes the gateway on your local network so your phone can reach it
198262
- `chatCompletions.enabled: true` -- enables the `/v1/chat/completions` endpoint (off by default)
199-
- `auth.token` -- the token your iOS app will use to authenticate
200-
201-
### 2. Configure the iOS app
263+
- `auth.token` -- the token your app will use to authenticate
202264

203-
In `GeminiConfig.swift`, update the OpenClaw settings:
265+
### 2. Configure the app
204266

267+
**iOS** -- In `Secrets.swift`:
205268
```swift
206-
static let openClawHost = "http://Your-Mac.local" // your Mac's Bonjour hostname
269+
static let openClawHost = "http://Your-Mac.local"
207270
static let openClawPort = 18789
208-
static let openClawGatewayToken = "your-gateway-token-here" // must match gateway.auth.token
271+
static let openClawGatewayToken = "your-gateway-token-here"
272+
```
273+
274+
**Android** -- In `Secrets.kt`:
275+
```kotlin
276+
const val openClawHost = "http://Your-Mac.local"
277+
const val openClawPort = 18789
278+
const val openClawGatewayToken = "your-gateway-token-here"
209279
```
210280

211281
To find your Mac's Bonjour hostname: **System Settings > General > Sharing** -- it's shown at the top (e.g., `Johns-MacBook-Pro.local`).
212282

283+
> Both iOS and Android also have an in-app Settings screen where you can change these values at runtime without editing source code.
284+
213285
### 3. Start the gateway
214286

215287
```bash
@@ -224,9 +296,11 @@ curl http://localhost:18789/health
224296

225297
Now when you talk to the AI, it can execute tasks through OpenClaw.
226298

299+
---
300+
227301
## Architecture
228302

229-
### Key Files
303+
### Key Files (iOS)
230304

231305
All source code is in `samples/CameraAccess/CameraAccess/`:
232306

@@ -240,22 +314,43 @@ All source code is in `samples/CameraAccess/CameraAccess/`:
240314
| `OpenClaw/OpenClawBridge.swift` | HTTP client for OpenClaw gateway |
241315
| `OpenClaw/ToolCallRouter.swift` | Routes Gemini tool calls to OpenClaw |
242316
| `iPhone/IPhoneCameraManager.swift` | AVCaptureSession wrapper for iPhone camera mode |
317+
| `WebRTC/WebRTCClient.swift` | WebRTC peer connection + SDP negotiation |
318+
| `WebRTC/SignalingClient.swift` | WebSocket signaling for WebRTC rooms |
319+
320+
### Key Files (Android)
321+
322+
All source code is in `samples/CameraAccessAndroid/app/src/main/java/.../cameraaccess/`:
323+
324+
| File | Purpose |
325+
|------|---------|
326+
| `gemini/GeminiConfig.kt` | API keys, model config, system prompt |
327+
| `gemini/GeminiLiveService.kt` | OkHttp WebSocket client for Gemini Live API |
328+
| `gemini/AudioManager.kt` | AudioRecord (16kHz) + AudioTrack (24kHz) |
329+
| `gemini/GeminiSessionViewModel.kt` | Session lifecycle, tool call wiring, UI state |
330+
| `openclaw/ToolCallModels.kt` | Tool declarations, data classes |
331+
| `openclaw/OpenClawBridge.kt` | OkHttp HTTP client for OpenClaw gateway |
332+
| `openclaw/ToolCallRouter.kt` | Routes Gemini tool calls to OpenClaw |
333+
| `phone/PhoneCameraManager.kt` | CameraX wrapper for phone camera mode |
334+
| `webrtc/WebRTCClient.kt` | WebRTC peer connection (stream-webrtc-android) |
335+
| `webrtc/SignalingClient.kt` | OkHttp WebSocket signaling for WebRTC rooms |
336+
| `settings/SettingsManager.kt` | SharedPreferences with Secrets.kt fallback |
243337

244338
### Audio Pipeline
245339

246-
- **Input**: iPhone mic -> AudioManager (PCM Int16, 16kHz mono, 100ms chunks) -> Gemini WebSocket
247-
- **Output**: Gemini WebSocket -> AudioManager playback queue -> iPhone speaker
248-
- **iPhone mode**: Uses `.voiceChat` audio session for echo cancellation + mic gating during AI speech
249-
- **Glasses mode**: Uses `.videoChat` audio session (mic is on glasses, speaker is on phone -- no echo)
340+
- **Input**: Phone mic -> AudioManager (PCM Int16, 16kHz mono, 100ms chunks) -> Gemini WebSocket
341+
- **Output**: Gemini WebSocket -> AudioManager playback queue -> Phone speaker
342+
- **iOS iPhone mode**: Uses `.voiceChat` audio session for echo cancellation + mic gating during AI speech
343+
- **iOS Glasses mode**: Uses `.videoChat` audio session (mic is on glasses, speaker is on phone -- no echo)
344+
- **Android**: Uses `VOICE_COMMUNICATION` audio source for built-in acoustic echo cancellation
250345

251346
### Video Pipeline
252347

253-
- **Glasses**: DAT SDK `videoFramePublisher` (24fps) -> throttle to ~1fps -> JPEG (50% quality) -> Gemini
254-
- **iPhone**: `AVCaptureSession` back camera (30fps) -> throttle to ~1fps -> JPEG -> Gemini
348+
- **Glasses**: DAT SDK video stream (24fps) -> throttle to ~1fps -> JPEG (50% quality) -> Gemini
349+
- **Phone**: Camera capture (30fps) -> throttle to ~1fps -> JPEG -> Gemini
255350

256351
### Tool Calling
257352

258-
Gemini Live supports function calling. This app declares a single `execute` tool that routes everything through OpenClaw:
353+
Gemini Live supports function calling. Both apps declare a single `execute` tool that routes everything through OpenClaw:
259354

260355
1. User says "Add eggs to my shopping list"
261356
2. Gemini speaks "Sure, adding that now" (verbal acknowledgment before tool call)
@@ -265,25 +360,52 @@ Gemini Live supports function calling. This app declares a single `execute` tool
265360
6. Result returns to Gemini via `toolResponse`
266361
7. Gemini speaks the confirmation
267362

363+
---
364+
268365
## Requirements
269366

367+
### iOS
270368
- iOS 17.0+
271369
- Xcode 15.0+
272370
- Gemini API key ([get one free](https://aistudio.google.com/apikey))
273371
- Meta Ray-Ban glasses (optional -- use iPhone mode for testing)
274372
- OpenClaw on your Mac (optional -- for agentic actions)
275373

374+
### Android
375+
- Android 14+ (API 34+)
376+
- Android Studio Ladybug or newer
377+
- GitHub account with `read:packages` token (for DAT SDK)
378+
- Gemini API key ([get one free](https://aistudio.google.com/apikey))
379+
- Meta Ray-Ban glasses (optional -- use Phone mode for testing)
380+
- OpenClaw on your Mac (optional -- for agentic actions)
381+
382+
---
383+
276384
## Troubleshooting
277385

278-
**"Gemini API key not configured"** -- Open `GeminiConfig.swift` and add your API key.
386+
### General
387+
388+
**Gemini doesn't hear me** -- Check that microphone permission is granted. The app uses aggressive voice activity detection -- speak clearly and at normal volume.
279389

280-
**OpenClaw connection timeout** -- Make sure your iPhone and Mac are on the same Wi-Fi network, the gateway is running (`openclaw gateway restart`), and the hostname in `GeminiConfig.swift` matches your Mac's Bonjour name.
390+
**OpenClaw connection timeout** -- Make sure your phone and Mac are on the same Wi-Fi network, the gateway is running (`openclaw gateway restart`), and the hostname matches your Mac's Bonjour name.
391+
392+
**OpenClaw opens duplicate browser tabs** -- This is a known upstream issue in OpenClaw's CDP (Chrome DevTools Protocol) connection management ([#13851](https://github.com/nichochar/openclaw/issues/13851), [#12317](https://github.com/nichochar/openclaw/issues/12317)). Using `profile: "openclaw"` (managed Chrome) instead of the default extension relay may improve stability.
393+
394+
### iOS-specific
395+
396+
**"Gemini API key not configured"** -- Add your API key in Secrets.swift or in the in-app Settings.
281397

282398
**Echo/feedback in iPhone mode** -- The app mutes the mic while the AI is speaking. If you still hear echo, try turning down the volume.
283399

284-
**Gemini doesn't hear me** -- Check that microphone permission is granted. The app uses aggressive voice activity detection -- speak clearly and at normal volume.
400+
### Android-specific
401+
402+
**Gradle sync fails with 401 Unauthorized** -- Your GitHub token is missing or doesn't have `read:packages` scope. Check `local.properties` for `gpr.user` and `gpr.token`. Generate a new token at [github.com/settings/tokens](https://github.com/settings/tokens).
403+
404+
**Gemini WebSocket times out** -- The Gemini Live API sends binary WebSocket frames. If you're building a custom client, make sure to handle both text and binary frame types.
405+
406+
**Audio not working** -- Ensure `RECORD_AUDIO` permission is granted. On Android 13+, you may need to grant this permission manually in Settings > Apps.
285407

286-
**OpenClaw opens duplicate browser tabs** -- This is a known upstream issue in OpenClaw's CDP (Chrome DevTools Protocol) connection management ([#13851](https://github.com/nichochar/openclaw/issues/13851), [#12317](https://github.com/nichochar/openclaw/issues/12317)). The browser control service loses track of existing tabs after navigation, falling back to opening new ones. Using `profile: "openclaw"` (managed Chrome) instead of the default extension relay may improve stability.
408+
**Phone camera not starting** -- Ensure `CAMERA` permission is granted. CameraX requires both the permission and a valid lifecycle.
287409

288410
For DAT SDK issues, see the [developer documentation](https://wearables.developer.meta.com/docs/develop/) or the [discussions forum](https://github.com/facebook/meta-wearables-dat-ios/discussions).
289411

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
*.iml
2+
.gradle
3+
.kotlin
4+
/local.properties
5+
/.idea
6+
.DS_Store
7+
/build
8+
/captures
9+
.externalNativeBuild
10+
.cxx
11+
local.properties
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Camera Access App
2+
3+
A sample Android application demonstrating integration with Meta Wearables Device Access Toolkit. This app showcases streaming video from Meta AI glasses, capturing photos, and managing connection states.
4+
5+
## Features
6+
7+
- Connect to Meta AI glasses
8+
- Stream camera feed from the device
9+
- Capture photos from glasses
10+
- Share captured photos
11+
12+
## Prerequisites
13+
14+
- Android Studio Arctic Fox (2021.3.1) or newer
15+
- JDK 11 or newer
16+
- Android SDK 31+ (Android 12.0+)
17+
- Meta Wearables Device Access Toolkit (included as a dependency)
18+
- A Meta AI glasses device for testing (optional for development)
19+
20+
## Building the app
21+
22+
### Using Android Studio
23+
24+
1. Clone this repository
25+
1. Open the project in Android Studio
26+
1. Add your personal access token (classic) to the `local.properties` file (see [SDK for Android setup](https://wearables.developer.meta.com/docs/getting-started-toolkit/#sdk-for-android-setup))
27+
1. Click **File** > **Sync Project with Gradle Files**
28+
1. Click **Run** > **Run...** > **app**
29+
30+
## Running the app
31+
32+
1. Turn 'Developer Mode' on in the Meta AI app.
33+
1. Launch the app.
34+
1. Press the "Connect" button to complete app registration.
35+
1. Once connected, the camera stream from the device will be displayed
36+
1. Use the on-screen controls to:
37+
- Capture photos
38+
- View and save captured photos
39+
- Disconnect from the device
40+
41+
## Troubleshooting
42+
43+
For issues related to the Meta Wearables Device Access Toolkit, please refer to the [developer documentation](https://wearables.developer.meta.com/docs/develop/) or visit our [discussions forum](https://github.com/facebook/meta-wearables-dat-android/discussions)
44+
45+
## License
46+
47+
This source code is licensed under the license found in the LICENSE file in the root directory of this source tree.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/build

0 commit comments

Comments
 (0)