From b9f8063e976d637f5634e18b381227df33af9bd7 Mon Sep 17 00:00:00 2001
From: Romeo Valentin <romeov@stanford.edu>
Date: Wed, 13 Dec 2023 12:31:29 -0800
Subject: [PATCH 1/2] Add usage example.

Closes #28.
---
 README.md | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 5922938..8ee4e7c 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,22 @@
 
 **GPU integrations for Dagger.jl**
 
-DaggerGPU.jl makes use of the `Dagger.Processor` infrastructure to dispatch Dagger kernels to NVIDIA, AMD, and Apple GPUs, via CUDA.jl, AMDGPU.jl, and Metal.jl respectively. Usage is simple: `add` or `dev` DaggerGPU.jl and CUDA.jl/AMDGPU.jl/Metal.jl appropriately, load it with `using DaggerGPU`, and add `DaggerGPU.CuArrayDeviceProc`/`DaggerGPU.ROCArrayProc`/`DaggerGPU.MtlArrayDeviceProc` to your scheduler or thunk options (see Dagger.jl documentation for details on how to do this).
+DaggerGPU.jl makes use of the `Dagger.Processor` infrastructure to dispatch Dagger kernels to NVIDIA, AMD, and Apple GPUs, via CUDA.jl, AMDGPU.jl, and Metal.jl respectively. Usage is simple: `add` or `dev` DaggerGPU.jl and CUDA.jl/AMDGPU.jl/Metal.jl appropriately, load it with `using DaggerGPU`, and add the appropriate GPU scope to the Dagger scope options, for example
+
+``` julia
+using CUDA, Dagger, DaggerGPU
+sc = Dagger.scope(cuda_gpu=1)
+
+# two large matrices
+A = rand(1000, 1000); B = rand(1000, 1000)
+# move them to gpu and multiply there
+A_gpu = Dagger.@spawn scope=sc CUDA.Matrix(A); B_gpu = Dagger.@spawn scope=sc CUDA.Matrix(B)
+C_gpu = Dagger.@spawn scope=sc A_gpu*B_gpu
+# move back to cpu to use there.
+C = Dagger.@spawn scope=sc Matrix(C_gpu) 
+```
+
+and similarly for `rocm_gpu` and `metal_gpu`.
 
 DaggerGPU.jl is still experimental, but we welcome GPU-owning users to try it out and report back on any issues or sharp edges that they encounter. When filing an issue about DaggerGPU.jl, please provide:
 - The complete error message and backtrace

From da3bd1b01b72afa55b444711689f0bf4ca82537f Mon Sep 17 00:00:00 2001
From: Romeo Valentin <romeov@stanford.edu>
Date: Wed, 13 Dec 2023 13:01:23 -0800
Subject: [PATCH 2/2] Fix: Matrix -> CuMatrix.

The matrices were actually not moved to the GPU before...
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8ee4e7c..497161b 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@ sc = Dagger.scope(cuda_gpu=1)
 # two large matrices
 A = rand(1000, 1000); B = rand(1000, 1000)
 # move them to gpu and multiply there
-A_gpu = Dagger.@spawn scope=sc CUDA.Matrix(A); B_gpu = Dagger.@spawn scope=sc CUDA.Matrix(B)
+A_gpu = Dagger.@spawn scope=sc CUDA.CuMatrix(A); B_gpu = Dagger.@spawn scope=sc CUDA.CuMatrix(B)
 C_gpu = Dagger.@spawn scope=sc A_gpu*B_gpu
 # move back to cpu to use there.
 C = Dagger.@spawn scope=sc Matrix(C_gpu)