update RPC docs

stduhpf · stduhpf · commit 0f3750fb1b6c · 2026-01-28T21:02:50.000+01:00
diff --git a/docs/rpc.md b/docs/rpc.md
@@ -1,6 +1,6 @@
 # Building and Using the RPC Server with `stable-diffusion.cpp`
 
-This guide covers how to build a version of the RPC server from `llama.cpp` that is compatible with your version of `stable-diffusion.cpp` to manage multi-backends setups. RPC allows you to offload specific model components to a remote server.
+This guide covers how to build a version of [the RPC server from `llama.cpp`](https://github.com/ggml-org/llama.cpp/blob/master/tools/rpc/README.md) that is compatible with your version of `stable-diffusion.cpp` to manage multi-backends setups. RPC allows you to offload specific model components to a remote server.
 
 > **Note on Model Location:** The model files (e.g., `.safetensors` or `.gguf`) remain on the **Client** machine. The client parses the file and transmits the necessary tensor data and computational graphs to the server. The server does not need to store the model files locally.
 
@@ -16,7 +16,7 @@ cmake .. \
 cmake --build . --config Release -j $(nproc)
 ```
 
-> **Note:** Ensure you add the other flags you would normally use (e.g., `-DSD_VULKAN=ON`, `-DSD_CUDA=ON`, `-DSD_HIPBLAS=ON`, or `-DGGML_METAL=ON`), for more information about building `stable-diffusion.cpp` from source, please refer to the `build.md` documentation.
+> **Note:** Ensure you add the other flags you would normally use (e.g., `-DSD_VULKAN=ON`, `-DSD_CUDA=ON`, `-DSD_HIPBLAS=ON`, or `-DGGML_METAL=ON`), for more information about building `stable-diffusion.cpp` from source, please refer to the [docs/build.md](docs%build.md) documentation.
 
 ## 2. Ensure `llama.cpp` is at the correct commit
 
@@ -39,8 +39,7 @@ cmake --build . --config Release -j $(nproc)
     cd llama.cpp
     git checkout $HASH
     ```
-
-To save on download time and storage, you can use a shallow clone to download only the target commit: 
+    To save on download time and storage, you can use a shallow clone to download only the target commit: 
     ```bash
     mkdir -p llama.cpp
     cd llama.cpp
@@ -54,7 +53,7 @@ To save on download time and storage, you can use a shallow clone to download on
 
 The RPC server acts as the worker. You must explicitly enable the **backend** (the hardware interface, such as CUDA for Nvidia, Metal for Apple Silicon, or Vulkan) when building, otherwise the server will default to using only the CPU.
 
-To find the correct flags, refer to the official documentation for the `llama.cpp` repository.
+To find the correct flags, refer to the official documentation for the [`llama.cpp`](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) repository.
 
 > **Crucial:** You must include the compiler flags required to satisfy the API compatibility with `stable-diffusion.cpp` (`-DGGML_MAX_NAME=128`). Without this flag, `GGML_MAX_NAME` will default to `64` for the server, and data transfers between the client and server will fail. Of course, `-DGGML_RPC` must also be enabled.
 >
@@ -167,18 +166,22 @@ Example: A main machine (192.168.1.10) with 3 GPUs, with one GPU running CUDA an
 
 **Terminal 1 (CUDA):**
 ```bash
-# Linux / macOS / WSL
+# Linux / WSL
 export CUDA_VISIBLE_DEVICES=0
-./rpc-server-cuda --host 0.0.0.0
+cd ./build_cuda/bin/Release 
+./rpc-server --host 0.0.0.0
 
 # Windows PowerShell
 $env:CUDA_VISIBLE_DEVICES="0"
-./rpc-server-cuda --host 0.0.0.0
+cd .\build_cuda\bin\Release
+./rpc-server --host 0.0.0.0
 ```
 
 **Terminal 2 (Vulkan):**
 ```bash
-./rpc-server-vulkan --host 0.0.0.0 --port 50053 -d Vulkan1,Vulkan2
+cd ./build_vulkan/bin/Release
+# ignore the first GPU (used by CUDA server)
+./rpc-server --host 0.0.0.0 --port 50053 -d Vulkan1,Vulkan2
 ```
 
 **On the second machine:**