Skip to content

Commit 86c10a5

Browse files
committed
Deploying to gh-pages from @ dstackai/dstack@bb02788 🚀
1 parent 775679f commit 86c10a5

File tree

6 files changed

+282
-130
lines changed

6 files changed

+282
-130
lines changed

docs/concepts/services/index.html

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4748,6 +4748,56 @@ <h3 id="replicas-and-scaling">Replicas and scaling<a class="headerlink" href="#r
47484748
<blockquote>
47494749
<p>The <code>scaling</code> property requires creating a <a href="../gateways/">gateway</a>.</p>
47504750
</blockquote>
4751+
<details class="info">
4752+
<summary>Replica groups</summary>
4753+
<p>A service can include multiple replica groups. Each group can define its own <code>commands</code>, <code>resources</code> requirements, and <code>scaling</code> rules.</p>
4754+
<p><div editor-title="service.dstack.yml"> </p>
4755+
<div class="highlight"><pre><span></span><code><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">service</span>
4756+
<span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">llama-8b-service</span>
4757+
4758+
<span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">lmsysorg/sglang:latest</span>
4759+
<span class="nt">env</span><span class="p">:</span>
4760+
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B</span>
4761+
4762+
<span class="nt">replicas</span><span class="p">:</span>
4763+
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">count</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1..2</span>
4764+
<span class="w"> </span><span class="nt">scaling</span><span class="p">:</span>
4765+
<span class="w"> </span><span class="nt">metric</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">rps</span>
4766+
<span class="w"> </span><span class="nt">target</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10</span>
4767+
<span class="w"> </span><span class="nt">commands</span><span class="p">:</span>
4768+
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="p p-Indicator">|</span>
4769+
<span class="w"> </span><span class="no">python -m sglang.launch_server \</span>
4770+
<span class="w"> </span><span class="no">--model-path $MODEL_ID \</span>
4771+
<span class="w"> </span><span class="no">--port 8000 \</span>
4772+
<span class="w"> </span><span class="no">--trust-remote-code</span>
4773+
<span class="w"> </span><span class="nt">resources</span><span class="p">:</span>
4774+
<span class="w"> </span><span class="nt">gpu</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">48GB</span>
4775+
4776+
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">count</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1..4</span>
4777+
<span class="w"> </span><span class="nt">scaling</span><span class="p">:</span>
4778+
<span class="w"> </span><span class="nt">metric</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">rps</span>
4779+
<span class="w"> </span><span class="nt">target</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">5</span>
4780+
<span class="w"> </span><span class="nt">commands</span><span class="p">:</span>
4781+
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="p p-Indicator">|</span>
4782+
<span class="w"> </span><span class="no">python -m sglang.launch_server \</span>
4783+
<span class="w"> </span><span class="no">--model-path $MODEL_ID \</span>
4784+
<span class="w"> </span><span class="no">--port 8000 \</span>
4785+
<span class="w"> </span><span class="no">--trust-remote-code</span>
4786+
<span class="w"> </span><span class="nt">resources</span><span class="p">:</span>
4787+
<span class="w"> </span><span class="nt">gpu</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">24GB</span>
4788+
4789+
<span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8000</span>
4790+
<span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">deepseek-ai/DeepSeek-R1-Distill-Llama-8B</span>
4791+
</code></pre></div>
4792+
</div>
4793+
<blockquote>
4794+
<p>Properties such as <code>regions</code>, <code>port</code>, <code>image</code>, <code>env</code> and some other cannot be configured per replica group. This support is coming soon.</p>
4795+
</blockquote>
4796+
</details>
4797+
<details class="info">
4798+
<summary>Disaggregated serving</summary>
4799+
<p>Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.</p>
4800+
</details>
47514801
<h3 id="model">Model<a class="headerlink" href="#model" title="Permanent link">&para;</a></h3>
47524802
<p>If the service is running a chat model with an OpenAI-compatible interface,
47534803
set the <a href="#model"><code>model</code></a> property to make the model accessible via <code>dstack</code>'s

docs/concepts/services/index.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,57 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
164164

165165
> The `scaling` property requires creating a [gateway](gateways.md).
166166

167+
??? info "Replica groups"
168+
A service can include multiple replica groups. Each group can define its own `commands`, `resources` requirements, and `scaling` rules.
169+
170+
<div editor-title="service.dstack.yml">
171+
172+
```yaml
173+
type: service
174+
name: llama-8b-service
175+
176+
image: lmsysorg/sglang:latest
177+
env:
178+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
179+
180+
replicas:
181+
- count: 1..2
182+
scaling:
183+
metric: rps
184+
target: 10
185+
commands:
186+
- |
187+
python -m sglang.launch_server \
188+
--model-path $MODEL_ID \
189+
--port 8000 \
190+
--trust-remote-code
191+
resources:
192+
gpu: 48GB
193+
194+
- count: 1..4
195+
scaling:
196+
metric: rps
197+
target: 5
198+
commands:
199+
- |
200+
python -m sglang.launch_server \
201+
--model-path $MODEL_ID \
202+
--port 8000 \
203+
--trust-remote-code
204+
resources:
205+
gpu: 24GB
206+
207+
port: 8000
208+
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
209+
```
210+
211+
</div>
212+
213+
> Properties such as `regions`, `port`, `image`, `env` and some other cannot be configured per replica group. This support is coming soon.
214+
215+
??? info "Disaggregated serving"
216+
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
217+
167218
### Model
168219

169220
If the service is running a chat model with an OpenAI-compatible interface,

llms-full.txt

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3873,6 +3873,57 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
38733873

38743874
> The `scaling` property requires creating a [gateway](gateways.md).
38753875

3876+
??? info "Replica groups"
3877+
A service can include multiple replica groups. Each group can define its own `commands`, `resources` requirements, and `scaling` rules.
3878+
3879+
<div editor-title="service.dstack.yml">
3880+
3881+
```yaml
3882+
type: service
3883+
name: llama-8b-service
3884+
3885+
image: lmsysorg/sglang:latest
3886+
env:
3887+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
3888+
3889+
replicas:
3890+
- count: 1..2
3891+
scaling:
3892+
metric: rps
3893+
target: 10
3894+
commands:
3895+
- |
3896+
python -m sglang.launch_server \
3897+
--model-path $MODEL_ID \
3898+
--port 8000 \
3899+
--trust-remote-code
3900+
resources:
3901+
gpu: 48GB
3902+
3903+
- count: 1..4
3904+
scaling:
3905+
metric: rps
3906+
target: 5
3907+
commands:
3908+
- |
3909+
python -m sglang.launch_server \
3910+
--model-path $MODEL_ID \
3911+
--port 8000 \
3912+
--trust-remote-code
3913+
resources:
3914+
gpu: 24GB
3915+
3916+
port: 8000
3917+
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
3918+
```
3919+
3920+
</div>
3921+
3922+
> Properties such as `regions`, `port`, `image`, `env` and some other cannot be configured per replica group. This support is coming soon.
3923+
3924+
??? info "Disaggregated serving"
3925+
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
3926+
38763927
### Model
38773928

38783929
If the service is running a chat model with an OpenAI-compatible interface,

search/search_index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)