UPSTREAM PR #1255: feat: add -H --height and -W --width options #1248 #53

loci-dev · 2026-02-07T04:14:03Z

Note

Source pull request: leejet/stable-diffusion.cpp#1255

After falling on my face with the first PR, it seemed necessary to get up and try again with a different issue.

Setting the default -H (--height) and -W (--width) options from the sd-server command line.

There is nothing special to this. Manually added .default_width and .default_height to struct SDSvrParams and initialized both endpoints with that, instead 512.

loci-review · 2026-02-07T05:14:00Z

Overview

Analysis of stable-diffusion.cpp compared 48,089 functions across two binaries following a single commit adding CLI options for image dimensions. Modified functions: 60 (0.12%), new: 2, removed: 1, unchanged: 48,026 (99.87%).

Power Consumption:

build.bin.sd-server: +0.044% (512,977 nJ → 513,205 nJ)
build.bin.sd-cli: -0.0% (479,167 nJ → 479,167 nJ, negligible)

Function Analysis

SDSvrParams::get_options (directly modified): Throughput +82ns (+9.29%), response +8,959ns (+11.58%). Added two CLI options for default image height/width. The 9μs overhead occurs once at startup, not affecting inference performance. Change is justified by the feature addition.

apply_binary_op<op_div, ggml_bf16_t> (GGML tensor operation): Throughput +79ns (+6.64%), response +93ns (+3.59%). Division operations on bfloat16 tensors used in normalization and attention scaling. Potentially called thousands of times per inference, cumulative impact ~593μs per image. Source in GGML submodule (not accessible); regression warrants investigation.

apply_unary_op<op_hardsigmoid, ggml_bf16_t>: Throughput -71ns (-9.11%), response -71ns (-3.47%). Improvement in hard sigmoid activation partially offsets division regression.

Standard library functions (std::less, std::vector, std::unordered_map operations): Mixed results with throughput changes ranging from -74ns to +45ns. Most are compiler/toolchain artifacts affecting initialization code, not inference paths. Vector copy constructor improved (-33.91%), comparison operator regressed (+68.69%), but net impact is minimal as these operate during model loading.

Other analyzed functions showed negligible changes in non-critical paths.

Additional Findings

The commit modified only CLI parsing code, yet most performance variations stem from compiler/standard library differences between builds. ML inference impact is sub-millisecond (<1ms per image, <0.1% of total generation time). The division operation regression in GGML's bfloat16 handling is the only noteworthy concern for ML workloads, though absolute impact remains small. Overall system maintains excellent performance characteristics with appropriate trade-offs for added functionality.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

feat: add -H --height and -W --width options #1248

3e6e560

loci-dev temporarily deployed to stable-diffusion-cpp-prod February 7, 2026 04:14 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 76645dd to 5bbc590 Compare February 7, 2026 04:37

loci-dev force-pushed the main branch 3 times, most recently from 342c73d to 8c51734 Compare February 10, 2026 04:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1255: feat: add -H --height and -W --width options #1248 #53

UPSTREAM PR #1255: feat: add -H --height and -W --width options #1248 #53

Uh oh!

loci-dev commented Feb 7, 2026

Uh oh!

loci-review bot commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

UPSTREAM PR #1255: feat: add -H --height and -W --width options #1248 #53

Are you sure you want to change the base?

UPSTREAM PR #1255: feat: add -H --height and -W --width options #1248 #53

Uh oh!

Conversation

loci-dev commented Feb 7, 2026

Uh oh!

loci-review bot commented Feb 7, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant