Skip to content

Conversation

@DaMatrix
Copy link
Member

Rather than simply performing the updates asynchronously in a different thread (which introduces potential race condition and just offloads the issue to another core), this actually makes the RenderChunk position updates significantly faster in most common cases.

We observe that under normal gameplay circumstances, the camera rarely moves more than one cube per frame. By detecting this common case, we can efficiently skip RenderChunks whose position hasn't changed, as when the camera moves by one cube in a given direction, only one 2D slice/plane of RenderChunks are actually changed.

On my machine, with a horizontal render distance of 48 chunks and a vertical render distance of 16 cubes, and while flying around at maximum speed in spectator mode, this change reduces ViewFrustum#updateChunkPositions() from ~22% of the total client thread CPU time to ~2.4%, nearly an order of magnitude performance improvement.

Rather than simply performing the updates asynchronously in a different thread (which introduces potential race condition and just offloads the issue to another core), this actually makes the RenderChunk position updates significantly faster in most common cases.

We observe that under normal gameplay circumstances, the camera rarely moves more than one cube per frame. By detecting this common case, we can efficiently skip RenderChunks whose position hasn't changed, as when the camera moves by one cube in a given direction, only one 2D slice/plane of RenderChunks are actually changed.

On my machine, with a horizontal render distance of 48 chunks and a vertical render distance of 16 cubes, and while flying around at maximum speed in spectator mode, this change reduces ViewFrustum#updateChunkPositions() from ~22% of the total client thread CPU time to ~2.4%, nearly an order of magnitude performance improvement.
@Niko-sk2x
Copy link
Member

Can you measure exact time in microseconds, with various render distance values, including max possible horizontal+vertical?

@DaMatrix
Copy link
Member Author

Do you want the exact time for one update in a known direction, over a fixed sample duration with a known movement pattern, or average time for many updates with a random movement pattern? This is hard to microbenchmark since the exact update durations are going to depend on the direction which a player is moving in, as well as where the player is relative to the origin point (not to mention that the actual time is going to be affected by the number of RenderChunks which are actually built when their position is changed, or worse - if the RenderChunk is being built we may have to sleep while acquiring lockCompileTask).

@Niko-sk2x
Copy link
Member

Average/minimum/maximum time per invocation when moving normally, this basically tells me how much stutter it's actually going to cause

@DaMatrix
Copy link
Member Author

I couldn't go higher than 20 vertical with 64 horizontal render distance, the client takes so long to allocate all the buffers that it gets timed out. All durations in milliseconds.

  • Original:
    • 8 horizontal, 8 vertical: {count=510, sum=204.274381, min=0.055093, average=0.400538, max=1.630065}
    • 16 horizontal, 16 vertical: {count=596, sum=1686.617977, min=1.588502, average=2.829896, max=6.958128}
    • 48 horizontal, 16 vertical: {count=360, sum=9309.496855, min=17.222589, average=25.859713, max=419.388198}
    • 64 horizontal, 20 vertical: {count=317, sum=19034.859821, min=41.213958, average=60.046876, max=621.479877}
  • This PR:
    • 8 horizontal, 8 vertical: {count=425, sum=46.763603, min=0.001328, average=0.110032, max=2.314520}
    • 16 horizontal, 16 vertical: {count=543, sum=176.860643, min=0.000544, average=0.325710, max=12.860030}
    • 48 horizontal, 16 vertical: {count=429, sum=645.727444, min=0.001609, average=1.505192, max=33.976273}
    • 64 horizontal, 20 vertical: {count=606, sum=1800.563056, min=0.002199, average=2.971226, max=65.247747}

This PR seems is clearly significantly faster than the original code at high render distances, and outperforms the original code on average at low render distances.

At low render distances this PR tends to have longer spikes than the original code (about 2x longer), despite having a lower average duration. My guess is that this is caused by the VM occasionally having to deoptimize and recompile the code when one of the conditions is reached for the first time.

@Niko-sk2x
Copy link
Member

I would like to figure out further improvements, but 3ms in the max render distance case looks mostly good enough

@Niko-sk2x Niko-sk2x merged commit 7165b37 into OpenCubicChunks:MC_1.12 Mar 11, 2025
1 check passed
@DaMatrix DaMatrix deleted the optimized-viewfrustum-position-updates branch March 11, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants