Skip to content

Conversation

@JostMigenda
Copy link
Collaborator

Previously, some code examples would run the setup step as part of the timed function, repeating it thousands of times. In particular, for the two examples in episodes/optimisation-data-structures-algorithms.md the setup step took about 6 ms (an order of magnitude longer than the faster methods).
Running the setup just once shaves about 20 s of the runtime of these examples. It also makes the timing cleaner (because it is no longer necessary to substract the setup time) and potentially more reliable.

There are two open questions before we merge this:

  • For the search example, I’m getting search_set: 0.6 ms, linear_search_list: 1500 ms and binary_search_list: 3.36ms. The latter are about 1.5× faster than described in the text, which is reasonable; but the first one is an order of magnitude slower. Could you double check, please? (I’m wondering if substracting the setup time might have affected the result there …)
  • The technical appendix on bytecode uses the manualSearch and operatorSearch examples, so should probably be updated alongside these changes. However, I noticed that Python 3.13 produces a few different bytecodes, even for the code that remained unchanged. I’m wondering how we should deal with this; especially w/r/t Python Version Upgrade Process/Documentation #75.

@JostMigenda JostMigenda requested a review from Robadob July 7, 2025 11:07
@github-actions
Copy link

github-actions bot commented Jul 7, 2025

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

  • 🎯 correct output
  • 🖼️ correct figures
  • ❓ new warnings
  • ‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/RSE-Sheffield/pando-python/compare/md-outputs..md-outputs-PR-89

The following changes were observed in the rendered markdown documents:

 md5sum.txt                                 |  4 +--
 optimisation-data-structures-algorithms.md | 50 +++++++++++-------------------
 optimisation-using-python.md               | 13 +++-----
 3 files changed, 24 insertions(+), 43 deletions(-)
What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2025-07-09 14:32:50 +0000

github-actions bot pushed a commit that referenced this pull request Jul 7, 2025
Copy link
Member

@Robadob Robadob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement, I guess they were written this way due to me not understanding/trusting Python's scoping rules.

Might be worth removing the seeding of random, this feels redundant for the purposes of comparison if they're all using the same dataset (and makes the code a tiny bit smaller).

@Robadob
Copy link
Member

Robadob commented Jul 9, 2025

Off-topic:

This warning (visible at the bottom of your PRs diff) seems spurious, the anchor works in the live site. Possibly a bug due to the original link containing an apostrophe.

image

no longer necessary, now that input data is generated exactly once

Co-authored-by: Robert Chisholm <Robadob@Robadob.org>
@Robadob
Copy link
Member

Robadob commented Jul 9, 2025

Ah I missed your questions. will double check now.

github-actions bot pushed a commit that referenced this pull request Jul 9, 2025
@JostMigenda
Copy link
Collaborator Author

Thanks for the review; I’ve cleaned up the random seeds as suggested.

Before we merge this, do you have any thoughts on the two open questions in the original post?

@Robadob
Copy link
Member

Robadob commented Jul 9, 2025

For the search example, I’m getting search_set: 0.6 ms, linear_search_list: 1500 ms and binary_search_list: 3.36ms. The latter are about 1.5× faster than described in the text, which is reasonable; but the first one is an order of magnitude slower. Could you double check, please? (I’m wondering if substracting the setup time might have affected the result there …)

I know the speedups can differ quite a bit between my laptop and desktop (and sometimes over repeated runs). I've never bothered to investigate why (e.g. all hardware differences, or are there significant impacts from Python version), as the hierarchy always remains the same. I'd be most concerned that all times for a particular example are self-consistent (e.g. same hardware), rather than trying to standardise across the site.

Python 3.10.12/Windows on my office desktop:

search_set: 1.13ms
# Still waiting for the linear search to complete, but that's being very slow, i'll catch it later

So it's even slower on my run on this particular machine (no idea where I first ran it).

(This took longer to run than I recall, I'd have probably reduced repeats to make it practical for people to mess around with :|)

The technical appendix on bytecode uses the manualSearch and operatorSearch examples, so should probably be updated alongside these changes. However, I noticed that Python 3.13 produces a few different bytecodes, even for the code that remained unchanged. I’m wondering how we should deal with this; especially w/r/t #75.

It might just be worth adding a note in italics: "Different versions of python may produce slightly different bytecode."

Feels worth standardising those code examples to match though. Could argue against, as it maybe helps make the point of called functions not being disassembled.

@Robadob
Copy link
Member

Robadob commented Jul 9, 2025

search_set: 1.13ms
linear_search_list: 2735.54ms
binary_search_list: 7.69ms

Full results, everything a bit slower

@JostMigenda
Copy link
Collaborator Author

Okay, that’s all about a factor of 2 longer than in my testing, which is within the typical amount of variation I’d expect. It looks like the order of magnitude difference for search_set is a clear aberration then, so I should update that.

@Robadob
Copy link
Member

Robadob commented Jul 9, 2025

Sounds good

@JostMigenda
Copy link
Collaborator Author

I’m finally finding time to finish this now, after the usual summer and conference season. Since the main repo has since moved to the carpentries-incubator org, I’ll close the PR here and open a new one over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants