Remove duplicate setup code in examples #14

JostMigenda · 2025-10-30T13:18:10Z

See previous discussion and review of the first two commits in RSE-Sheffield#89. The two commits added today resolve both open questions discussed there (discrepancy in relative timing for the list/set search comparison and bytecode in appendix).

(The default output format of the dis module has changed a bit in 3.13; that’s why the last commit looks a bit messy. I think it’s nicer though, with named jump labels instead of instruction offsets.)

no longer necessary, now that input data is generated exactly once Co-authored-by: Robert Chisholm <Robadob@Robadob.org>

github-actions · 2025-10-30T13:18:26Z

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

🎯 correct output
🖼️ correct figures
❓ new warnings
‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/carpentries-incubator/pando-python/compare/md-outputs..md-outputs-PR-14

The following changes were observed in the rendered markdown documents:

 md5sum.txt                                 |   6 +-
 optimisation-data-structures-algorithms.md |  62 ++++------
 optimisation-using-python.md               |  13 +-
 technical-appendix.md                      | 183 +++++++++++++++--------------
 4 files changed, 124 insertions(+), 140 deletions(-)

What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2025-10-30 13:19:17 +0000

Robadob

May want to reduce repeats to speed up the timing runs.

Can't decide whether to include the sort in the timing of binary search or not. (it is a cost, but you'd normally sort once and search many times)

Happy to merge as is, but those are my thoughts.

Robadob · 2025-10-30T16:47:03Z

episodes/optimisation-data-structures-algorithms.md

            ls_out.append(i)
-            
+
 repeats = 1000


Suggested change

repeats = 1000

repeats = 100

As noted in the earlier PR, may want to cut this down given how slow uniqueList is, making it easier for people to try it themselves.

Results from 1k repeats locally after these changes, they basically align with what's already there. If repeats is dropped the time will divide too.

uniqueSet: 0.45ms uniqueSetAdd: 0.98ms uniqueList: 641.24ms uniqueListSort: 3.03ms

Robadob · 2025-10-30T16:56:40Z

episodes/optimisation-data-structures-algorithms.md

-    return st, ls  # Return both set and list
+st = set([random.randint(0, int(N*M)) for i in range(N)])
+ls = list(st)
+ls.sort()  # Sort required for binary search


Tbh, the cost of sorting might be worth including in the benchmark to make it "fair".

Good point. However, I experimented with this a bit and it seems to be more subtle than I initially expected:

If I use st = set([random.randint(0, int(N*M)) for i in range(N)]); ls = list(st) then ls starts out as a “mostly sorted” list (e.g. something like [32771, 32772, 5, 7, 9, 11, 14, 32785, 17, 19, …]; I’m guessing due to implementation details of the set iteration and hashing implementations?) Including the sorting in the benchmark seemed to have little (<10%) impact on binary_search_list.

However, if I switch it around (ls = [random.randint(0, int(N*M)) for i in range(N)]; st = set(ls)) so ls becomes truly random, then binary_search_list becomes about 50% slower.

The latter is probably the more realistic (no-one’s gonna first create a set and then transform it to a list to search it; they’ll just start out having some list that they want to search). On the other hand, by that argument, it might be fairer to include the st = set(ls) when benchmarking search_set, which would make it slower by a similar 50-ish % …

Overall, since it wouldn’t affect the qualitative comparison either way, I’d err on the side of keeping it as is. (The section on NumPy arrays vs. lists had a brief discussion about the overhead of conversion, so it’s okay to skip over that here.)

Bonus fun fact: With ls = […]; st = set(ls) I’ve found the search_set times vary between 0.55 and 0.69 ms for different random inputs. (But reproducible within ±0.02 ms for a fixed input.) However, for the “slow” inputs, running st = set(set(ls)) instead would pretty reliably drop the time down to 0.57±0.02 ms. 🤯

I’m guessing the outer set() call is technically not a no-op in that case, because the order in which elements get added to the new set is different from the one of st = set(ls)? But I really can’t afford to get nerdsniped by that and study the implementation of sets right now …

Fair to assume that some inputs require more data-reallocation, or less balanced hashmaps (e.g worse performing lookups). Though interesting to know it's visible in practice.

Robadob · 2025-10-30T17:34:19Z

episodes/optimisation-data-structures-algorithms.md

            j += 1

-
 repeats = 1000


Suggested change

repeats = 1000

repeats = 10

As above, maybe want to reduce this so it's quicker for people to run locally. This one is much slower.

It would be nice to use timeit's auto functionality which just ensures longer than 0.2s, but its a lot more verbose sadly.

My time for 1000x, curiously linear search is only 1.3x whereas the other two are 2x, but eh.

search_set: 1.03ms linear_search_list: 2141.45ms binary_search_list: 6.96ms

I’ll open an issue to check run times for all code samples; I vaguely remember there being another slow one 🤔

Robadob · 2025-10-30T17:44:49Z

episodes/optimisation-using-python.md

        if i in ls:
            ct += 1

 repeats = 1000


Suggested change

repeats = 1000

repeats = 100

My 1000x time, manual search 1.3x operator much the same.

manualSearch: 200.51ms operatorSearch: 27.39ms

Auto-generated via `{sandpaper}` Source : 1134124 Branch : main Author : Jost Migenda <jost.migenda@kcl.ac.uk> Time : 2025-11-02 01:17:13 +0000 Message : Merge pull request #14 from JostMigenda/examples_setup Remove duplicate setup code in examples

Auto-generated via `{sandpaper}` Source : c48867e Branch : md-outputs Author : GitHub Actions <actions@github.com> Time : 2025-11-02 01:18:15 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : 1134124 Branch : main Author : Jost Migenda <jost.migenda@kcl.ac.uk> Time : 2025-11-02 01:17:13 +0000 Message : Merge pull request #14 from JostMigenda/examples_setup Remove duplicate setup code in examples

Auto-generated via `{sandpaper}` Source : 1134124 Branch : main Author : Jost Migenda <jost.migenda@kcl.ac.uk> Time : 2025-11-02 01:17:13 +0000 Message : Merge pull request carpentries-incubator#14 from JostMigenda/examples_setup Remove duplicate setup code in examples

Auto-generated via `{sandpaper}` Source : 9ce30eb Branch : md-outputs Author : GitHub Actions <actions@github.com> Time : 2025-11-05 09:15:29 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : 1134124 Branch : main Author : Jost Migenda <jost.migenda@kcl.ac.uk> Time : 2025-11-02 01:17:13 +0000 Message : Merge pull request carpentries-incubator#14 from JostMigenda/examples_setup Remove duplicate setup code in examples

Auto-generated via `{sandpaper}` Source : c48867e Branch : md-outputs Author : GitHub Actions <actions@github.com> Time : 2025-11-02 01:18:15 +0000 Message : markdown source builds Auto-generated via `{sandpaper}` Source : 1134124 Branch : main Author : Jost Migenda <jost.migenda@kcl.ac.uk> Time : 2025-11-02 01:17:13 +0000 Message : Merge pull request #14 from JostMigenda/examples_setup Remove duplicate setup code in examples

JostMigenda and others added 4 commits July 7, 2025 11:47

do setup only once

5e6cb1f

remove pRNG seeds

4df35e5

no longer necessary, now that input data is generated exactly once Co-authored-by: Robert Chisholm <Robadob@Robadob.org>

correct relative timings of search examples

d30c897

appendix: update dis example to reflect code changes in main text

edfbf2c

JostMigenda requested a review from Robadob October 30, 2025 13:18

github-actions bot pushed a commit that referenced this pull request Oct 30, 2025

differences for PR #14

f19d5c2

Robadob approved these changes Oct 30, 2025

View reviewed changes

JostMigenda mentioned this pull request Nov 2, 2025

Check number of repeats for code examples #16

Open

JostMigenda merged commit 1134124 into carpentries-incubator:main Nov 2, 2025
3 checks passed

JostMigenda deleted the examples_setup branch November 3, 2025 07:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove duplicate setup code in examples #14

Remove duplicate setup code in examples #14

Uh oh!

JostMigenda commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

Robadob left a comment

Uh oh!

Robadob Oct 30, 2025

Uh oh!

Robadob Oct 30, 2025

Uh oh!

JostMigenda Nov 2, 2025

Uh oh!

JostMigenda Nov 2, 2025

Uh oh!

Robadob Nov 2, 2025

Uh oh!

Robadob Oct 30, 2025

Uh oh!

JostMigenda Nov 2, 2025

Uh oh!

Robadob Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Remove duplicate setup code in examples #14

Remove duplicate setup code in examples #14

Uh oh!

Conversation

JostMigenda commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thank you!

Rendered Changes

Uh oh!

Robadob left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Oct 30, 2025 •

edited

Loading