Skip to content

ENH: Add PDB reader benchmark to track parsing performance#5394

Draft
Dreamstick9 wants to merge 4 commits into
MDAnalysis:developfrom
Dreamstick9:add-pdb-benchmark
Draft

ENH: Add PDB reader benchmark to track parsing performance#5394
Dreamstick9 wants to merge 4 commits into
MDAnalysis:developfrom
Dreamstick9:add-pdb-benchmark

Conversation

@Dreamstick9
Copy link
Copy Markdown
Contributor

@Dreamstick9 Dreamstick9 commented Jun 2, 2026

Changes made in this Pull Request:

  • Contributes to performance benchmarks #1023
  • Added PDBReaderBench class to benchmarks/benchmarks/traj_reader.py.
  • it benchmarks PDB format reading and parsing across different number of frames ([10, 100, 500]) using the standard PDB test file.

LLM / AI generated code disclosure

LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)
  • LLM/AI disclosure was updated.

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Jun 2, 2026

Documentation build overview

📚 MDAnalysis | 🛠️ Build #32994886 | 📁 Comparing 625961e against latest (941d679)

  🔍 Preview build  

2 files changed
± index.html
± documentation_pages/analysis/wbridge_analysis.html

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.85%. Comparing base (b05a84b) to head (625961e).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5394   +/-   ##
========================================
  Coverage    93.85%   93.85%           
========================================
  Files          182      182           
  Lines        22509    22509           
  Branches      3202     3202           
========================================
  Hits         21125    21125           
  Misses         922      922           
  Partials       462      462           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread benchmarks/benchmarks/traj_reader.py Outdated
self.u = mda.Universe(PDB)

def time_iterate(self, n_frames):
for _ in range(n_frames):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm mistaken, but won't this nested loop do something weird?

  • for each integer in n_frames range
  • loop over the entire trajectory--all frames, since u.trajectory isn't sliced

Now, it may be the case that after the first outer loop iteration there's no rewind so you don't redo the full iteration each time, but I'm pretty sure the way this is expressed isn't quite right yet.

Even if you fix that, I'm not entirely certain this is benchmarking what we want on the "reading" side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the standard PDB file only contains 1 frame, but for a better approach i could use PDB_multiframe (which is an NMR) and MDAnalysis treats all the 3d models in this file as a different frame, which removes the need for the outer loop totally, So @tylerjereddy if this approach seems to be better, shall I go ahead?

Copy link
Copy Markdown
Contributor Author

@Dreamstick9 Dreamstick9 Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change and the benchmarks for this updated reader come out to be 5.13 ms

Screenshot 2026-06-04 at 10 00 42 PM

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added time_read to separately benchmark the initial parse, let me know if this approach is the right direction to move forward with.

@Dreamstick9 Dreamstick9 requested a review from tylerjereddy June 3, 2026 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants