Skip to content

Conversation

@subCode321
Copy link
Contributor

@subCode321 subCode321 commented Mar 6, 2025

Fixes #123

  1. I've used rq worker --with-scheduler to queue in jobs and process them
  2. I've added the new modules to requirements
  3. Also, added visualization of CPU and RAM metrics

Video Recording

Streamlit.Scheduler.mp4

Summary by CodeRabbit

  • New Features

    • Integrated background processing by launching a Redis server and worker service to manage task scheduling.
    • Enhanced workflow management by queuing tasks for improved execution order.
  • Bug Fixes

    • Prevented potential cleanup errors by ensuring process identifier files are checked before removal.
  • Documentation

    • Updated installation instructions to clarify the multi-step startup process, including launching Redis and the worker.
  • Chores

    • Refined dependency configurations and ordering for a smoother environment setup.

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2025

Walkthrough

The changes update dependency files by repositioning psutil==7.0.0 and adding redis==5.2.1 and rq==2.1.0. Modifications in Docker and GitHub Actions introduce commands to start a Redis server and an RQ worker from entrypoint scripts and batch files. The installation documentation now details these startup steps. In the Python workflow components, task execution is shifted from immediate execution to queuing via Redis and RQ with defined dependencies, while a safety check is added for PID file removal.

Changes

Files Change Summary
requirements.txt, environment.yml Repositioned psutil==7.0.0 and added new dependencies redis==5.2.1 and rq==2.1.0.
Dockerfile, Dockerfile_simple, .github/workflows/build-windows-executable-app.yaml Appended commands in entrypoint scripts and batch file to start Redis server and RQ worker with scheduler before launching the Streamlit app.
docs/installation.md Revised instructions to first start the Redis server and RQ worker, followed by the application launch.
src/Workflow.py, src/workflow/WorkflowManager.py Transitioned task execution to use a Redis-backed job queue with dependency chaining between workflow tasks.
src/workflow/CommandExecutor.py Added a conditional check to ensure the PID file exists before attempting to unlink it.
test_gui.py Removed an extraneous blank line before a fixture decorator.

Sequence Diagram(s)

sequenceDiagram
    participant Container
    participant EntryPoint
    participant RedisServer
    participant RQWorker
    participant StreamlitApp

    Container->>EntryPoint: Start container
    EntryPoint->>RedisServer: Execute "redis-server"
    EntryPoint->>RQWorker: Execute "rq worker --with-scheduler"
    EntryPoint->>StreamlitApp: Launch "streamlit run app.py"
Loading
sequenceDiagram
    participant WM as WorkflowManager
    participant Q as Job Queue
    participant W as RQ Worker
    participant E as Task Executor

    WM->>Q: Enqueue FeatureFinderMetabo job
    Q->>W: Process FeatureFinderMetabo job
    W->>E: Execute FeatureFinderMetabo task
    E-->>Q: On completion, trigger FeatureLinkerUnlabeledKD job
    Q->>W: Process FeatureLinkerUnlabeledKD job
    W->>E: Execute FeatureLinkerUnlabeledKD task
    E-->>Q: On completion, trigger export and final tasks
Loading

Assessment against linked issues

Objective Addressed Explanation
Add a Scheduler for Workflows (#123)

Possibly related PRs

Suggested reviewers

  • t0mdavid-m
  • jcharkow

Poem

I'm a rabbit with a jubilant beat,
Hopping through code with nimble feet.
Redis and RQ now lead the queue,
Tasks line up in a hop, brand new.
Celebrate these tweaks in lines and code,
With whiskers twitching down the workflow road!


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
requirements.txt (1)

10-12: Consider pinning streamlit-js-eval.

While redis and rq are pinned to specific versions, streamlit-js-eval is not. Pinning all packages ensures consistent builds and reduces the risk of unintended breakage or dependency conflicts.

content/run_example_workflow.py (3)

7-9: Make Redis connection configurable.

Redis is being initialized with default parameters, which might be fine for local development. However, for production or different deployment environments, consider externalizing the host/port or retrieving them from environment variables.


53-55: Reuse the Redis connection or handle exceptions.

Creating a new Redis connection each time this function is called can lead to excessive overhead or connection issues. Consider reusing a single connection (e.g., a global instance) or adding error handling if the job ID is invalid or Redis is unavailable.


60-80: Handle additional job states.

Currently, the loop checks if the job is finished but doesn't handle failures separately. Consider checking for job.is_failed to correctly update the status if a failure occurs.

 while not (job.is_finished or job.is_failed):
     logs = get_workflow_progress_logs(job.id)
     ...
 
 if job.is_failed:
     status.update(label="Workflow failed!", expanded=False, state='error')
 elif job.is_finished:
     status.update(label="Workflow complete!", expanded=False, state='complete')
src/mzmlfileworkflow.py (3)

9-10: Use environment-based Redis configuration.

For production, it’s often preferable to configure Redis connections (host, port, credentials) via environment variables or a separate configuration file rather than relying on defaults.


31-36: Handle cases when no job is available.

get_current_job() may return None if run outside an RQ worker context. Also, creating a new Redis connection every time can be costly. Consider reusing a single connection or shared connection pool for better performance.


48-60: Add error handling for file load operations.

If an mzML file is missing or fails during loading, the loop continues without reporting the error. Wrap file loading in a try/except block to log relevant error details via Redis.

 try:
     num_spectra_val = mzML_file_get_num_spectra(
       str(Path(workspace_path, "mzML-files", file + ".mzML"))
     )
     num_spectra.append(num_spectra_val)
 except Exception as e:
     log_workflow_progress(f"Error reading mzML file {file}: {e}")
     continue
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d65c9c9 and 2eac24a.

📒 Files selected for processing (3)
  • content/run_example_workflow.py (2 hunks)
  • requirements.txt (1 hunks)
  • src/mzmlfileworkflow.py (3 hunks)
🔇 Additional comments (1)
src/mzmlfileworkflow.py (1)

38-38: Verify validity of workspace_path.

Ensure workspace_path exists and is accessible. Handling this scenario prevents unexpected runtime errors or missing file references.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
app.py (1)

12-27: Platform-specific handling for RQ worker looks good.

The implementation correctly differentiates between Windows and Unix-based systems for launching the RQ worker. For Windows, the CREATE_NEW_PROCESS_GROUP flag allows sending control signals later, while Unix systems use a different approach with start_new_session=True.

Consider extracting the queue name "mzml_workflow_run" into a constant at the top of the file to avoid repetition and make future changes easier.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89a84d7 and 3f9233d.

📒 Files selected for processing (1)
  • app.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
app.py

10-10: pyopenms imported but unused

Remove unused import: pyopenms

(F401)


37-37: Do not use bare except

(E722)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-simple-app
  • GitHub Check: build-full-app
  • GitHub Check: build-openms
🔇 Additional comments (2)
app.py (2)

4-8: New imports correctly support the RQ worker functionality.

These imports are necessary for subprocess management, platform detection, and process cleanup.


10-10:

Details

❓ Verification inconclusive

Verify necessity of the pyopenms import.

The static analysis flags pyopenms as unused, but there's a comment suggesting it's necessary for Windows. Please verify if this import is still needed or document more clearly why it's required.


🏁 Script executed:

#!/bin/bash
# Check if pyopenms is used in the codebase
rg -i "pyopenms" --type py

Length of output: 884


Action: Clarify Usage of pyopenms in app.py

It appears that the import of pyopenms on line 10 in app.py isn’t directly used in that file. However, given that other parts of the codebase import and use pyopenms (typically aliased as poms), there may be a side effect—perhaps related to Windows-specific initialization—that isn’t immediately obvious. Please verify if this import is intentionally retained for such side effects. If it is required for Windows functionality or any implicit initialization, add a comment in the code documenting this rationale. Otherwise, consider removing the unused import to clean up the code.

🧰 Tools
🪛 Ruff (0.8.2)

10-10: pyopenms imported but unused

Remove unused import: pyopenms

(F401)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 2

🧹 Nitpick comments (3)
test_gui.py (3)

7-7: Remove unused import

MagicMock is imported but not used in the code. You should remove unused imports to keep the codebase clean.

- from unittest.mock import patch, MagicMock
+ from unittest.mock import patch
🧰 Tools
🪛 Ruff (0.8.2)

7-7: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)


138-146: Duplicate error handling logic

This is a duplicate of the logic at lines 112-120. Consider extracting this repeated logic into a helper function.

+ def create_example_files(mzml_dir):
+     """Create example mzML files in the given directory."""
+     example_dir = Path("example-data", "mzML")
+     if example_dir.exists():
+         for f in example_dir.glob("*.mzML"):
+             shutil.copy(f, mzml_dir)
+     else:
+         # Create empty example files if directory doesn't exist
+         for name in ['Blank.mzML', 'Treatment.mzML', 'Pool.mzML', 'Control.mzML']:
+             Path(mzml_dir, name).touch()

# Then use this function in both test cases

156-160: Simplify nested with statements

Similar to the previous suggestion, these nested with statements could be simplified for better readability.

- with patch('redis.Redis', return_value=MockRedis()):
-     with patch('rq.Queue', return_value=MockQueue()):
-         launch.button[1].click().run(timeout=60)
-         assert not launch.exception
+ with patch('redis.Redis', return_value=MockRedis()), \
+      patch('rq.Queue', return_value=MockQueue()):
+     launch.button[1].click().run(timeout=60)
+     assert not launch.exception
🧰 Tools
🪛 Ruff (0.8.2)

157-158: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🛑 Comments failed to post (2)
test_gui.py (2)

39-39: 💡 Verification agent

🧩 Analysis chain

Verify that the module has the correct function name

The repeated pipeline failure on this line indicates that get_workflow_progress_logs is not defined in src.mzmlfileworkflow. This is a critical issue that needs to be fixed.


🏁 Script executed:

#!/bin/bash
# Check if the correct function exists in the mzmlfileworkflow module
echo "Looking for workflow logging functions in mzmlfileworkflow:"
rg "def get_workflow" src/mzmlfileworkflow.py

echo "\nChecking imports in this module:"
rg "import|from" src/mzmlfileworkflow.py

Length of output: 693


Attention: Incorrect function reference in patch

The patch in test_gui.py is targeting get_workflow_progress_logs in src/mzmlfileworkflow.py, but our search did not reveal any function starting with "get_workflow"—confirming that no such function is defined in the module. Please review if:

  • The intended function was misnamed in the patch.
  • The module is missing the expected get_workflow_progress_logs implementation.

Adjust the test or implement the missing function to resolve the pipeline failure.

🧰 Tools
🪛 GitHub Actions: Test workflow functions

[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


34-40: ⚠️ Potential issue

Fix the missing module attribute and improve code structure

The pipeline is failing because get_workflow_progress_logs is not defined in the src.mzmlfileworkflow module. Also, the nested with statements could be simplified for better readability.

Fix the missing attribute:

- with patch('src.mzmlfileworkflow.get_workflow_progress_logs', return_value=[b"Mock log entry"]):
+ with patch('src.mzmlfileworkflow.get_workflow_logs', return_value=[b"Mock log entry"]):

Or make sure the get_workflow_progress_logs function is properly defined in src.mzmlfileworkflow.

Also, you can simplify the nested with statements:

- with patch('redis.Redis', return_value=MockRedis()):
-     with patch('rq.Queue', return_value=MockQueue()):
-         with patch('src.mzmlfileworkflow.get_workflow_progress_logs', return_value=[b"Mock log entry"]):
-             yield
+ with patch('redis.Redis', return_value=MockRedis()), \
+      patch('rq.Queue', return_value=MockQueue()), \
+      patch('src.mzmlfileworkflow.get_workflow_progress_logs', return_value=[b"Mock log entry"]):
+     yield
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

@pytest.fixture(autouse=True)
def mock_redis_rq():
    """Mock Redis and RQ components for all tests"""
    with patch('redis.Redis', return_value=MockRedis()), \
         patch('rq.Queue', return_value=MockQueue()), \
         patch('src.mzmlfileworkflow.get_workflow_logs', return_value=[b"Mock log entry"]):
        yield
🧰 Tools
🪛 Ruff (0.8.2)

37-38: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🪛 GitHub Actions: Test workflow functions

[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'


[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
Dockerfile (1)

144-146: Consider adding error handling for Redis and RQ worker

The current implementation doesn't include any error handling for Redis and RQ worker startup, which could lead to silent failures.

Consider adding basic error checking:

-    echo "redis-server" >> /app/entrypoint.sh && \
-    echo "rq worker --with-scheduler -q mzml_workflow_run" >> /app/entrypoint.sh && \
+    echo "redis-server & REDIS_PID=\$!" >> /app/entrypoint.sh && \
+    echo "sleep 2" >> /app/entrypoint.sh && \
+    echo "if ! kill -0 \$REDIS_PID 2>/dev/null; then echo 'Redis failed to start'; exit 1; fi" >> /app/entrypoint.sh && \
+    echo "rq worker --with-scheduler -q mzml_workflow_run & RQ_PID=\$!" >> /app/entrypoint.sh && \
+    echo "sleep 1" >> /app/entrypoint.sh && \
+    echo "if ! kill -0 \$RQ_PID 2>/dev/null; then echo 'RQ worker failed to start'; exit 1; fi" >> /app/entrypoint.sh && \
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e71dca and 64a4397.

📒 Files selected for processing (3)
  • Dockerfile (1 hunks)
  • Dockerfile_simple (1 hunks)
  • environment.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-openms
  • GitHub Check: build-full-app
  • GitHub Check: build-simple-app
🔇 Additional comments (2)
environment.yml (1)

18-19: Dependencies added for Redis and RQ integration

The addition of redis and rq packages supports the implementation of job queuing with the Redis Queue worker as mentioned in the PR objectives.

Dockerfile (1)

144-145: Consider running Redis and RQ worker in background

The Redis server and RQ worker are started sequentially in the entrypoint script without being backgrounded. This could prevent the Streamlit app from starting since each command will block until it completes.

Apply the same fix as suggested in the Dockerfile_simple review:

-    echo "redis-server" >> /app/entrypoint.sh && \
-    echo "rq worker --with-scheduler -q mzml_workflow_run" >> /app/entrypoint.sh && \
+    echo "redis-server &" >> /app/entrypoint.sh && \
+    echo "sleep 2 && rq worker --with-scheduler -q mzml_workflow_run &" >> /app/entrypoint.sh && \

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/build-windows-executable-app.yaml (1)

262-264: Review of Batch File Command Setup for Redis, RQ Worker, and Streamlit
The new commands correctly implement the change required by the PR by manually starting the Redis server and the RQ worker (with scheduler) before launching the Streamlit app. However, please note the following points:

  • The third command (line 264) has an extra leading whitespace inside the quotes. This inconsistency might be unintentional and could be removed for uniformity.
  • Since the PR description mentions that these processes will eventually be automated within app.py, it would be beneficial to add a TODO comment indicating that this manual batch file startup should be revisited once the automation is implemented.

Proposed Diff to Remove the Leading Whitespace:

-          echo " start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local" >> ${{ env.APP_NAME }}.bat
+          echo "start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local" >> ${{ env.APP_NAME }}.bat
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64a4397 and 1025979.

📒 Files selected for processing (4)
  • .github/workflows/build-windows-executable-app.yaml (1 hunks)
  • Dockerfile (1 hunks)
  • Dockerfile_simple (1 hunks)
  • test_gui.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • Dockerfile
  • Dockerfile_simple
🧰 Additional context used
🪛 GitHub Actions: Test workflow functions
test_gui.py

[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])


[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])


[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])


[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])


[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-openms
  • GitHub Check: build-simple-app
  • GitHub Check: build-full-app

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
content/run_example_workflow.py (1)

10-10: Remove unused import.

The threading module is imported but never used in this file.

-import threading
🧰 Tools
🪛 Ruff (0.8.2)

10-10: threading imported but unused

Remove unused import: threading

(F401)

src/workflow/mzmlfileworkflowstatus.py (1)

21-39: Job status monitoring with Streamlit UI updates.

This function effectively bridges RQ job execution with Streamlit UI updates through:

  1. Creating a status indicator in the Streamlit UI
  2. Continuously polling Redis for new log messages
  3. Updating the UI in real-time as new messages arrive
  4. Properly finalizing the status when the job completes or fails

Consider these improvements:

  1. Simplify the nested with statements
  2. Add error handling for Redis connection issues
  3. Implement a timeout mechanism for stalled jobs
-def monitor_mzml_workflow_job_status(job: job.Job, streamlit_status_placeholder: DeltaGenerator):
-    with streamlit_status_placeholder.container():
-        with st.status("Workflow in progress...", expanded=True) as status:
+def monitor_mzml_workflow_job_status(job: job.Job, streamlit_status_placeholder: DeltaGenerator, timeout_seconds: int = 3600):
+    with streamlit_status_placeholder.container(), st.status("Workflow in progress...", expanded=True) as status:
         last_len = 0
+        start_time = time.time()
         while not job.is_finished:
-            logs = get_mzml_workflow_progress_logs(job.id)
+            try:
+                logs = get_mzml_workflow_progress_logs(job.id)
+                
+                if len(logs) > last_len:
+                    # Read from the last read log and update the status message
+                    for log in logs[last_len:]:
+                        status.text(log.decode('utf-8'))
+                    last_len = len(logs)
-
-            if len(logs) > last_len:
-                # Read from the last read log and update the status message
-                for log in logs[last_len:]:
-                    status.text(log.decode('utf-8'))
-                last_len = len(logs)
+            except Exception as e:
+                status.text(f"Error retrieving logs: {str(e)}")
+                
+            # Check for timeout
+            if time.time() - start_time > timeout_seconds:
+                status.update(label="Workflow timed out!", expanded=False, state='error')
+                break

             time.sleep(1)

         if job.is_finished:
             status.update(label="Workflow complete!", expanded=False, state='complete')
         else:
             status.update(label="Workflow stopped!", expanded=False, state='error')
🧰 Tools
🪛 Ruff (0.8.2)

22-23: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2992275 and 141acfe.

📒 Files selected for processing (3)
  • content/run_example_workflow.py (2 hunks)
  • src/mzmlfileworkflow.py (3 hunks)
  • src/workflow/mzmlfileworkflowstatus.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/mzmlfileworkflow.py
🧰 Additional context used
🪛 Ruff (0.8.2)
content/run_example_workflow.py

10-10: threading imported but unused

Remove unused import: threading

(F401)

src/workflow/mzmlfileworkflowstatus.py

22-23: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-full-app
  • GitHub Check: build-openms
  • GitHub Check: build-simple-app
🔇 Additional comments (6)
content/run_example_workflow.py (3)

7-10: New imports for Redis queue implementation.

These imports are necessary to support the new job queuing functionality. The Redis connection, RQ queue, and monitoring function are essential for the asynchronous workflow execution.

🧰 Tools
🪛 Ruff (0.8.2)

10-10: threading imported but unused

Remove unused import: threading

(F401)


54-55: Status placeholder for workflow progress reporting.

Good addition - this empty placeholder will be used to display real-time progress updates from the queued workflow job.


60-62: Workflow execution now uses Redis queue instead of direct execution.

This change implements the core functionality of the PR by:

  1. Creating a Redis queue named 'mzml_workflow_run'
  2. Enqueueing the workflow job with necessary parameters
  3. Monitoring job progress and updating the UI

The implementation correctly passes st.session_state["workspace"] as a new parameter to the workflow function.

src/workflow/mzmlfileworkflowstatus.py (3)

1-7: Appropriate imports for Redis integration and Streamlit components.

The imports cover all the requirements for this module:

  • Redis for storage
  • RQ job management
  • Time for sleep operations
  • Streamlit components for UI updates
  • Type annotations

8-11: Function to retrieve workflow progress logs from Redis.

This helper function correctly retrieves all logs for a specific job ID from Redis using a consistent key format. The function properly specifies a return type of List[bytes] for type safety.


13-18: Function to log workflow progress to Redis.

This function properly:

  1. Gets the current job context
  2. Constructs a consistent log key based on job ID
  3. Appends the message to the Redis list

This implementation allows for real-time progress tracking across distributed components.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/common/common.py (1)

460-465: ⚠️ Potential issue

Add defensive check for mzml_workflow_job_id.

The added session state key parameter makes the download button unique per workflow job, which is good for preventing state conflicts. However, if mzml_workflow_job_id is not set in the session state (e.g., when running other workflows), this would raise a KeyError.

Consider adding a defensive check:

-            key=f"download_{st.session_state['mzml_workflow_job_id']}" 
+            key=f"download_{st.session_state.get('mzml_workflow_job_id', 'default')}"
🧹 Nitpick comments (7)
docs/installation.md (1)

32-34: Fix punctuation and clarify process requirements.

The installation instructions need punctuation fixes and additional context about why these processes are needed.

-1. Open a new terminal and run `redis-server` 
-2. On another terminal run `rq worker --with-scheduler -q mzml_workflow_run`
-3. Finally start the application `streamlit run app.py`
+1. Open a new terminal and run `redis-server` (provides the message queue backend).
+2. On another terminal run `rq worker --with-scheduler -q mzml_workflow_run` (processes workflow jobs asynchronously).
+3. Finally start the application with `streamlit run app.py`.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...

(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)

src/mzmlfileworkflow.py (1)

30-30: Update function docstring to reflect the new parameter.

The function signature has been updated to include workspace_path, but the docstring hasn't been updated to document this parameter.

-def run_workflow(params, result_dir, workspace_path):
-    """Load each mzML file into pyOpenMS Experiment and get the number of spectra."""
+def run_workflow(params, result_dir, workspace_path):
+    """
+    Load each mzML file into pyOpenMS Experiment and get the number of spectra.
+    
+    Args:
+        params: Dictionary containing workflow parameters
+        result_dir: Directory to store results
+        workspace_path: Path to the workspace containing mzML files
+    """
content/run_example_workflow.py (3)

62-64: Consider handling connection errors to Redis.
If Redis is offline or misconfigured, queue.enqueue may fail. Handling exceptions would improve resilience.

 try:
     job = queue.enqueue(mzmlfileworkflow.run_workflow, params, result_dir, st.session_state["workspace"])
 except RedisError as e:
     st.error(f"Failed to enqueue job: {e}")

68-68: Use 'is None' or 'is not None' for None checks.
According to style guidelines (E711), using == None or != None is discouraged in favor of is None or is not None.

-if st.session_state['mzml_workflow_job_id'] != None:
+if st.session_state['mzml_workflow_job_id'] is not None:

-if st.session_state['mzml_workflow_job_id'] == None:
+if st.session_state['mzml_workflow_job_id'] is None:

-if st.session_state['mzml_workflow_job_id'] != None:
+if st.session_state['mzml_workflow_job_id'] is not None:

Also applies to: 75-75, 85-85

🧰 Tools
🪛 Ruff (0.8.2)

68-68: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


72-73: Fix the spelling of “results_section_placholder”.
Renaming to results_section_placeholder improves consistency and clarity.

-results_section_placholder = st.empty()
-with results_section_placholder:
+results_section_placeholder = st.empty()
+with results_section_placeholder:
src/workflow/mzmlfileworkflowstatus.py (2)

49-49: Use 'is None' or 'is not None' when checking for None.
Following Python conventions (E711) improves readability and avoids subtle bugs.

-if st.session_state['mzml_workflow_job_id'] == None:
+if st.session_state['mzml_workflow_job_id'] is None:

-if log != None:
+if log is not None:

-if status_log != None:
+if status_log is not None:

Also applies to: 56-56, 60-60

🧰 Tools
🪛 Ruff (0.8.2)

49-49: Comparison to None should be cond is None

Replace with cond is None

(E711)


67-71: Include error details for failed jobs.
When the job is stopped, canceled, or fails, consider showing the error details (e.g., job.exc_info) to help debugging.

elif job.is_stopped or job.is_canceled or job.is_failed:
    workflow_status.update(label="Stopped!", expanded=False, state='error')
    with results_section_placholder:
        reset_workflow_state(result_dir)
+       if job.is_failed and job.exc_info:
+           st.error(f"Job failed with error:\n{job.exc_info}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 141acfe and b1a6f17.

📒 Files selected for processing (6)
  • content/run_example_workflow.py (2 hunks)
  • docs/installation.md (1 hunks)
  • src/common/common.py (1 hunks)
  • src/cpustats.py (1 hunks)
  • src/mzmlfileworkflow.py (3 hunks)
  • src/workflow/mzmlfileworkflowstatus.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Test workflow functions
src/cpustats.py

[error] 1-1: ModuleNotFoundError: No module named 'psutil'

🪛 LanguageTool
docs/installation.md

[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...

(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)

🪛 Ruff (0.8.2)
content/run_example_workflow.py

68-68: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


75-75: Comparison to None should be cond is None

Replace with cond is None

(E711)


85-85: Comparison to None should be cond is not None

Replace with cond is not None

(E711)

src/workflow/mzmlfileworkflowstatus.py

49-49: Comparison to None should be cond is None

Replace with cond is None

(E711)


56-56: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


60-60: Comparison to None should be cond is not None

Replace with cond is not None

(E711)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-simple-app
  • GitHub Check: build-openms
  • GitHub Check: build-full-app
🔇 Additional comments (4)
src/mzmlfileworkflow.py (2)

9-9: LGTM! Good separation of workflow status logging.

Importing dedicated logging functions from a separate module improves code organization and maintainability.


40-52:

Details

✅ Verification successful

Verify backward compatibility with existing code.

The changes to use separate logging functions are good, but make sure this doesn't break existing code that might call this function directly (without going through the job queue).


🏁 Script executed:

#!/bin/bash
# Check for direct calls to run_workflow that might not provide the new workspace_path parameter
rg -A 2 -B 2 "run_workflow\(.*\)" --type py | grep -v "mzml_workflow_run" | grep -v "workspace_path"

Length of output: 763


Backward Compatibility Verified

Based on the search results, there are no direct calls to run_workflow that omit the new workspace_path parameter. The test file (test_gui.py) shows invocations of the workflow, and none of the searched contexts appear problematic. The updated logging functions are being used appropriately without evidence of breaking the direct API usage.

  • No occurrences of direct run_workflow calls lacking workspace_path were found.
  • Test invocations (e.g., in test_gui.py) seem to supply the required parameters.
  • The changes to separate the logging functions do not introduce backward compatibility issues.

Please continue monitoring tests to ensure that all direct API usages remain compatible.

content/run_example_workflow.py (2)

7-10: Good addition of RQ and Redis imports.
These new dependencies look correct and align with the PR objective of implementing a queuing mechanism.


54-58: Status placeholder looks appropriate.
This approach for showing workflow status and progress via a placeholder is clear and straightforward.

src/cpustats.py Outdated
Comment on lines 1 to 18
import psutil
import time
from streamlit.delta_generator import DeltaGenerator
import streamlit as st

def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator):
with cpu_ram_stats_placeholder:
ram_progress = 1 - psutil.virtual_memory().available / psutil.virtual_memory().total
cpu_progress = psutil.cpu_percent() / 100
ram_stats_column, cpu_stats_column = st.columns(2)

ram_stats_column.text(f"Ram ({int(ram_progress * 100)}%)")
ram_stats_column.progress(ram_progress)

cpu_stats_column.text(f"CPU ({int(cpu_progress * 100)}%)")
cpu_stats_column.progress(cpu_progress)

time.sleep(1) No newline at end of file
Copy link

@coderabbitai coderabbitai bot Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Add missing dependency and improve CPU/RAM monitoring implementation.

The module requires psutil which is missing from the requirements (as indicated by pipeline failures). Additionally, the current implementation has some limitations:

  1. It uses time.sleep(1) which blocks the thread
  2. There's no mechanism to stop the monitoring

First, ensure psutil is added to your requirements:

#!/bin/bash
# Check if psutil is in requirements.txt or environment.yml
grep -E "psutil" requirements.txt environment.yml

Consider improving the implementation for better resource management:

 import psutil
 import time
 from streamlit.delta_generator import DeltaGenerator
 import streamlit as st
+from threading import Event, Thread

-def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator):
+def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator, stop_event: Event = None):
+    """
+    Monitor CPU and RAM usage and display in Streamlit.
+    
+    Args:
+        cpu_ram_stats_placeholder: Streamlit placeholder to display stats
+        stop_event: Optional event to signal stopping the monitoring
+    """
     with cpu_ram_stats_placeholder:
         ram_progress = 1 - psutil.virtual_memory().available / psutil.virtual_memory().total
         cpu_progress = psutil.cpu_percent() / 100
         ram_stats_column, cpu_stats_column = st.columns(2)

         ram_stats_column.text(f"Ram ({int(ram_progress * 100)}%)")
         ram_stats_column.progress(ram_progress)
         
         cpu_stats_column.text(f"CPU ({int(cpu_progress * 100)}%)")
         cpu_stats_column.progress(cpu_progress)
         
-        time.sleep(1)
+        # Only sleep if not being stopped
+        if stop_event is None or not stop_event.is_set():
+            time.sleep(1)

+def start_monitoring_thread(placeholder: DeltaGenerator):
+    """
+    Start CPU/RAM monitoring in a separate thread.
+    
+    Args:
+        placeholder: Streamlit placeholder to display stats
+        
+    Returns:
+        tuple: (thread, stop_event) - use stop_event.set() to stop monitoring
+    """
+    stop_event = Event()
+    
+    def monitor_loop():
+        while not stop_event.is_set():
+            monitor_cpu_ram_stats(placeholder, stop_event)
+    
+    thread = Thread(target=monitor_loop, daemon=True)
+    thread.start()
+    return thread, stop_event
🧰 Tools
🪛 GitHub Actions: Test workflow functions

[error] 1-1: ModuleNotFoundError: No module named 'psutil'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this won't allow access to streamlit placeholder in a separate thread context

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subCode321 You're absolutely right. Streamlit elements should only be accessed from the main thread. Threading would cause issues with the Streamlit execution model.

Here's a more Streamlit-friendly approach using session state to control monitoring:

import psutil
import streamlit as st
from streamlit.delta_generator import DeltaGenerator

def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator):
    """
    Display current CPU and RAM stats in the provided placeholder.
    This function should be called repeatedly from the main Streamlit thread.
    """
    with cpu_ram_stats_placeholder:
        ram_progress = 1 - psutil.virtual_memory().available / psutil.virtual_memory().total
        cpu_progress = psutil.cpu_percent() / 100
        ram_stats_column, cpu_stats_column = st.columns(2)

        ram_stats_column.text(f"RAM ({int(ram_progress * 100)}%)")
        ram_stats_column.progress(ram_progress)
        
        cpu_stats_column.text(f"CPU ({int(cpu_progress * 100)}%)")
        cpu_stats_column.progress(cpu_progress)

Then in your main app, you can call it this way:

# In your main Streamlit app
if 'monitor_active' not in st.session_state:
    st.session_state.monitor_active = True

stats_placeholder = st.empty()

# This will be called on each rerun of your Streamlit app
if st.session_state.monitor_active:
    monitor_cpu_ram_stats(stats_placeholder)
    
# To control monitoring:
if st.button('Toggle Monitoring'):
    st.session_state.monitor_active = not st.session_state.monitor_active

This approach works with Streamlit's execution model rather than against it.


✏️ Learnings added
Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: src/cpustats.py:1-18
Timestamp: 2025-03-10T11:04:48.615Z
Learning: Streamlit elements and placeholders should only be accessed from the main thread. Using threading to manipulate Streamlit UI components will cause issues with Streamlit's execution model.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also added to requirements

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if psutil is in requirements.txt and environment.yml
echo "Checking requirements.txt:"
grep -E "psutil" requirements.txt || echo "Not found in requirements.txt"

echo -e "\nChecking environment.yml:"
grep -E "psutil" environment.yml || echo "Not found in environment.yml"

Length of output: 234


@subCode321 Great! I've verified that psutil==7.0.0 is now properly included in both requirements.txt and environment.yml files.


✏️ Learnings added
Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: src/cpustats.py:1-18
Timestamp: 2025-03-10T11:08:48.576Z
Learning: psutil is required for CPU and RAM monitoring functionality in the streamlit-template project and needs to be included in both requirements.txt and environment.yml.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines 83 to 88
while True:
# monitor if active job exists
if st.session_state['mzml_workflow_job_id'] != None:
monitor_mzml_workflow_job_status(mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir)
monitor_cpu_ram_stats(cpu_ram_stats_placeholder)

Copy link

@coderabbitai coderabbitai bot Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid an infinite loop in a Streamlit app.
A while True: loop can block UI updates indefinitely and consume resources. Consider adding a break, a time-based check, or a different mechanism (e.g. st.experimental_singleton) so the user interface remains responsive.

 while True:
+    import time
     if st.session_state['mzml_workflow_job_id'] is not None:
         monitor_mzml_workflow_job_status(
            mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir
         )
     monitor_cpu_ram_stats(cpu_ram_stats_placeholder)
+    time.sleep(1)  # Sleep to prevent hogging CPU
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while True:
# monitor if active job exists
if st.session_state['mzml_workflow_job_id'] != None:
monitor_mzml_workflow_job_status(mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir)
monitor_cpu_ram_stats(cpu_ram_stats_placeholder)
while True:
import time
if st.session_state['mzml_workflow_job_id'] is not None:
monitor_mzml_workflow_job_status(
mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir
)
monitor_cpu_ram_stats(cpu_ram_stats_placeholder)
time.sleep(1) # Sleep to prevent hogging CPU
🧰 Tools
🪛 Ruff (0.8.2)

85-85: Comparison to None should be cond is not None

Replace with cond is not None

(E711)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need this While loop to constantly monitor CPU Stats. This is how Alphapept has also done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Let me research more about how Streamlit handles this kind of monitoring approach:

Let me also look at the specific file that handles the monitoring functionality:

@subCode321 I understand you need continuous monitoring of CPU stats. While infinite loops in Streamlit can be tricky, there are several approaches that work for real-time monitoring:

  1. The current while True approach you're using is commonly used for real-time monitoring in Streamlit when combined with appropriate sleep intervals to prevent high CPU usage.

  2. Based on what I found, you may want to consider adding a small sleep interval (e.g., time.sleep(0.5)) in your loop to reduce CPU usage while still providing near real-time updates.

  3. If you want to follow the Streamlit recommended approach, you could use:

    • st.experimental_rerun() at the end of your script
    • The newer st.rerun() function (if using Streamlit 1.27.0+)
    • st.cache_resource with background threads for monitoring

That said, if your current approach works for your use case and is similar to what Alphapept uses, it's reasonable to continue with it. Just make sure the monitoring functions themselves incorporate appropriate delays to prevent excessive CPU usage.


🏁 Script executed:

#!/bin/bash
# Check if Alphapept is referenced anywhere in the codebase
rg -i "alphapept" --type-not binary

# Look for other instances of infinite loops in the codebase
rg "while\s+True" --type py

# Let's also look at how monitor_cpu_ram_stats is implemented
cat src/cpustats.py

Length of output: 920


🏁 Script executed:

#!/bin/bash
# Check the implementation of monitoring functions
cat src/workflow/mzmlfileworkflowstatus.py

Length of output: 2862


🌐 Web query:

How to implement real-time monitoring in Streamlit applications using infinite loops

💡 Result:

To implement real-time monitoring in Streamlit applications using infinite loops, developers typically use controlled refresh mechanisms combined with Streamlit's session state and UI components. Below are the key approaches and best practices:


Core Implementation Methods

  1. Controlled while Loop with st.rerun()
    Use time.sleep() and st.rerun() to simulate real-time updates without blocking the app:

    import time
    import streamlit as st
    
    def main():
        st.title("Real-Time Dashboard")
        while True:
            # Capture/process data (e.g., network packets, sensor readings)
            updated_data = fetch_live_data()  
            
            # Update visualizations
            st.line_chart(updated_data)
            time.sleep(5)  # Refresh every 5 seconds
            st.rerun()
    
    if __name__ == "__main__":
        main()

    This approach ensures the dashboard refreshes at fixed intervals[1][10].

  2. Streamlit Autorefresh Component
    For a cleaner solution, use the streamlit-autorefresh package to avoid manual loop management:

    from streamlit_autorefresh import st_autorefresh
    import streamlit as st
    
    # Auto-refresh every 2 seconds (2000 milliseconds)
    st_autorefresh(interval=2000, key="data_refresh")
    
    # Fetch and display live data
    st.metric("Active Users", get_live_user_count())
    st.line_chart(get_time_series_data())

    This method offloads refresh logic to a dedicated component[3].


Best Practices

  • Use st.empty() for Dynamic Updates
    Create placeholders to overwrite specific UI elements without rerunning the entire script:

    placeholder = st.empty()
    while True:
        with placeholder.container():
            st.metric("CPU Usage", get_cpu_usage())
            st.bar_chart(get_resource_stats())
        time.sleep(3)

    This minimizes performance overhead[10][16].

  • Leverage Session State
    Store persistent data (e.g., packet processors) in st.session_state:

    if 'processor' not in st.session_state:
        st.session_state.processor = initialize_data_processor()

    Ensures state retention across reruns[1].

  • Handle Privileges for Low-Level Operations
    For network/sensor data access, run Streamlit with administrative rights:

    sudo streamlit run dashboard.py  # Linux/macOS

Common Pitfalls & Solutions

Issue Solution
Infinite loops blocking UI updates[2][5] Use st_autorefresh instead of raw while True loops.
Memory leaks from unmanaged cache[11] Clear unused cached data with @st.cache_resource(ttl=60).
Deployment failures due to infinite loops[8][16] Test locally first; use Streamlit Cloud's resource limits.

Example: Network Traffic Dashboard

import time
import streamlit as st
from packet_sniffer import capture_packets  # Custom data source

def main():
    st.set_page_config(layout="wide")
    st.title("Live Network Traffic")
    
    # Initialize packet capture
    if 'sniffer' not in st.session_state:
        st.session_state.sniffer = capture_packets()
    
    # Dashboard layout
    col1, col2 = st.columns(2)
    with col1:
        st.metric("Total Packets", len(st.session_state.sniffer.packets))
    with col2:
        st.metric("Capture Duration", f"{time.time() - st.session_state.start_time:.2f}s")
    
    # Auto-refresh logic
    time.sleep(2)
    st.rerun()

if __name__ == "__main__":
    main()

(Adapted from [1][10])


For production use cases, prioritize the streamlit-autorefresh component or Streamlit's native st.rerun() with controlled intervals. Avoid raw infinite loops unless absolutely necessary, as they can cause memory issues and deployment failures[5][8].

Citations:


@subCode321 I see that you need the while True loop for continuous CPU stats monitoring. The implementation looks reasonable and I notice:

  1. Your monitor_cpu_ram_stats() function in src/cpustats.py already includes a time.sleep(1) at the end, which prevents excessive CPU usage.

  2. This approach of using infinite loops for monitoring appears elsewhere in your codebase (in src/run_subprocess.py).

  3. While Streamlit apps typically avoid infinite loops, they are appropriate for real-time monitoring when they include proper sleep intervals - which yours does.

This implementation is valid for your use case. Since you've mentioned Alphapept uses a similar approach, and your monitoring functions already include appropriate pauses, I agree with keeping the current implementation.


✏️ Learnings added
Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: content/run_example_workflow.py:83-88
Timestamp: 2025-03-10T11:09:56.311Z
Learning: In the OpenMS streamlit-template project, continuous real-time monitoring of CPU stats and workflow jobs is implemented using a while True loop, which is considered acceptable for this specific use case as confirmed by the maintainer.

Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: content/run_example_workflow.py:83-88
Timestamp: 2025-03-10T11:09:56.311Z
Learning: In the OpenMS streamlit-template project, continuous real-time monitoring of CPU stats and workflow jobs is implemented using a while True loop with a time.sleep(1) in the monitor_cpu_ram_stats function, which is considered acceptable for this specific use case as confirmed by the maintainer.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@subCode321
Copy link
Contributor Author

@t0mdavid-m I've implemented end to end. Can I get a review on this please?

@t0mdavid-m t0mdavid-m self-requested a review March 11, 2025 14:28
@t0mdavid-m t0mdavid-m linked an issue Mar 11, 2025 that may be closed by this pull request
@subCode321
Copy link
Contributor Author

Just a heads-up. Since, there's a continuous while loop that runs to check for CPU stats ... the test_gui.py is facing timeouts as the code blocks runs forever.

Sharing video of working code:

Streamlit.Scheduler.mp4

@t0mdavid-m
Copy link
Member

Just a heads-up. Since, there's a continuous while loop that runs to check for CPU stats ... the test_gui.py is facing timeouts as the code blocks runs forever.

Sharing video of working code:
Streamlit.Scheduler.mp4

Thank you for the contribution! I will get back to you regarding the scheduler ASAP. For checking the CPU stats, please do not use a while loop. I think st.fragment() might be useful here as they can be configured to refresh at regular intervals.

@jcharkow
Copy link
Contributor

Please separate out the hardware-utilization and the scheduler issue as these are two different concepts that are not related to one another and makes PR more confusing. Easiest thing to do will be to remove the hardware part from this PR and just keep the scheduler code so that we can merge one of the features even if the other feature is not ready.

For the hardware utilization

  1. Please put on the expandable tab in the sidebar st.Expander so that it is visible on any page. (Similar to where the workspace is displayed).

@t0mdavid-m will get back to you on next steps for the scheduler.

@subCode321
Copy link
Contributor Author

Please separate out the hardware-utilization and the scheduler issue as these are two different concepts that are not related to one another and makes PR more confusing. Easiest thing to do will be to remove the hardware part from this PR and just keep the scheduler code so that we can merge one of the features even if the other feature is not ready.

For the hardware utilization

  1. Please put on the expandable tab in the sidebar st.Expander so that it is visible on any page. (Similar to where the workspace is displayed).

@t0mdavid-m will get back to you on next steps for the scheduler.

sure, I will do that.

@subCode321
Copy link
Contributor Author

Just a heads-up. Since, there's a continuous while loop that runs to check for CPU stats ... the test_gui.py is facing timeouts as the code blocks runs forever.
Sharing video of working code:
Streamlit.Scheduler.mp4

Thank you for the contribution! I will get back to you regarding the scheduler ASAP. For checking the CPU stats, please do not use a while loop. I think st.fragment() might be useful here as they can be configured to refresh at regular intervals.

thank you so much for the suggestion. I will do that!

@subCode321
Copy link
Contributor Author

@jcharkow I've split CPU vizualizer to this PR --> #175

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/installation.md (1)

32-34: Clarify and Enhance Installation Steps Formatting

The added instructions clearly outline the new process to run the app with Redis and RQ. To improve readability and meet best practices, consider revising the punctuation. For instance, adding a comma after "Finally" and a period at the end of line 34 would address the static analysis hints and ensure consistency in the list items.

Example diff:

-3. Finally start the application `streamlit run app.py`
+3. Finally, start the application `streamlit run app.py`.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...

(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ebd5c5 and a5627fb.

📒 Files selected for processing (1)
  • docs/installation.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/installation.md

[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...

(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-full-app
  • GitHub Check: build-openms
  • GitHub Check: build-simple-app

@subCode321
Copy link
Contributor Author

I've separated out both the PRs

Copy link
Member

@t0mdavid-m t0mdavid-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! Sorry for taking a while to get back to you but I had to think about the path forward a bit.

I think your solution is good but at the moment requires manual implementation of the queuing into each workflow. While this will always be the case for simple pyOpenMS based workflows (like you exemplary implementation), these usually are not very compute intense. It would be more important to implement the queue for workflows involving the WorkflowManager class.

Could you implement the queue at a higher level for the methods run_multiple_commands(), run_command(), run_topp(), and run_python()? This should enable the queuing to be just merged into existing workflows and cover the most computationally intense cases.

@subCode321
Copy link
Contributor Author

Thank you for your contribution! Sorry for taking a while to get back to you but I had to think about the path forward a bit.

I think your solution is good but at the moment requires manual implementation of the queuing into each workflow. While this will always be the case for simple pyOpenMS based workflows (like you exemplary implementation), these usually are not very compute intense. It would be more important to implement the queue for workflows involving the WorkflowManager class.

Could you implement the queue at a higher level for the methods run_multiple_commands(), run_command(), run_topp(), and run_python()? This should enable the queuing to be just merged into existing workflows and cover the most computationally intense cases.

Sure, I'll add this

@subCode321
Copy link
Contributor Author

subCode321 commented Mar 20, 2025

I've resumed work on this .. but I'm unable to test the top workflow

image

@t0mdavid-m
Copy link
Member

The TOPP framework looks for the TOPP tools. Installing OpenMS from here should fix the issue.

@subCode321
Copy link
Contributor Author

The TOPP framework looks for the TOPP tools. Installing OpenMS from here should fix the issue.

Sure, thanks. Also should I remove the current changes to the pyOpenMS Workflow? And since the pyOpenMS Workflow doesn't use the run_command or run_multiple_commands ... how do we enable scheduler for this simple workflow?

@subCode321 subCode321 requested a review from t0mdavid-m March 20, 2025 19:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
environment.yml (1)

18-20: Add new dependencies for job queue implementation.

The addition of redis and rq dependencies is necessary for implementing the job queue functionality. These packages will enable the application to queue jobs and process them asynchronously.

One minor issue:

 - rq==2.1.0
+

Add a newline at the end of the file to comply with standard file formatting conventions.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 20-20: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccdf34a and 0fcb925.

📒 Files selected for processing (10)
  • .github/workflows/build-windows-executable-app.yaml (1 hunks)
  • Dockerfile (1 hunks)
  • Dockerfile_simple (1 hunks)
  • docs/installation.md (1 hunks)
  • environment.yml (1 hunks)
  • requirements.txt (1 hunks)
  • src/Workflow.py (2 hunks)
  • src/workflow/CommandExecutor.py (1 hunks)
  • src/workflow/WorkflowManager.py (2 hunks)
  • test_gui.py (0 hunks)
💤 Files with no reviewable changes (1)
  • test_gui.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • requirements.txt
  • docs/installation.md
  • Dockerfile
  • Dockerfile_simple
  • .github/workflows/build-windows-executable-app.yaml
🧰 Additional context used
🧬 Code Definitions (1)
src/Workflow.py (1)
src/workflow/CommandExecutor.py (2) (2)
  • run_topp (110-196)
  • run_python (214-269)
🪛 YAMLlint (1.35.1)
environment.yml

[error] 20-20: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-simple-app
  • GitHub Check: build-full-app
  • GitHub Check: build-openms
🔇 Additional comments (6)
src/workflow/CommandExecutor.py (1)

93-94: Good defensive programming with file existence check.

Adding a check for file existence before attempting to unlink is a good defensive programming practice. This prevents potential exceptions that could occur if the PID file is already removed or doesn't exist when a job is processed from the queue.

src/Workflow.py (4)

72-76: Good implementation of job queuing for feature detection.

The code now uses a queue to schedule the FeatureFinderMetabo task instead of running it synchronously. This enables better resource management and parallelization.


86-91: Effective task dependency management with queuing.

Good implementation of task dependencies using the depends_on parameter. This ensures that the feature linking step only runs after feature detection is complete, maintaining the correct workflow sequence while leveraging asynchronous processing.


93-98: Consistent approach to task queuing.

The export task is properly queued with dependency on the feature linking job, maintaining the logical sequence of the workflow. This implementation ensures that tasks are executed in the correct order.


102-102:

Details

✅ Verification successful

Missing dependency for optional Python script execution.

The optional Python script execution is queued, but it doesn't have any dependencies specified. If this script can run independently of the other tasks, this is fine. However, if there should be specific ordering, consider adding a dependency.


🏁 Script executed:

#!/bin/bash
# Check if the example Python script needs to depend on other tasks

# Examine the example.py script to see if it uses outputs from other tasks
cat src/python-tools/example.py

Length of output: 2039


Optional Python Script Execution Dependency:

After verifying the content of src/python-tools/example.py, it appears that the script is designed to run independently. It accepts parameters via a JSON file when provided, and there’s no inherent dependency on outputs from other tasks. The enqueued execution in src/Workflow.py (line 102) correctly supplies the "in" parameter (using in_mzML), which aligns with the expectations of the script.

As the script does not currently demand an ordering dependency, no immediate changes are required. However, if future updates introduce reliance on outputs from prior tasks, please consider explicitly defining those dependencies.

src/workflow/WorkflowManager.py (1)

11-12: Initialize Redis Queue for job scheduling.

Good implementation of Redis Queue for job scheduling. This enables asynchronous processing of workflow tasks and better resource management.

Also applies to: 18-18

@subCode321
Copy link
Contributor Author

Thank you for your contribution! Sorry for taking a while to get back to you but I had to think about the path forward a bit.

I think your solution is good but at the moment requires manual implementation of the queuing into each workflow. While this will always be the case for simple pyOpenMS based workflows (like you exemplary implementation), these usually are not very compute intense. It would be more important to implement the queue for workflows involving the WorkflowManager class.

Could you implement the queue at a higher level for the methods run_multiple_commands(), run_command(), run_topp(), and run_python()? This should enable the queuing to be just merged into existing workflows and cover the most computationally intense cases.

@t0mdavid-m .... I've implemented this and the PR is now ready for review

Comment on lines +261 to 262
echo '@echo off' > ${{ env.APP_NAME }}.bat
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra space

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/build-windows-executable-app.yaml (2)

261-261: Remove Trailing Whitespace on Line 261

There is an extra trailing whitespace on this line. While it might seem insignificant, cleaning it up helps avoid static analysis warnings and maintains a clean file style.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 261-261: trailing spaces

(trailing-spaces)


270-270: Remove Trailing Whitespace on Line 270

This line also contains trailing spaces. Removing them ensures the file adheres to clean coding standards.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 270-270: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05cfa3d and b4733c1.

📒 Files selected for processing (1)
  • .github/workflows/build-windows-executable-app.yaml (2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/build-windows-executable-app.yaml

[error] 261-261: trailing spaces

(trailing-spaces)


[error] 270-270: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-simple-app
  • GitHub Check: build-full-app

Comment on lines +270 to 272
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' > ${{ env.APP_NAME }}.bat
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m rq worker --with-scheduler' >> ${{ env.APP_NAME }}.bat
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local' >> ${{ env.APP_NAME }}.bat
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Prevent Accidental File Overwrite with Correct Redirection

In the updated batch file section, the command on line 270 uses the single redirection operator (>). This causes the file to be overwritten—erasing previously appended commands such as the initial setup instructions. To ensure that all commands remain in the resulting batch file, change the redirection operator from > to >>.

Below is the suggested fix:

-echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' > ${{ env.APP_NAME }}.bat 
+echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' >> ${{ env.APP_NAME }}.bat 
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' > ${{ env.APP_NAME }}.bat
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m rq worker --with-scheduler' >> ${{ env.APP_NAME }}.bat
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local' >> ${{ env.APP_NAME }}.bat
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' >> ${{ env.APP_NAME }}.bat
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m rq worker --with-scheduler' >> ${{ env.APP_NAME }}.bat
echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local' >> ${{ env.APP_NAME }}.bat
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 270-270: trailing spaces

(trailing-spaces)

@subCode321
Copy link
Contributor Author

@t0mdavid-m does the change look good to you?

Copy link
Member

@t0mdavid-m t0mdavid-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the changes - everything looks good in principle! One suggestion: could you encapsulate the queuing logic within the WorkflowManager class so that methods like run_topp() automatically handle adding the tool to the queue internally? This would simplify usage and keep the queuing mechanism abstracted away from the user.

@subCode321
Copy link
Contributor Author

Thanks again for the changes - everything looks good in principle! One suggestion: could you encapsulate the queuing logic within the WorkflowManager class so that methods like run_topp() automatically handle adding the tool to the queue internally? This would simplify usage and keep the queuing mechanism abstracted away from the user.

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a Scheduler for Workflows

3 participants