-
Notifications
You must be signed in to change notification settings - Fork 35
Use rq worker to schedule workflow runs #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes update dependency files by repositioning Changes
Sequence Diagram(s)sequenceDiagram
participant Container
participant EntryPoint
participant RedisServer
participant RQWorker
participant StreamlitApp
Container->>EntryPoint: Start container
EntryPoint->>RedisServer: Execute "redis-server"
EntryPoint->>RQWorker: Execute "rq worker --with-scheduler"
EntryPoint->>StreamlitApp: Launch "streamlit run app.py"
sequenceDiagram
participant WM as WorkflowManager
participant Q as Job Queue
participant W as RQ Worker
participant E as Task Executor
WM->>Q: Enqueue FeatureFinderMetabo job
Q->>W: Process FeatureFinderMetabo job
W->>E: Execute FeatureFinderMetabo task
E-->>Q: On completion, trigger FeatureLinkerUnlabeledKD job
Q->>W: Process FeatureLinkerUnlabeledKD job
W->>E: Execute FeatureLinkerUnlabeledKD task
E-->>Q: On completion, trigger export and final tasks
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
requirements.txt (1)
10-12: Consider pinningstreamlit-js-eval.While
redisandrqare pinned to specific versions,streamlit-js-evalis not. Pinning all packages ensures consistent builds and reduces the risk of unintended breakage or dependency conflicts.content/run_example_workflow.py (3)
7-9: Make Redis connection configurable.Redis is being initialized with default parameters, which might be fine for local development. However, for production or different deployment environments, consider externalizing the host/port or retrieving them from environment variables.
53-55: Reuse the Redis connection or handle exceptions.Creating a new Redis connection each time this function is called can lead to excessive overhead or connection issues. Consider reusing a single connection (e.g., a global instance) or adding error handling if the job ID is invalid or Redis is unavailable.
60-80: Handle additional job states.Currently, the loop checks if the job is finished but doesn't handle failures separately. Consider checking for
job.is_failedto correctly update the status if a failure occurs.while not (job.is_finished or job.is_failed): logs = get_workflow_progress_logs(job.id) ... if job.is_failed: status.update(label="Workflow failed!", expanded=False, state='error') elif job.is_finished: status.update(label="Workflow complete!", expanded=False, state='complete')src/mzmlfileworkflow.py (3)
9-10: Use environment-based Redis configuration.For production, it’s often preferable to configure Redis connections (host, port, credentials) via environment variables or a separate configuration file rather than relying on defaults.
31-36: Handle cases when no job is available.
get_current_job()may returnNoneif run outside an RQ worker context. Also, creating a new Redis connection every time can be costly. Consider reusing a single connection or shared connection pool for better performance.
48-60: Add error handling for file load operations.If an mzML file is missing or fails during loading, the loop continues without reporting the error. Wrap file loading in a try/except block to log relevant error details via Redis.
try: num_spectra_val = mzML_file_get_num_spectra( str(Path(workspace_path, "mzML-files", file + ".mzML")) ) num_spectra.append(num_spectra_val) except Exception as e: log_workflow_progress(f"Error reading mzML file {file}: {e}") continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
content/run_example_workflow.py(2 hunks)requirements.txt(1 hunks)src/mzmlfileworkflow.py(3 hunks)
🔇 Additional comments (1)
src/mzmlfileworkflow.py (1)
38-38: Verify validity ofworkspace_path.Ensure
workspace_pathexists and is accessible. Handling this scenario prevents unexpected runtime errors or missing file references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
app.py (1)
12-27: Platform-specific handling for RQ worker looks good.The implementation correctly differentiates between Windows and Unix-based systems for launching the RQ worker. For Windows, the
CREATE_NEW_PROCESS_GROUPflag allows sending control signals later, while Unix systems use a different approach withstart_new_session=True.Consider extracting the queue name "mzml_workflow_run" into a constant at the top of the file to avoid repetition and make future changes easier.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
app.py
10-10: pyopenms imported but unused
Remove unused import: pyopenms
(F401)
37-37: Do not use bare except
(E722)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-simple-app
- GitHub Check: build-full-app
- GitHub Check: build-openms
🔇 Additional comments (2)
app.py (2)
4-8: New imports correctly support the RQ worker functionality.These imports are necessary for subprocess management, platform detection, and process cleanup.
10-10:Details
❓ Verification inconclusive
Verify necessity of the pyopenms import.
The static analysis flags
pyopenmsas unused, but there's a comment suggesting it's necessary for Windows. Please verify if this import is still needed or document more clearly why it's required.
🏁 Script executed:
#!/bin/bash # Check if pyopenms is used in the codebase rg -i "pyopenms" --type pyLength of output: 884
Action: Clarify Usage of
pyopenmsin app.pyIt appears that the import of
pyopenmson line 10 inapp.pyisn’t directly used in that file. However, given that other parts of the codebase import and usepyopenms(typically aliased aspoms), there may be a side effect—perhaps related to Windows-specific initialization—that isn’t immediately obvious. Please verify if this import is intentionally retained for such side effects. If it is required for Windows functionality or any implicit initialization, add a comment in the code documenting this rationale. Otherwise, consider removing the unused import to clean up the code.🧰 Tools
🪛 Ruff (0.8.2)
10-10:
pyopenmsimported but unusedRemove unused import:
pyopenms(F401)
2e71dca to
89a84d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🧹 Nitpick comments (3)
test_gui.py (3)
7-7: Remove unused import
MagicMockis imported but not used in the code. You should remove unused imports to keep the codebase clean.- from unittest.mock import patch, MagicMock + from unittest.mock import patch🧰 Tools
🪛 Ruff (0.8.2)
7-7:
unittest.mock.MagicMockimported but unusedRemove unused import:
unittest.mock.MagicMock(F401)
138-146: Duplicate error handling logicThis is a duplicate of the logic at lines 112-120. Consider extracting this repeated logic into a helper function.
+ def create_example_files(mzml_dir): + """Create example mzML files in the given directory.""" + example_dir = Path("example-data", "mzML") + if example_dir.exists(): + for f in example_dir.glob("*.mzML"): + shutil.copy(f, mzml_dir) + else: + # Create empty example files if directory doesn't exist + for name in ['Blank.mzML', 'Treatment.mzML', 'Pool.mzML', 'Control.mzML']: + Path(mzml_dir, name).touch() # Then use this function in both test cases
156-160: Simplify nested with statementsSimilar to the previous suggestion, these nested
withstatements could be simplified for better readability.- with patch('redis.Redis', return_value=MockRedis()): - with patch('rq.Queue', return_value=MockQueue()): - launch.button[1].click().run(timeout=60) - assert not launch.exception + with patch('redis.Redis', return_value=MockRedis()), \ + patch('rq.Queue', return_value=MockQueue()): + launch.button[1].click().run(timeout=60) + assert not launch.exception🧰 Tools
🪛 Ruff (0.8.2)
157-158: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
🛑 Comments failed to post (2)
test_gui.py (2)
39-39: 💡 Verification agent
🧩 Analysis chain
Verify that the module has the correct function name
The repeated pipeline failure on this line indicates that
get_workflow_progress_logsis not defined insrc.mzmlfileworkflow. This is a critical issue that needs to be fixed.
🏁 Script executed:
#!/bin/bash # Check if the correct function exists in the mzmlfileworkflow module echo "Looking for workflow logging functions in mzmlfileworkflow:" rg "def get_workflow" src/mzmlfileworkflow.py echo "\nChecking imports in this module:" rg "import|from" src/mzmlfileworkflow.pyLength of output: 693
Attention: Incorrect function reference in patch
The patch in
test_gui.pyis targetingget_workflow_progress_logsinsrc/mzmlfileworkflow.py, but our search did not reveal any function starting with "get_workflow"—confirming that no such function is defined in the module. Please review if:
- The intended function was misnamed in the patch.
- The module is missing the expected
get_workflow_progress_logsimplementation.Adjust the test or implement the missing function to resolve the pipeline failure.
🧰 Tools
🪛 GitHub Actions: Test workflow functions
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
34-40:
⚠️ Potential issueFix the missing module attribute and improve code structure
The pipeline is failing because
get_workflow_progress_logsis not defined in thesrc.mzmlfileworkflowmodule. Also, the nestedwithstatements could be simplified for better readability.Fix the missing attribute:
- with patch('src.mzmlfileworkflow.get_workflow_progress_logs', return_value=[b"Mock log entry"]): + with patch('src.mzmlfileworkflow.get_workflow_logs', return_value=[b"Mock log entry"]):Or make sure the
get_workflow_progress_logsfunction is properly defined insrc.mzmlfileworkflow.Also, you can simplify the nested
withstatements:- with patch('redis.Redis', return_value=MockRedis()): - with patch('rq.Queue', return_value=MockQueue()): - with patch('src.mzmlfileworkflow.get_workflow_progress_logs', return_value=[b"Mock log entry"]): - yield + with patch('redis.Redis', return_value=MockRedis()), \ + patch('rq.Queue', return_value=MockQueue()), \ + patch('src.mzmlfileworkflow.get_workflow_progress_logs', return_value=[b"Mock log entry"]): + yield📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.@pytest.fixture(autouse=True) def mock_redis_rq(): """Mock Redis and RQ components for all tests""" with patch('redis.Redis', return_value=MockRedis()), \ patch('rq.Queue', return_value=MockQueue()), \ patch('src.mzmlfileworkflow.get_workflow_logs', return_value=[b"Mock log entry"]): yield🧰 Tools
🪛 Ruff (0.8.2)
37-38: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
🪛 GitHub Actions: Test workflow functions
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
[error] 39-39: AttributeError: <module 'src.mzmlfileworkflow' does not have the attribute 'get_workflow_progress_logs'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
Dockerfile (1)
144-146: Consider adding error handling for Redis and RQ workerThe current implementation doesn't include any error handling for Redis and RQ worker startup, which could lead to silent failures.
Consider adding basic error checking:
- echo "redis-server" >> /app/entrypoint.sh && \ - echo "rq worker --with-scheduler -q mzml_workflow_run" >> /app/entrypoint.sh && \ + echo "redis-server & REDIS_PID=\$!" >> /app/entrypoint.sh && \ + echo "sleep 2" >> /app/entrypoint.sh && \ + echo "if ! kill -0 \$REDIS_PID 2>/dev/null; then echo 'Redis failed to start'; exit 1; fi" >> /app/entrypoint.sh && \ + echo "rq worker --with-scheduler -q mzml_workflow_run & RQ_PID=\$!" >> /app/entrypoint.sh && \ + echo "sleep 1" >> /app/entrypoint.sh && \ + echo "if ! kill -0 \$RQ_PID 2>/dev/null; then echo 'RQ worker failed to start'; exit 1; fi" >> /app/entrypoint.sh && \
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Dockerfile(1 hunks)Dockerfile_simple(1 hunks)environment.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-openms
- GitHub Check: build-full-app
- GitHub Check: build-simple-app
🔇 Additional comments (2)
environment.yml (1)
18-19: Dependencies added for Redis and RQ integrationThe addition of
redisandrqpackages supports the implementation of job queuing with the Redis Queue worker as mentioned in the PR objectives.Dockerfile (1)
144-145: Consider running Redis and RQ worker in backgroundThe Redis server and RQ worker are started sequentially in the entrypoint script without being backgrounded. This could prevent the Streamlit app from starting since each command will block until it completes.
Apply the same fix as suggested in the Dockerfile_simple review:
- echo "redis-server" >> /app/entrypoint.sh && \ - echo "rq worker --with-scheduler -q mzml_workflow_run" >> /app/entrypoint.sh && \ + echo "redis-server &" >> /app/entrypoint.sh && \ + echo "sleep 2 && rq worker --with-scheduler -q mzml_workflow_run &" >> /app/entrypoint.sh && \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/build-windows-executable-app.yaml (1)
262-264: Review of Batch File Command Setup for Redis, RQ Worker, and Streamlit
The new commands correctly implement the change required by the PR by manually starting the Redis server and the RQ worker (with scheduler) before launching the Streamlit app. However, please note the following points:
- The third command (line 264) has an extra leading whitespace inside the quotes. This inconsistency might be unintentional and could be removed for uniformity.
- Since the PR description mentions that these processes will eventually be automated within
app.py, it would be beneficial to add a TODO comment indicating that this manual batch file startup should be revisited once the automation is implemented.Proposed Diff to Remove the Leading Whitespace:
- echo " start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local" >> ${{ env.APP_NAME }}.bat + echo "start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local" >> ${{ env.APP_NAME }}.bat
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/build-windows-executable-app.yaml(1 hunks)Dockerfile(1 hunks)Dockerfile_simple(1 hunks)test_gui.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- Dockerfile
- Dockerfile_simple
🧰 Additional context used
🪛 GitHub Actions: Test workflow functions
test_gui.py
[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])
[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])
[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])
[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])
[error] 108-108: AssertionError: assert not ElementList(_list=[Exception(message='Error 111 connecting to localhost:6379. Connection refused.')])
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-openms
- GitHub Check: build-simple-app
- GitHub Check: build-full-app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
content/run_example_workflow.py (1)
10-10: Remove unused import.The
threadingmodule is imported but never used in this file.-import threading🧰 Tools
🪛 Ruff (0.8.2)
10-10:
threadingimported but unusedRemove unused import:
threading(F401)
src/workflow/mzmlfileworkflowstatus.py (1)
21-39: Job status monitoring with Streamlit UI updates.This function effectively bridges RQ job execution with Streamlit UI updates through:
- Creating a status indicator in the Streamlit UI
- Continuously polling Redis for new log messages
- Updating the UI in real-time as new messages arrive
- Properly finalizing the status when the job completes or fails
Consider these improvements:
- Simplify the nested
withstatements- Add error handling for Redis connection issues
- Implement a timeout mechanism for stalled jobs
-def monitor_mzml_workflow_job_status(job: job.Job, streamlit_status_placeholder: DeltaGenerator): - with streamlit_status_placeholder.container(): - with st.status("Workflow in progress...", expanded=True) as status: +def monitor_mzml_workflow_job_status(job: job.Job, streamlit_status_placeholder: DeltaGenerator, timeout_seconds: int = 3600): + with streamlit_status_placeholder.container(), st.status("Workflow in progress...", expanded=True) as status: last_len = 0 + start_time = time.time() while not job.is_finished: - logs = get_mzml_workflow_progress_logs(job.id) + try: + logs = get_mzml_workflow_progress_logs(job.id) + + if len(logs) > last_len: + # Read from the last read log and update the status message + for log in logs[last_len:]: + status.text(log.decode('utf-8')) + last_len = len(logs) - - if len(logs) > last_len: - # Read from the last read log and update the status message - for log in logs[last_len:]: - status.text(log.decode('utf-8')) - last_len = len(logs) + except Exception as e: + status.text(f"Error retrieving logs: {str(e)}") + + # Check for timeout + if time.time() - start_time > timeout_seconds: + status.update(label="Workflow timed out!", expanded=False, state='error') + break time.sleep(1) if job.is_finished: status.update(label="Workflow complete!", expanded=False, state='complete') else: status.update(label="Workflow stopped!", expanded=False, state='error')🧰 Tools
🪛 Ruff (0.8.2)
22-23: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
content/run_example_workflow.py(2 hunks)src/mzmlfileworkflow.py(3 hunks)src/workflow/mzmlfileworkflowstatus.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/mzmlfileworkflow.py
🧰 Additional context used
🪛 Ruff (0.8.2)
content/run_example_workflow.py
10-10: threading imported but unused
Remove unused import: threading
(F401)
src/workflow/mzmlfileworkflowstatus.py
22-23: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-full-app
- GitHub Check: build-openms
- GitHub Check: build-simple-app
🔇 Additional comments (6)
content/run_example_workflow.py (3)
7-10: New imports for Redis queue implementation.These imports are necessary to support the new job queuing functionality. The Redis connection, RQ queue, and monitoring function are essential for the asynchronous workflow execution.
🧰 Tools
🪛 Ruff (0.8.2)
10-10:
threadingimported but unusedRemove unused import:
threading(F401)
54-55: Status placeholder for workflow progress reporting.Good addition - this empty placeholder will be used to display real-time progress updates from the queued workflow job.
60-62: Workflow execution now uses Redis queue instead of direct execution.This change implements the core functionality of the PR by:
- Creating a Redis queue named 'mzml_workflow_run'
- Enqueueing the workflow job with necessary parameters
- Monitoring job progress and updating the UI
The implementation correctly passes
st.session_state["workspace"]as a new parameter to the workflow function.src/workflow/mzmlfileworkflowstatus.py (3)
1-7: Appropriate imports for Redis integration and Streamlit components.The imports cover all the requirements for this module:
- Redis for storage
- RQ job management
- Time for sleep operations
- Streamlit components for UI updates
- Type annotations
8-11: Function to retrieve workflow progress logs from Redis.This helper function correctly retrieves all logs for a specific job ID from Redis using a consistent key format. The function properly specifies a return type of
List[bytes]for type safety.
13-18: Function to log workflow progress to Redis.This function properly:
- Gets the current job context
- Constructs a consistent log key based on job ID
- Appends the message to the Redis list
This implementation allows for real-time progress tracking across distributed components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
src/common/common.py (1)
460-465:⚠️ Potential issueAdd defensive check for mzml_workflow_job_id.
The added session state key parameter makes the download button unique per workflow job, which is good for preventing state conflicts. However, if
mzml_workflow_job_idis not set in the session state (e.g., when running other workflows), this would raise a KeyError.Consider adding a defensive check:
- key=f"download_{st.session_state['mzml_workflow_job_id']}" + key=f"download_{st.session_state.get('mzml_workflow_job_id', 'default')}"
🧹 Nitpick comments (7)
docs/installation.md (1)
32-34: Fix punctuation and clarify process requirements.The installation instructions need punctuation fixes and additional context about why these processes are needed.
-1. Open a new terminal and run `redis-server` -2. On another terminal run `rq worker --with-scheduler -q mzml_workflow_run` -3. Finally start the application `streamlit run app.py` +1. Open a new terminal and run `redis-server` (provides the message queue backend). +2. On another terminal run `rq worker --with-scheduler -q mzml_workflow_run` (processes workflow jobs asynchronously). +3. Finally start the application with `streamlit run app.py`.🧰 Tools
🪛 LanguageTool
[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)
src/mzmlfileworkflow.py (1)
30-30: Update function docstring to reflect the new parameter.The function signature has been updated to include
workspace_path, but the docstring hasn't been updated to document this parameter.-def run_workflow(params, result_dir, workspace_path): - """Load each mzML file into pyOpenMS Experiment and get the number of spectra.""" +def run_workflow(params, result_dir, workspace_path): + """ + Load each mzML file into pyOpenMS Experiment and get the number of spectra. + + Args: + params: Dictionary containing workflow parameters + result_dir: Directory to store results + workspace_path: Path to the workspace containing mzML files + """content/run_example_workflow.py (3)
62-64: Consider handling connection errors to Redis.
If Redis is offline or misconfigured,queue.enqueuemay fail. Handling exceptions would improve resilience.try: job = queue.enqueue(mzmlfileworkflow.run_workflow, params, result_dir, st.session_state["workspace"]) except RedisError as e: st.error(f"Failed to enqueue job: {e}")
68-68: Use 'is None' or 'is not None' for None checks.
According to style guidelines (E711), using== Noneor!= Noneis discouraged in favor ofis Noneoris not None.-if st.session_state['mzml_workflow_job_id'] != None: +if st.session_state['mzml_workflow_job_id'] is not None: -if st.session_state['mzml_workflow_job_id'] == None: +if st.session_state['mzml_workflow_job_id'] is None: -if st.session_state['mzml_workflow_job_id'] != None: +if st.session_state['mzml_workflow_job_id'] is not None:Also applies to: 75-75, 85-85
🧰 Tools
🪛 Ruff (0.8.2)
68-68: Comparison to
Noneshould becond is not NoneReplace with
cond is not None(E711)
72-73: Fix the spelling of “results_section_placholder”.
Renaming toresults_section_placeholderimproves consistency and clarity.-results_section_placholder = st.empty() -with results_section_placholder: +results_section_placeholder = st.empty() +with results_section_placeholder:src/workflow/mzmlfileworkflowstatus.py (2)
49-49: Use 'is None' or 'is not None' when checking for None.
Following Python conventions (E711) improves readability and avoids subtle bugs.-if st.session_state['mzml_workflow_job_id'] == None: +if st.session_state['mzml_workflow_job_id'] is None: -if log != None: +if log is not None: -if status_log != None: +if status_log is not None:Also applies to: 56-56, 60-60
🧰 Tools
🪛 Ruff (0.8.2)
49-49: Comparison to
Noneshould becond is NoneReplace with
cond is None(E711)
67-71: Include error details for failed jobs.
When the job is stopped, canceled, or fails, consider showing the error details (e.g.,job.exc_info) to help debugging.elif job.is_stopped or job.is_canceled or job.is_failed: workflow_status.update(label="Stopped!", expanded=False, state='error') with results_section_placholder: reset_workflow_state(result_dir) + if job.is_failed and job.exc_info: + st.error(f"Job failed with error:\n{job.exc_info}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
content/run_example_workflow.py(2 hunks)docs/installation.md(1 hunks)src/common/common.py(1 hunks)src/cpustats.py(1 hunks)src/mzmlfileworkflow.py(3 hunks)src/workflow/mzmlfileworkflowstatus.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Test workflow functions
src/cpustats.py
[error] 1-1: ModuleNotFoundError: No module named 'psutil'
🪛 LanguageTool
docs/installation.md
[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...
(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)
🪛 Ruff (0.8.2)
content/run_example_workflow.py
68-68: Comparison to None should be cond is not None
Replace with cond is not None
(E711)
75-75: Comparison to None should be cond is None
Replace with cond is None
(E711)
85-85: Comparison to None should be cond is not None
Replace with cond is not None
(E711)
src/workflow/mzmlfileworkflowstatus.py
49-49: Comparison to None should be cond is None
Replace with cond is None
(E711)
56-56: Comparison to None should be cond is not None
Replace with cond is not None
(E711)
60-60: Comparison to None should be cond is not None
Replace with cond is not None
(E711)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-simple-app
- GitHub Check: build-openms
- GitHub Check: build-full-app
🔇 Additional comments (4)
src/mzmlfileworkflow.py (2)
9-9: LGTM! Good separation of workflow status logging.Importing dedicated logging functions from a separate module improves code organization and maintainability.
40-52:Details
✅ Verification successful
Verify backward compatibility with existing code.
The changes to use separate logging functions are good, but make sure this doesn't break existing code that might call this function directly (without going through the job queue).
🏁 Script executed:
#!/bin/bash # Check for direct calls to run_workflow that might not provide the new workspace_path parameter rg -A 2 -B 2 "run_workflow\(.*\)" --type py | grep -v "mzml_workflow_run" | grep -v "workspace_path"Length of output: 763
Backward Compatibility Verified
Based on the search results, there are no direct calls to
run_workflowthat omit the newworkspace_pathparameter. The test file (test_gui.py) shows invocations of the workflow, and none of the searched contexts appear problematic. The updated logging functions are being used appropriately without evidence of breaking the direct API usage.
- No occurrences of direct
run_workflowcalls lackingworkspace_pathwere found.- Test invocations (e.g., in
test_gui.py) seem to supply the required parameters.- The changes to separate the logging functions do not introduce backward compatibility issues.
Please continue monitoring tests to ensure that all direct API usages remain compatible.
content/run_example_workflow.py (2)
7-10: Good addition of RQ and Redis imports.
These new dependencies look correct and align with the PR objective of implementing a queuing mechanism.
54-58: Status placeholder looks appropriate.
This approach for showing workflow status and progress via a placeholder is clear and straightforward.
src/cpustats.py
Outdated
| import psutil | ||
| import time | ||
| from streamlit.delta_generator import DeltaGenerator | ||
| import streamlit as st | ||
|
|
||
| def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator): | ||
| with cpu_ram_stats_placeholder: | ||
| ram_progress = 1 - psutil.virtual_memory().available / psutil.virtual_memory().total | ||
| cpu_progress = psutil.cpu_percent() / 100 | ||
| ram_stats_column, cpu_stats_column = st.columns(2) | ||
|
|
||
| ram_stats_column.text(f"Ram ({int(ram_progress * 100)}%)") | ||
| ram_stats_column.progress(ram_progress) | ||
|
|
||
| cpu_stats_column.text(f"CPU ({int(cpu_progress * 100)}%)") | ||
| cpu_stats_column.progress(cpu_progress) | ||
|
|
||
| time.sleep(1) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add missing dependency and improve CPU/RAM monitoring implementation.
The module requires psutil which is missing from the requirements (as indicated by pipeline failures). Additionally, the current implementation has some limitations:
- It uses
time.sleep(1)which blocks the thread - There's no mechanism to stop the monitoring
First, ensure psutil is added to your requirements:
#!/bin/bash
# Check if psutil is in requirements.txt or environment.yml
grep -E "psutil" requirements.txt environment.ymlConsider improving the implementation for better resource management:
import psutil
import time
from streamlit.delta_generator import DeltaGenerator
import streamlit as st
+from threading import Event, Thread
-def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator):
+def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator, stop_event: Event = None):
+ """
+ Monitor CPU and RAM usage and display in Streamlit.
+
+ Args:
+ cpu_ram_stats_placeholder: Streamlit placeholder to display stats
+ stop_event: Optional event to signal stopping the monitoring
+ """
with cpu_ram_stats_placeholder:
ram_progress = 1 - psutil.virtual_memory().available / psutil.virtual_memory().total
cpu_progress = psutil.cpu_percent() / 100
ram_stats_column, cpu_stats_column = st.columns(2)
ram_stats_column.text(f"Ram ({int(ram_progress * 100)}%)")
ram_stats_column.progress(ram_progress)
cpu_stats_column.text(f"CPU ({int(cpu_progress * 100)}%)")
cpu_stats_column.progress(cpu_progress)
- time.sleep(1)
+ # Only sleep if not being stopped
+ if stop_event is None or not stop_event.is_set():
+ time.sleep(1)
+def start_monitoring_thread(placeholder: DeltaGenerator):
+ """
+ Start CPU/RAM monitoring in a separate thread.
+
+ Args:
+ placeholder: Streamlit placeholder to display stats
+
+ Returns:
+ tuple: (thread, stop_event) - use stop_event.set() to stop monitoring
+ """
+ stop_event = Event()
+
+ def monitor_loop():
+ while not stop_event.is_set():
+ monitor_cpu_ram_stats(placeholder, stop_event)
+
+ thread = Thread(target=monitor_loop, daemon=True)
+ thread.start()
+ return thread, stop_event🧰 Tools
🪛 GitHub Actions: Test workflow functions
[error] 1-1: ModuleNotFoundError: No module named 'psutil'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this won't allow access to streamlit placeholder in a separate thread context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@subCode321 You're absolutely right. Streamlit elements should only be accessed from the main thread. Threading would cause issues with the Streamlit execution model.
Here's a more Streamlit-friendly approach using session state to control monitoring:
import psutil
import streamlit as st
from streamlit.delta_generator import DeltaGenerator
def monitor_cpu_ram_stats(cpu_ram_stats_placeholder: DeltaGenerator):
"""
Display current CPU and RAM stats in the provided placeholder.
This function should be called repeatedly from the main Streamlit thread.
"""
with cpu_ram_stats_placeholder:
ram_progress = 1 - psutil.virtual_memory().available / psutil.virtual_memory().total
cpu_progress = psutil.cpu_percent() / 100
ram_stats_column, cpu_stats_column = st.columns(2)
ram_stats_column.text(f"RAM ({int(ram_progress * 100)}%)")
ram_stats_column.progress(ram_progress)
cpu_stats_column.text(f"CPU ({int(cpu_progress * 100)}%)")
cpu_stats_column.progress(cpu_progress)Then in your main app, you can call it this way:
# In your main Streamlit app
if 'monitor_active' not in st.session_state:
st.session_state.monitor_active = True
stats_placeholder = st.empty()
# This will be called on each rerun of your Streamlit app
if st.session_state.monitor_active:
monitor_cpu_ram_stats(stats_placeholder)
# To control monitoring:
if st.button('Toggle Monitoring'):
st.session_state.monitor_active = not st.session_state.monitor_activeThis approach works with Streamlit's execution model rather than against it.
✏️ Learnings added
Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: src/cpustats.py:1-18
Timestamp: 2025-03-10T11:04:48.615Z
Learning: Streamlit elements and placeholders should only be accessed from the main thread. Using threading to manipulate Streamlit UI components will cause issues with Streamlit's execution model.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also added to requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if psutil is in requirements.txt and environment.yml
echo "Checking requirements.txt:"
grep -E "psutil" requirements.txt || echo "Not found in requirements.txt"
echo -e "\nChecking environment.yml:"
grep -E "psutil" environment.yml || echo "Not found in environment.yml"
Length of output: 234
@subCode321 Great! I've verified that psutil==7.0.0 is now properly included in both requirements.txt and environment.yml files.
✏️ Learnings added
Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: src/cpustats.py:1-18
Timestamp: 2025-03-10T11:08:48.576Z
Learning: psutil is required for CPU and RAM monitoring functionality in the streamlit-template project and needs to be included in both requirements.txt and environment.yml.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
content/run_example_workflow.py
Outdated
| while True: | ||
| # monitor if active job exists | ||
| if st.session_state['mzml_workflow_job_id'] != None: | ||
| monitor_mzml_workflow_job_status(mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir) | ||
| monitor_cpu_ram_stats(cpu_ram_stats_placeholder) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid an infinite loop in a Streamlit app.
A while True: loop can block UI updates indefinitely and consume resources. Consider adding a break, a time-based check, or a different mechanism (e.g. st.experimental_singleton) so the user interface remains responsive.
while True:
+ import time
if st.session_state['mzml_workflow_job_id'] is not None:
monitor_mzml_workflow_job_status(
mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir
)
monitor_cpu_ram_stats(cpu_ram_stats_placeholder)
+ time.sleep(1) # Sleep to prevent hogging CPU📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while True: | |
| # monitor if active job exists | |
| if st.session_state['mzml_workflow_job_id'] != None: | |
| monitor_mzml_workflow_job_status(mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir) | |
| monitor_cpu_ram_stats(cpu_ram_stats_placeholder) | |
| while True: | |
| import time | |
| if st.session_state['mzml_workflow_job_id'] is not None: | |
| monitor_mzml_workflow_job_status( | |
| mzml_workflow_status_placeholder, workflow_status, results_section_placholder, result_dir | |
| ) | |
| monitor_cpu_ram_stats(cpu_ram_stats_placeholder) | |
| time.sleep(1) # Sleep to prevent hogging CPU |
🧰 Tools
🪛 Ruff (0.8.2)
85-85: Comparison to None should be cond is not None
Replace with cond is not None
(E711)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this While loop to constantly monitor CPU Stats. This is how Alphapept has also done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Let me research more about how Streamlit handles this kind of monitoring approach:
Let me also look at the specific file that handles the monitoring functionality:
@subCode321 I understand you need continuous monitoring of CPU stats. While infinite loops in Streamlit can be tricky, there are several approaches that work for real-time monitoring:
-
The current
while Trueapproach you're using is commonly used for real-time monitoring in Streamlit when combined with appropriate sleep intervals to prevent high CPU usage. -
Based on what I found, you may want to consider adding a small sleep interval (e.g.,
time.sleep(0.5)) in your loop to reduce CPU usage while still providing near real-time updates. -
If you want to follow the Streamlit recommended approach, you could use:
st.experimental_rerun()at the end of your script- The newer
st.rerun()function (if using Streamlit 1.27.0+) st.cache_resourcewith background threads for monitoring
That said, if your current approach works for your use case and is similar to what Alphapept uses, it's reasonable to continue with it. Just make sure the monitoring functions themselves incorporate appropriate delays to prevent excessive CPU usage.
🏁 Script executed:
#!/bin/bash
# Check if Alphapept is referenced anywhere in the codebase
rg -i "alphapept" --type-not binary
# Look for other instances of infinite loops in the codebase
rg "while\s+True" --type py
# Let's also look at how monitor_cpu_ram_stats is implemented
cat src/cpustats.py
Length of output: 920
🏁 Script executed:
#!/bin/bash
# Check the implementation of monitoring functions
cat src/workflow/mzmlfileworkflowstatus.py
Length of output: 2862
🌐 Web query:
How to implement real-time monitoring in Streamlit applications using infinite loops
💡 Result:
To implement real-time monitoring in Streamlit applications using infinite loops, developers typically use controlled refresh mechanisms combined with Streamlit's session state and UI components. Below are the key approaches and best practices:
Core Implementation Methods
-
Controlled
whileLoop withst.rerun()
Usetime.sleep()andst.rerun()to simulate real-time updates without blocking the app:import time import streamlit as st def main(): st.title("Real-Time Dashboard") while True: # Capture/process data (e.g., network packets, sensor readings) updated_data = fetch_live_data() # Update visualizations st.line_chart(updated_data) time.sleep(5) # Refresh every 5 seconds st.rerun() if __name__ == "__main__": main()
This approach ensures the dashboard refreshes at fixed intervals[1][10].
-
Streamlit Autorefresh Component
For a cleaner solution, use thestreamlit-autorefreshpackage to avoid manual loop management:from streamlit_autorefresh import st_autorefresh import streamlit as st # Auto-refresh every 2 seconds (2000 milliseconds) st_autorefresh(interval=2000, key="data_refresh") # Fetch and display live data st.metric("Active Users", get_live_user_count()) st.line_chart(get_time_series_data())
This method offloads refresh logic to a dedicated component[3].
Best Practices
-
Use
st.empty()for Dynamic Updates
Create placeholders to overwrite specific UI elements without rerunning the entire script:placeholder = st.empty() while True: with placeholder.container(): st.metric("CPU Usage", get_cpu_usage()) st.bar_chart(get_resource_stats()) time.sleep(3)
This minimizes performance overhead[10][16].
-
Leverage Session State
Store persistent data (e.g., packet processors) inst.session_state:if 'processor' not in st.session_state: st.session_state.processor = initialize_data_processor()
Ensures state retention across reruns[1].
-
Handle Privileges for Low-Level Operations
For network/sensor data access, run Streamlit with administrative rights:sudo streamlit run dashboard.py # Linux/macOS
Common Pitfalls & Solutions
| Issue | Solution |
|---|---|
| Infinite loops blocking UI updates[2][5] | Use st_autorefresh instead of raw while True loops. |
| Memory leaks from unmanaged cache[11] | Clear unused cached data with @st.cache_resource(ttl=60). |
| Deployment failures due to infinite loops[8][16] | Test locally first; use Streamlit Cloud's resource limits. |
Example: Network Traffic Dashboard
import time
import streamlit as st
from packet_sniffer import capture_packets # Custom data source
def main():
st.set_page_config(layout="wide")
st.title("Live Network Traffic")
# Initialize packet capture
if 'sniffer' not in st.session_state:
st.session_state.sniffer = capture_packets()
# Dashboard layout
col1, col2 = st.columns(2)
with col1:
st.metric("Total Packets", len(st.session_state.sniffer.packets))
with col2:
st.metric("Capture Duration", f"{time.time() - st.session_state.start_time:.2f}s")
# Auto-refresh logic
time.sleep(2)
st.rerun()
if __name__ == "__main__":
main()(Adapted from [1][10])
For production use cases, prioritize the streamlit-autorefresh component or Streamlit's native st.rerun() with controlled intervals. Avoid raw infinite loops unless absolutely necessary, as they can cause memory issues and deployment failures[5][8].
Citations:
- 1: https://www.freecodecamp.org/news/build-a-real-time-network-traffic-dashboard-with-python-and-streamlit/
- 2: https://discuss.streamlit.io/t/streamlit-keeps-running-an-endless-while-loop/52534
- 3: https://discuss.streamlit.io/t/streamlit-autorefresh/14519
- 4: https://getlazy.ai/post/real-time-dashboard-python-streamlit
- 5: Infinite loop after breaking streamlit/streamlit#6494
- 6: https://discuss.streamlit.io/t/continuously-updating-dashboard/532
- 7: https://discuss.streamlit.io/t/real-time-clickstream-analytics-with-streamlit-glassflow/74440
- 8: https://discuss.streamlit.io/t/streamlit-app-stuck-in-an-infinite-loop-during-deployment-works-fine-locally-no-error-reported/92059
- 9: https://www.youtube.com/watch?v=OkodDZxsN1I
- 10: https://blog.streamlit.io/how-to-build-a-real-time-live-dashboard-with-streamlit/
- 11: https://discuss.streamlit.io/t/infinite-loop-keeps-creating-using-up-more-memory-which-eventually-lead-to-crashing/41446
- 12: https://dev.to/alexmercedcoder/deep-dive-into-data-apps-with-streamlit-3e40
- 13: https://www.redhat.com/en/blog/streamlit-monitor-infrastructure
- 14: https://discuss.streamlit.io/t/streamlit-unit-testing-infinite-loop-with-st-rerun/56268
- 15: https://discuss.streamlit.io/t/real-time-live-graph-metrics-for-tracking-data-acquisition-unit/20906
- 16: https://discuss.streamlit.io/t/starting-up-and-cloning-repository-on-infinite-loop/89030
- 17: https://discuss.streamlit.io/t/share-my-project-monitoring-live-dashboard/44224
- 18: https://discuss.streamlit.io/t/while-true-loop-in-streamlit-for-text-input/70105
- 19: https://discuss.streamlit.io/t/how-to-build-a-real-time-live-dashboard-with-streamlit/24437/11
- 20: https://discuss.streamlit.io/t/app-gets-stuck-in-an-endless-refresh-loop/27121
@subCode321 I see that you need the while True loop for continuous CPU stats monitoring. The implementation looks reasonable and I notice:
-
Your
monitor_cpu_ram_stats()function insrc/cpustats.pyalready includes atime.sleep(1)at the end, which prevents excessive CPU usage. -
This approach of using infinite loops for monitoring appears elsewhere in your codebase (in
src/run_subprocess.py). -
While Streamlit apps typically avoid infinite loops, they are appropriate for real-time monitoring when they include proper sleep intervals - which yours does.
This implementation is valid for your use case. Since you've mentioned Alphapept uses a similar approach, and your monitoring functions already include appropriate pauses, I agree with keeping the current implementation.
✏️ Learnings added
Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: content/run_example_workflow.py:83-88
Timestamp: 2025-03-10T11:09:56.311Z
Learning: In the OpenMS streamlit-template project, continuous real-time monitoring of CPU stats and workflow jobs is implemented using a while True loop, which is considered acceptable for this specific use case as confirmed by the maintainer.
Learnt from: subCode321
PR: OpenMS/streamlit-template#160
File: content/run_example_workflow.py:83-88
Timestamp: 2025-03-10T11:09:56.311Z
Learning: In the OpenMS streamlit-template project, continuous real-time monitoring of CPU stats and workflow jobs is implemented using a while True loop with a time.sleep(1) in the monitor_cpu_ram_stats function, which is considered acceptable for this specific use case as confirmed by the maintainer.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
|
@t0mdavid-m I've implemented end to end. Can I get a review on this please? |
|
Just a heads-up. Since, there's a continuous Sharing video of working code: Streamlit.Scheduler.mp4 |
Thank you for the contribution! I will get back to you regarding the scheduler ASAP. For checking the CPU stats, please do not use a while loop. I think |
|
Please separate out the hardware-utilization and the scheduler issue as these are two different concepts that are not related to one another and makes PR more confusing. Easiest thing to do will be to remove the hardware part from this PR and just keep the scheduler code so that we can merge one of the features even if the other feature is not ready. For the hardware utilization
@t0mdavid-m will get back to you on next steps for the scheduler. |
sure, I will do that. |
thank you so much for the suggestion. I will do that! |
8ebd5c5 to
a5627fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/installation.md (1)
32-34: Clarify and Enhance Installation Steps FormattingThe added instructions clearly outline the new process to run the app with Redis and RQ. To improve readability and meet best practices, consider revising the punctuation. For instance, adding a comma after "Finally" and a period at the end of line 34 would address the static analysis hints and ensure consistency in the list items.
Example diff:
-3. Finally start the application `streamlit run app.py` +3. Finally, start the application `streamlit run app.py`.🧰 Tools
🪛 LanguageTool
[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/installation.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/installation.md
[uncategorized] ~34-~34: A comma might be missing here.
Context: ...with-scheduler -q mzml_workflow_run3. Finally start the applicationstreamlit run ap...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~34-~34: A period might be missing here.
Context: ...mzml_workflow_run3. Finally start the applicationstreamlit run app.py` ## Docker This...
(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-full-app
- GitHub Check: build-openms
- GitHub Check: build-simple-app
|
I've separated out both the PRs |
t0mdavid-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! Sorry for taking a while to get back to you but I had to think about the path forward a bit.
I think your solution is good but at the moment requires manual implementation of the queuing into each workflow. While this will always be the case for simple pyOpenMS based workflows (like you exemplary implementation), these usually are not very compute intense. It would be more important to implement the queue for workflows involving the WorkflowManager class.
Could you implement the queue at a higher level for the methods run_multiple_commands(), run_command(), run_topp(), and run_python()? This should enable the queuing to be just merged into existing workflows and cover the most computationally intense cases.
Sure, I'll add this |
|
The TOPP framework looks for the TOPP tools. Installing OpenMS from here should fix the issue. |
Sure, thanks. Also should I remove the current changes to the |
ccdf34a to
0fcb925
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
environment.yml (1)
18-20: Add new dependencies for job queue implementation.The addition of
redisandrqdependencies is necessary for implementing the job queue functionality. These packages will enable the application to queue jobs and process them asynchronously.One minor issue:
- rq==2.1.0 +Add a newline at the end of the file to comply with standard file formatting conventions.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 20-20: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.github/workflows/build-windows-executable-app.yaml(1 hunks)Dockerfile(1 hunks)Dockerfile_simple(1 hunks)docs/installation.md(1 hunks)environment.yml(1 hunks)requirements.txt(1 hunks)src/Workflow.py(2 hunks)src/workflow/CommandExecutor.py(1 hunks)src/workflow/WorkflowManager.py(2 hunks)test_gui.py(0 hunks)
💤 Files with no reviewable changes (1)
- test_gui.py
🚧 Files skipped from review as they are similar to previous changes (5)
- requirements.txt
- docs/installation.md
- Dockerfile
- Dockerfile_simple
- .github/workflows/build-windows-executable-app.yaml
🧰 Additional context used
🧬 Code Definitions (1)
src/Workflow.py (1)
src/workflow/CommandExecutor.py (2) (2)
run_topp(110-196)run_python(214-269)
🪛 YAMLlint (1.35.1)
environment.yml
[error] 20-20: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-simple-app
- GitHub Check: build-full-app
- GitHub Check: build-openms
🔇 Additional comments (6)
src/workflow/CommandExecutor.py (1)
93-94: Good defensive programming with file existence check.Adding a check for file existence before attempting to unlink is a good defensive programming practice. This prevents potential exceptions that could occur if the PID file is already removed or doesn't exist when a job is processed from the queue.
src/Workflow.py (4)
72-76: Good implementation of job queuing for feature detection.The code now uses a queue to schedule the FeatureFinderMetabo task instead of running it synchronously. This enables better resource management and parallelization.
86-91: Effective task dependency management with queuing.Good implementation of task dependencies using the
depends_onparameter. This ensures that the feature linking step only runs after feature detection is complete, maintaining the correct workflow sequence while leveraging asynchronous processing.
93-98: Consistent approach to task queuing.The export task is properly queued with dependency on the feature linking job, maintaining the logical sequence of the workflow. This implementation ensures that tasks are executed in the correct order.
102-102:Details
✅ Verification successful
Missing dependency for optional Python script execution.
The optional Python script execution is queued, but it doesn't have any dependencies specified. If this script can run independently of the other tasks, this is fine. However, if there should be specific ordering, consider adding a dependency.
🏁 Script executed:
#!/bin/bash # Check if the example Python script needs to depend on other tasks # Examine the example.py script to see if it uses outputs from other tasks cat src/python-tools/example.pyLength of output: 2039
Optional Python Script Execution Dependency:
After verifying the content of
src/python-tools/example.py, it appears that the script is designed to run independently. It accepts parameters via a JSON file when provided, and there’s no inherent dependency on outputs from other tasks. The enqueued execution insrc/Workflow.py(line 102) correctly supplies the"in"parameter (usingin_mzML), which aligns with the expectations of the script.As the script does not currently demand an ordering dependency, no immediate changes are required. However, if future updates introduce reliance on outputs from prior tasks, please consider explicitly defining those dependencies.
src/workflow/WorkflowManager.py (1)
11-12: Initialize Redis Queue for job scheduling.Good implementation of Redis Queue for job scheduling. This enables asynchronous processing of workflow tasks and better resource management.
Also applies to: 18-18
@t0mdavid-m .... I've implemented this and the PR is now ready for review |
| echo '@echo off' > ${{ env.APP_NAME }}.bat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/build-windows-executable-app.yaml (2)
261-261: Remove Trailing Whitespace on Line 261There is an extra trailing whitespace on this line. While it might seem insignificant, cleaning it up helps avoid static analysis warnings and maintains a clean file style.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 261-261: trailing spaces
(trailing-spaces)
270-270: Remove Trailing Whitespace on Line 270This line also contains trailing spaces. Removing them ensures the file adheres to clean coding standards.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 270-270: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/build-windows-executable-app.yaml(2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/build-windows-executable-app.yaml
[error] 261-261: trailing spaces
(trailing-spaces)
[error] 270-270: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-simple-app
- GitHub Check: build-full-app
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' > ${{ env.APP_NAME }}.bat | ||
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m rq worker --with-scheduler' >> ${{ env.APP_NAME }}.bat | ||
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local' >> ${{ env.APP_NAME }}.bat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent Accidental File Overwrite with Correct Redirection
In the updated batch file section, the command on line 270 uses the single redirection operator (>). This causes the file to be overwritten—erasing previously appended commands such as the initial setup instructions. To ensure that all commands remain in the resulting batch file, change the redirection operator from > to >>.
Below is the suggested fix:
-echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' > ${{ env.APP_NAME }}.bat
+echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' >> ${{ env.APP_NAME }}.bat 📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' > ${{ env.APP_NAME }}.bat | |
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m rq worker --with-scheduler' >> ${{ env.APP_NAME }}.bat | |
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local' >> ${{ env.APP_NAME }}.bat | |
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m redis-server' >> ${{ env.APP_NAME }}.bat | |
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m rq worker --with-scheduler' >> ${{ env.APP_NAME }}.bat | |
| echo 'start /min .\python-${{ env.PYTHON_VERSION }}\python -m streamlit run app.py local' >> ${{ env.APP_NAME }}.bat |
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 270-270: trailing spaces
(trailing-spaces)
|
@t0mdavid-m does the change look good to you? |
t0mdavid-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the changes - everything looks good in principle! One suggestion: could you encapsulate the queuing logic within the WorkflowManager class so that methods like run_topp() automatically handle adding the tool to the queue internally? This would simplify usage and keep the queuing mechanism abstracted away from the user.
Sure |

Fixes #123
rq worker --with-schedulerto queue in jobs and process themVideo Recording
Streamlit.Scheduler.mp4
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores