Conversation
…egration # Conflicts: # flixopt/flow_system.py
WalkthroughIntroduces a new clustering API via Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant FlowSystem
participant TransformAccessor
participant ClusteringParameters
participant Clustering
participant ClusteringModel
participant Optimizer
User->>FlowSystem: transform.cluster(n_clusters, cluster_duration, ...)
FlowSystem->>TransformAccessor: cluster(keyword args)
TransformAccessor->>ClusteringParameters: Create from keywords
ClusteringParameters-->>TransformAccessor: Instance
alt Multi-dimensional clustering
TransformAccessor->>TransformAccessor: _cluster_multi_dimensional()
loop Per period/scenario
TransformAccessor->>Clustering: Compute per-slice
Clustering-->>TransformAccessor: Clustering results
end
else Single-dimension clustering
TransformAccessor->>TransformAccessor: _cluster_simple()
TransformAccessor->>Clustering: Compute for full dataset
Clustering-->>TransformAccessor: Clustering results
end
TransformAccessor->>ClusteringModel: Apply constraints (multi-dim support)
ClusteringModel->>ClusteringModel: do_modeling() with segments/flexibility
ClusteringModel-->>TransformAccessor: Constraints added
alt aggregate_data enabled
TransformAccessor->>TransformAccessor: Aggregate time-series back
end
TransformAccessor-->>FlowSystem: Clustered FlowSystem
FlowSystem-->>User: Return result
User->>Optimizer: optimize()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
docs/notebooks/data/generate_example_systems.py (1)
246-250: Add error handling for file loading operations.Both
create_district_heating_system()andcreate_operational_system()load CSV data without error handling. If the file is missing or the date range doesn't exist, users will see cryptic pandas errors rather than helpful messages.Consider adding validation:
data_path = DATA_DIR / 'Zeitreihen2020.csv' if not data_path.exists(): raise FileNotFoundError(f"Required data file not found: {data_path}") data = pd.read_csv(data_path, index_col=0, parse_dates=True).sort_index()docs/notebooks/08a-aggregation.ipynb (1)
61-61: Consider handling the import error more gracefully.The inline import
from data.generate_example_systems import create_district_heating_systemwill raise an ImportError with a confusing message if the module doesn't exist. A more helpful error would guide users:if not data_file.exists(): try: from data.generate_example_systems import create_district_heating_system except ImportError: raise ImportError( "Example data file not found and generator module unavailable. " "Please download the example data or ensure data/generate_example_systems.py exists." ) # ... rest of generation codeflixopt/transform_accessor.py (2)
256-269: Consider extracting validation to a shared helper.The validation logic (lines 256-269) is duplicated from
_cluster_simple()(lines 186-198). Extracting this to a private method would reduce duplication and ensure consistency.def _validate_clustering_timesteps(self, params: ClusteringParameters) -> float: """Validate timesteps and return dt value.""" dt_min = float(self._fs.hours_per_timestep.min().item()) dt_max = float(self._fs.hours_per_timestep.max().item()) if dt_min != dt_max: raise ValueError(...) ratio = params.cluster_duration_hours / dt_max if not np.isclose(ratio, round(ratio), atol=1e-9): raise ValueError(...) return dt_min
363-371: Redundant keys in clustering_info dict.Both
'clustering'and'clustering_results'store the sameclustering_resultsdict. If'clustering'is required for backwards compatibility with_add_clustering_constraints, consider documenting this more explicitly or refactoring the consumer to use a single key.clustered_fs._clustering_info = { 'parameters': params, - 'clustering': clustering_results, # Required by _add_clustering_constraints - 'clustering_results': clustering_results, # Dict of Clustering objects per dimension + 'clustering': clustering_results, # Dict of Clustering objects per (period, scenario) 'components_to_clusterize': components_to_clusterize, 'original_fs': self._fs, 'has_periods': self._fs.periods is not None, 'has_scenarios': self._fs.scenarios is not None, }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.md(1 hunks)docs/notebooks/08a-aggregation.ipynb(19 hunks)docs/notebooks/08b-rolling-horizon.ipynb(2 hunks)docs/notebooks/data/generate_example_systems.py(4 hunks)examples/03_Optimization_modes/example_optimization_modes.py(2 hunks)flixopt/clustering.py(5 hunks)flixopt/optimization.py(4 hunks)flixopt/transform_accessor.py(5 hunks)tests/deprecated/test_integration.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/transform_accessor.py (2)
flixopt/flow_system.py (4)
FlowSystem(49-2225)to_dataset(581-633)copy(856-881)sel(2014-2040)flixopt/core.py (2)
drop_constant_arrays(564-594)to_dataarray(370-484)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md
62-62: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
93-93: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
🔇 Additional comments (22)
docs/notebooks/data/generate_example_systems.py (2)
236-348: LGTM! Well-structured helper function.The
create_district_heating_system()function is well-documented and follows consistent patterns with existing helper functions. The docstring clearly explains the purpose and usage.
351-448: LGTM! Operational system helper is well-designed.The
create_operational_system()function provides a clean example for operational planning without investments. Good separation of concerns between investment and operational optimization examples.CHANGELOG.md (1)
54-112: LGTM! Comprehensive migration documentation.The changelog clearly documents the clustering API improvements with:
- Clear before/after migration examples
- Parameter mapping table
- Usage examples for the new API
This will help users migrate smoothly from the old to the new clustering API.
flixopt/optimization.py (4)
396-402: LGTM! Clear deprecation guidance with migration example.The updated deprecation message provides a concrete example of the new API usage, making it easy for users to migrate from
ClusteredOptimizationto the newflow_system.transform.cluster()API.
444-449: LGTM! Consistent parameter naming.The internal parameter name updates (
cluster_duration_hoursand error messages) align with the public API changes documented in the CHANGELOG.
462-463: LGTM! Clustering object initialization updated correctly.The parameter names passed to the
Clusteringconstructor have been updated to match the new API (hours_per_periodandnr_of_periods), maintaining consistency with the parameter renaming.
472-472: LGTM! Simplified parameter naming.The change from
aggregate_data_and_fix_non_binary_varstoaggregate_dataimproves readability while maintaining the same functionality.tests/deprecated/test_integration.py (1)
284-293: LGTM! Test correctly updated for new clustering API.The
ClusteringParametersinitialization has been properly migrated to use the new parameter names:
n_clusters(wasnr_of_periods)cluster_duration(washours_per_period)include_storage(wasfix_storage_flows)aggregate_data(wasaggregate_data_and_fix_non_binary_vars)flexibility_percent(waspercentage_of_period_freedom)flexibility_penalty(waspenalty_of_period_freedom)This matches the migration guide in CHANGELOG.md exactly.
docs/notebooks/08b-rolling-horizon.ipynb (1)
54-60: LGTM! Clear narrative update.The markdown cell effectively communicates the shift to loading a pre-built FlowSystem, aligning with the PR's objective to adopt a FlowSystem-centric workflow.
examples/03_Optimization_modes/example_optimization_modes.py (2)
36-41: LGTM! Clear clustering configuration.The new parameter names (
n_clusters,cluster_duration,include_storage) are more intuitive than the previoushours_per_periodandnr_of_periods.
192-204: Clean migration to the new transform.cluster() API.The new API usage is clear and correctly passes the clustering configuration. The conditional peak time series assignment based on
keep_extreme_periodsis properly implemented.docs/notebooks/08a-aggregation.ipynb (2)
69-83: Visualization code looks appropriate for the notebook.The hardcoded component names ('HeatDemand', 'GridBuy') and timestep slice (672 = 7 days × 96 at 15-min resolution) are reasonable for a demonstration notebook where the example system is controlled.
360-386: Well-organized summary with clear learning outcomes.The summary effectively captures the key takeaways and appropriately defers clustering details to the dedicated notebook (08c-clustering.ipynb).
flixopt/transform_accessor.py (2)
55-172: Well-designed public API with clear documentation.The explicit parameter signature is more user-friendly than requiring a separate
ClusteringParametersobject. The docstring provides comprehensive examples covering basic clustering, segmentation, and multi-period scenarios.
174-243: Solid implementation with proper validation.The timestep consistency check and cluster duration divisibility validation are correctly implemented. The use of
np.isclosewith a tight tolerance for the ratio check is appropriate for floating-point comparison.flixopt/clustering.py (7)
89-121: Clean integration of segmentation support.The logic correctly handles the case when
nr_of_periodsisNone(segmentation-only mode) by falling back tototal_periods. Thetsamconfiguration properly setssegmentationandnoSegmentsbased on then_segmentsparameter.
285-325: Well-implemented segment equation index generation.The early return on line 300-301 when
n_segments is Nonecorrectly guards against accessingsegmentDurationDictwhen segmentation is disabled. The loop logic properly generates index pairs to equate all timesteps within a segment to the first timestep.
328-353: Clean utility function for duration parsing.Using
pd.Timedeltafor parsing duration strings is appropriate and handles standard pandas duration formats correctly.
356-454: Well-documented parameter class with clear API.The redesigned
ClusteringParametersclass provides a cleaner interface with intuitive parameter names. The convenience properties (use_extreme_periods,use_segmentation) improve code readability.
495-562: Efficient constraint generation with proper batching.Batching continuous variables into a single constraint (line 545-548) is a good optimization. Binary variables correctly get individual constraints to support the flexibility correction variables.
653-667: Correct penalty application for flexibility deviations.The method properly sums correction variables over the appropriate dimension (
eq_idxfor equality constraints ortimeas fallback) and adds the penalty to the objective viaPENALTY_EFFECT_LABEL.
623-628: Theself._model.variables.binariesattribute is a valid linopy API feature and requires no changes.The
.binariesproperty is a documented feature of linopy's Variables class, so the membership check on line 625 is correct and aligns with linopy's standard usage patterns.
| } | ||
| }, | ||
| "source": "# Load time series data (15-min resolution)\ndata = pd.read_csv('../../examples/resources/Zeitreihen2020.csv', index_col=0, parse_dates=True).sort_index()\ndata = data['2020-01-01':'2020-01-14 23:45:00'] # Two weeks\ndata.index.name = 'time' # Rename index for consistency\n\ntimesteps = data.index\n\n# Extract profiles\nelectricity_demand = data['P_Netz/MW'].to_numpy()\nheat_demand = data['Q_Netz/MW'].to_numpy()\nelectricity_price = data['Strompr.€/MWh'].to_numpy()\ngas_price = data['Gaspr.€/MWh'].to_numpy()\n\nprint(f'Timesteps: {len(timesteps)} ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Heat demand: {heat_demand.min():.1f} - {heat_demand.max():.1f} MW')\nprint(f'Electricity price: {electricity_price.min():.1f} - {electricity_price.max():.1f} €/MWh')", | ||
| "source": "from pathlib import Path\n\n# Generate example data if not present (for local development)\ndata_file = Path('data/operational_system.nc4')\nif not data_file.exists():\n from data.generate_example_systems import create_operational_system\n\n fs = create_operational_system()\n fs.optimize(fx.solvers.HighsSolver(log_to_console=False))\n fs.to_netcdf(data_file, overwrite=True)\n\n# Load the operational system (real data from Zeitreihen2020.csv, two weeks)\nflow_system = fx.FlowSystem.from_netcdf(data_file)\n\ntimesteps = flow_system.timesteps\nprint(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Components: {list(flow_system.components.keys())}')", |
There was a problem hiding this comment.
Ensure parent directory exists before writing NetCDF.
If the data/ directory doesn't exist, fs.to_netcdf(data_file, overwrite=True) will fail with a file system error. Additionally, there's no error handling for file I/O operations or optimization failures, which could lead to confusing errors for users.
Apply this diff to create the directory and add basic error handling:
from pathlib import Path
# Generate example data if not present (for local development)
data_file = Path('data/operational_system.nc4')
if not data_file.exists():
+ # Ensure parent directory exists
+ data_file.parent.mkdir(parents=True, exist_ok=True)
+
from data.generate_example_systems import create_operational_system
+ print("Generating example data (this may take a moment)...")
fs = create_operational_system()
fs.optimize(fx.solvers.HighsSolver(log_to_console=False))
fs.to_netcdf(data_file, overwrite=True)
+ print(f"Example data saved to {data_file}")
-# Load the operational system (real data from Zeitreihen2020.csv, two weeks)
+# Load the operational system
flow_system = fx.FlowSystem.from_netcdf(data_file)
timesteps = flow_system.timesteps
print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')
print(f'Components: {list(flow_system.components.keys())}')Note: The comment mentioning "real data from Zeitreihen2020.csv" is misleading since the code may generate synthetic data if the file doesn't exist.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "source": "from pathlib import Path\n\n# Generate example data if not present (for local development)\ndata_file = Path('data/operational_system.nc4')\nif not data_file.exists():\n from data.generate_example_systems import create_operational_system\n\n fs = create_operational_system()\n fs.optimize(fx.solvers.HighsSolver(log_to_console=False))\n fs.to_netcdf(data_file, overwrite=True)\n\n# Load the operational system (real data from Zeitreihen2020.csv, two weeks)\nflow_system = fx.FlowSystem.from_netcdf(data_file)\n\ntimesteps = flow_system.timesteps\nprint(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Components: {list(flow_system.components.keys())}')", | |
| from pathlib import Path | |
| # Generate example data if not present (for local development) | |
| data_file = Path('data/operational_system.nc4') | |
| if not data_file.exists(): | |
| # Ensure parent directory exists | |
| data_file.parent.mkdir(parents=True, exist_ok=True) | |
| from data.generate_example_systems import create_operational_system | |
| print("Generating example data (this may take a moment)...") | |
| fs = create_operational_system() | |
| fs.optimize(fx.solvers.HighsSolver(log_to_console=False)) | |
| fs.to_netcdf(data_file, overwrite=True) | |
| print(f"Example data saved to {data_file}") | |
| # Load the operational system | |
| flow_system = fx.FlowSystem.from_netcdf(data_file) | |
| timesteps = flow_system.timesteps | |
| print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)') | |
| print(f'Components: {list(flow_system.components.keys())}') |
| class ClusteredResult: | ||
| def __init__(self, name, fs): | ||
| self.name = name | ||
| self.flow_system = fs | ||
| self.durations = {'total': 0} # Placeholder | ||
|
|
||
| optimization = ClusteredResult('Clustered', clustered_fs) | ||
| optimizations.append(optimization) |
There was a problem hiding this comment.
The durations placeholder will show misleading 0 seconds in the comparison chart.
Line 250 uses calc.durations for the speed comparison. This placeholder returns {'total': 0}, making the clustered optimization appear to take no time.
Consider capturing the actual duration:
+ import timeit
+ start = timeit.default_timer()
clustered_fs = flow_system.copy().transform.cluster(
n_clusters=n_clusters,
cluster_duration=cluster_duration,
include_storage=include_storage,
time_series_for_high_peaks=time_series_for_high_peaks,
time_series_for_low_peaks=time_series_for_low_peaks,
)
clustered_fs.optimize(fx.solvers.HighsSolver(0.01 / 100, 60))
+ elapsed = timeit.default_timer() - start
# Wrap in a simple object for compatibility with comparison code
class ClusteredResult:
def __init__(self, name, fs):
self.name = name
self.flow_system = fs
- self.durations = {'total': 0} # Placeholder
+ self.durations = {'total': elapsed}
optimization = ClusteredResult('Clustered', clustered_fs)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/03_Optimization_modes/example_optimization_modes.py around lines 207
to 214, the ClusteredResult currently sets durations to a hardcoded {'total':
0}, which misrepresents timing in the comparison chart; replace the placeholder
by measuring the actual runtime of the clustered optimization (e.g., record
start = time.perf_counter() before running the clustered flow system, run the
optimization, record end = time.perf_counter() afterwards) and set
optimization.durations['total'] = end - start (and any other duration keys to
match the shape used elsewhere) so the plotted speed comparison uses real
elapsed seconds.
|
Close in favout of #540 |
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
New Features
Breaking Changes
hours_per_period→cluster_duration,nr_of_periods→n_clusters,fix_storage_flows→include_storage, and additional semantic updates.Documentation
✏️ Tip: You can customize this high-level summary in your review settings.