diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index f61055dbb..ace900221 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,5 +17,5 @@ Please fill out the following template to help us review your pull request. ## Signed-off-by -Signed-off-by: *Your Name (email)* -Date: *YYYY-MM-DD* \ No newline at end of file +Signed-off-by: *Your Name (email)* +Date: *YYYY-MM-DD* diff --git a/CLA.md b/CLA.md index 6a87eb8c4..d321de5ea 100644 --- a/CLA.md +++ b/CLA.md @@ -18,5 +18,5 @@ By submitting a pull request, patch or code snippet, you agree that: your contribution and that Authors may use, sell or license the software containing your contribution at its sole discretion. -Signed-off-by: *Enrique Tomás Martínez Beltrán (enriquetomas@um.es)* -Date: *2025-06-25* \ No newline at end of file +Signed-off-by: *Enrique Tomás Martínez Beltrán (enriquetomas@um.es)* +Date: *2025-06-25* diff --git a/COMMERCIAL_INFO.md b/COMMERCIAL_INFO.md index c22826a6e..81bcec4a7 100644 --- a/COMMERCIAL_INFO.md +++ b/COMMERCIAL_INFO.md @@ -1,6 +1,6 @@ # NEBULA Enterprise License -This repository is published under **GNU AGPL v3.0**. +This repository is published under **GNU AGPL v3.0**. If you wish to embed NEBULA in closed-source products, offer it as a hosted service, or obtain an SLA, please e-mail **enriquetomas@um.es** and **alberto.huertas@um.es**. -A bespoke commercial agreement (OEM / subscription / SaaS) will be provided on request. \ No newline at end of file +A bespoke commercial agreement (OEM / subscription / SaaS) will be provided on request. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a123f7cc5..6a2779a85 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,7 +6,7 @@ Follow conventional-commit style. ## 2 • Sign the CLA When you open your first Pull Request, **CLA-assistant** will block the merge until you tick the box confirming you accept the -[ICLA](CLA.md). +[ICLA](CLA.md). Add a Developer-Certificate-of-Origin line in every commit: ``` @@ -23,4 +23,4 @@ The pull request will be reviewed by the maintainers. The maintainers will provide feedback on the pull request. ## 6 • Merge the Pull Request -The pull request will be merged by the maintainers. \ No newline at end of file +The pull request will be merged by the maintainers. diff --git a/README.md b/README.md index 5dfc69a53..725d78280 100755 --- a/README.md +++ b/README.md @@ -159,4 +159,33 @@ We would like to thank the following projects for their contributions which have - [FastAPI](https://github.com/tiangolo/fastapi) for the RESTful API - [Fedstellar](https://github.com/CyberDataLab/fedstellar) platform and [p2pfl](https://github.com/pguijas/p2pfl/) library - [Adversarial Robustness Toolbox (ART)](https://github.com/Trusted-AI/adversarial-robustness-toolbox) for the implementation of adversarial attacks +- [Opacus](https://github.com/meta-pytorch/opacus) for differential privacy training support +- [AI Fairness 360 (AIF360)](https://github.com/Trusted-AI/AIF360) for fairness metric definitions +- [HolisticAI](https://github.com/holistic-ai/holisticai) for trustworthiness and fairness metric definitions - [D3.js](https://github.com/d3/d3-force) for the network visualizations + +## Third-party Differential Privacy + +NEBULA uses Opacus for differential privacy training: + +- Yousefpour, A., Shilov, I., Sablayrolles, A., Testuggine, D., Prasad, K., Malek, M., Nguyen, J., Ghosh, S., Bharadwaj, A., Zhao, J., Cormode, G., & Mironov, I. (2021). Opacus: User-Friendly Differential Privacy Library in PyTorch. arXiv:2109.12298. Licensed under Apache License 2.0: https://github.com/meta-pytorch/opacus/blob/main/LICENSE + +## Third-party Trustworthiness Metrics + +NEBULA implements some trustworthiness and fairness metrics following definitions documented in external toolkits: + +- AI Fairness 360 (AIF360). AI Fairness 360 [Software]. https://github.com/Trusted-AI/AIF360. Licensed under Apache License 2.0: https://github.com/Trusted-AI/AIF360/blob/main/LICENSE + +- Holistic AI. HolisticAI [Software]. https://github.com/holistic-ai/holisticai. Licensed under Apache License 2.0: https://github.com/holistic-ai/holisticai/blob/main/LICENSE + +## Third-party Tabular Datasets + +NEBULA preprocesses these datasets for experiments, including splitting, scaling, encoding, label mapping, filtering, and/or sample limiting depending on the dataset. + +- Becker, B. & Kohavi, R. (1996). Adult [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5XW20. Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ + +- Blackard, J. (1998). Covertype [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C50K5N. Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ + +- Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B. Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ + +- Stolfo, S., Fan, W., Lee, W., Prodromidis, A., & Chan, P. (1999). KDD Cup 1999 Data [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C51C7N. Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ diff --git a/app/deployer.py b/app/deployer.py index 968ef62da..fc21f23ef 100644 --- a/app/deployer.py +++ b/app/deployer.py @@ -289,17 +289,17 @@ def run_script(self, script): def kill_script_processes(self, pids_file): """ Forcefully terminates processes listed in a given PID file, including their child processes. - + Args: pids_file (str): Path to the file containing PIDs, one per line. - + Behavior: - Reads the PIDs from the file. - For each PID, checks if the process exists. - If it exists, kills all child processes recursively before killing the main process. - Handles and logs exceptions such as missing processes or invalid PID entries. - Logs warnings and errors appropriately. - + Typical use case: Used to clean up running processes related to a scenario or script that has been deleted or stopped. """ @@ -344,7 +344,7 @@ def run_observer(): """ Starts a watchdog observer to monitor the configuration directory for changes. - This function is typically used to execute additional scripts or trigger events + This function is typically used to execute additional scripts or trigger events during the execution of a federated learning session by monitoring file system changes. Main functionalities: @@ -357,7 +357,7 @@ def run_observer(): - Trigger specific actions during a federation lifecycle. Note: - The observer runs in a blocking mode and will keep the process alive + The observer runs in a blocking mode and will keep the process alive until manually stopped or interrupted. """ # Watchdog for running additional scripts in the host machine (i.e. during the execution of a federation) @@ -373,7 +373,7 @@ class Deployer: """ Handles the configuration and initialization of deployment parameters for the NEBULA system. - This class reads and stores various deployment-related settings such as port assignments, + This class reads and stores various deployment-related settings such as port assignments, environment paths, logging configuration, and system mode (production, development, or simulation). Main functionalities: @@ -438,7 +438,7 @@ def configure_logger(self): """ Configures the logging system for the deployment controller. - This method sets up both console and file logging with a consistent format and appropriate log levels. + This method sets up both console and file logging with a consistent format and appropriate log levels. It also ensures that Uvicorn loggers are properly configured to avoid duplicate log outputs. Main functionalities: @@ -452,7 +452,7 @@ def configure_logger(self): - Ensures clean and consistent logging output during deployment. Note: - This method does not set up file logging directly, but prepares the base configuration + This method does not set up file logging directly, but prepares the base configuration and Uvicorn logger behavior for further logging use. """ log_console_format = "[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s" @@ -475,7 +475,7 @@ def ensure_directory_access(self, directory_path: str) -> str: """ Ensures that the specified directory exists and is writable. - This method attempts to create the directory if it does not exist and verifies + This method attempts to create the directory if it does not exist and verifies write access by writing and deleting a temporary metadata file. Args: @@ -521,8 +521,8 @@ def start(self): """ Starts the NEBULA deployment process and all associated services. - This method initializes the NEBULA platform by setting up the environment, - checking port availability, starting key services (controller, frontend, WAF), + This method initializes the NEBULA platform by setting up the environment, + checking port availability, starting key services (controller, frontend, WAF), and launching a filesystem observer to react to configuration changes. Main functionalities: @@ -539,7 +539,7 @@ def start(self): - Central entry point for managing NEBULA components during deployment. Note: - The method blocks indefinitely until manually interrupted, + The method blocks indefinitely until manually interrupted, and ensures graceful shutdown upon receiving SIGINT or SIGTERM. """ banner = """ @@ -616,8 +616,8 @@ def signal_handler(self, sig, frame): """ Handles system termination signals to ensure a clean shutdown. - This method is triggered when the application receives SIGTERM or SIGINT signals - (e.g., via Ctrl+C or `kill`). It logs the event, performs cleanup actions, and + This method is triggered when the application receives SIGTERM or SIGINT signals + (e.g., via Ctrl+C or `kill`). It logs the event, performs cleanup actions, and terminates the process gracefully. Args: @@ -749,7 +749,7 @@ def run_controller(self): ) network_name = f"{os.environ['USER']}_nebula-net-base" - + try: subprocess.check_call(["nvidia-smi"]) self.gpu_available = True diff --git a/app/windows/install.ps1 b/app/windows/install.ps1 index 88aa7f1ac..6d9dc7004 100644 --- a/app/windows/install.ps1 +++ b/app/windows/install.ps1 @@ -1,2 +1,2 @@ # Run make install -make install \ No newline at end of file +make install diff --git a/docs/_prebuilt/commercial-faq.md b/docs/_prebuilt/commercial-faq.md index 197ef81a9..274c0fcbd 100644 --- a/docs/_prebuilt/commercial-faq.md +++ b/docs/_prebuilt/commercial-faq.md @@ -1,13 +1,13 @@ # Commercial FAQ — NEBULA Enterprise -**Q 1. What does the commercial license cover?** +**Q 1. What does the commercial license cover?** To be determined. -**Q 2. Does the commercial edition include extra features?** +**Q 2. Does the commercial edition include extra features?** To be determined. -**Q 3. Pricing model?** +**Q 3. Pricing model?** To be determined. -**Q 4. Can we contribute back fixes?** -Absolutely; your patches remain under AGPL in the community edition, and you can keep proprietary extensions private under the commercial agreement. \ No newline at end of file +**Q 4. Can we contribute back fixes?** +Absolutely; your patches remain under AGPL in the community edition, and you can keep proprietary extensions private under the commercial agreement. diff --git a/nebula/addons/attacks/communications/floodingattack.py b/nebula/addons/attacks/communications/floodingattack.py index 146854fa3..73dc0394c 100644 --- a/nebula/addons/attacks/communications/floodingattack.py +++ b/nebula/addons/attacks/communications/floodingattack.py @@ -69,9 +69,9 @@ async def wrapper(*args, **kwargs): ) _, *new_args = args # Exclude self argument await func(*new_args, **kwargs) - _, *new_args = args + _, *new_args = args return await func(*new_args) - + return wrapper return decorator diff --git a/nebula/addons/attacks/dataset/datapoison.py b/nebula/addons/attacks/dataset/datapoison.py index 7b22d37d7..40dd7bdee 100755 --- a/nebula/addons/attacks/dataset/datapoison.py +++ b/nebula/addons/attacks/dataset/datapoison.py @@ -71,6 +71,59 @@ def _convert_to_tensor(self, data: torch.Tensor | Image.Image | tuple) -> torch. else: return torch.tensor(data) + def _restore_data_format(self, data, original): + if isinstance(data, torch.Tensor): + array_data = data.detach().cpu().numpy() + else: + array_data = np.asarray(data) + + original_shape = None + if isinstance(original, torch.Tensor): + original_shape = tuple(original.shape) + elif isinstance(original, Image.Image): + original_shape = np.array(original).shape + elif hasattr(original, "shape"): + original_shape = tuple(original.shape) + + if original_shape is not None and array_data.shape != original_shape and array_data.size == np.prod(original_shape): + array_data = array_data.reshape(original_shape) + + if isinstance(original, torch.Tensor): + restored = torch.as_tensor(array_data, device=original.device) + if original.dtype.is_floating_point: + original_max = original.detach().max() if original.numel() > 0 else torch.tensor(1.0, device=original.device) + if restored.numel() > 0 and original_max > 1 and restored.min() >= 0 and restored.max() <= 1: + restored = restored * original_max + return restored.to(dtype=original.dtype) + + if restored.numel() > 0 and restored.min() >= 0 and restored.max() <= 1: + restored = restored * torch.iinfo(original.dtype).max + return restored.clamp(torch.iinfo(original.dtype).min, torch.iinfo(original.dtype).max).to(dtype=original.dtype) + + if isinstance(original, Image.Image): + original_array = np.array(original) + restored = self._restore_array_dtype(array_data, original_array.dtype, original_array) + return Image.fromarray(restored, mode=original.mode) + + if isinstance(original, np.ndarray): + return self._restore_array_dtype(array_data, original.dtype, original) + + return data + + def _restore_array_dtype(self, data: np.ndarray, dtype: np.dtype, original: np.ndarray | None = None) -> np.ndarray: + dtype = np.dtype(dtype) + if np.issubdtype(dtype, np.integer): + if data.size > 0 and data.min() >= 0 and data.max() <= 1: + data = data * np.iinfo(dtype).max + return np.rint(np.clip(data, np.iinfo(dtype).min, np.iinfo(dtype).max)).astype(dtype) + + if original is not None and data.size > 0 and original.size > 0: + original_max = np.max(original) + if original_max > 1 and data.min() >= 0 and data.max() <= 1: + data = data * original_max + + return data.astype(dtype) + def _handle_single_point(self, tensor: torch.Tensor) -> tuple[torch.Tensor, bool]: """ Handle single point tensors by reshaping them. @@ -100,7 +153,7 @@ def __init__(self, noise_type: str): """ self.noise_type = noise_type.lower() - def apply_noise(self, t: torch.Tensor | Image.Image, poisoned_noise_percent: float) -> torch.Tensor: + def apply_noise(self, t: torch.Tensor | Image.Image, poisoned_noise_percent: float): """ Applies noise to a tensor based on the specified noise type and poisoning percentage. @@ -109,9 +162,10 @@ def apply_noise(self, t: torch.Tensor | Image.Image, poisoned_noise_percent: flo poisoned_noise_percent: The percentage of noise to be applied (0-100) Returns: - The tensor with noise applied + The poisoned data in the same format as the input """ - t = self._convert_to_tensor(t) + original = t[0] if isinstance(t, tuple) else t + t = self._convert_to_tensor(original) t, is_single_point = self._handle_single_point(t) arr = t.detach().cpu().numpy() @@ -122,21 +176,21 @@ def apply_noise(self, t: torch.Tensor | Image.Image, poisoned_noise_percent: flo ) if self.noise_type == "salt": - poisoned = torch.tensor(random_noise(arr, mode=self.noise_type, amount=poisoned_ratio)) + poisoned = random_noise(arr, mode=self.noise_type, amount=poisoned_ratio) elif self.noise_type == "gaussian": - poisoned = torch.tensor(random_noise(arr, mode=self.noise_type, mean=0, var=poisoned_ratio, clip=True)) + poisoned = random_noise(arr, mode=self.noise_type, mean=0, var=poisoned_ratio, clip=True) elif self.noise_type == "s&p": - poisoned = torch.tensor(random_noise(arr, mode=self.noise_type, amount=poisoned_ratio)) + poisoned = random_noise(arr, mode=self.noise_type, amount=poisoned_ratio) elif self.noise_type == "nlp_rawdata": poisoned = self.poison_to_nlp_rawdata(arr, poisoned_ratio) else: logging.info(f"ERROR: noise_type '{self.noise_type}' not supported in data poison attack.") - return t + return original if is_single_point: poisoned = poisoned[0] - return poisoned + return self._restore_data_format(poisoned, original) def poison_to_nlp_rawdata(self, text_data: list, poisoned_ratio: float) -> list: """ @@ -221,7 +275,7 @@ def __init__(self, target_label: int): """ self.target_label = target_label - def add_x_to_image(self, img: torch.Tensor | Image.Image) -> torch.Tensor: + def add_x_to_image(self, img: torch.Tensor | Image.Image): """ Adds a 10x10 pixel 'X' mark to the top-left corner of an image. @@ -229,10 +283,11 @@ def add_x_to_image(self, img: torch.Tensor | Image.Image) -> torch.Tensor: img: Input image tensor or PIL Image Returns: - Modified image with X pattern + Modified image in the same format as the input """ logging.info(f"[{self.__class__.__name__}] Adding X pattern to image") - img = self._convert_to_tensor(img) + original = img[0] if isinstance(img, tuple) else img + img = self._convert_to_tensor(original) img, is_single_point = self._handle_single_point(img) # Handle batch dimension if present @@ -267,7 +322,7 @@ def add_x_to_image(self, img: torch.Tensor | Image.Image) -> torch.Tensor: if is_single_point: img = img[0] - return img + return self._restore_data_format(img, original) def poison_data( self, diff --git a/nebula/addons/attacks/model/gllneuroninversion.py b/nebula/addons/attacks/model/gllneuroninversion.py index 64cb3d215..e52a5a930 100644 --- a/nebula/addons/attacks/model/gllneuroninversion.py +++ b/nebula/addons/attacks/model/gllneuroninversion.py @@ -66,4 +66,4 @@ def model_attack(self, received_weights): # Inject random noise of the same shape and type received_weights[target_key] = torch.empty_like(target_weights).uniform_(0, noise_scale) - return received_weights \ No newline at end of file + return received_weights diff --git a/nebula/addons/attacks/model/swappingweights.py b/nebula/addons/attacks/model/swappingweights.py index 95aa89208..36eb1e7a0 100644 --- a/nebula/addons/attacks/model/swappingweights.py +++ b/nebula/addons/attacks/model/swappingweights.py @@ -109,4 +109,4 @@ def model_attack(self, received_weights): if self.layer_idx + 2 < len(layer_keys): received_weights[layer_keys[self.layer_idx + 2]] = received_weights[layer_keys[self.layer_idx + 2]][:, perm] - return received_weights \ No newline at end of file + return received_weights diff --git a/nebula/addons/defenses/__init__.py b/nebula/addons/defenses/__init__.py new file mode 100644 index 000000000..5e1105d48 --- /dev/null +++ b/nebula/addons/defenses/__init__.py @@ -0,0 +1 @@ +"""Defense add-ons for Nebula.""" diff --git a/nebula/addons/defenses/adversarial_training/__init__.py b/nebula/addons/defenses/adversarial_training/__init__.py new file mode 100644 index 000000000..772ac6cda --- /dev/null +++ b/nebula/addons/defenses/adversarial_training/__init__.py @@ -0,0 +1,57 @@ +from nebula.addons.defenses.adversarial_training.defense import ( + ERR_ALPHA, + ERR_APPLY_PROBABILITY, + ERR_CANDIDATE_SELECTION, + ERR_EPSILON, + ERR_IMAGE_ATTACK, + ERR_LOSS_INCREASE, + ERR_MARGIN_WINDOW, + ERR_MODE, + ERR_STEPS, + ERR_TABULAR_ATTACK, + ERR_TABULAR_METADATA, + ERR_UNSUPPORTED_ATTACK, + IMAGE_ADVERSARIAL_ATTACKS, + IMAGE_DATASET_NORMALIZATION, + TABULAR_ADVERSARIAL_ATTACKS, + TABULAR_ADVERSARIAL_DATASETS, + AdversarialExampleGenerator, + AdversarialTrainingConfig, + AdversarialTrainingDefense, + ImageAdversarialExampleGenerator, + ImageFGSMGenerator, + ImagePGDGenerator, + TabularAdversarialExampleGenerator, + TabularConstrainedPGDGenerator, + TabularConstraintSet, + apply_adversarial_training_if_enabled, +) + +__all__ = [ + "ERR_ALPHA", + "ERR_APPLY_PROBABILITY", + "ERR_CANDIDATE_SELECTION", + "ERR_EPSILON", + "ERR_IMAGE_ATTACK", + "ERR_LOSS_INCREASE", + "ERR_MARGIN_WINDOW", + "ERR_MODE", + "ERR_STEPS", + "ERR_TABULAR_ATTACK", + "ERR_TABULAR_METADATA", + "ERR_UNSUPPORTED_ATTACK", + "IMAGE_ADVERSARIAL_ATTACKS", + "IMAGE_DATASET_NORMALIZATION", + "TABULAR_ADVERSARIAL_ATTACKS", + "TABULAR_ADVERSARIAL_DATASETS", + "AdversarialExampleGenerator", + "AdversarialTrainingConfig", + "AdversarialTrainingDefense", + "ImageAdversarialExampleGenerator", + "ImageFGSMGenerator", + "ImagePGDGenerator", + "TabularAdversarialExampleGenerator", + "TabularConstrainedPGDGenerator", + "TabularConstraintSet", + "apply_adversarial_training_if_enabled", +] diff --git a/nebula/addons/defenses/adversarial_training/base.py b/nebula/addons/defenses/adversarial_training/base.py new file mode 100644 index 000000000..3e3c1fc48 --- /dev/null +++ b/nebula/addons/defenses/adversarial_training/base.py @@ -0,0 +1,30 @@ +from abc import ABC, abstractmethod + +import torch + + +class AdversarialExampleGenerator(ABC): + """Base interface for domain-specific adversarial example generators.""" + + last_epsilon: float | None = None + + @abstractmethod + def generate(self, model, x, y, criterion): + # Concrete generators must return an adversarial version of the input batch. + raise NotImplementedError + + def _sample_epsilon(self, device: torch.device) -> float: + # Sample the effective epsilon on the same device as the batch. + epsilon_max = float(self.config.epsilon) + if epsilon_max <= 0.0: + self.last_epsilon = 0.0 + return 0.0 + + # Use a different attack strength per batch, capped by the user epsilon. + epsilon_min = epsilon_max / 4.0 + epsilon_step = epsilon_max / 8.0 + num_values = max(round((epsilon_max - epsilon_min) / epsilon_step) + 1, 1) + index = int(torch.randint(num_values, (), device=device).item()) + epsilon = min(epsilon_min + index * epsilon_step, epsilon_max) + self.last_epsilon = epsilon + return epsilon diff --git a/nebula/addons/defenses/adversarial_training/config.py b/nebula/addons/defenses/adversarial_training/config.py new file mode 100644 index 000000000..48dd73e6e --- /dev/null +++ b/nebula/addons/defenses/adversarial_training/config.py @@ -0,0 +1,129 @@ +from dataclasses import dataclass +from typing import Any + +from nebula.core.datasets.image_metadata import IMAGE_DATASET_NORMALIZATION + +IMAGE_ADVERSARIAL_ATTACKS = {"fgsm", "pgd"} +TABULAR_ADVERSARIAL_ATTACKS = {"constrained_pgd"} +TABULAR_ADVERSARIAL_DATASETS = {"AdultCensus", "BreastCancer", "Covtype", "KDDCUP99"} + +ERR_IMAGE_ATTACK = "image adversarial_training.attack must be one of: fgsm, pgd" +ERR_TABULAR_ATTACK = "tabular adversarial_training.attack must be one of: constrained_pgd" +ERR_MODE = "adversarial_training.mode must be one of: adversarial, mixed" +ERR_EPSILON = "adversarial_training.epsilon must be >= 0" +ERR_ALPHA = "adversarial_training.alpha must be >= 0" +ERR_STEPS = "adversarial_training.steps must be >= 1" +ERR_APPLY_PROBABILITY = "adversarial_training.apply_probability must be in [0, 1]" +ERR_CANDIDATE_SELECTION = ( + "tabular adversarial_training.candidate_selection must be one of: none, loss_window, margin_window" +) +ERR_LOSS_INCREASE = "adversarial_training loss increase thresholds must be >= 0 and target <= max" +ERR_MARGIN_WINDOW = "adversarial_training margin thresholds must satisfy target_margin <= max_margin" +ERR_TABULAR_METADATA = "Tabular adversarial training requires tabular_metadata" +ERR_UNSUPPORTED_ATTACK = "Unsupported adversarial training attack: {attack}" + +@dataclass(frozen=True) +class AdversarialTrainingConfig: + enabled: bool = False + dataset_name: str | None = None + domain: str = "image" + attack: str = "fgsm" + epsilon: float = 8.0 / 255.0 + alpha: float | None = None + steps: int = 1 + mode: str = "mixed" + clean_weight: float = 0.5 + adversarial_weight: float = 0.5 + apply_probability: float = 0.3 + log_adversarial_metrics: bool = True + candidate_selection: str = "none" + target_loss_increase: float | None = None + max_loss_increase: float | None = None + target_margin: float | None = 0.0 + max_margin: float | None = 0.5 + + +def config_from_participant(participant_config: dict[str, Any]) -> AdversarialTrainingConfig | None: + # Read the raw participant config and normalize it into a typed defense config. + raw = participant_config.get("defense_args", {}).get("adversarial_training", {}) + if not raw or not raw.get("enabled", False): + return None + + dataset_name = participant_config.get("data_args", {}).get("dataset") + domain = str(raw.get("domain", "image")).lower() + attack = str(raw.get("attack", "constrained_pgd" if domain == "tabular" else "fgsm")).lower() + + mode = str(raw.get("mode", "mixed")).lower() + clean_weight, adversarial_weight = _loss_weights_for_mode(mode) + + return AdversarialTrainingConfig( + enabled=True, + dataset_name=dataset_name, + domain=domain, + attack=attack, + epsilon=float(raw.get("epsilon", 8.0 / 255.0)), + alpha=float(raw["alpha"]) if raw.get("alpha") is not None else None, + steps=int(raw.get("steps", 1)), + mode=mode, + clean_weight=clean_weight, + adversarial_weight=adversarial_weight, + apply_probability=float(raw.get("apply_probability", 0.3)), + log_adversarial_metrics=True, + candidate_selection=str(raw.get("candidate_selection", "none")).lower(), + target_loss_increase=float(raw["target_loss_increase"]) + if raw.get("target_loss_increase") is not None + else None, + max_loss_increase=float(raw["max_loss_increase"]) + if raw.get("max_loss_increase") is not None + else None, + target_margin=float(raw["target_margin"]) + if raw.get("target_margin") is not None + else 0.0, + max_margin=float(raw["max_margin"]) + if raw.get("max_margin") is not None + else 0.5, + ) + + +def _loss_weights_for_mode(mode: str) -> tuple[float, float]: + if mode == "adversarial": + return 0.0, 1.0 + return 0.5, 0.5 + + +def validate_config(config: AdversarialTrainingConfig) -> None: + # Fail early when a frontend/backend config value cannot produce a valid attack. + if config.mode not in {"adversarial", "mixed"}: + raise ValueError(ERR_MODE) + if config.domain == "image" and config.attack not in IMAGE_ADVERSARIAL_ATTACKS: + raise ValueError(ERR_IMAGE_ATTACK) + if config.domain == "tabular" and config.attack not in TABULAR_ADVERSARIAL_ATTACKS: + raise ValueError(ERR_TABULAR_ATTACK) + if config.domain == "tabular" and config.candidate_selection not in {"none", "loss_window", "margin_window"}: + raise ValueError(ERR_CANDIDATE_SELECTION) + if config.domain == "image" and config.candidate_selection != "none": + raise ValueError(ERR_CANDIDATE_SELECTION) + if config.epsilon < 0: + raise ValueError(ERR_EPSILON) + if config.alpha is not None and config.alpha < 0: + raise ValueError(ERR_ALPHA) + if config.steps < 1: + raise ValueError(ERR_STEPS) + if not 0.0 <= config.apply_probability <= 1.0: + raise ValueError(ERR_APPLY_PROBABILITY) + if config.target_loss_increase is not None and config.target_loss_increase < 0: + raise ValueError(ERR_LOSS_INCREASE) + if config.max_loss_increase is not None and config.max_loss_increase < 0: + raise ValueError(ERR_LOSS_INCREASE) + if ( + config.target_loss_increase is not None + and config.max_loss_increase is not None + and config.target_loss_increase > config.max_loss_increase + ): + raise ValueError(ERR_LOSS_INCREASE) + if ( + config.target_margin is not None + and config.max_margin is not None + and config.target_margin > config.max_margin + ): + raise ValueError(ERR_MARGIN_WINDOW) diff --git a/nebula/addons/defenses/adversarial_training/defense.py b/nebula/addons/defenses/adversarial_training/defense.py new file mode 100644 index 000000000..1e0cbc8de --- /dev/null +++ b/nebula/addons/defenses/adversarial_training/defense.py @@ -0,0 +1,270 @@ +import logging +from typing import Any + +import torch + +from nebula.addons.defenses.adversarial_training.base import AdversarialExampleGenerator +from nebula.addons.defenses.adversarial_training.config import ( + ERR_ALPHA, + ERR_APPLY_PROBABILITY, + ERR_CANDIDATE_SELECTION, + ERR_EPSILON, + ERR_IMAGE_ATTACK, + ERR_LOSS_INCREASE, + ERR_MARGIN_WINDOW, + ERR_MODE, + ERR_STEPS, + ERR_TABULAR_ATTACK, + ERR_TABULAR_METADATA, + ERR_UNSUPPORTED_ATTACK, + IMAGE_ADVERSARIAL_ATTACKS, + IMAGE_DATASET_NORMALIZATION, + TABULAR_ADVERSARIAL_ATTACKS, + TABULAR_ADVERSARIAL_DATASETS, + AdversarialTrainingConfig, + config_from_participant, + validate_config, +) +from nebula.core.datasets.image_metadata import get_image_normalization +from nebula.addons.defenses.adversarial_training.image import ( + ImageAdversarialExampleGenerator, + ImageFGSMGenerator, + ImagePGDGenerator, +) +from nebula.addons.defenses.adversarial_training.logging import AdversarialTrainingSampleLogger +from nebula.addons.defenses.adversarial_training.tabular import ( + TabularAdversarialExampleGenerator, + TabularConstrainedPGDGenerator, + TabularConstraintSet, +) +from nebula.core.datasets.tabular_metadata import CATEGORICAL, CONTINUOUS, INTEGER, TabularAdversarialMetadata + + +class AdversarialTrainingDefense: + """Batch-level adversarial training defense for Nebula models.""" + + LOGGED_SAMPLES_PER_ROUND = AdversarialTrainingSampleLogger.LOGGED_SAMPLES_PER_ROUND + + def __init__(self, config: AdversarialTrainingConfig, generator: AdversarialExampleGenerator): + # Keep the selected generator and logger together for each participant model. + self.config = config + self.generator = generator + self.sample_logger = AdversarialTrainingSampleLogger(config, generator) + self._logged_adversarial_samples_by_round = self.sample_logger._logged_samples_by_round + + @classmethod + def from_participant_config( + cls, + participant_config: dict[str, Any], + partition=None, + ) -> "AdversarialTrainingDefense | None": + # This is the only entry point used by Nebula's node setup. + config = config_from_participant(participant_config) + if config is None: + return None + validate_config(config) + + if config.domain == "tabular": + metadata = cls._get_tabular_metadata(partition) + return cls(config=config, generator=TabularConstrainedPGDGenerator(config, metadata)) + + if config.domain == "image": + # Image attacks run in normalized model space, so each dataset must provide mean/std. + normalization = get_image_normalization(config.dataset_name) + if normalization is None: + logging.warning( + "[AdversarialTrainingDefense] Skipping adversarial training: dataset '%s' has no image bounds", + config.dataset_name, + ) + return None + + return cls(config=config, generator=cls._build_image_generator(config, normalization)) + + logging.warning( + "[AdversarialTrainingDefense] Skipping adversarial training: domain '%s' is not implemented yet", + config.domain, + ) + return None + + @staticmethod + def _build_image_generator(config, normalization): + # Choose the image attack implementation requested by the participant config. + mean, std = normalization + if config.attack == "fgsm": + return ImageFGSMGenerator(config, mean, std) + if config.attack == "pgd": + return ImagePGDGenerator(config, mean, std) + raise ValueError(ERR_UNSUPPORTED_ATTACK.format(attack=config.attack)) + + @staticmethod + def _get_tabular_metadata(partition) -> TabularAdversarialMetadata: + # Load the tabular constraints from the local training partition. + train_set = getattr(partition, "train_set", None) if partition is not None else None + metadata = getattr(train_set, "tabular_metadata", None) + if metadata is None: + raise ValueError(ERR_TABULAR_METADATA) + # Metadata can come from an in-memory dataset object or from a serialized config. + if isinstance(metadata, TabularAdversarialMetadata): + tabular_metadata = metadata + else: + tabular_metadata = TabularAdversarialMetadata.from_dict(metadata) + + _log_tabular_metadata(tabular_metadata) + return tabular_metadata + + def should_apply(self, x: torch.Tensor) -> bool: + # Allows adversarial training to be applied to only a fraction of batches. + if self.config.apply_probability >= 1.0: + return True + if self.config.apply_probability <= 0.0: + return False + return bool(torch.rand((), device=x.device).item() < self.config.apply_probability) + + def compute_training_step(self, model, x, y, criterion): + if not self.should_apply(x): + logits = model(x) + loss = criterion(logits, y) + return loss, logits, {} + + # Generate x_adv once and reuse it for logging, adversarial loss and metrics. + x_adv = self.generator.generate(model, x, y, criterion) + self._log_adversarial_samples(model, x, x_adv, y) + adv_logits = model(x_adv) + adv_loss = criterion(adv_logits, y) + + # "adversarial" replaces the clean batch loss completely. + if self.config.mode == "adversarial": + return adv_loss, adv_logits, self._extra_metrics({ + "Adversarial Loss": adv_loss, + "Adversarial Accuracy": self._accuracy(adv_logits, y), + }) + + clean_logits = model(x) + clean_loss = criterion(clean_logits, y) + # "mixed" uses a fixed 50/50 clean/adversarial objective. + loss = self.config.clean_weight * clean_loss + self.config.adversarial_weight * adv_loss + + return loss, clean_logits, self._extra_metrics({ + "Clean Loss": clean_loss, + "Adversarial Loss": adv_loss, + "Adversarial Accuracy": self._accuracy(adv_logits, y), + }) + + def _log_adversarial_samples(self, model, x_clean: torch.Tensor, x_adv: torch.Tensor, y: torch.Tensor) -> None: + # Delegate logging so the training step stays focused on loss computation. + self.sample_logger.log(model, x_clean, x_adv, y) + + def _accuracy(self, logits, y): + # Compute batch accuracy from model logits. + predictions = torch.argmax(logits, dim=1) + return torch.mean((predictions == y).float()) + + def _extra_metrics(self, metrics): + # Allow users to disable adversarial metrics without changing the training loss. + if not self.config.log_adversarial_metrics: + return {} + return metrics + + +def _log_tabular_metadata(tabular_metadata: TabularAdversarialMetadata) -> None: + # Log a compact metadata summary to make constrained PGD setup auditable. + integer_features = _feature_names_by_type(tabular_metadata, {INTEGER}) + continuous_features = _feature_names_by_type(tabular_metadata, {CONTINUOUS}) + categorical_features = _feature_names_by_type(tabular_metadata, {CATEGORICAL}) + non_perturbable_features = _feature_names_excluding_types( + tabular_metadata, + {CONTINUOUS, INTEGER, CATEGORICAL}, + ) + logging.info( + "[AdversarialTrainingDefense] Tabular feature mask loaded | integer=%s | continuous=%s | " + "categorical=%s | categorical_groups=%s | non_perturbable=%s | integer_features=%s | " + "continuous_features=%s | categorical_preview=%s | non_perturbable_preview=%s", + len(integer_features), + len(continuous_features), + len(categorical_features), + len(tabular_metadata.categorical_groups or []), + len(non_perturbable_features), + integer_features, + continuous_features, + categorical_features[:20], + non_perturbable_features[:20], + ) + + +def _feature_names_by_type(tabular_metadata: TabularAdversarialMetadata, feature_types: set[str]) -> list[str]: + # Return feature names whose metadata type is included in feature_types. + return [ + name + for name, feature_type in zip(tabular_metadata.feature_names, tabular_metadata.feature_types, strict=True) + if feature_type in feature_types + ] + + +def _feature_names_excluding_types(tabular_metadata: TabularAdversarialMetadata, feature_types: set[str]) -> list[str]: + # Return feature names whose metadata type is not included in feature_types. + return [ + name + for name, feature_type in zip(tabular_metadata.feature_names, tabular_metadata.feature_types, strict=True) + if feature_type not in feature_types + ] + + +def apply_adversarial_training_if_enabled(model, participant_config: dict[str, Any], partition=None) -> None: + # Attach the defense to the model only when the participant config enables it. + defense = AdversarialTrainingDefense.from_participant_config(participant_config, partition=partition) + if defense is not None: + model.set_adversarial_training(defense) + logging.info( + "[AdversarialTrainingDefense] Enabled | dataset=%s | attack=%s | epsilon_max=%s | " + "epsilon_range=[%.6f, %.6f] | epsilon_step=%.6f | steps=%s | mode=%s | " + "clean_weight=%.2f | adversarial_weight=%.2f | apply_probability=%.2f | " + "candidate_selection=%s | target_loss_increase=%s | max_loss_increase=%s | " + "target_margin=%s | max_margin=%s | log_adversarial_metrics=%s", + defense.config.dataset_name, + defense.config.attack, + defense.config.epsilon, + defense.config.epsilon / 4.0, + defense.config.epsilon, + defense.config.epsilon / 8.0, + defense.config.steps, + defense.config.mode, + defense.config.clean_weight, + defense.config.adversarial_weight, + defense.config.apply_probability, + defense.config.candidate_selection, + defense.config.target_loss_increase, + defense.config.max_loss_increase, + defense.config.target_margin, + defense.config.max_margin, + defense.config.log_adversarial_metrics, + ) + + +__all__ = [ + "ERR_ALPHA", + "ERR_APPLY_PROBABILITY", + "ERR_CANDIDATE_SELECTION", + "ERR_EPSILON", + "ERR_IMAGE_ATTACK", + "ERR_LOSS_INCREASE", + "ERR_MARGIN_WINDOW", + "ERR_MODE", + "ERR_STEPS", + "ERR_TABULAR_ATTACK", + "ERR_TABULAR_METADATA", + "ERR_UNSUPPORTED_ATTACK", + "IMAGE_ADVERSARIAL_ATTACKS", + "IMAGE_DATASET_NORMALIZATION", + "TABULAR_ADVERSARIAL_ATTACKS", + "TABULAR_ADVERSARIAL_DATASETS", + "AdversarialExampleGenerator", + "AdversarialTrainingConfig", + "AdversarialTrainingDefense", + "ImageAdversarialExampleGenerator", + "ImageFGSMGenerator", + "ImagePGDGenerator", + "TabularAdversarialExampleGenerator", + "TabularConstrainedPGDGenerator", + "TabularConstraintSet", + "apply_adversarial_training_if_enabled", +] diff --git a/nebula/addons/defenses/adversarial_training/image.py b/nebula/addons/defenses/adversarial_training/image.py new file mode 100644 index 000000000..585231f32 --- /dev/null +++ b/nebula/addons/defenses/adversarial_training/image.py @@ -0,0 +1,87 @@ +import torch + +from nebula.addons.defenses.adversarial_training.base import AdversarialExampleGenerator +from nebula.addons.defenses.adversarial_training.config import AdversarialTrainingConfig + +IMAGE_CLIP_MIN = 0.0 +IMAGE_CLIP_MAX = 1.0 + + +class ImageAdversarialExampleGenerator(AdversarialExampleGenerator): + def __init__(self, config: AdversarialTrainingConfig, mean: tuple[float, ...], std: tuple[float, ...]): + # Store normalization values so attacks can move between pixel and model space. + self.config = config + self.mean = mean + self.std = std + + def _channel_tensor(self, values: tuple[float, ...], x: torch.Tensor) -> torch.Tensor: + # Reshape per-channel values so they broadcast over the whole image batch. + shape = [1, len(values)] + [1] * max(x.dim() - 2, 0) + return torch.tensor(values, dtype=x.dtype, device=x.device).view(*shape) + + def _epsilon(self, x: torch.Tensor, epsilon: float) -> torch.Tensor: + # Image batches are normalized, so pixel-space epsilon must be scaled by std. + std = self._channel_tensor(self.std, x) + return float(epsilon) / std + + def _alpha(self, x: torch.Tensor, epsilon: float) -> torch.Tensor: + # Use the configured step size, or split epsilon across PGD steps by default. + alpha = self.config.alpha + if alpha is None: + alpha = epsilon / max(int(self.config.steps), 1) + std = self._channel_tensor(self.std, x) + return float(alpha) / std + + def _bounds(self, x: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: + # Convert valid pixel bounds to the normalized space where the model operates. + mean = self._channel_tensor(self.mean, x) + std = self._channel_tensor(self.std, x) + lower = (IMAGE_CLIP_MIN - mean) / std + upper = (IMAGE_CLIP_MAX - mean) / std + return lower, upper + + def denormalize(self, x: torch.Tensor) -> torch.Tensor: + # Convert normalized tensors back to pixel scale for logging. + mean = self._channel_tensor(self.mean, x) + std = self._channel_tensor(self.std, x) + return (x * std + mean).clamp(IMAGE_CLIP_MIN, IMAGE_CLIP_MAX) + + def _project(self, x_adv: torch.Tensor, x_clean: torch.Tensor, epsilon: float) -> torch.Tensor: + # Keep the adversarial image inside both the epsilon ball and valid pixel bounds. + epsilon = self._epsilon(x_clean, epsilon) + lower, upper = self._bounds(x_clean) + x_adv = torch.max(torch.min(x_adv, x_clean + epsilon), x_clean - epsilon) + return torch.max(torch.min(x_adv, upper), lower) + + +class ImageFGSMGenerator(ImageAdversarialExampleGenerator): + def generate(self, model, x, y, criterion): + # Build one adversarial image batch with a single gradient step. + epsilon = self._sample_epsilon(x.device) + x_adv = x.detach().clone().requires_grad_(True) + logits = model(x_adv) + loss = criterion(logits, y) + grad = torch.autograd.grad(loss, x_adv, only_inputs=True)[0] + # FGSM takes one step in the sign of the loss gradient. + x_adv = x_adv + self._epsilon(x_adv, epsilon) * grad.sign() + return self._project(x_adv.detach(), x.detach(), epsilon) + + +class ImagePGDGenerator(ImageAdversarialExampleGenerator): + def generate(self, model, x, y, criterion): + # Build one adversarial image batch with iterative projected gradient steps. + epsilon = self._sample_epsilon(x.device) + x_clean = x.detach() + x_adv = x_clean.clone() + steps = max(int(self.config.steps), 1) + + for _ in range(steps): + x_adv = x_adv.detach().requires_grad_(True) + logits = model(x_adv) + loss = criterion(logits, y) + grad = torch.autograd.grad(loss, x_adv, only_inputs=True)[0] + # PGD repeats smaller FGSM-like steps and projects after each step. + x_adv = x_adv + self._alpha(x_adv, epsilon) * grad.sign() + x_adv = self._project(x_adv.detach(), x_clean, epsilon) + + return x_adv.detach() diff --git a/nebula/addons/defenses/adversarial_training/logging.py b/nebula/addons/defenses/adversarial_training/logging.py new file mode 100644 index 000000000..013a398ac --- /dev/null +++ b/nebula/addons/defenses/adversarial_training/logging.py @@ -0,0 +1,225 @@ +import logging + +import torch + +from nebula.addons.defenses.adversarial_training.config import AdversarialTrainingConfig +from nebula.config.config import TRAINING_LOGGER + +logging_training = logging.getLogger(TRAINING_LOGGER) + + +class AdversarialTrainingSampleLogger: + """Logs representative clean/adversarial samples without affecting training tensors.""" + + LOGGED_SAMPLES_PER_ROUND = 3 + + def __init__(self, config: AdversarialTrainingConfig, generator): + # Keep logging state per defense instance and per federated round. + self.config = config + self.generator = generator + self._logged_samples_by_round: dict[int, int] = {} + + def log(self, model, x_clean: torch.Tensor, x_adv: torch.Tensor, y: torch.Tensor) -> None: + # Log only a few representative samples per round to avoid noisy training logs. + if not self.config.log_adversarial_metrics: + return + + current_round = int(getattr(model, "round", 0)) + already_logged = self._logged_samples_by_round.get(current_round, 0) + remaining = self.LOGGED_SAMPLES_PER_ROUND - already_logged + if remaining <= 0: + return + + with torch.no_grad(): + # Predictions must use the same normalized tensors that the model saw during training. + model_clean = x_clean.detach() + model_adv = x_adv.detach() + clean_predictions = torch.argmax(model(model_clean), dim=1) + adversarial_predictions = torch.argmax(model(model_adv), dim=1) + + # Display values can be denormalized for images; tabular tensors are already in model space. + clean_view = model_clean + adv_view = model_adv + if hasattr(self.generator, "denormalize"): + clean_view = self.generator.denormalize(clean_view) + adv_view = self.generator.denormalize(adv_view) + + delta = adv_view - clean_view + samples_to_log = min(remaining, int(clean_view.size(0))) + for sample_idx in range(samples_to_log): + self._log_sample( + current_round=current_round, + sample_number=already_logged + sample_idx + 1, + clean=clean_view[sample_idx].detach().float().cpu(), + adversarial=adv_view[sample_idx].detach().float().cpu(), + delta=delta[sample_idx].detach().float().cpu(), + label=self._safe_scalar(y, sample_idx), + clean_prediction=self._safe_scalar(clean_predictions, sample_idx), + adversarial_prediction=self._safe_scalar(adversarial_predictions, sample_idx), + ) + + self._logged_samples_by_round[current_round] = already_logged + samples_to_log + + def _log_sample( + self, + current_round: int, + sample_number: int, + clean: torch.Tensor, + adversarial: torch.Tensor, + delta: torch.Tensor, + label: int | None, + clean_prediction: int | None, + adversarial_prediction: int | None, + ) -> None: + # Write the shared summary line before adding image/tabular-specific details. + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Sample %s/%s before/after distortion | " + "dataset=%s | attack=%s | epsilon_effective=%.6f | label=%s | " + "clean_pred=%s | adversarial_pred=%s | " + "clean[min=%.6f max=%.6f mean=%.6f] | " + "adv[min=%.6f max=%.6f mean=%.6f] | delta_linf=%.6f | delta_l2=%.6f", + current_round, + sample_number, + self.LOGGED_SAMPLES_PER_ROUND, + self.config.dataset_name, + self.config.attack, + float(getattr(self.generator, "last_epsilon", self.config.epsilon) or 0.0), + label, + clean_prediction, + adversarial_prediction, + clean.min().item(), + clean.max().item(), + clean.mean().item(), + adversarial.min().item(), + adversarial.max().item(), + adversarial.mean().item(), + delta.abs().max().item(), + delta.reshape(-1).norm(p=2).item(), + ) + if self.config.domain == "tabular": + self._log_tabular_sample(current_round, sample_number, clean, adversarial, delta) + else: + # Image logs stay compact: a 4x4 patch is enough to see that perturbations exist. + self._log_image_sample(current_round, sample_number, clean, adversarial, delta) + + def _log_tabular_sample( + self, + current_round: int, + sample_number: int, + clean: torch.Tensor, + adversarial: torch.Tensor, + delta: torch.Tensor, + ) -> None: + # For tabular data, log full vectors because each feature has semantic meaning. + feature_names = getattr(getattr(self.generator, "metadata", None), "feature_names", None) + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Clean tabular sample %s:\n%s", + current_round, + sample_number, + self._format_tabular_vector(clean, feature_names), + ) + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Final adversarial tabular sample %s:\n%s", + current_round, + sample_number, + self._format_tabular_vector(adversarial, feature_names), + ) + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Tabular perturbation delta sample %s:\n%s", + current_round, + sample_number, + self._format_tabular_vector(delta, feature_names), + ) + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Changed tabular features sample %s:\n%s", + current_round, + sample_number, + self._format_tabular_changes(clean, adversarial, delta, feature_names), + ) + + def _log_image_sample( + self, + current_round: int, + sample_number: int, + clean: torch.Tensor, + adversarial: torch.Tensor, + delta: torch.Tensor, + ) -> None: + # For images, log a small patch instead of the full tensor. + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Clean sample %s channel0 4x4:\n%s", + current_round, + sample_number, + self._format_patch(clean), + ) + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Adversarial sample %s channel0 4x4:\n%s", + current_round, + sample_number, + self._format_patch(adversarial), + ) + logging_training.info( + "[AdversarialTrainingDefense] Round %s | Delta sample %s channel0 4x4:\n%s", + current_round, + sample_number, + self._format_patch(delta), + ) + + @staticmethod + def _safe_scalar(values: torch.Tensor, sample_idx: int) -> int | None: + # Read one scalar defensively in case a short tensor is passed to the logger. + if values.numel() <= sample_idx: + return None + return int(values[sample_idx].detach().cpu().item()) + + @staticmethod + def _format_patch(sample: torch.Tensor, patch_size: int = 4) -> str: + # Format a small leading patch so image logs stay human-readable. + if sample.dim() >= 3: + patch = sample[0, :patch_size, :patch_size] + elif sample.dim() == 2: + patch = sample[:patch_size, :patch_size] + else: + patch = sample[:patch_size] + values = patch.tolist() + if sample.dim() < 2: + return str([round(float(value), 6) for value in values]) + return str([[round(float(value), 6) for value in row] for row in values]) + + @staticmethod + def _format_tabular_vector(sample: torch.Tensor, feature_names: list[str] | None = None) -> str: + # Format a tabular sample as a feature-name to value mapping. + values = sample.reshape(-1).tolist() + names = feature_names or [f"feature_{idx}" for idx in range(len(values))] + return str({str(name): round(float(value), 6) for name, value in zip(names, values, strict=False)}) + + @staticmethod + def _format_tabular_changes( + clean: torch.Tensor, + adversarial: torch.Tensor, + delta: torch.Tensor, + feature_names: list[str] | None = None, + tolerance: float = 1e-7, + ) -> str: + # Format only features whose perturbation is larger than numerical noise. + clean_values = clean.reshape(-1).tolist() + adversarial_values = adversarial.reshape(-1).tolist() + delta_values = delta.reshape(-1).tolist() + names = feature_names or [f"feature_{idx}" for idx in range(len(delta_values))] + # Keep the changed-features log focused; full vectors are logged just above. + changes = { + str(name): { + "clean": round(float(clean_value), 6), + "adversarial": round(float(adversarial_value), 6), + "delta": round(float(delta_value), 6), + } + for name, clean_value, adversarial_value, delta_value in zip( + names, + clean_values, + adversarial_values, + delta_values, + strict=False, + ) + if abs(float(delta_value)) > tolerance + } + return str(changes) diff --git a/nebula/addons/defenses/adversarial_training/tabular.py b/nebula/addons/defenses/adversarial_training/tabular.py new file mode 100644 index 000000000..661556d9b --- /dev/null +++ b/nebula/addons/defenses/adversarial_training/tabular.py @@ -0,0 +1,275 @@ +import torch +import torch.nn.functional as F + +from nebula.addons.defenses.adversarial_training.base import AdversarialExampleGenerator +from nebula.addons.defenses.adversarial_training.config import AdversarialTrainingConfig +from nebula.core.datasets.tabular_metadata import CATEGORICAL, CONTINUOUS, INTEGER, TabularAdversarialMetadata + + +class TabularConstraintSet: + """Projects tabular attack candidates back to the valid feature domain.""" + + def __init__(self, metadata: TabularAdversarialMetadata): + # The metadata is dataset-level and immutable; derived tensors are cached per device/dtype. + self.metadata = metadata + self._tensor_cache: dict[tuple[torch.device, torch.dtype], dict[str, torch.Tensor]] = {} + + def tensors(self, x: torch.Tensor) -> dict[str, torch.Tensor]: + # Masks and bounds are reused in every constrained PGD step, so build them once per placement. + key = (x.device, x.dtype) + cached = self._tensor_cache.get(key) + if cached is not None: + return cached + + # Masks have shape (1, n_features), which broadcasts over the batch dimension. + cached = { + "continuous": self._feature_type_mask(x, CONTINUOUS), + "integer": self._feature_type_mask(x, INTEGER), + "categorical": self._feature_type_mask(x, CATEGORICAL), + "min": torch.tensor(self.metadata.feature_min_norm, dtype=x.dtype, device=x.device).view(1, -1), + "max": torch.tensor(self.metadata.feature_max_norm, dtype=x.dtype, device=x.device).view(1, -1), + } + cached["numeric"] = cached["continuous"] | cached["integer"] + cached["perturbable"] = cached["numeric"] | cached["categorical"] + cached["integer_step"] = self._integer_steps(cached["min"]) + self._tensor_cache[key] = cached + return cached + + def perturbable_mask(self, x: torch.Tensor) -> torch.Tensor: + # Used by the attack step to avoid moving immutable features in the first place. + return self.tensors(x)["perturbable"] + + def project(self, x_candidate: torch.Tensor, x_clean: torch.Tensor, epsilon: float) -> torch.Tensor: + # Clamp numeric features, round integers, restore immutable features and fix one-hot groups. + tensors = self.tensors(x_clean) + lower, upper = self._bounds(x_clean, epsilon, tensors) + + # First force every value into its valid interval, then apply type-specific fixes. + x_projected = torch.max(torch.min(x_candidate, upper), lower) + x_projected = self._project_integer_features(x_projected, x_clean, lower, upper, tensors) + x_projected = self.project_categorical_groups(x_projected) + # Immutable features are copied back from the original clean sample as the final guardrail. + return torch.where(tensors["perturbable"], x_projected, x_clean) + + def categorical_gradient_step(self, x_candidate: torch.Tensor, grad: torch.Tensor) -> torch.Tensor: + if not self.metadata.categorical_groups: + return x_candidate + + # One-hot columns are discrete: instead of adding a fractional gradient, + # activate the category whose gradient most increases the adversarial loss. + x_stepped = x_candidate.clone() + for group in self.metadata.categorical_groups: + group_tensor = torch.tensor(group, dtype=torch.long, device=x_candidate.device) + selected = grad.index_select(1, group_tensor).argmax(dim=1) + x_stepped[:, group_tensor] = F.one_hot(selected, num_classes=len(group)).to(dtype=x_candidate.dtype) + return x_stepped + + def project_categorical_groups(self, x_candidate: torch.Tensor) -> torch.Tensor: + if not self.metadata.categorical_groups: + return x_candidate + + # Projection must always leave each one-hot group with exactly one active feature. + x_projected = x_candidate.clone() + for group in self.metadata.categorical_groups: + group_tensor = torch.tensor(group, dtype=torch.long, device=x_candidate.device) + selected = x_candidate.index_select(1, group_tensor).argmax(dim=1) + x_projected[:, group_tensor] = F.one_hot(selected, num_classes=len(group)).to(dtype=x_candidate.dtype) + return x_projected + + def _feature_type_mask(self, x: torch.Tensor, feature_type: str) -> torch.Tensor: + return torch.tensor( + [value == feature_type for value in self.metadata.feature_types], + dtype=torch.bool, + device=x.device, + ).view(1, -1) + + def _bounds( + self, + x_clean: torch.Tensor, + epsilon: float, + tensors: dict[str, torch.Tensor], + ) -> tuple[torch.Tensor, torch.Tensor]: + # Numeric features are restricted both by dataset bounds and by the epsilon ball around x_clean. + numeric_lower = torch.maximum(tensors["min"], x_clean - float(epsilon)) + numeric_upper = torch.minimum(tensors["max"], x_clean + float(epsilon)) + # Categorical features are handled by one-hot projection, not by an epsilon ball. + lower = torch.where(tensors["categorical"], tensors["min"], numeric_lower) + upper = torch.where(tensors["categorical"], tensors["max"], numeric_upper) + return lower, upper + + def _integer_steps(self, minimum: torch.Tensor) -> torch.Tensor: + # Default step=1 is harmless for non-integer columns because the integer mask gates usage later. + integer_steps = torch.ones_like(minimum) + for idx, step in (self.metadata.integer_step_norm or {}).items(): + integer_steps[0, int(idx)] = float(step) + return integer_steps + + def _project_integer_features( + self, + x_projected: torch.Tensor, + x_clean: torch.Tensor, + lower: torch.Tensor, + upper: torch.Tensor, + tensors: dict[str, torch.Tensor], + ) -> torch.Tensor: + integer_mask = tensors["integer"] + if not integer_mask.any(): + return x_projected + + # Integer features may be normalized, so the valid values form a shifted grid: + # min, min + step, min + 2*step, ... + step = torch.clamp(tensors["integer_step"], min=torch.finfo(x_projected.dtype).eps) + grid_lower = torch.ceil((lower - tensors["min"]) / step) * step + tensors["min"] + grid_upper = torch.floor((upper - tensors["min"]) / step) * step + tensors["min"] + rounded = torch.round((x_projected - tensors["min"]) / step) * step + tensors["min"] + rounded = torch.max(torch.min(rounded, grid_upper), grid_lower) + + # If epsilon is smaller than the normalized integer step, no valid integer move exists. + has_valid_grid = grid_lower <= grid_upper + rounded = torch.where(has_valid_grid, rounded, x_clean) + return torch.where(integer_mask, rounded, x_projected) + + +class TabularAdversarialExampleGenerator(AdversarialExampleGenerator): + """Base generator for constrained tabular adversarial examples.""" + + def __init__(self, config: AdversarialTrainingConfig, metadata: TabularAdversarialMetadata): + # Generators share the same constraint layer; only the search strategy should vary. + self.config = config + self.metadata = metadata + self.constraints = TabularConstraintSet(metadata) + + def _alpha(self, epsilon: float) -> float: + # By default, distribute the epsilon budget evenly across constrained PGD steps. + if self.config.alpha is not None: + return float(self.config.alpha) + return float(epsilon) / max(int(self.config.steps), 1) + + def _margin(self, logits: torch.Tensor, y: torch.Tensor) -> torch.Tensor: + # Positive margin means some wrong class already beats the true class. + true_logits = logits.gather(1, y.view(-1, 1)).squeeze(1) + true_class_mask = F.one_hot(y, num_classes=logits.size(1)).bool() + other_logits = logits.masked_fill(true_class_mask, float("-inf")) + return other_logits.max(dim=1).values - true_logits + + def _per_sample_loss(self, logits: torch.Tensor, y: torch.Tensor) -> torch.Tensor: + # The attack needs per-sample scores so each row can stop once it is hard enough. + return F.cross_entropy(logits, y, reduction="none") + + +class TabularConstrainedPGDGenerator(TabularAdversarialExampleGenerator): + """Constrained PGD generator for tabular adversarial examples.""" + + def generate(self, model, x, y, criterion): + # Sample one attack strength for this batch, matching the image generator behavior. + epsilon = self._sample_epsilon(x.device) + x_clean = x.detach() + if epsilon <= 0.0: + return x_clean + + steps = max(int(self.config.steps), 1) + step_size = self._alpha(epsilon) + perturbable_mask = self.constraints.perturbable_mask(x_clean).to(dtype=x_clean.dtype) + + x_adv = x_clean.clone() + best_adv = x_adv.clone() + best_score = torch.full((x_clean.size(0),), float("-inf"), dtype=x_clean.dtype, device=x_clean.device) + best_distance = torch.full((x_clean.size(0),), float("inf"), dtype=x_clean.dtype, device=x_clean.device) + use_loss_window = self._use_loss_window() + use_margin_window = self._use_margin_window() + clean_loss = self._clean_loss(model, x_clean, y) if use_loss_window else None + + for _ in range(steps): + # PGD step: move in the sign of the loss gradient, but only on perturbable features. + x_grad = x_adv.detach().requires_grad_(True) + logits = model(x_grad) + loss = criterion(logits, y) + grad = torch.autograd.grad(loss, x_grad, only_inputs=True)[0] + + candidate = x_adv.detach() + float(step_size) * grad.sign() * perturbable_mask + candidate = self.constraints.categorical_gradient_step(candidate, grad) + # This is the key tabular rule: never score or return an invalid candidate. + candidate = self.constraints.project(candidate, x_clean, epsilon) + + with torch.no_grad(): + # Keep the best candidate per sample, not just the last step. + candidate_logits = model(candidate) + if use_loss_window: + candidate_score = self._loss_increase(candidate_logits, y, clean_loss) + better = self._loss_window_better(candidate_score, best_score) + elif use_margin_window: + candidate_score = self._margin(candidate_logits, y) + candidate_distance = self._margin_window_distance(candidate_score) + better = self._margin_window_better(candidate_score, candidate_distance, best_score, best_distance) + best_distance = torch.where(better, candidate_distance, best_distance) + else: + candidate_score = self._margin(candidate_logits, y) + better = candidate_score > best_score + best_adv = torch.where(better.view(-1, 1), candidate, best_adv) + best_score = torch.where(better, candidate_score, best_score) + + if self._target_reached(best_score, best_distance): + break + + x_adv = candidate + + return best_adv.detach() + + def _use_loss_window(self) -> bool: + return self.config.candidate_selection == "loss_window" + + def _use_margin_window(self) -> bool: + return self.config.candidate_selection == "margin_window" + + def _clean_loss(self, model, x_clean: torch.Tensor, y: torch.Tensor) -> torch.Tensor: + # Baseline difficulty. Candidate scores become loss(candidate) - loss(clean). + with torch.no_grad(): + return self._per_sample_loss(model(x_clean), y) + + def _loss_increase( + self, + candidate_logits: torch.Tensor, + y: torch.Tensor, + clean_loss: torch.Tensor, + ) -> torch.Tensor: + return self._per_sample_loss(candidate_logits, y) - clean_loss + + def _loss_window_better(self, candidate_score: torch.Tensor, best_score: torch.Tensor) -> torch.Tensor: + # A candidate must make the sample harder. If max_loss_increase is set, reject overshoots. + valid = candidate_score > 0.0 + if self.config.max_loss_increase is not None: + valid = valid & (candidate_score <= float(self.config.max_loss_increase)) + return valid & (candidate_score > best_score) + + def _margin_window_distance(self, margin: torch.Tensor) -> torch.Tensor: + # Distance is zero inside the window and positive outside. This gives a + # soft fallback when discrete tabular steps jump over the desired range. + distance = torch.zeros_like(margin) + if self.config.target_margin is not None: + target = torch.full_like(margin, float(self.config.target_margin)) + distance = torch.maximum(distance, target - margin) + if self.config.max_margin is not None: + maximum = torch.full_like(margin, float(self.config.max_margin)) + distance = torch.maximum(distance, margin - maximum) + return distance + + def _margin_window_better( + self, + candidate_score: torch.Tensor, + candidate_distance: torch.Tensor, + best_score: torch.Tensor, + best_distance: torch.Tensor, + ) -> torch.Tensor: + closer = candidate_distance < best_distance + same_distance = candidate_distance == best_distance + stronger = candidate_score > best_score + return closer | (same_distance & stronger) + + def _target_reached(self, best_score: torch.Tensor, best_distance: torch.Tensor) -> bool: + if self._use_loss_window(): + if self.config.target_loss_increase is None: + return False + return bool((best_score >= float(self.config.target_loss_increase)).all().item()) + if self._use_margin_window(): + return bool((best_distance <= torch.finfo(best_distance.dtype).eps).all().item()) + return False diff --git a/nebula/addons/defenses/feature_squeezing.py b/nebula/addons/defenses/feature_squeezing.py new file mode 100644 index 000000000..683cb9cce --- /dev/null +++ b/nebula/addons/defenses/feature_squeezing.py @@ -0,0 +1,224 @@ +import logging +from dataclasses import dataclass +from typing import Any + +import numpy as np +import torch +from PIL import Image + +PIL_IMAGE_MODES = {"1", "L", "P", "RGB", "RGBA", "CMYK", "YCbCr"} + + +# --------------------------------------------------------------------------- +# Configuration +# --------------------------------------------------------------------------- + + +@dataclass(frozen=True) +class FeatureSqueezingConfig: + enabled: bool = False + bit_depth: int = 8 + dataset_name: str | None = None + apply_to_train: bool = True + apply_to_test: bool = True + apply_to_local_test: bool = True + + +# --------------------------------------------------------------------------- +# Defense +# --------------------------------------------------------------------------- + + +class FeatureSqueezingDefense: + """Dataset-level feature squeezing for image Nebula datasets.""" + + def __init__(self, config: FeatureSqueezingConfig): + # Validate the number of quantization levels requested by the scenario. + if not isinstance(config.bit_depth, int) or not 1 <= config.bit_depth <= 64: + raise ValueError("feature_squeezing.bit_depth must be an integer in [1, 64]") + + self.config = config + self.levels = float((2**config.bit_depth) - 1) + + @classmethod + def from_participant_config(cls, participant_config: dict[str, Any]) -> "FeatureSqueezingDefense | None": + # Build the defense only when feature squeezing is enabled in the participant config. + raw = participant_config.get("defense_args", {}).get("feature_squeezing", {}) + if not raw or not raw.get("enabled", False): + return None + + return cls( + FeatureSqueezingConfig( + enabled=True, + bit_depth=int(raw.get("bit_depth", raw.get("n", 8))), + dataset_name=participant_config.get("data_args", {}).get("dataset"), + apply_to_train=bool(raw.get("apply_to_train", True)), + apply_to_test=bool(raw.get("apply_to_test", True)), + apply_to_local_test=bool(raw.get("apply_to_local_test", True)), + ) + ) + + def apply_to_partition(self, partition) -> None: + # Apply the defense to each enabled split in the participant partition. + train_set = getattr(partition, "train_set", None) + if train_set is None: + logging.warning("[FeatureSqueezingDefense] No train set found; skipping defense") + return + + logging.info( + "[FeatureSqueezingDefense] Applying feature squeezing | dataset=%s | bit_depth=%s", + self.config.dataset_name, + self.config.bit_depth, + ) + + seen_data: set[int] = set() + for name, dataset, enabled in ( + ("train", train_set, self.config.apply_to_train), + ("test", getattr(partition, "test_set", None), self.config.apply_to_test), + ("local_test", getattr(partition, "local_test_set", None), self.config.apply_to_local_test), + ): + if enabled: + self._transform_dataset(dataset, name, seen_data) + + def _transform_dataset(self, dataset, name: str, seen_data: set[int]) -> None: + # Transform all samples in one dataset split, avoiding duplicated shared data. + data = getattr(dataset, "data", None) + if dataset is None or data is None: + return + + if id(data) in seen_data: + logging.info("[FeatureSqueezingDefense] Dataset %s already transformed; skipping duplicate data", name) + self._log_check(data, name, status="already_transformed") + return + + before_sample = data[0] if len(data) else None + before = self._summary(before_sample) if before_sample is not None else None + for idx, sample in enumerate(data): + data[idx] = self._transform_sample(sample) + + seen_data.add(id(data)) + logging.info("[FeatureSqueezingDefense] Transformed %s samples in %s set", len(data), name) + self._log_check(data, name, status="transformed", before=before) + + def _transform_sample(self, sample): + # Transform only the input image and keep labels or metadata unchanged. + if isinstance(sample, tuple) and sample: + return (self._squeeze_image(sample[0]), *sample[1:]) + return self._squeeze_image(sample) + + # ------------------------------------------------------------------ + # Image squeezing + # ------------------------------------------------------------------ + + def _squeeze_image(self, value): + # Quantize PIL images, tensors, and arrays while preserving the original container type. + if isinstance(value, Image.Image): + image = value if value.mode in PIL_IMAGE_MODES else value.convert("RGB") + arr = np.asarray(image) + squeezed = np.rint(self._squeeze_image_array(arr)).clip(0, 255).astype(arr.dtype, copy=False) + return Image.fromarray(squeezed, mode=image.mode) + + squeezed = self._squeeze_image_array(self._as_numpy(value)) + return self._restore_type(value, squeezed) + + def _squeeze_image_array(self, arr: np.ndarray) -> np.ndarray: + # Normalize values to [0, 1], quantize them, and map them back to the original range. + arr_float = arr.astype(np.float32, copy=False) + if np.issubdtype(arr.dtype, np.integer): + info = np.iinfo(arr.dtype) + low, high = float(info.min), float(info.max) + else: + low, high = float(np.nanmin(arr_float)), float(np.nanmax(arr_float)) + if low >= 0.0 and high <= 1.0: + low, high = 0.0, 1.0 + + value_range = high - low + if value_range == 0: + return arr.copy() + return self._quantize01((arr_float - low) / value_range) * value_range + low + + # ------------------------------------------------------------------ + # Helpers and diagnostics + # ------------------------------------------------------------------ + + def _quantize01(self, arr: np.ndarray) -> np.ndarray: + # Reduce normalized values to the discrete levels defined by bit_depth. + return np.rint(np.clip(arr, 0.0, 1.0) * self.levels) / self.levels + + def _log_check(self, data, name: str, status: str, before: str | None = None) -> None: + # Log a compact before/after summary to verify that squeezing was applied. + if not len(data): + logging.info("[FeatureSqueezingDefense] Verification %s | status=%s | empty dataset", name, status) + return + + expectation = f"expected_unique_values<={int(self.levels + 1)}" + + after = self._summary(data[0]) + if before is None: + logging.info( + "[FeatureSqueezingDefense] Verification %s | status=%s | %s | sample_after={%s}", + name, + status, + expectation, + after, + ) + return + + logging.info( + "[FeatureSqueezingDefense] Verification %s | status=%s | %s | sample_before={%s} | " + "sample_after={%s}", + name, + status, + expectation, + before, + after, + ) + + def _summary(self, sample) -> str: + # Create a short numeric summary of one sample for diagnostics. + arr = self._as_numpy(self._unwrap(sample)) + if arr.size == 0: + return f"shape={arr.shape}, empty=True" + + flat = arr.reshape(-1) + unique = np.unique(flat) + preview = ", ".join(self._fmt(value) for value in unique[: min(12, len(unique))]) + return ( + f"shape={arr.shape}, dtype={arr.dtype}, min={self._fmt(np.nanmin(flat))}, " + f"max={self._fmt(np.nanmax(flat))}, unique_count={len(unique)}, unique_preview=[{preview}]" + ) + + def _as_numpy(self, value) -> np.ndarray: + # Convert supported image containers to numpy for quantization and logging. + if isinstance(value, torch.Tensor): + return value.detach().cpu().numpy() + if isinstance(value, Image.Image): + return np.asarray(value) + return np.asarray(value) + + def _restore_type(self, original, arr: np.ndarray): + # Return squeezed data with the same high-level type as the original sample. + if isinstance(original, torch.Tensor): + return torch.as_tensor(arr, dtype=original.dtype, device=original.device) + if isinstance(original, np.ndarray): + return arr.astype(original.dtype, copy=False) + return arr + + def _unwrap(self, sample): + # Extract the image from common dataset samples shaped as (image, label, ...). + return sample[0] if isinstance(sample, tuple) and sample else sample + + def _fmt(self, value) -> str: + # Format numbers in logs without unnecessary trailing decimals. + try: + number = float(value) + except (TypeError, ValueError): + return str(value) + return str(int(number)) if number.is_integer() else f"{number:.6g}" + + +def apply_feature_squeezing_if_enabled(partition, participant_config: dict[str, Any]) -> None: + # Public entrypoint used by the node startup flow. + defense = FeatureSqueezingDefense.from_participant_config(participant_config) + if defense is not None: + defense.apply_to_partition(partition) diff --git a/nebula/addons/networksimulation/networksimulator.py b/nebula/addons/networksimulation/networksimulator.py index e296a1527..9dfd4853e 100644 --- a/nebula/addons/networksimulation/networksimulator.py +++ b/nebula/addons/networksimulation/networksimulator.py @@ -6,7 +6,7 @@ class NetworkSimulator(ABC): Abstract base class representing a network simulator interface. This interface defines the required methods for controlling and simulating network conditions between nodes. - A concrete implementation is expected to manage artificial delays, bandwidth restrictions, packet loss, + A concrete implementation is expected to manage artificial delays, bandwidth restrictions, packet loss, or other configurable conditions typically used in network emulation or testing. Required asynchronous methods: diff --git a/nebula/addons/reputation/reputation.py b/nebula/addons/reputation/reputation.py index 561199513..25ce5a770 100644 --- a/nebula/addons/reputation/reputation.py +++ b/nebula/addons/reputation/reputation.py @@ -1,3 +1,5 @@ +import asyncio +import json import logging import random import time @@ -8,7 +10,13 @@ from typing import TYPE_CHECKING from nebula.addons.functions import print_msg_box from nebula.core.eventmanager import EventManager -from nebula.core.nebulaevents import AggregationEvent, RoundStartEvent, UpdateReceivedEvent, DuplicatedMessageEvent +from nebula.core.nebulaevents import ( + AggregationEvent, + RoundEndEvent, + RoundStartEvent, + UpdateReceivedEvent, + DuplicatedMessageEvent, +) from nebula.core.utils.helper import ( cosine_metric, euclidean_metric, @@ -61,7 +69,7 @@ class Reputation: The class handles collection of metrics, calculation of static and dynamic reputation, updating history, and communication of reputation scores to neighbors. """ - + REPUTATION_THRESHOLD = 0.6 SIMILARITY_THRESHOLD = 0.6 INITIAL_ROUND_FOR_REPUTATION = 1 @@ -70,12 +78,12 @@ class Reputation: WEIGHTED_HISTORY_ROUNDS = 3 FRACTION_ANOMALY_MULTIPLIER = 1.20 THRESHOLD_ANOMALY_MULTIPLIER = 1.15 - + # Augmentation factors LATENCY_AUGMENT_FACTOR = 1.4 MESSAGE_AUGMENT_FACTOR_EARLY = 2.0 MESSAGE_AUGMENT_FACTOR_NORMAL = 1.1 - + # Penalty and decay factors HISTORICAL_PENALTY_THRESHOLD = 0.9 NEGATIVE_LATENCY_PENALTY = 0.3 @@ -104,7 +112,7 @@ def __init__(self, engine: "Engine", config: "Config"): self._addr = engine.addr self._log_dir = engine.log_dir self._idx = engine.idx - + self._initialize_data_structures() self._configure_constants() self._load_configuration() @@ -116,7 +124,7 @@ def _configure_constants(self): """Configure system constants from config or use defaults.""" reputation_config = self._config.participant.get("defense_args", {}).get("reputation", {}) constants_config = reputation_config.get("constants", {}) - + self.REPUTATION_THRESHOLD = constants_config.get("reputation_threshold", self.REPUTATION_THRESHOLD) self.SIMILARITY_THRESHOLD = constants_config.get("similarity_threshold", self.SIMILARITY_THRESHOLD) self.INITIAL_ROUND_FOR_REPUTATION = constants_config.get("initial_round_for_reputation", self.INITIAL_ROUND_FOR_REPUTATION) @@ -165,6 +173,16 @@ def _initialize_data_structures(self): self.previous_std_dev_number_message = {} self.previous_percentile_25_number_message = {} self.previous_percentile_85_number_message = {} + self._last_reputation_calculation_round = None + self._pending_sdfl_reputation_updates = {} + self._sdfl_training_finished_rounds = set() + self._sdfl_reputation_updates_expected = {} + self._sdfl_reputation_updates_received = {} + self._sdfl_reputation_updates_events = {} + self.reputation_tables = {} + self._reputation_tables_expected = {} + self._reputation_tables_events = {} + self._reputation_tables_wait_tasks = {} def _load_configuration(self): """Load and validate reputation configuration.""" @@ -188,7 +206,7 @@ def _configure_metric_weights(self): """Configure weights for different metrics based on weighting factor.""" default_weight = 0.25 metric_names = ["model_arrival_latency", "model_similarity", "num_messages", "fraction_parameters_changed"] - + if self._weighting_factor == "static": self._weight_model_arrival_latency = float( self._metrics.get("model_arrival_latency", {}).get("weight", default_weight) @@ -209,7 +227,7 @@ def _configure_metric_weights(self): elif not isinstance(self._metrics[metric_name], dict): self._metrics[metric_name] = {"enabled": bool(self._metrics[metric_name])} self._metrics[metric_name]["weight"] = default_weight - + self._weight_model_arrival_latency = default_weight self._weight_model_similarity = default_weight self._weight_num_messages = default_weight @@ -229,24 +247,24 @@ def engine(self): def _is_metric_enabled(self, metric_name: str, metrics_config: dict = None) -> bool: """ Check if a specific metric is enabled based on the provided configuration. - + Args: metric_name (str): The name of the metric to check. - metrics_config (dict, optional): The configuration dictionary for metrics. + metrics_config (dict, optional): The configuration dictionary for metrics. If None, uses the instance's _metrics. - + Returns: bool: True if the metric is enabled, False otherwise. """ config_to_use = metrics_config if metrics_config is not None else getattr(self, '_metrics', None) - + if not isinstance(config_to_use, dict): if metrics_config is not None: logging.warning(f"metrics_config is not a dictionary: {type(metrics_config)}") else: logging.warning("_metrics is not properly initialized") return False - + metric_config = config_to_use.get(metric_name) if metric_config is None: return False @@ -269,7 +287,7 @@ def save_data( ): """ Save data between nodes and aggregated models. - + Args: type_data: Type of data to save ('number_message', 'fraction_of_params_changed', 'model_arrival_latency') nei: Neighbor identifier @@ -285,12 +303,11 @@ def save_data( return if nei not in self.connection_metrics: - logging.warning(f"Neighbor {nei} not found in connection_metrics") - return + self.connection_metrics[nei] = Metrics() try: metrics_instance = self.connection_metrics[nei] - + if type_data == "number_message": message_data = {"time": time, "current_round": current_round} if not isinstance(metrics_instance.messages, list): @@ -320,17 +337,35 @@ async def setup(self): """Set up the reputation system by subscribing to relevant events.""" if self._enabled: await EventManager.get_instance().subscribe_node_event(RoundStartEvent, self.on_round_start) - await EventManager.get_instance().subscribe_node_event(AggregationEvent, self.calculate_reputation) - if self._is_metric_enabled("model_similarity"): - await EventManager.get_instance().subscribe_node_event(UpdateReceivedEvent, self.recollect_similarity) - if self._is_metric_enabled("fraction_parameters_changed"): - await EventManager.get_instance().subscribe_node_event( - UpdateReceivedEvent, self.recollect_fraction_of_parameters_changed - ) + federation = self._engine.config.participant["scenario_args"].get("federation") + if federation == "SDFL": + await EventManager.get_instance().subscribe_node_event(AggregationEvent, self.calculate_reputation) + await EventManager.get_instance().subscribe_node_event(RoundEndEvent, self.calculate_sdfl_reputation) + else: + await EventManager.get_instance().subscribe_node_event(AggregationEvent, self.calculate_reputation) + if federation == "SDFL": + if ( + self._is_metric_enabled("model_similarity") + or self._is_metric_enabled("fraction_parameters_changed") + ): + await EventManager.get_instance().subscribe_node_event( + UpdateReceivedEvent, self.recollect_or_buffer_sdfl_model_metrics + ) + else: + if self._is_metric_enabled("model_similarity"): + await EventManager.get_instance().subscribe_node_event(UpdateReceivedEvent, self.recollect_similarity) + if self._is_metric_enabled("fraction_parameters_changed"): + await EventManager.get_instance().subscribe_node_event( + UpdateReceivedEvent, self.recollect_fraction_of_parameters_changed + ) if self._is_metric_enabled("model_arrival_latency"): await EventManager.get_instance().subscribe_node_event( UpdateReceivedEvent, self.recollect_model_arrival_latency ) + if federation == "SDFL": + await EventManager.get_instance().subscribe_node_event( + UpdateReceivedEvent, self.mark_sdfl_reputation_update_received + ) if self._is_metric_enabled("num_messages"): await EventManager.get_instance().subscribe(("model", "update"), self.recollect_number_message) await EventManager.get_instance().subscribe(("model", "initialization"), self.recollect_number_message) @@ -338,26 +373,149 @@ async def setup(self): await EventManager.get_instance().subscribe( ("federation", "federation_models_included"), self.recollect_number_message ) - await EventManager.get_instance().subscribe_node_event(DuplicatedMessageEvent, self.recollect_duplicated_number_message) + if federation != "SDFL": + await EventManager.get_instance().subscribe_node_event( + DuplicatedMessageEvent, self.recollect_duplicated_number_message + ) + + async def _should_recollect_update_event(self, ure: UpdateReceivedEvent) -> bool: + """Return whether this update belongs to the reputation observation channel.""" + (_, _, source, _, _) = await ure.get_event_data() + + if source == self._addr: + return False + + federation = self._engine.config.participant["scenario_args"].get("federation") + if federation != "SDFL": + return not ure.is_reputation_update() + + if not ure.is_reputation_update(): + return False + + direct_neighbors = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) + return source in direct_neighbors + + async def recollect_or_buffer_sdfl_model_metrics(self, ure: UpdateReceivedEvent): + """Delay SDFL model-comparison metrics while the local node is still training.""" + if not await self._should_recollect_update_event(ure): + return + + (_, _, source, round_num, _) = await ure.get_event_data() + role = self._engine.rb.get_role_name(True) + local_training_pending = role == "trainer" and round_num not in self._sdfl_training_finished_rounds + if local_training_pending or await self._engine.trainning_in_progress_lock.locked_async(): + self._pending_sdfl_reputation_updates.setdefault(round_num, {}) + self._pending_sdfl_reputation_updates[round_num][source] = ure + logging.info( + f"SDFL reputation | Buffered model metrics from {source} for round {round_num}; " + "local training has not finished yet" + ) + return + + await self._process_sdfl_model_metrics(ure) + + async def process_pending_sdfl_reputation_updates(self, round_num: int = None): + """Process buffered SDFL reputation updates after local training has finished.""" + if round_num is None: + round_num = await self._engine.get_round() + + self._sdfl_training_finished_rounds.add(round_num) + pending_updates = self._pending_sdfl_reputation_updates.pop(round_num, {}) + if not pending_updates: + return + + logging.info( + f"SDFL reputation | Processing {len(pending_updates)} buffered model metrics for round {round_num}" + ) + for ure in pending_updates.values(): + await self._process_sdfl_model_metrics(ure) + + async def _process_sdfl_model_metrics(self, ure: UpdateReceivedEvent): + if self._is_metric_enabled("model_similarity"): + await self.recollect_similarity(ure) + if self._is_metric_enabled("fraction_parameters_changed"): + await self.recollect_fraction_of_parameters_changed(ure) + + async def mark_sdfl_reputation_update_received(self, ure: UpdateReceivedEvent): + """Mark a direct-neighbor SDFL reputation update as processed for this round.""" + if not await self._should_recollect_update_event(ure): + return + + (_, _, source, round_num, _) = await ure.get_event_data() + self._sdfl_reputation_updates_received.setdefault(round_num, set()).add(source) + + expected = self._sdfl_reputation_updates_expected.get(round_num) + event = self._sdfl_reputation_updates_events.get(round_num) + received = self._sdfl_reputation_updates_received.get(round_num, set()) + if expected and event and expected.issubset(received): + event.set() + + logging.info( + f"SDFL reputation | Reputation model/update processed for round {round_num} from {source}; " + f"received={len(received)}" + ) + + async def wait_sdfl_reputation_updates(self, expected_nodes, round_num: int = None, timeout: float = None): + """Wait until direct-neighbor SDFL reputation updates arrive or timeout expires.""" + if round_num is None: + round_num = await self._engine.get_round() + if timeout is None: + timeout = float( + self._config.participant["defense_args"] + .get("reputation", {}) + .get("model_update_timeout", + self._config.participant["defense_args"].get("reputation", {}).get("table_aggregation_timeout", 30)) + ) + + expected_nodes = set(expected_nodes) - {self._addr} + self._sdfl_reputation_updates_expected[round_num] = expected_nodes + event = self._sdfl_reputation_updates_events.setdefault(round_num, asyncio.Event()) + + received = self._sdfl_reputation_updates_received.setdefault(round_num, set()) + if expected_nodes.issubset(received): + event.set() + + if expected_nodes: + logging.info( + f"SDFL reputation | Waiting reputation model/update messages for round {round_num}; " + f"expected={sorted(expected_nodes)} already_received={sorted(received & expected_nodes)} " + f"timeout={timeout}" + ) + + try: + await asyncio.wait_for(event.wait(), timeout=timeout) + except asyncio.TimeoutError: + logging.info( + f"SDFL reputation | Timeout waiting reputation model/update messages for round {round_num}; " + f"missing={sorted(expected_nodes - received)}" + ) + + received = self._sdfl_reputation_updates_received.get(round_num, set()) + missing = expected_nodes - received + logging.info( + f"SDFL reputation | Reputation model/update wait finished for round {round_num}; " + f"received={sorted(received & expected_nodes)} missing={sorted(missing)}" + ) + return received & expected_nodes, missing async def init_reputation( self, federation_nodes=None, round_num=None, last_feedback_round=None, init_reputation=None ): """ Initialize the reputation system. - + Args: federation_nodes: List of federation node identifiers - round_num: Current round number + round_num: Current round number last_feedback_round: Last round that received feedback init_reputation: Initial reputation value to assign """ if not self._enabled: return - + if not self._validate_init_parameters(federation_nodes, round_num, init_reputation): return - + neighbors = self._validate_federation_nodes(federation_nodes) if not neighbors: logging.error("init_reputation | No valid neighbors found") @@ -370,13 +528,13 @@ def _validate_init_parameters(self, federation_nodes, round_num, init_reputation if not federation_nodes: logging.error("init_reputation | No federation nodes provided") return False - + if round_num is None: logging.warning("init_reputation | Round number not provided") - + if init_reputation is None: logging.warning("init_reputation | Initial reputation value not provided") - + return True async def _initialize_neighbor_reputations(self, neighbors: list, round_num: int, last_feedback_round: int, init_reputation: float): @@ -392,7 +550,7 @@ def _create_or_update_reputation_entry(self, nei: str, round_num: int, last_feed "round": round_num, "last_feedback_round": last_feedback_round, } - + if nei not in self.reputation: self.reputation[nei] = reputation_data elif self.reputation[nei].get("reputation") is None: @@ -401,21 +559,21 @@ def _create_or_update_reputation_entry(self, nei: str, round_num: int, last_feed def _validate_federation_nodes(self, federation_nodes) -> list: """ Validate and filter federation nodes. - + Args: federation_nodes: List of federation node identifiers - + Returns: list: List of valid node identifiers """ if not federation_nodes: return [] - + valid_nodes = [node for node in federation_nodes if node and str(node).strip()] - + if not valid_nodes: logging.warning("No valid federation nodes found after filtering") - + return valid_nodes async def _calculate_static_reputation( @@ -429,7 +587,7 @@ async def _calculate_static_reputation( Args: addr: The participant's address - nei: The neighbor's address + nei: The neighbor's address metric_values: Dictionary with metric values """ static_weights = { @@ -440,18 +598,28 @@ async def _calculate_static_reputation( } reputation_static = sum( - metric_values.get(metric_name, 0) * static_weights[metric_name] + metric_values.get(metric_name, 0) * static_weights[metric_name] for metric_name in static_weights ) - - logging.info(f"Static reputation for node {nei} at round {await self.engine.get_round()}: {reputation_static}") + + current_round = await self.engine.get_round() + logging.info( + f"Reputation debug | static raw calculation | round={current_round} node={nei} " + f"metrics={json.dumps(metric_values, sort_keys=True, default=str)} " + f"weights={json.dumps(static_weights, sort_keys=True, default=str)} " + f"raw_reputation={reputation_static}" + ) avg_reputation = await self.save_reputation_history_in_memory(self.engine.addr, nei, reputation_static) + logging.info( + f"Reputation debug | static smoothed result | round={current_round} node={nei} " + f"raw_reputation={reputation_static} smoothed_reputation={avg_reputation}" + ) metrics_data = { "addr": addr, "nei": nei, - "round": await self.engine.get_round(), + "round": current_round, "reputation_without_feedback": avg_reputation, **{f"average_{name}": weight for name, weight in static_weights.items()} } @@ -476,48 +644,48 @@ async def _calculate_dynamic_reputation(self, addr, neighbors): async def _calculate_average_weights(self): """Calculate average weights for all enabled metrics.""" average_weights = {} - + for metric_name in self.history_data.keys(): if self._is_metric_enabled(metric_name): average_weights[metric_name] = await self._get_metric_average_weight(metric_name) - + return average_weights - + async def _get_metric_average_weight(self, metric_name): """Get the average weight for a specific metric.""" if metric_name not in self.history_data or not self.history_data[metric_name]: logging.debug(f"No history data available for metric: {metric_name}") return 0 - + valid_entries = [ entry for entry in self.history_data[metric_name] - if (entry.get("round") is not None and - entry["round"] >= await self._engine.get_round() and + if (entry.get("round") is not None and + entry["round"] >= await self._engine.get_round() and entry.get("weight") not in [None, -1]) ] - + if not valid_entries: return 0 - + try: weights = [entry["weight"] for entry in valid_entries if entry.get("weight") is not None] return sum(weights) / len(weights) if weights else 0 except (TypeError, ZeroDivisionError) as e: logging.warning(f"Error calculating average weight for {metric_name}: {e}") return 0 - + async def _process_neighbors_reputation(self, addr, neighbors, average_weights): """Process reputation calculation for all neighbors.""" for nei in neighbors: metric_values = await self._get_neighbor_metric_values(nei) - + if all(metric_name in metric_values for metric_name in average_weights): await self._update_neighbor_reputation(addr, nei, metric_values, average_weights) - + async def _get_neighbor_metric_values(self, nei): """Get metric values for a specific neighbor in the current round.""" metric_values = {} - + for metric_name in self.history_data: if self._is_metric_enabled(metric_name): for entry in self.history_data.get(metric_name, []): @@ -526,26 +694,34 @@ async def _get_neighbor_metric_values(self, nei): entry.get("nei") == nei): metric_values[metric_name] = entry.get("metric_value", 0) break - + return metric_values - + async def _update_neighbor_reputation(self, addr, nei, metric_values, average_weights): """Update reputation for a specific neighbor.""" reputation_with_weights = sum( - metric_values.get(metric_name, 0) * average_weights[metric_name] + metric_values.get(metric_name, 0) * average_weights[metric_name] for metric_name in average_weights ) - + + current_round = await self._engine.get_round() logging.info( - f"Dynamic reputation with weights for {nei} at round {await self._engine.get_round()}: {reputation_with_weights}" + f"Reputation debug | dynamic raw calculation | round={current_round} node={nei} " + f"metrics={json.dumps(metric_values, sort_keys=True, default=str)} " + f"average_weights={json.dumps(average_weights, sort_keys=True, default=str)} " + f"raw_reputation={reputation_with_weights}" ) avg_reputation = await self.save_reputation_history_in_memory(self._engine.addr, nei, reputation_with_weights) + logging.info( + f"Reputation debug | dynamic smoothed result | round={current_round} node={nei} " + f"raw_reputation={reputation_with_weights} smoothed_reputation={avg_reputation}" + ) metrics_data = { "addr": addr, "nei": nei, - "round": await self._engine.get_round(), + "round": current_round, "reputation_without_feedback": avg_reputation, } @@ -564,7 +740,7 @@ async def _update_reputation_record(self, nei: str, reputation: float, data: dic data: Additional data to update (currently unused) """ current_round = await self._engine.get_round() - + if nei not in self.reputation: self.reputation[nei] = { "reputation": reputation, @@ -576,7 +752,7 @@ async def _update_reputation_record(self, nei: str, reputation: float, data: dic self.reputation[nei]["round"] = current_round logging.info(f"Reputation of node {nei}: {self.reputation[nei]['reputation']}") - + if self.reputation[nei]["reputation"] < self.REPUTATION_THRESHOLD and current_round > 0: self.rejected_nodes.add(nei) logging.info(f"Rejected node {nei} at round {current_round}") @@ -608,23 +784,28 @@ def calculate_weighted_values( reputation_metrics ) self._add_current_metrics_to_history(active_metrics, history_data, current_round, addr, nei) - + if current_round >= self.INITIAL_ROUND_FOR_REPUTATION and len(active_metrics) > 0: adjusted_weights = self._calculate_dynamic_weights(active_metrics, history_data) else: adjusted_weights = self._calculate_uniform_weights(active_metrics) - + self._update_history_with_weights(active_metrics, history_data, adjusted_weights, current_round, nei) + logging.info( + f"Reputation | metric values and weights | round={current_round} node={nei} " + f"active_metrics={json.dumps(active_metrics, sort_keys=True, default=str)} " + f"weights={json.dumps(adjusted_weights, sort_keys=True, default=str)}" + ) def _ensure_history_data_structure(self, history_data: dict): """Ensure all required keys exist in history data structure.""" required_keys = [ "num_messages", - "model_similarity", + "model_similarity", "fraction_parameters_changed", "model_arrival_latency", ] - + for key in required_keys: if key not in history_data: history_data[key] = [] @@ -644,7 +825,7 @@ def _get_active_metrics( "fraction_parameters_changed": fraction_score_asign, "model_arrival_latency": avg_model_arrival_latency, } - + return {k: v for k, v in all_metrics.items() if self._is_metric_enabled(k, reputation_metrics)} def _add_current_metrics_to_history(self, active_metrics: dict, history_data: dict, current_round: int, addr: str, nei: str): @@ -662,17 +843,25 @@ def _add_current_metrics_to_history(self, active_metrics: dict, history_data: di def _calculate_dynamic_weights(self, active_metrics: dict, history_data: dict) -> dict: """Calculate dynamic weights based on metric deviations.""" deviations = self._calculate_metric_deviations(active_metrics, history_data) - + if all(deviation == 0.0 for deviation in deviations.values()): - return self._generate_random_weights(active_metrics) + weights = self._generate_random_weights(active_metrics) else: normalized_weights = self._normalize_deviation_weights(deviations) - return self._adjust_weights_with_minimum(normalized_weights, deviations) + weights = self._adjust_weights_with_minimum(normalized_weights, deviations) + + logging.info( + "Reputation debug | dynamic weight calculation | " + f"active_metrics={json.dumps(active_metrics, sort_keys=True, default=str)} " + f"deviations={json.dumps(deviations, sort_keys=True, default=str)} " + f"weights={json.dumps(weights, sort_keys=True, default=str)}" + ) + return weights def _calculate_metric_deviations(self, active_metrics: dict, history_data: dict) -> dict: """Calculate deviations of current metrics from historical means.""" deviations = {} - + for metric_name, current_value in active_metrics.items(): historical_values = history_data[metric_name] metric_values = [ @@ -680,11 +869,11 @@ def _calculate_metric_deviations(self, active_metrics: dict, history_data: dict) for entry in historical_values if "metric_value" in entry and entry["metric_value"] != 0 ] - + mean_value = np.mean(metric_values) if metric_values else 0 deviation = abs(current_value - mean_value) deviations[metric_name] = deviation - + return deviations def _generate_random_weights(self, active_metrics: dict) -> dict: @@ -692,7 +881,7 @@ def _generate_random_weights(self, active_metrics: dict) -> dict: num_metrics = len(active_metrics) random_weights = [random.random() for _ in range(num_metrics)] total_random_weight = sum(random_weights) - + return { metric_name: weight / total_random_weight for metric_name, weight in zip(active_metrics, random_weights, strict=False) @@ -702,14 +891,14 @@ def _normalize_deviation_weights(self, deviations: dict) -> dict: """Normalize weights based on deviations.""" max_deviation = max(deviations.values()) if deviations else 1 normalized_weights = { - metric_name: (deviation / max_deviation) + metric_name: (deviation / max_deviation) for metric_name, deviation in deviations.items() } - + total_weight = sum(normalized_weights.values()) if total_weight > 0: return { - metric_name: weight / total_weight + metric_name: weight / total_weight for metric_name, weight in normalized_weights.items() } else: @@ -720,20 +909,20 @@ def _adjust_weights_with_minimum(self, normalized_weights: dict, deviations: dic """Apply minimum weight constraints and renormalize.""" mean_deviation = np.mean(list(deviations.values())) dynamic_min_weight = max(self.DYNAMIC_MIN_WEIGHT_THRESHOLD, mean_deviation / (mean_deviation + 1)) - + adjusted_weights = {} total_adjusted_weight = 0 - + for metric_name, weight in normalized_weights.items(): adjusted_weight = max(weight, dynamic_min_weight) adjusted_weights[metric_name] = adjusted_weight total_adjusted_weight += adjusted_weight - + # Renormalize if total weight exceeds 1 if total_adjusted_weight > 1: for metric_name in adjusted_weights: adjusted_weights[metric_name] /= total_adjusted_weight - + return adjusted_weights def _calculate_uniform_weights(self, active_metrics: dict) -> dict: @@ -748,8 +937,8 @@ def _update_history_with_weights(self, active_metrics: dict, history_data: dict, for metric_name in active_metrics: weight = weights.get(metric_name, -1) for entry in history_data[metric_name]: - if (entry["metric_name"] == metric_name and - entry["round"] == current_round and + if (entry["metric_name"] == metric_name and + entry["round"] == current_round and entry["nei"] == nei): entry["weight"] = weight @@ -765,7 +954,7 @@ async def calculate_value_metrics(self, addr, nei, metrics_active=None): try: current_round = await self._engine.get_round() metrics_instance = self.connection_metrics.get(nei) - + if not metrics_instance: logging.warning(f"No metrics found for neighbor {nei}") return self._get_default_metric_values() @@ -777,8 +966,16 @@ async def calculate_value_metrics(self, addr, nei, metrics_active=None): "similarity": self._process_model_similarity_metric(nei, current_round, metrics_active) } + logging.info( + f"Reputation debug | calculated metric results | round={current_round} node={nei} " + f"messages={json.dumps(metric_results['messages'], sort_keys=True, default=str)} " + f"similarity={metric_results['similarity']} " + f"fraction={metric_results['fraction']} " + f"latency={metric_results['latency']}" + ) + self._log_metrics_graphics(metric_results, addr, nei, current_round) - + return ( metric_results["messages"]["avg"], metric_results["similarity"], @@ -802,7 +999,7 @@ def _process_num_messages_metric(self, metrics_instance, addr: str, nei: str, cu filtered_messages = [ msg for msg in metrics_instance.messages if msg.get("current_round") == current_round ] - + for msg in filtered_messages: self.messages_number_message.append({ "number_message": msg.get("time"), @@ -813,12 +1010,16 @@ def _process_num_messages_metric(self, metrics_instance, addr: str, nei: str, cu normalized, count = self.manage_metric_number_message( self.messages_number_message, addr, nei, current_round, True ) - + avg = self.save_number_message_history(addr, nei, normalized, current_round) - + if avg is None and current_round > self.HISTORY_ROUNDS_LOOKBACK: avg = self.number_message_history[(addr, nei)][current_round - 1]["avg_number_message"] + logging.info( + f"Reputation debug | num_messages metric | round={current_round} node={nei} " + f"filtered_messages={len(filtered_messages)} normalized={normalized} count={count} avg={avg or 0}" + ) return {"normalized": normalized, "count": count, "avg": avg or 0} def _process_fraction_parameters_metric(self, metrics_instance, addr: str, nei: str, current_round: int, metrics_active) -> float: @@ -833,9 +1034,16 @@ def _process_fraction_parameters_metric(self, metrics_instance, addr: str, nei: score_fraction = self.analyze_anomalies(addr, nei, current_round, fraction_changed, threshold) if current_round >= self.INITIAL_ROUND_FOR_FRACTION: - return self._calculate_fraction_score_assignment(addr, nei, current_round, score_fraction) + final_fraction = self._calculate_fraction_score_assignment(addr, nei, current_round, score_fraction) else: - return 0 + final_fraction = 0 + + logging.info( + f"Reputation debug | fraction_parameters_changed metric | round={current_round} node={nei} " + f"raw_score={score_fraction} final_score={final_fraction} " + f"has_current_data={metrics_instance.fraction_of_params_changed.get('current_round') == current_round}" + ) + return final_fraction def _calculate_fraction_score_assignment(self, addr: str, nei: str, current_round: int, score_fraction: float) -> float: """Calculate the final fraction score assignment.""" @@ -900,14 +1108,24 @@ def _process_model_arrival_latency_metric(self, metrics_instance, addr: str, nei avg_latency = self.save_model_arrival_latency_history(nei, latency_normalized, current_round) if avg_latency is None and current_round > 1: avg_latency = self.model_arrival_latency_history[(addr, nei)][current_round - 1]["score"] + logging.info( + f"Reputation debug | model_arrival_latency metric | round={current_round} node={nei} " + f"latency_normalized={latency_normalized} avg_latency={avg_latency or 0} " + f"has_current_data={metrics_instance.model_arrival_latency.get('round_received') == current_round}" + ) return avg_latency or 0 - + return 0 def _process_model_similarity_metric(self, nei: str, current_round: int, metrics_active) -> float: """Process the model similarity metric.""" if current_round >= 1 and self._is_metric_enabled("model_similarity", metrics_active): - return self.calculate_similarity_from_metrics(nei, current_round) + similarity = self.calculate_similarity_from_metrics(nei, current_round) + logging.info( + f"Reputation debug | model_similarity metric | round={current_round} node={nei} " + f"similarity={similarity}" + ) + return similarity return 0 def _log_metrics_graphics(self, metric_results: dict, addr: str, nei: str, current_round: int): @@ -938,7 +1156,7 @@ def create_graphics_to_metrics( ): """ Create and log graphics for reputation metrics. - + Args: number_message_count: Count of messages for logging number_message_norm: Normalized message metric @@ -952,25 +1170,25 @@ def create_graphics_to_metrics( """ if current_round is None or current_round >= total_rounds: return - + self.engine.trainer._logger.log_data( - {f"R-Model_arrival_latency_reputation/{addr}": {nei: model_arrival_latency}}, + {f"R-Model_arrival_latency_reputation/{addr}": {nei: model_arrival_latency}}, step=current_round ) self.engine.trainer._logger.log_data( - {f"R-Count_messages_number_message_reputation/{addr}": {nei: number_message_count}}, + {f"R-Count_messages_number_message_reputation/{addr}": {nei: number_message_count}}, step=current_round ) self.engine.trainer._logger.log_data( - {f"R-number_message_reputation/{addr}": {nei: number_message_norm}}, + {f"R-number_message_reputation/{addr}": {nei: number_message_norm}}, step=current_round ) self.engine.trainer._logger.log_data( - {f"R-Similarity_reputation/{addr}": {nei: similarity}}, + {f"R-Similarity_reputation/{addr}": {nei: similarity}}, step=current_round ) self.engine.trainer._logger.log_data( - {f"R-Fraction_reputation/{addr}": {nei: fraction}}, + {f"R-Fraction_reputation/{addr}": {nei: fraction}}, step=current_round ) @@ -991,7 +1209,7 @@ def analyze_anomalies( try: key = (addr, nei, current_round) self._initialize_fraction_history_entry(key, fraction_changed, threshold) - + if current_round == 0: return self._handle_initial_round_anomalies(key, fraction_changed, threshold) else: @@ -1032,16 +1250,16 @@ def _handle_subsequent_round_anomalies( ) -> float: """Handle anomaly analysis for subsequent rounds.""" prev_stats = self._find_previous_valid_stats(addr, nei, current_round) - + if prev_stats is None: logging.warning(f"No valid previous stats found for {addr}, {nei}, round {current_round}") return 1.0 - + anomalies = self._detect_anomalies(fraction_changed, threshold, prev_stats) values = self._calculate_anomaly_values(fraction_changed, threshold, prev_stats, anomalies) fraction_score = self._calculate_combined_score(values) self._update_fraction_statistics(key, fraction_changed, threshold, prev_stats, anomalies, fraction_score) - + return max(fraction_score, 0) def _find_previous_valid_stats(self, addr: str, nei: str, current_round: int) -> dict: @@ -1049,18 +1267,18 @@ def _find_previous_valid_stats(self, addr: str, nei: str, current_round: int) -> for i in range(1, current_round + 1): candidate_key = (addr, nei, current_round - i) candidate_data = self.fraction_changed_history.get(candidate_key, {}) - + required_keys = ["mean_fraction", "std_dev_fraction", "mean_threshold", "std_dev_threshold"] if all(candidate_data.get(k) is not None for k in required_keys): return candidate_data - + return None def _detect_anomalies(self, current_fraction: float, current_threshold: float, prev_stats: dict) -> dict: """Detect if current values are anomalous compared to previous statistics.""" upper_mean_fraction = (prev_stats["mean_fraction"] + prev_stats["std_dev_fraction"]) * self.FRACTION_ANOMALY_MULTIPLIER upper_mean_threshold = (prev_stats["mean_threshold"] + prev_stats["std_dev_threshold"]) * self.THRESHOLD_ANOMALY_MULTIPLIER - + return { "fraction_anomaly": current_fraction > upper_mean_fraction, "threshold_anomaly": current_threshold > upper_mean_threshold, @@ -1074,19 +1292,19 @@ def _calculate_anomaly_values( """Calculate penalty values for fraction and threshold anomalies.""" fraction_value = 1.0 threshold_value = 1.0 - + if anomalies["fraction_anomaly"]: mean_fraction_prev = prev_stats["mean_fraction"] if mean_fraction_prev > 0: penalization_factor = abs(current_fraction - mean_fraction_prev) / mean_fraction_prev fraction_value = 1 - (1 / (1 + np.exp(-penalization_factor))) - + if anomalies["threshold_anomaly"]: mean_threshold_prev = prev_stats["mean_threshold"] if mean_threshold_prev > 0: penalization_factor = abs(current_threshold - mean_threshold_prev) / mean_threshold_prev threshold_value = 1 - (1 / (1 + np.exp(-penalization_factor))) - + return { "fraction_value": fraction_value, "threshold_value": threshold_value, @@ -1099,19 +1317,19 @@ def _calculate_combined_score(self, values: dict) -> float: return fraction_weight * values["fraction_value"] + threshold_weight * values["threshold_value"] def _update_fraction_statistics( - self, key: tuple, current_fraction: float, current_threshold: float, + self, key: tuple, current_fraction: float, current_threshold: float, prev_stats: dict, anomalies: dict, fraction_score: float ): """Update the fraction statistics for the current round.""" self.fraction_changed_history[key]["fraction_anomaly"] = anomalies["fraction_anomaly"] self.fraction_changed_history[key]["threshold_anomaly"] = anomalies["threshold_anomaly"] - + self.fraction_changed_history[key]["mean_fraction"] = (current_fraction + prev_stats["mean_fraction"]) / 2 self.fraction_changed_history[key]["mean_threshold"] = (current_threshold + prev_stats["mean_threshold"]) / 2 - + fraction_variance = ((current_fraction - prev_stats["mean_fraction"]) ** 2 + prev_stats["std_dev_fraction"] ** 2) / 2 threshold_variance = ((self.THRESHOLD_VARIANCE_MULTIPLIER * (current_threshold - prev_stats["mean_threshold"]) ** 2) + prev_stats["std_dev_threshold"] ** 2) / 2 - + self.fraction_changed_history[key]["std_dev_fraction"] = np.sqrt(fraction_variance) self.fraction_changed_history[key]["std_dev_threshold"] = np.sqrt(threshold_variance) self.fraction_changed_history[key]["fraction_score"] = fraction_score @@ -1132,9 +1350,9 @@ def manage_model_arrival_latency(self, addr, nei, latency, current_round, round_ """ try: current_key = nei - + self._initialize_latency_round_entry(current_round, current_key, latency) - + if current_round >= 1: score = self._calculate_latency_score(current_round, current_key, latency) self._update_latency_entry_with_score(current_round, current_key, score) @@ -1161,17 +1379,17 @@ def _calculate_latency_score(self, current_round: int, current_key: str, latency """Calculate the latency score based on historical data.""" target_round = self._get_target_round_for_latency(current_round) all_latencies = self._get_all_latencies_for_round(target_round) - + if not all_latencies: return 0.0 - + mean_latency = np.mean(all_latencies) augment_mean = mean_latency * self.LATENCY_AUGMENT_FACTOR - + if latency is None: logging.info(f"latency is None in round {current_round} for nei {current_key}") return -0.5 - + if latency <= augment_mean: return 1.0 else: @@ -1195,7 +1413,7 @@ def _update_latency_entry_with_score(self, current_round: int, current_key: str, target_round = self._get_target_round_for_latency(current_round) all_latencies = self._get_all_latencies_for_round(target_round) mean_latency = np.mean(all_latencies) if all_latencies else 0 - + self.model_arrival_latency_history[current_round][current_key].update({ "mean_latency": mean_latency, "score": score, @@ -1215,9 +1433,9 @@ def save_model_arrival_latency_history(self, nei, model_arrival_latency, round_n """ try: current_key = nei - + self._initialize_latency_history_entry(round_num, current_key, model_arrival_latency) - + if model_arrival_latency > 0 and round_num >= 1: avg_model_arrival_latency = self._calculate_latency_weighted_average_positive( round_num, current_key, model_arrival_latency @@ -1236,7 +1454,7 @@ def save_model_arrival_latency_history(self, nei, model_arrival_latency, round_n ) return avg_model_arrival_latency - + except Exception: logging.exception("Error saving model_arrival_latency history") @@ -1284,14 +1502,14 @@ def manage_metric_number_message( ) -> tuple[float, int]: """ Manage the number of messages metric for a specific neighbor. - + Args: messages_number_message: List of message data addr: Source address nei: Neighbor address current_round: Current round number metric_active: Whether the metric is active - + Returns: Tuple of (normalized_messages, messages_count) """ @@ -1301,13 +1519,13 @@ def manage_metric_number_message( messages_count = self._count_relevant_messages(messages_number_message, addr, nei, current_round) neighbor_stats = self._calculate_neighbor_statistics(messages_number_message, current_round) - + normalized_messages = self._calculate_normalized_messages(messages_count, neighbor_stats) - + normalized_messages = self._apply_historical_penalty( normalized_messages, addr, nei, current_round ) - + self._store_message_history(addr, nei, current_round, normalized_messages) normalized_messages = max(0.001, normalized_messages) @@ -1339,7 +1557,7 @@ def _calculate_neighbor_statistics(self, messages: list, current_round: int) -> neighbor_counts[key] = neighbor_counts.get(key, 0) + 1 counts_all_neighbors = list(neighbor_counts.values()) - + if not counts_all_neighbors: return { "percentile_reference": 0, @@ -1349,7 +1567,7 @@ def _calculate_neighbor_statistics(self, messages: list, current_round: int) -> } mean_messages = np.mean(counts_all_neighbors) - + return { "percentile_reference": np.percentile(counts_all_neighbors, 25), "std_dev": np.std(counts_all_neighbors), @@ -1361,10 +1579,10 @@ def _calculate_normalized_messages(self, messages_count: int, neighbor_stats: di """Calculate normalized message score with relative and extra penalties.""" normalized_messages = 1.0 penalties_applied = [] - + relative_increase = self._calculate_relative_increase(messages_count, neighbor_stats["percentile_reference"]) dynamic_margin = self._calculate_dynamic_margin(neighbor_stats) - + if relative_increase > dynamic_margin: penalty_ratio = self._calculate_penalty_ratio(relative_increase, dynamic_margin) normalized_messages *= np.exp(-(penalty_ratio**2)) @@ -1400,7 +1618,7 @@ def _calculate_penalty_ratio(self, relative_increase: float, dynamic_margin: flo def _should_apply_extra_penalty(self, messages_count: int, neighbor_stats: dict) -> bool: """Determine if extra penalty should be applied.""" - return (neighbor_stats["mean_messages"] > 0 and + return (neighbor_stats["mean_messages"] > 0 and messages_count > neighbor_stats["augment_mean"]) def _calculate_extra_penalty_factor(self, messages_count: int, neighbor_stats: dict) -> float: @@ -1408,7 +1626,7 @@ def _calculate_extra_penalty_factor(self, messages_count: int, neighbor_stats: d epsilon = 1e-6 mean_messages = neighbor_stats["mean_messages"] augment_mean = neighbor_stats["augment_mean"] - + extra_penalty = (messages_count - mean_messages) / (mean_messages + epsilon) amplification = 1 + (augment_mean / (mean_messages + epsilon)) return extra_penalty * amplification @@ -1417,27 +1635,27 @@ def _apply_historical_penalty(self, normalized_messages: float, addr: str, nei: """Apply historical penalty based on previous round's score.""" if current_round <= 1: return normalized_messages - + prev_data = ( self.number_message_history.get((addr, nei), {}) .get(current_round - 1, {}) ) - + prev_score = prev_data.get("normalized_messages") was_previously_penalized = prev_data.get("was_penalized", False) - + if prev_score is not None and prev_score < self.HISTORICAL_PENALTY_THRESHOLD: original_score = normalized_messages - + if was_previously_penalized: penalty_factor = self.HISTORICAL_PENALTY_THRESHOLD * 0.8 logging.debug(f"Repeated penalty applied to {nei}: stricter historical penalty") else: penalty_factor = self.HISTORICAL_PENALTY_THRESHOLD - + normalized_messages *= penalty_factor logging.debug(f"Historical penalty applied to {nei}: {original_score:.4f} -> {normalized_messages:.4f} (prev_score: {prev_score:.4f}, was_penalized: {was_previously_penalized})") - + return normalized_messages def _store_message_history(self, addr: str, nei: str, current_round: int, normalized_messages: float): @@ -1445,9 +1663,9 @@ def _store_message_history(self, addr: str, nei: str, current_round: int, normal key = (addr, nei) if key not in self.number_message_history: self.number_message_history[key] = {} - + was_penalized = normalized_messages < 1.0 - + self.number_message_history[key][current_round] = { "normalized_messages": normalized_messages, "was_penalized": was_penalized, @@ -1464,9 +1682,9 @@ def save_number_message_history(self, addr, nei, messages_number_message_normali """ try: key = (addr, nei) - + self._initialize_message_history_entry(key, current_round, messages_number_message_normalized) - + if messages_number_message_normalized > 0 and current_round >= 1: avg_number_message = self._calculate_weighted_average_positive(key, current_round, messages_number_message_normalized) elif messages_number_message_normalized == 0 and current_round >= 1: @@ -1478,7 +1696,7 @@ def save_number_message_history(self, addr, nei, messages_number_message_normali self.number_message_history[key][current_round]["avg_number_message"] = avg_number_message return avg_number_message - + except Exception: logging.exception("Error saving number_message history") return -1 @@ -1524,7 +1742,7 @@ async def save_reputation_history_in_memory(self, addr: str, nei: str, reputatio Args: addr: The node's identifier - nei: The neighboring node identifier + nei: The neighboring node identifier reputation: The reputation value to save Returns: @@ -1533,27 +1751,31 @@ async def save_reputation_history_in_memory(self, addr: str, nei: str, reputatio try: key = (addr, nei) current_round = await self._engine.get_round() - + if key not in self.reputation_history: self.reputation_history[key] = {} self.reputation_history[key][current_round] = reputation rounds = sorted(self.reputation_history[key].keys(), reverse=True)[:2] - + if len(rounds) >= 2: current_rep = self.reputation_history[key][rounds[0]] previous_rep = self.reputation_history[key][rounds[1]] - + current_weight = self.REPUTATION_CURRENT_WEIGHT previous_weight = self.REPUTATION_FEEDBACK_WEIGHT avg_reputation = (current_rep * current_weight) + (previous_rep * previous_weight) - - logging.info(f"Current reputation: {current_rep}, Previous reputation: {previous_rep}") - logging.info(f"Reputation ponderated: {avg_reputation}") + + logging.info( + f"Reputation debug | reputation smoothing | round={current_round} node={nei} " + f"current_raw={current_rep} previous_raw={previous_rep} " + f"current_weight={current_weight} previous_weight={previous_weight} " + f"smoothed={avg_reputation}" + ) else: avg_reputation = reputation - + return avg_reputation except Exception: @@ -1577,23 +1799,23 @@ def calculate_similarity_from_metrics(self, nei: str, current_round: int) -> flo return 0.0 relevant_metrics = [ - metric for metric in metrics_instance.similarity + metric for metric in metrics_instance.similarity if metric.get("nei") == nei and metric.get("current_round") == current_round ] - + if not relevant_metrics: relevant_metrics = [ - metric for metric in metrics_instance.similarity + metric for metric in metrics_instance.similarity if metric.get("nei") == nei ] - + if not relevant_metrics: return 0.0 neighbor_metric = relevant_metrics[-1] similarity_weights = { "cosine": 0.25, - "euclidean": 0.25, + "euclidean": 0.25, "manhattan": 0.25, "pearson_correlation": 0.25, } @@ -1604,7 +1826,7 @@ def calculate_similarity_from_metrics(self, nei: str, current_round: int) -> flo ) return max(0.0, min(1.0, similarity_value)) - + except Exception: return 0.0 @@ -1618,16 +1840,50 @@ async def calculate_reputation(self, ae: AggregationEvent): if not self._enabled: return + current_round = await self._engine.get_round() + if self._last_reputation_calculation_round == current_round: + logging.info(f"Reputation already calculated for round {current_round}; skipping") + return + (updates, _, _) = await ae.get_event_data() await self._log_reputation_calculation_start() - + neighbors = set(await self._engine._cm.get_addrs_current_connections(only_direct=True)) - + federation = self._engine.config.participant["scenario_args"].get("federation") + await self._process_neighbor_metrics(neighbors) await self._calculate_reputation_by_factor(neighbors) await self._handle_initial_reputation() - await self._process_feedback() + if federation != "CFL": + await self._process_feedback() await self._finalize_reputation_calculation(updates, neighbors) + self._last_reputation_calculation_round = current_round + + async def calculate_sdfl_reputation(self, _ree: RoundEndEvent): + """Calculate SDFL reputation at round end for trainers and aggregators.""" + # SDFL shares reputation tables instead of direct feedback messages at round end. + await self.calculate_and_send_sdfl_reputation_table() + + async def calculate_and_send_sdfl_reputation_table(self): + """Calculate local SDFL reputation and broadcast the table immediately.""" + if not self._enabled: + return + + current_round = await self._engine.get_round() + if self._last_reputation_calculation_round == current_round: + logging.info(f"Reputation already calculated for round {current_round}; skipping") + return + + await self._log_reputation_calculation_start() + + # Each node computes direct-neighbor reputation from locally observed metrics. + neighbors = set(await self._engine._cm.get_addrs_current_connections(only_direct=True)) + await self._process_neighbor_metrics(neighbors) + await self._calculate_reputation_by_factor(neighbors) + await self._handle_initial_reputation() + await self._process_feedback() + await self._finalize_reputation_calculation({}, neighbors) + self._last_reputation_calculation_round = current_round async def _log_reputation_calculation_start(self): """Log the start of reputation calculation with relevant information.""" @@ -1644,7 +1900,7 @@ async def _process_neighbor_metrics(self, neighbors): metrics = await self.calculate_value_metrics( self._addr, nei, metrics_active=self._metrics ) - + if self._weighting_factor == "dynamic": await self._process_dynamic_metrics(nei, metrics) elif self._weighting_factor == "static" and await self._engine.get_round() >= 1: @@ -1653,7 +1909,7 @@ async def _process_neighbor_metrics(self, neighbors): async def _process_dynamic_metrics(self, nei, metrics): """Process metrics for dynamic weighting factor.""" (metric_messages_number, metric_similarity, metric_fraction, metric_model_arrival_latency) = metrics - + self.calculate_weighted_values( metric_messages_number, metric_similarity, @@ -1669,7 +1925,7 @@ async def _process_dynamic_metrics(self, nei, metrics): async def _process_static_metrics(self, nei, metrics): """Process metrics for static weighting factor.""" (metric_messages_number, metric_similarity, metric_fraction, metric_model_arrival_latency) = metrics - + metric_values_dict = { "num_messages": metric_messages_number, "model_similarity": metric_similarity, @@ -1698,7 +1954,7 @@ async def _process_feedback(self): """Process and include feedback in reputation.""" status = await self.include_feedback_in_reputation() current_round = await self._engine.get_round() - + if status: logging.info(f"Feedback included in reputation at round {current_round}") else: @@ -1709,7 +1965,191 @@ async def _finalize_reputation_calculation(self, updates, neighbors): if self.reputation is not None: self.create_graphic_reputation(self._addr, await self._engine.get_round()) await self.update_process_aggregation(updates) - await self.send_reputation_to_neighbors(neighbors) + federation = self._engine.config.participant["scenario_args"].get("federation") + if federation == "SDFL": + # SDFL forwards compact reputation tables so the aggregator can infer non-neighbor trust. + await self.send_reputation_table_to_neighbors(neighbors) + elif federation != "CFL": + await self.send_reputation_to_neighbors(neighbors) + + async def get_local_reputation_table(self, round_num: int = None): + """Return current-round reputation scores for direct neighbors only.""" + if round_num is None: + round_num = await self._engine.get_round() + + direct_neighbors = set(await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False)) + # Only export scores observed locally for this round; indirect scores are not re-shared. + return { + node_id: float(data["reputation"]) + for node_id, data in self.reputation.items() + if node_id in direct_neighbors + and data.get("round") == round_num + and data.get("reputation") is not None + } + + async def register_reputation_table(self, node_id: str, round_num: int, reputation_table: dict, received_from: str = None): + """Store a reputation table received for a round.""" + # Normalize table payloads at the boundary so aggregation uses numeric scores only. + normalized_table = {} + for neighbor, score in reputation_table.items(): + try: + normalized_table[str(neighbor)] = float(score) + except (TypeError, ValueError): + logging.warning( + f"SDFL reputation | Ignoring invalid reputation score from table {node_id}: " + f"{neighbor}={score}" + ) + + self.reputation_tables.setdefault(round_num, {}) + self.reputation_tables[round_num][node_id] = normalized_table + + logging.info( + f"SDFL reputation | Stored reputation table from {node_id} for round {round_num} " + f"via {received_from}; tables={len(self.reputation_tables[round_num])}" + ) + + expected = self._reputation_tables_expected.get(round_num) + event = self._reputation_tables_events.get(round_num) + if expected and event and expected.issubset(self.reputation_tables[round_num].keys()): + # Wake any aggregator task blocked waiting for all expected reputation tables. + event.set() + + async def wait_reputation_tables(self, expected_nodes, round_num: int, timeout: float): + """Wait until all expected reputation tables arrive or the timeout expires.""" + expected_nodes = set(expected_nodes) + self._reputation_tables_expected[round_num] = expected_nodes + event = self._reputation_tables_events.setdefault(round_num, asyncio.Event()) + + # The table may have arrived before the wait was registered. + if expected_nodes.issubset(self.reputation_tables.get(round_num, {}).keys()): + event.set() + + try: + await asyncio.wait_for(event.wait(), timeout=timeout) + except asyncio.TimeoutError: + missing = expected_nodes - set(self.reputation_tables.get(round_num, {}).keys()) + logging.info( + f"SDFL reputation | Timeout waiting reputation tables for round {round_num}; " + f"missing={sorted(missing)}" + ) + + tables = self.reputation_tables.get(round_num, {}) + missing = expected_nodes - set(tables.keys()) + return tables, missing + + def start_reputation_tables_collection(self, expected_nodes, round_num: int, timeout: float): + """Start a background wait for reputation tables of one SDFL round.""" + if round_num in self._reputation_tables_wait_tasks: + return + + # Keep collecting in the background so late tables are visible before aggregation. + async def _wait_and_log(): + tables, missing = await self.wait_reputation_tables(expected_nodes, round_num, timeout) + logging.info( + f"SDFL reputation | Reputation table collection snapshot for round {round_num}; " + f"received={len(tables)} missing={len(missing)} missing_nodes={sorted(missing)}" + ) + + self._reputation_tables_wait_tasks[round_num] = asyncio.create_task( + _wait_and_log(), + name=f"SDFL_reputation_tables_round_{round_num}", + ) + + async def calculate_indirect_reputation_for_non_neighbors( + self, + target_nodes, + expected_table_nodes, + round_num: int, + timeout: float, + ): + """Calculate indirect SDFL reputation for non-neighbor nodes from received tables.""" + direct_neighbors = set(await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False)) + # The aggregator already has direct scores for neighbors; tables fill the non-neighbor gap. + target_nodes = set(target_nodes) - direct_neighbors - {self._addr} + expected_table_nodes = set(expected_table_nodes) + + if not target_nodes: + logging.info(f"SDFL reputation | No non-neighbor nodes require indirect reputation in round {round_num}") + return {} + + logging.info( + f"SDFL reputation | Waiting reputation tables before aggregation for round {round_num}; " + f"expected_tables={len(expected_table_nodes)} target_non_neighbors={sorted(target_nodes)}" + ) + tables, missing = await self.wait_reputation_tables(expected_table_nodes, round_num, timeout) + logging.info( + f"SDFL reputation | Reputation tables used before aggregation for round {round_num}; " + f"received={len(tables)} missing={len(missing)} missing_nodes={sorted(missing)}:\n" + f"{json.dumps(tables, sort_keys=True, indent=2)}" + ) + + indirect_reputations = {} + for node_id in target_nodes: + # Average all tables that contain the target node to estimate indirect reputation. + scores = [ + float(table[node_id]) + for table in tables.values() + if isinstance(table, dict) and node_id in table + ] + if not scores: + logging.info( + f"SDFL reputation | No received reputation table contains non-neighbor {node_id} " + f"for round {round_num}" + ) + continue + + reputation = float(sum(scores) / len(scores)) + self.reputation[node_id] = { + "reputation": reputation, + "round": round_num, + "last_feedback_round": self.reputation.get(node_id, {}).get("last_feedback_round", -1), + } + indirect_reputations[node_id] = reputation + + if reputation < self.REPUTATION_THRESHOLD and round_num > 0: + # Rejections based on indirect reputation affect aggregation weights for this round. + self.rejected_nodes.add(node_id) + logging.info(f"SDFL reputation | Indirect reputation rejected node {node_id} at round {round_num}") + + logging.info( + f"SDFL reputation | Indirect reputations for non-neighbors before aggregation round {round_num}: " + f"{json.dumps(indirect_reputations, sort_keys=True)}; missing_tables={sorted(missing)}" + ) + return indirect_reputations + + async def send_reputation_table_to_neighbors(self, neighbors): + """Send the local SDFL reputation table through the forwarding channel.""" + round_num = await self._engine.get_round() + reputation_table = await self.get_local_reputation_table(round_num) + # Register our own table locally so local aggregation paths see the same state as receivers. + await self.register_reputation_table(self._addr, round_num, reputation_table, received_from=self._addr) + + if self._engine.rb.get_role_name(True) == "aggregator": + # Aggregators start waiting early because trainer tables may arrive before aggregation. + expected_nodes = self._engine.get_sdfl_expected_trainers() + timeout = float( + self._config.participant["defense_args"] + .get("reputation", {}) + .get("table_aggregation_timeout", 10) + ) + self.start_reputation_tables_collection(expected_nodes, round_num, timeout) + + message = self._engine.cm.create_message( + "reputationtable", + "table", + node_id=self._addr, + round=round_num, + reputation_table_json=json.dumps(reputation_table, sort_keys=True), + ) + + for neighbor in neighbors: + # Reputation tables are forwarded by the network layer in SDFL. + await self._engine.cm.send_message(neighbor, message) + + logging.info( + f"SDFL reputation | Sent reputation table for round {round_num} " + f"to {len(neighbors)} neighbors" + ) async def send_reputation_to_neighbors(self, neighbors): """ @@ -1735,7 +2175,7 @@ async def send_reputation_to_neighbors(self, neighbors): def create_graphic_reputation(self, addr: str, round_num: int): """ Log reputation data for visualization. - + Args: addr: The node address round_num: The round number for logging step @@ -1746,7 +2186,7 @@ def create_graphic_reputation(self, addr: str, round_num: int): for node_id, data in self.reputation.items() if data.get("reputation") is not None } - + if valid_reputations: reputation_data = {f"Reputation/{addr}": valid_reputations} self._engine.trainer._logger.log_data(reputation_data, step=round_num) @@ -1778,6 +2218,8 @@ async def update_process_aggregation(self, updates): logging.info(f"✅ Nei {nei} with reputation {rep:.4f}, scaled model with weight {weight:.4f}") else: logging.info(f"⛔ Nei {nei} with reputation {rep:.4f}, model rejected") + updates.pop(nei, None) + self.rejected_nodes.add(nei) logging.info(f"Updates after rejected nodes: {list(updates.keys())}") logging.info(f"Nodes rejected: {self.rejected_nodes}") @@ -1842,11 +2284,15 @@ async def on_round_start(self, rse: RoundStartEvent): if round_id not in self.round_timing_info: self.round_timing_info[round_id] = {} self.round_timing_info[round_id]["start_time"] = start_time + self._sdfl_training_finished_rounds.discard(round_id) expected_nodes.difference_update(self.rejected_nodes) expected_nodes = list(expected_nodes) self._recalculate_pending_latencies(round_id) async def recollect_model_arrival_latency(self, ure: UpdateReceivedEvent): + if not await self._should_recollect_update_event(ure): + return + (decoded_model, weight, source, round_num, local) = await ure.get_event_data() current_round = await self._engine.get_round() @@ -1954,26 +2400,29 @@ def _recalculate_pending_latencies(self, current_round): async def recollect_similarity(self, ure: UpdateReceivedEvent): """ Collect and analyze model similarity metrics. - + Args: ure: UpdateReceivedEvent containing model and metadata """ - (decoded_model, weight, nei, round_num, local) = await ure.get_event_data() - if not (self._enabled and self._is_metric_enabled("model_similarity")): return - + + if not await self._should_recollect_update_event(ure): + return + + (decoded_model, weight, nei, round_num, local) = await ure.get_event_data() + if not self._engine.config.participant["adaptive_args"]["model_similarity"]: return - + if nei == self._addr: return - + logging.info("🤖 handle_model_message | Checking model similarity") - + local_model = self._engine.trainer.get_model_parameters() similarity_values = self._calculate_all_similarity_metrics(local_model, decoded_model) - + similarity_metrics = { "timestamp": datetime.now(), "nei": nei, @@ -1996,7 +2445,7 @@ def _calculate_all_similarity_metrics(self, local_model: dict, received_model: d "jaccard": 0.0, "minkowski": 0.0, } - + similarity_functions = [ ("cosine", cosine_metric), ("euclidean", euclidean_metric), @@ -2004,29 +2453,29 @@ def _calculate_all_similarity_metrics(self, local_model: dict, received_model: d ("pearson_correlation", pearson_correlation_metric), ("jaccard", jaccard_metric), ] - + similarity_values = {} - + for name, metric_func in similarity_functions: try: similarity_values[name] = metric_func(local_model, received_model, similarity=True) except Exception: similarity_values[name] = 0.0 - + try: similarity_values["minkowski"] = minkowski_metric( local_model, received_model, p=2, similarity=True ) except Exception: similarity_values["minkowski"] = 0.0 - + return similarity_values def _store_similarity_metrics(self, nei: str, similarity_metrics: dict): """Store similarity metrics for the given neighbor.""" if nei not in self.connection_metrics: self.connection_metrics[nei] = Metrics() - + self.connection_metrics[nei].similarity.append(similarity_metrics) def _check_similarity_threshold(self, nei: str, cosine_value: float): @@ -2041,6 +2490,10 @@ async def recollect_number_message(self, source, message): async def recollect_duplicated_number_message(self, dme: DuplicatedMessageEvent): """Record a duplicated message event.""" + if self._engine.config.participant["scenario_args"].get("federation") == "SDFL": + # SDFL forwards model/table messages, so duplicates are not a reliable reputation signal. + return + event_data = await dme.get_event_data() if isinstance(event_data, tuple): source = event_data[0] @@ -2051,6 +2504,12 @@ async def recollect_duplicated_number_message(self, dme: DuplicatedMessageEvent) async def _record_message_data(self, source: str): """Record message data for the given source if it's not the current address.""" if source != self._addr: + if self._engine.config.participant["scenario_args"].get("federation") == "SDFL": + # In SDFL, message-count reputation is only meaningful for direct neighbors. + direct_neighbors = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) + if source not in direct_neighbors: + return + current_time = time.time() if current_time: self.save_data( @@ -2064,25 +2523,28 @@ async def _record_message_data(self, source: str): async def recollect_fraction_of_parameters_changed(self, ure: UpdateReceivedEvent): """ Collect and analyze the fraction of parameters that changed between models. - + Args: ure: UpdateReceivedEvent containing model and metadata """ + if not await self._should_recollect_update_event(ure): + return + (decoded_model, weight, source, round_num, local) = await ure.get_event_data() - + current_round = await self._engine.get_round() parameters_local = self._engine.trainer.get_model_parameters() - + prev_threshold = self._get_previous_threshold(source, current_round) differences = self._calculate_parameter_differences(parameters_local, decoded_model) current_threshold = self._calculate_threshold(differences, prev_threshold) - + changed_params, total_params, changes_record = self._count_changed_parameters( parameters_local, decoded_model, current_threshold ) - + fraction_changed = changed_params / total_params if total_params > 0 else 0.0 - + self._store_fraction_data(source, current_round, { "fraction_changed": fraction_changed, "total_params": total_params, @@ -2102,7 +2564,7 @@ async def recollect_fraction_of_parameters_changed(self, ure: UpdateReceivedEven def _get_previous_threshold(self, source: str, current_round: int) -> float: """Get the threshold from the previous round for the given source.""" - if (source in self.fraction_of_params_changed and + if (source in self.fraction_of_params_changed and current_round - 1 in self.fraction_of_params_changed[source]): return self.fraction_of_params_changed[source][current_round - 1][-1]["threshold"] return None @@ -2122,7 +2584,7 @@ def _calculate_threshold(self, differences: list, prev_threshold: float) -> floa """Calculate the threshold for determining parameter changes.""" if not differences: return 0 - + mean_threshold = torch.mean(torch.tensor(differences)).item() if prev_threshold is not None: return (prev_threshold + mean_threshold) / 2 @@ -2133,20 +2595,20 @@ def _count_changed_parameters(self, local_params: dict, received_params: dict, t total_params = 0 changed_params = 0 changes_record = {} - + for key in local_params.keys(): if key in received_params: local_tensor = local_params[key].cpu() received_tensor = received_params[key].cpu() diff = torch.abs(local_tensor - received_tensor) total_params += diff.numel() - + num_changed = torch.sum(diff > threshold).item() changed_params += num_changed - + if num_changed > 0: changes_record[key] = num_changed - + return changed_params, total_params, changes_record def _store_fraction_data(self, source: str, current_round: int, data: dict): @@ -2155,5 +2617,5 @@ def _store_fraction_data(self, source: str, current_round: int, data: dict): self.fraction_of_params_changed[source] = {} if current_round not in self.fraction_of_params_changed[source]: self.fraction_of_params_changed[source][current_round] = [] - - self.fraction_of_params_changed[source][current_round].append(data) \ No newline at end of file + + self.fraction_of_params_changed[source][current_round].append(data) diff --git a/nebula/addons/trustworthiness/calculation.py b/nebula/addons/trustworthiness/calculation.py deleted file mode 100755 index db3499f5d..000000000 --- a/nebula/addons/trustworthiness/calculation.py +++ /dev/null @@ -1,493 +0,0 @@ -import logging -import math -import numbers -import os.path -import statistics -from datetime import datetime -from math import e -from os.path import exists - -import numpy as np -import pandas as pd -import shap -import torch.nn -from art.estimators.classification import PyTorchClassifier -from art.metrics import clever_u -from codecarbon import EmissionsTracker -from scipy.stats import variation -from torch import nn, optim - -from nebula.addons.trustworthiness.utils import read_csv - -dirname = os.path.dirname(__file__) -logger = logging.getLogger(__name__) - -R_L1 = 40 -R_L2 = 2 -R_LI = 0.1 - - -def get_mapped_score(score_key, score_map): - """ - Finds the score by the score_key in the score_map. - - Args: - score_key (string): The key to look up in the score_map. - score_map (dict): The score map defined in the eval_metrics.json file. - - Returns: - float: The normalized score of [0, 1]. - """ - score = 0 - if score_map is None: - logger.warning("Score map is missing") - else: - keys = [key for key, value in score_map.items()] - scores = [value for key, value in score_map.items()] - normalized_scores = get_normalized_scores(scores) - normalized_score_map = dict(zip(keys, normalized_scores, strict=False)) - score = normalized_score_map.get(score_key, np.nan) - - return score - - -def get_normalized_scores(scores): - """ - Calculates the normalized scores of a list. - - Args: - scores (list): The values that will be normalized. - - Returns: - list: The normalized list. - """ - normalized = [(x - np.min(scores)) / (np.max(scores) - np.min(scores)) for x in scores] - return normalized - - -def get_range_score(value, ranges, direction="asc"): - """ - Maps the value to a range and gets the score by the range and direction. - - Args: - value (int): The input score. - ranges (list): The ranges defined. - direction (string): Asc means the higher the range the higher the score, desc means otherwise. - - Returns: - float: The normalized score of [0, 1]. - """ - - if not (type(value) == int or type(value) == float): - logger.warning("Input value is not a number") - logger.warning(f"{value}") - return 0 - else: - score = 0 - if ranges is None: - logger.warning("Score ranges are missing") - else: - total_bins = len(ranges) + 1 - bin = np.digitize(value, ranges, right=True) - score = 1 - (bin / total_bins) if direction == "desc" else bin / total_bins - return score - - -def get_map_value_score(score_key, score_map): - """ - Finds the score by the score_key in the score_map and returns the value. - - Args: - score_key (string): The key to look up in the score_map. - score_map (dict): The score map defined in the eval_metrics.json file. - - Returns: - float: The score obtained in the score_map. - """ - score = 0 - if score_map is None: - logger.warning("Score map is missing") - else: - score = score_map[score_key] - return score - - -def get_true_score(value, direction): - """ - Returns the negative of the value if direction is 'desc', otherwise returns value. - - Args: - value (int): The input score. - direction (string): Asc means the higher the range the higher the score, desc means otherwise. - - Returns: - float: The score obtained. - """ - - if value is True: - return 1 - elif value is False: - return 0 - else: - if not (type(value) == int or type(value) == float): - logger.warning("Input value is not a number") - logger.warning(f"{value}.") - return 0 - else: - if direction == "desc": - return 1 - value - else: - return value - - -def get_scaled_score(value, scale: list, direction: str): - """ - Maps a score of a specific scale into the scale between zero and one. - - Args: - value (int or float): The raw value of the metric. - scale (list): List containing the minimum and maximum value the value can fall in between. - - Returns: - float: The normalized score of [0, 1]. - """ - - score = 0 - try: - value_min, value_max = scale[0], scale[1] - except Exception: - logger.warning("Score minimum or score maximum is missing. The minimum has been set to 0 and the maximum to 1") - value_min, value_max = 0, 1 - if not value: - logger.warning("Score value is missing. Set value to zero") - else: - low, high = 0, 1 - if value >= value_max: - score = 1 - elif value <= value_min: - score = 0 - else: - diff = value_max - value_min - diffScale = high - low - score = (float(value) - value_min) * (float(diffScale) / diff) + low - if direction == "desc": - score = high - score - - return score - - -def get_value(value): - """ - Get the value of a metric. - - Args: - value (float): The value of the metric. - - Returns: - float: The value of the metric. - """ - - return value - - -def check_properties(*args): - """ - Check if all the arguments have values. - - Args: - args (list): All the arguments. - - Returns: - float: The mean of arguments that have values. - """ - - result = map(lambda x: x is not None and x != "", args) - return np.mean(list(result)) - - -def get_cv(list=None, std=None, mean=None): - """ - Get the coefficient of variation. - - Args: - list (list): List in which the coefficient of variation will be calculated. - std (float): Standard deviation of a list. - mean (float): Mean of a list. - - Returns: - float: The coefficient of variation calculated. - """ - if std is not None and mean is not None: - return std / mean - - if list is not None: - return np.std(list) / np.mean(list) - - return 0 - - -def get_global_privacy_risk(dp, epsilon, n): - """ - Calculates the global privacy risk by epsilon and the number of clients. - - Args: - dp (bool): Indicates if differential privacy is used or not. - epsilon (int): The epsilon value. - n (int): The number of clients in the scenario. - - Returns: - float: The global privacy risk. - """ - - if dp is True and isinstance(epsilon, numbers.Number): - return 1 / (1 + (n - 1) * math.pow(e, -epsilon)) - else: - return 1 - - -def get_elapsed_time(start_time, end_time): - """ - Calculates the elapsed time during the execution of the scenario. - - Args: - start_time (datetime): Start datetime. - end_time (datetime): End datetime. - - Returns: - float: The elapsed time. - """ - start_date = datetime.strptime(start_time, "%d/%m/%Y %H:%M:%S") - end_date = datetime.strptime(end_time, "%d/%m/%Y %H:%M:%S") - - elapsed_time = (end_date - start_date).total_seconds() / 60 - - return elapsed_time - - -def get_bytes_models(models_files): - """ - Calculates the mean bytes of the final models of the nodes. - - Args: - models_files (list): List of final models. - - Returns: - float: The mean bytes of the models. - """ - - total_models_size = 0 - number_models = len(models_files) - - for file in models_files: - model_size = os.path.getsize(file) - total_models_size += model_size - - avg_model_size = total_models_size / number_models - - return avg_model_size - - -def get_bytes_sent_recv(scenario_name): - """ - Calculates the mean bytes sent and received of the nodes. - - Args: - bytes_sent_files (list): Files that contain the bytes sent of the nodes. - bytes_recv_files (list): Files that contain the bytes received of the nodes. - - Returns: - 4-tupla: The total bytes sent, the total bytes received, the mean bytes sent and the mean bytes received of the nodes. - """ - total_upload_bytes = 0 - total_download_bytes = 0 - - data_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", "data_results.csv") - - data = read_csv(data_file) - - number_files = len(data) - - total_upload_bytes = int(data["bytes_sent"].sum()) - total_download_bytes = int(data["bytes_recv"].sum()) - - avg_upload_bytes = total_upload_bytes / number_files - avg_download_bytes = total_download_bytes / number_files - - return total_upload_bytes, total_download_bytes, avg_upload_bytes, avg_download_bytes - - -def get_avg_loss_accuracy(scenario_name): - """ - Calculates the mean accuracy and loss models of the nodes. - - Args: - loss_files (list): Files that contain the loss of the models of the nodes. - accuracy_files (list): Files that contain the acurracies of the models of the nodes. - - Returns: - 3-tupla: The mean loss of the models, the mean accuracies of the models, the standard deviation of the accuracies of the models. - """ - total_accuracy = 0 - total_loss = 0 - - data_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", "data_results.csv") - - data = read_csv(data_file) - - number_files = len(data) - - total_loss = data["loss"].sum() - total_accuracy = data["accuracy"].sum() - - avg_loss = total_loss / number_files - avg_accuracy = total_accuracy / number_files - std_accuracy = statistics.stdev(data["accuracy"]) - - return avg_loss, avg_accuracy, std_accuracy - -def get_feature_importance_cv(model, test_sample): - """ - Calculates the coefficient of variation of the feature importance. - - Args: - model (object): The model. - test_sample (object): One test sample to calculate the feature importance. - - Returns: - float: The coefficient of variation of the feature importance. - """ - - try: - cv = 0 - batch_size = 10 - device = "cpu" - - if isinstance(model, torch.nn.Module): - batched_data, _ = test_sample - - n = batch_size - m = math.floor(0.8 * n) - - background = batched_data[:m].to(device) - test_data = batched_data[m:n].to(device) - - e = shap.DeepExplainer(model, background) - shap_values = e.shap_values(test_data) - if shap_values is not None and len(shap_values) > 0: - sums = np.array([shap_values[i].sum() for i in range(len(shap_values))]) - abs_sums = np.absolute(sums) - cv = variation(abs_sums) - except Exception as e: - logger.warning("Could not compute feature importance CV with shap") - cv = 1 - if math.isnan(cv): - cv = 1 - return cv - - -def get_clever_score(model, test_sample, nb_classes, learning_rate): - """ - Calculates the CLEVER score. - - Args: - model (object): The model. - test_sample (object): One test sample to calculate the CLEVER score. - nb_classes (int): The nb_classes of the model. - learning_rate (float): The learning rate of the model. - - Returns: - float: The CLEVER score. - """ - - images, _ = test_sample - background = images[-1] - - criterion = nn.CrossEntropyLoss() - optimizer = optim.Adam(model.parameters(), learning_rate) - - # Create the ART classifier - classifier = PyTorchClassifier( - model=model, - loss=criterion, - optimizer=optimizer, - input_shape=(1, 28, 28), - nb_classes=nb_classes, - ) - - score_untargeted = clever_u( - classifier, - background.numpy(), - 10, - 5, - R_L2, - norm=2, - pool_factor=3, - verbose=False, - ) - return score_untargeted - - -def stop_emissions_tracking_and_save( - tracker: EmissionsTracker, - outdir: str, - emissions_file: str, - role: str, - workload: str, - sample_size: int = 0, -): - """ - Stops emissions tracking object from CodeCarbon and saves relevant information to emissions.csv file. - - Args: - tracker (object): The emissions tracker object holding information. - outdir (str): The path of the output directory of the experiment. - emissions_file (str): The path to the emissions file. - role (str): Either client or server depending on the role. - workload (str): Either aggregation or training depending on the workload. - sample_size (int): The number of samples used for training, if aggregation 0. - """ - - tracker.stop() - - emissions_file = os.path.join(outdir, emissions_file) - - if exists(emissions_file): - df = pd.read_csv(emissions_file) - else: - df = pd.DataFrame( - columns=[ - "role", - "energy_grid", - "emissions", - "workload", - "CPU_model", - "GPU_model", - ] - ) - try: - energy_grid = (tracker.final_emissions_data.emissions / tracker.final_emissions_data.energy_consumed) * 1000 - df = pd.concat( - [ - df, - pd.DataFrame({ - "role": role, - "energy_grid": [energy_grid], - "emissions": [tracker.final_emissions_data.emissions], - "workload": workload, - "CPU_model": tracker.final_emissions_data.cpu_model - if tracker.final_emissions_data.cpu_model - else "None", - "GPU_model": tracker.final_emissions_data.gpu_model - if tracker.final_emissions_data.gpu_model - else "None", - "CPU_used": True if tracker.final_emissions_data.cpu_energy else False, - "GPU_used": True if tracker.final_emissions_data.gpu_energy else False, - "energy_consumed": tracker.final_emissions_data.energy_consumed, - "sample_size": sample_size, - }), - ], - ignore_index=True, - ) - df.to_csv(emissions_file, encoding="utf-8", index=False) - except Exception as e: - logger.warning(e) diff --git a/nebula/addons/trustworthiness/cfl_factsheet.py b/nebula/addons/trustworthiness/cfl_factsheet.py new file mode 100755 index 000000000..1144571db --- /dev/null +++ b/nebula/addons/trustworthiness/cfl_factsheet.py @@ -0,0 +1,194 @@ +import logging +import os +from json import JSONDecodeError +import numpy as np +import pandas as pd + +from nebula.addons.trustworthiness.helpers.csv_io import read_csv +from nebula.addons.trustworthiness.helpers.data_distribution import ( + get_class_imbalance_score, + get_cv, +) +from nebula.addons.trustworthiness.helpers.factsheet_values import check_field_filled +from nebula.addons.trustworthiness.helpers.privacy import ( + get_global_privacy_risk, +) +from nebula.addons.trustworthiness.helpers.scenario_metrics import ( + get_avg_class_imbalance_model_size, + get_avg_loss_accuracy, + get_bytes_sent_recv, + get_dp_global, + get_elapsed_time, + get_entropy_list, + get_underfitting_score, +) +from nebula.addons.trustworthiness.factsheet_common import ( + get_factsheet_path, + get_factsheet_template_name, + get_trustworthiness_dir, + load_or_create_factsheet, + populate_common_pre_train_sections, + populate_participation, + populate_reliability, + populate_reputation, + set_dp_configuration, + write_factsheet, +) +from nebula.addons.trustworthiness.factsheet_populators import populate_profile_metrics +# from nebula.core.models.syscall.mlp import SyscallModelMLP + +logger = logging.getLogger(__name__) + +class CflFactsheet: + def __init__(self): + # Manage the single CFL factsheet populated from server-side aggregation. + self.factsheet_file_nm = "factsheet.json" + self.factsheet_template_file_nm = "factsheet_template_cfl.json" + + def populate_factsheet_cfl( + self, + scenario_name, + data, + start_time, + end_time, + participant_idx, + model, + train_loader, + test_loader, + reputation_summary=None, + participation_summary=None, + reliability_summary=None, + ): + + # Resolve the output factsheet and template for federation/data type. + factsheet_file = get_factsheet_path(scenario_name, self.factsheet_file_nm) + factsheet_template_file_nm = get_factsheet_template_name( + data["federation"], + model, + self.factsheet_template_file_nm, + dataset_name=data["dataset"], + ) + + try: + factsheet_file, factsheet = load_or_create_factsheet( + scenario_name, + self.factsheet_file_nm, + factsheet_template_file_nm, + ) + + logging.info("FactSheet: Populating factsheet with pre training metrics") + + populate_common_pre_train_sections(factsheet, data, model) + + # CFL reads aggregate CSV artifacts from the scenario trust directory. + files_dir = get_trustworthiness_dir(scenario_name) + + emissions_file = os.path.join(files_dir, "emissions.csv") + + # Aggregate class imbalance, entropy and model size across participants. + avg_class_imbalance, avg_model_size = get_avg_class_imbalance_model_size(scenario_name) + entropy_distribution = get_entropy_list (scenario_name) + + values = np.array(entropy_distribution) + + normalized_values = (values - np.min(values)) / (np.max(values) - np.min(values)) + + avg_entropy = np.mean(normalized_values) + + factsheet["data"]["avg_entropy"] = avg_entropy + + # Set global performance and fairness metrics from aggregate results. + result_avg_loss_accuracy = get_avg_loss_accuracy(scenario_name) + factsheet["performance"]["test_loss_avg"] = result_avg_loss_accuracy[0] + factsheet["performance"]["test_acc_avg"] = result_avg_loss_accuracy[1] + test_acc_cv = get_cv(std=result_avg_loss_accuracy[2], mean=result_avg_loss_accuracy[1]) + factsheet["fairness"]["test_acc_cv"] = 1 if test_acc_cv > 1 else test_acc_cv + factsheet["performance"]["test_macro_f1"] = result_avg_loss_accuracy[3] + factsheet["performance"]["train_accuracy"] = result_avg_loss_accuracy[4] + + # Compute CFL privacy risk from aggregate DP settings and client count. + dp_enabled, dp_epsilon = get_dp_global(scenario_name) + set_dp_configuration(factsheet, dp_enabled, dp_epsilon) + factsheet["privacy"]["privacy_risk"] = get_global_privacy_risk( + dp_enabled, + dp_epsilon, + factsheet["participants"]["client_num"], + ) + + # Populate system timing, model-size and communication totals. + factsheet["system"]["avg_time_minutes"] = get_elapsed_time(start_time, end_time) + factsheet["system"]["avg_model_size"] = avg_model_size + + result_bytes_sent_recv = get_bytes_sent_recv(scenario_name) + factsheet["system"]["total_upload_bytes"] = result_bytes_sent_recv[0] + factsheet["system"]["total_download_bytes"] = result_bytes_sent_recv[1] + factsheet["system"]["avg_upload_bytes"] = result_bytes_sent_recv[2] + factsheet["system"]["avg_download_bytes"] = result_bytes_sent_recv[3] + populate_reliability(factsheet, reliability_summary) + populate_participation(factsheet, participation_summary) + + # Convert class imbalance and runtime summaries into factsheet fields. + class_imbalance_score = get_class_imbalance_score(avg_class_imbalance) + factsheet["fairness"]["class_imbalance"] = class_imbalance_score + populate_reputation(factsheet, reputation_summary) + + underfitting_score = get_underfitting_score(scenario_name, participant_idx) + + factsheet["fairness"]["underfitting"] = underfitting_score + # Add model/profile-specific metrics after base factsheet fields exist. + populate_profile_metrics( + factsheet, + data["federation"], + model, + train_loader, + test_loader, + factsheet["performance"]["test_acc_avg"], + ) + + # Enrich CodeCarbon emissions with CPU/GPU benchmark metadata. + emissions = None if emissions_file is None else read_csv(emissions_file) + if emissions is not None: + logging.info("FactSheet: Populating emissions") + cpu_spez_df = pd.read_csv(os.path.join(os.path.dirname(__file__), "benchmarks", "CPU_benchmarks_v4.csv"), header=0) + emissions["CPU_model"] = emissions["CPU_model"].astype(str).str.replace(r"\([^)]*\)", "", regex=True) + emissions["CPU_model"] = emissions["CPU_model"].astype(str).str.replace(r" CPU", "", regex=True) + emissions["GPU_model"] = emissions["GPU_model"].astype(str).str.replace(r"[0-9] x ", "", regex=True) + emissions = pd.merge(emissions, cpu_spez_df[["cpuName", "powerPerf"]], left_on="CPU_model", right_on="cpuName", how="left") + gpu_spez_df = pd.read_csv(os.path.join(os.path.dirname(__file__), "benchmarks", "GPU_benchmarks_v7.csv"), header=0) + emissions = pd.merge(emissions, gpu_spez_df[["gpuName", "powerPerformance"]], left_on="GPU_model", right_on="gpuName", how="left") + + emissions.drop("cpuName", axis=1, inplace=True) + emissions.drop("gpuName", axis=1, inplace=True) + emissions["powerPerf"] = emissions["powerPerf"].astype(float) + emissions["powerPerformance"] = emissions["powerPerformance"].astype(float) + # Trainer rows represent client-side training cost. + client_emissions = emissions.loc[emissions["role"] == "trainer"] + client_avg_carbon_intensity = round(client_emissions["energy_grid"].mean(), 2) + factsheet["sustainability"]["avg_carbon_intensity_clients"] = check_field_filled(factsheet, ["sustainability", "avg_carbon_intensity_clients"], client_avg_carbon_intensity, "") + factsheet["sustainability"]["emissions_training"] = check_field_filled(factsheet, ["sustainability", "emissions_training"], client_emissions["emissions"].sum(), "") + factsheet["participants"]["avg_dataset_size"] = check_field_filled(factsheet, ["participants", "avg_dataset_size"], client_emissions["sample_size"].mean(), "") + GPU_powerperf = (client_emissions.loc[client_emissions["GPU_used"] == True])["powerPerformance"] + CPU_powerperf = (client_emissions.loc[client_emissions["CPU_used"] == True])["powerPerf"] + clients_power_performance = round(pd.concat([GPU_powerperf, CPU_powerperf]).mean(), 2) + factsheet["sustainability"]["avg_power_performance_clients"] = check_field_filled(factsheet, ["sustainability", "avg_power_performance_clients"], clients_power_performance, "") + + # Server rows represent aggregation cost. + server_emissions = emissions.loc[emissions["role"] == "server"] + server_avg_carbon_intensity = round(server_emissions["energy_grid"].mean(), 2) + factsheet["sustainability"]["avg_carbon_intensity_server"] = check_field_filled(factsheet, ["sustainability", "avg_carbon_intensity_server"], server_avg_carbon_intensity, "") + factsheet["sustainability"]["emissions_aggregation"] = check_field_filled(factsheet, ["sustainability", "emissions_aggregation"], server_emissions["emissions"].sum(), "") + GPU_powerperf = (server_emissions.loc[server_emissions["GPU_used"] == True])["powerPerformance"] + CPU_powerperf = (server_emissions.loc[server_emissions["CPU_used"] == True])["powerPerf"] + server_power_performance = round(pd.concat([GPU_powerperf, CPU_powerperf]).mean(), 2) + factsheet["sustainability"]["avg_power_performance_server"] = check_field_filled(factsheet, ["sustainability", "avg_power_performance_server"], server_power_performance, "") + + # Estimate communication emissions from byte counts and carbon intensity. + factsheet["sustainability"]["emissions_communication_uplink"] = check_field_filled(factsheet, ["sustainability", "emissions_communication_uplink"], factsheet["system"]["total_upload_bytes"] * 2.24e-10 * factsheet["sustainability"]["avg_carbon_intensity_clients"], "") + factsheet["sustainability"]["emissions_communication_downlink"] = check_field_filled(factsheet, ["sustainability", "emissions_communication_downlink"], factsheet["system"]["total_download_bytes"] * 2.24e-10 * factsheet["sustainability"]["avg_carbon_intensity_server"], "") + + write_factsheet(factsheet_file, factsheet) + + except JSONDecodeError as e: + # Keep corrupted factsheet failures explicit in logs. + logging.info(f"{factsheet_file} is invalid") + logging.error(e) diff --git a/nebula/addons/trustworthiness/configs/eval_metrics_cfl.json b/nebula/addons/trustworthiness/configs/eval_metrics_cfl.json new file mode 100755 index 000000000..520e32ed6 --- /dev/null +++ b/nebula/addons/trustworthiness/configs/eval_metrics_cfl.json @@ -0,0 +1,1129 @@ +{ + "robustness": { + "resilience_to_attacks": { + "weight": 0.4, + "metrics": { + "certified_robustness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Cross Lipschitz Extreme Value for network Robustness: attack-agnostic estimator of the lower bound βL", + "weight": 0.2 + }, + "inverse_loss_sensitivity": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse loss sensitivity score; higher values indicate lower sensitivity of the loss to input perturbations.", + "weight": 0.2 + }, + "adversarial_accuracy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Adversarial accuracy; higher values indicate better predictive performance under adversarial perturbations.", + "weight": 0.2 + }, + "empirical_robustness_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical robustness score; higher values indicate stronger resistance to adversarial perturbations.", + "weight": 0.15 + }, + "confidence_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Confidence score; higher values indicate more stable predictive confidence.", + "weight": 0.1 + }, + "inverse_attack_success_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse attack success rate; higher values indicate a lower fraction of successful adversarial attacks.", + "weight": 0.15 + } + } + }, + "algorithm_robustness": { + "weight": 0.4, + "metrics": { + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_acc_avg" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average test accuracy of the global model on clients test data.", + "weight": 0.4 + }, + "macro_f1": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Macro F1 score of the final model on test data.", + "weight": 0.4 + }, + "personalization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/personalization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of personalized FL algorithm.", + "weight": 0.1 + }, + "reputation_enabled": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of an active reputation-based defense mechanism.", + "weight": 0.1 + } + } + }, + "client_reliability": { + "weight": 0.2, + "metrics": { + "scale": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the model.", + "weight": 0.1 + }, + "average_neighbor_reputation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average reputation score of the neighbors associated with the node or federation.", + "weight": 0.3 + }, + "inverse_dropout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/dropout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of expected client updates that were not received across rounds.", + "weight": 0.3 + }, + "inverse_timeout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of aggregation rounds that finished with missing expected client updates.", + "weight": 0.3 + } + } + } + }, + "privacy": { + "technique": { + "weight": 0.2, + "metrics": { + "differential_privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of differential privacy.", + "weight": 1 + } + } + }, + "uncertainty": { + "weight": 0.6, + "metrics": { + "entropy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/avg_entropy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The measure of uncertainty in identifying a client.", + "weight": 1 + } + } + }, + "indistinguishability": { + "weight": 0.2, + "metrics": { + "global_privacy_risk": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_global_privacy_risk", + "type": "true_score", + "direction": "desc", + "description": "A worst-case approximation of the maximal risk for distinguishing two clients.", + "weight": 0.2 + }, + "epsilon_star": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical privacy leakage estimated from the separability of train and test loss distributions.", + "weight": 0.4 + }, + "mia_auc_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Trust-oriented score derived from the ROC-AUC of a loss-based membership inference attack.", + "weight": 0.4 + } + } + } + }, + "fairness": { + "selection_fairness": { + "weight": 0.25, + "metrics": { + "selection_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "asc", + "description": "Variation in selection rate among the clients.", + "weight": 1 + } + } + }, + "performance_fairness": { + "weight": 0.25, + "metrics": { + "accuracy_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/test_acc_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Variation of global model performance among the clients.", + "weight": 1 + } + } + }, + "class_distribution": { + "weight": 0.25, + "metrics": { + "class_imbalance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of the sample size per class.", + "weight": 1 + } + } + }, + "outcome_fairness": { + "weight": 0.25, + "metrics": { + "underfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/underfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Held-out performance proxy used as an outcome-level fairness signal.", + "weight": 0.1 + }, + "inverse_overfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Generalization quality proxy transformed so higher is better.", + "weight": 0.15 + }, + "inverse_well_calibration_error": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Calibration quality of the predictive outputs represented as a trust-oriented score.", + "weight": 0.2 + }, + "inverse_generalized_entropy_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_theil_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Theil-based outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_coefficient_of_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Coefficient-of-variation-based outcome fairness score.", + "weight": 0.15 + } + } + } + }, + "explainability": { + "interpretability": { + "weight": 0.4, + "metrics": { + "algorithmic_transparency": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/training_model" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "RandomForestClassifier": 4, + "KNeighborsClassifier": 3, + "SVC": 2, + "GaussianProcessClassifier": 3, + "DecisionTreeClassifier": 5, + "MLP": 1, + "AdaBoostClassifier": 3, + "GaussianNB": 3.5, + "QuadraticDiscriminantAnalysis": 3, + "LogisticRegression": 4, + "LinearRegression": 3.5, + "Sequential": 1, + "CNN": 1 + }, + "description": "Mapping of Learning techniques to the level of explainability based on on literature research and qualitative analysis of each learning technique.", + "weight": 0.6 + }, + "model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5, 10e6, 10e7, 10e8], + "description": "Ranges of how to map model size to a score from 1-5.", + "weight": 0.4 + } + } + }, + "post_hoc_methods": { + "weight": 0.6, + "metrics": { + "clipped_feature_importance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of feature importance scores of all the features.", + "weight": 0.2 + }, + "alpha_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of features needed to explain most of the attribution mass; lower values indicate sparser and more focused explanations.", + "weight": 0.2 + }, + "spread_ratio": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Normalized entropy of the attribution distribution; lower values indicate explanations concentrated on fewer features.", + "weight": 0.2 + }, + "spread_divergence": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Jensen-Shannon divergence between the attribution distribution and a uniform distribution; higher values indicate more selective explanations.", + "weight": 0.2 + }, + "visualization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/visualization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of graphical capabilities to show the explainability.", + "weight": 0.2 + } + } + } + }, + "accountability": { + "factsheet_completeness": { + "weight": 0.8, + "metrics": { + "project_specs": { + "inputs": [ + { + "source": "factsheet", + "field_path": "project/overview" + }, + { + "source": "factsheet", + "field_path": "project/purpose" + }, + { + "source": "factsheet", + "field_path": "project/background" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Specifications of the project.", + "weight": 0.1 + }, + "participants": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + }, + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + }, + { + "source": "factsheet", + "field_path": "participants/client_selector" + }, + { + "source": "factsheet", + "field_path": "participants/avg_dataset_size" + }, + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Participants information.", + "weight": 0.1 + }, + "data": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/provenance" + }, + { + "source": "factsheet", + "field_path": "data/preprocessing" + }, + { + "source": "factsheet", + "field_path": "data/avg_entropy" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Meta data about the data.", + "weight": 0.1 + }, + "configuration": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + }, + { + "source": "factsheet", + "field_path": "configuration/training_model" + }, + { + "source": "factsheet", + "field_path": "configuration/personalization" + }, + { + "source": "factsheet", + "field_path": "configuration/visualization" + }, + { + "source": "factsheet", + "field_path": "configuration/monitoring" + }, + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + }, + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + }, + { + "source": "factsheet", + "field_path": "configuration/learning_rate" + }, + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "FL model configurations.", + "weight": 0.1 + }, + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_loss_avg" + }, + { + "source": "factsheet", + "field_path": "performance/test_acc_avg" + }, + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + }, + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + }, + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + }, + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + }, + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + }, + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Performance evaluation results.", + "weight": 0.1 + }, + "fairness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/test_acc_cv" + }, + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + }, + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + }, + { + "source": "factsheet", + "field_path": "fairness/underfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Fairness metrics results.", + "weight": 0.1 + }, + "system": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/avg_time_minutes" + }, + { + "source": "factsheet", + "field_path": "system/avg_model_size" + }, + { + "source": "factsheet", + "field_path": "system/total_upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/total_download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/avg_upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/avg_download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/dropout_rate" + }, + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Privacy metrics and risk estimates documented in the factsheet.", + "weight": 0.1 + }, + "privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/privacy_risk" + }, + { + "source": "factsheet", + "field_path": "privacy/epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Explainability metrics documented in the factsheet.", + "weight": 0.1 + }, + "explainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Sustainability and emissions metrics documented in the factsheet.", + "weight": 0.1 + }, + "sustainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_server" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_clients" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_clients" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_server" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_training" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_aggregation" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_uplink" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_downlink" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "System usage information.", + "weight": 0.1 + } + } + }, + "monitoring": { + "weight": 0.2, + "metrics": { + "logs_available": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/monitoring" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of logs to show all the nodes.", + "weight": 1 + } + } + } + }, + "architectural_soundness": { + "client_management": { + "weight": 0.25, + "metrics": { + "client_selector": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_selector" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Reputation Based": 1.0, + "Full Participation": 0.5 + }, + "description": "Mapping of client selection strategies to architectural soundness. Reputation-based selection is scored higher than full participation because it introduces an explicit selection mechanism.", + "weight": 1 + } + } + }, + "optimization": { + "weight": 0.5, + "metrics": { + "algorithm": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + } + ], + "operation": "get_value", + "type": "score_map_value", + "score_map": { + "FedAvg": 0.9509, + "Krum": 0.9535, + "TrimmedMean": 0.9595, + "Median": 0.9461 + }, + "description": "The choice of a suitable aggregation algorithm.", + "weight": 1 + } + } + }, + "federation_management": { + "weight": 0.25, + "metrics": { + "topology_type": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/preprocessing" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Fully": 1.0, + "Star": 0.8, + "Ring": 0.6, + "Random": 0.2 + }, + "description": "Mapping of network topology types to architectural soundness, assuming fully connected topologies provide the strongest structural connectivity, followed by star, ring, and random topologies.", + "weight": 1 + } + } + } + }, + "sustainability": { + "energy_source": { + "weight": 0.5, + "metrics": { + "carbon_intensity_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_clients" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [20, 795], + "description": "Carbon intensity of energy grid used by clients", + "weight": 0.5 + }, + "carbon_intensity_server": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_server" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [20, 795], + "description": "Carbon intensity of energy grid used by server", + "weight": 0.5 + } + } + }, + "hardware_efficiency": { + "weight": 0.25, + "metrics": { + "avg_power_performance_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_clients" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [20, 1447], + "description": "Average Power Performanc of Client CPUs or GPUs", + "weight": 0.5 + }, + "avg_power_performance_server": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_server" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [20, 1447], + "description": "Power Performanc of Server CPU or GPU", + "weight": 0.5 + } + } + }, + "federation_complexity": { + "weight": 0.25, + "metrics": { + "communication_efficiency": { + "inputs": [ + { "source": "factsheet", "field_path": "system/total_upload_bytes" }, + { "source": "factsheet", "field_path": "system/total_download_bytes" }, + { "source": "factsheet", "field_path": "performance/test_acc_avg" } + ], + "operation": "comm_efficiency", + "type": "ranges", + "direction": "desc", + "ranges":[0.1, 10e2, 10e3,10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "Communication cost per unit of final test accuracy; lower values indicate more efficient federation communication.", + "weight": 0.3 + }, + "number_of_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The total number of training rounds", + "weight": 0.15 + }, + "avg_model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges":[10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "The size of the model", + "weight": 0.15 + }, + "client_selection_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [ + 0.1,1 + ], + "description": "The selection rate of clients for each training round", + "weight": 0.1 + }, + "number_of_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the federation.", + "weight": 0.1 + }, + "local_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [1, 100], + "description": "The number of local training rounds.", + "weight": 0.1 + }, + "avg_dataset_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_dataset_size" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5], + "description": "The average number of training samples", + "weight": 0.1 + } + } + } + } + } diff --git a/nebula/addons/trustworthiness/configs/eval_metrics_cfl_images.json b/nebula/addons/trustworthiness/configs/eval_metrics_cfl_images.json new file mode 100755 index 000000000..520e32ed6 --- /dev/null +++ b/nebula/addons/trustworthiness/configs/eval_metrics_cfl_images.json @@ -0,0 +1,1129 @@ +{ + "robustness": { + "resilience_to_attacks": { + "weight": 0.4, + "metrics": { + "certified_robustness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Cross Lipschitz Extreme Value for network Robustness: attack-agnostic estimator of the lower bound βL", + "weight": 0.2 + }, + "inverse_loss_sensitivity": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse loss sensitivity score; higher values indicate lower sensitivity of the loss to input perturbations.", + "weight": 0.2 + }, + "adversarial_accuracy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Adversarial accuracy; higher values indicate better predictive performance under adversarial perturbations.", + "weight": 0.2 + }, + "empirical_robustness_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical robustness score; higher values indicate stronger resistance to adversarial perturbations.", + "weight": 0.15 + }, + "confidence_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Confidence score; higher values indicate more stable predictive confidence.", + "weight": 0.1 + }, + "inverse_attack_success_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse attack success rate; higher values indicate a lower fraction of successful adversarial attacks.", + "weight": 0.15 + } + } + }, + "algorithm_robustness": { + "weight": 0.4, + "metrics": { + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_acc_avg" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average test accuracy of the global model on clients test data.", + "weight": 0.4 + }, + "macro_f1": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Macro F1 score of the final model on test data.", + "weight": 0.4 + }, + "personalization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/personalization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of personalized FL algorithm.", + "weight": 0.1 + }, + "reputation_enabled": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of an active reputation-based defense mechanism.", + "weight": 0.1 + } + } + }, + "client_reliability": { + "weight": 0.2, + "metrics": { + "scale": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the model.", + "weight": 0.1 + }, + "average_neighbor_reputation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average reputation score of the neighbors associated with the node or federation.", + "weight": 0.3 + }, + "inverse_dropout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/dropout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of expected client updates that were not received across rounds.", + "weight": 0.3 + }, + "inverse_timeout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of aggregation rounds that finished with missing expected client updates.", + "weight": 0.3 + } + } + } + }, + "privacy": { + "technique": { + "weight": 0.2, + "metrics": { + "differential_privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of differential privacy.", + "weight": 1 + } + } + }, + "uncertainty": { + "weight": 0.6, + "metrics": { + "entropy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/avg_entropy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The measure of uncertainty in identifying a client.", + "weight": 1 + } + } + }, + "indistinguishability": { + "weight": 0.2, + "metrics": { + "global_privacy_risk": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_global_privacy_risk", + "type": "true_score", + "direction": "desc", + "description": "A worst-case approximation of the maximal risk for distinguishing two clients.", + "weight": 0.2 + }, + "epsilon_star": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical privacy leakage estimated from the separability of train and test loss distributions.", + "weight": 0.4 + }, + "mia_auc_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Trust-oriented score derived from the ROC-AUC of a loss-based membership inference attack.", + "weight": 0.4 + } + } + } + }, + "fairness": { + "selection_fairness": { + "weight": 0.25, + "metrics": { + "selection_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "asc", + "description": "Variation in selection rate among the clients.", + "weight": 1 + } + } + }, + "performance_fairness": { + "weight": 0.25, + "metrics": { + "accuracy_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/test_acc_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Variation of global model performance among the clients.", + "weight": 1 + } + } + }, + "class_distribution": { + "weight": 0.25, + "metrics": { + "class_imbalance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of the sample size per class.", + "weight": 1 + } + } + }, + "outcome_fairness": { + "weight": 0.25, + "metrics": { + "underfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/underfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Held-out performance proxy used as an outcome-level fairness signal.", + "weight": 0.1 + }, + "inverse_overfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Generalization quality proxy transformed so higher is better.", + "weight": 0.15 + }, + "inverse_well_calibration_error": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Calibration quality of the predictive outputs represented as a trust-oriented score.", + "weight": 0.2 + }, + "inverse_generalized_entropy_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_theil_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Theil-based outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_coefficient_of_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Coefficient-of-variation-based outcome fairness score.", + "weight": 0.15 + } + } + } + }, + "explainability": { + "interpretability": { + "weight": 0.4, + "metrics": { + "algorithmic_transparency": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/training_model" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "RandomForestClassifier": 4, + "KNeighborsClassifier": 3, + "SVC": 2, + "GaussianProcessClassifier": 3, + "DecisionTreeClassifier": 5, + "MLP": 1, + "AdaBoostClassifier": 3, + "GaussianNB": 3.5, + "QuadraticDiscriminantAnalysis": 3, + "LogisticRegression": 4, + "LinearRegression": 3.5, + "Sequential": 1, + "CNN": 1 + }, + "description": "Mapping of Learning techniques to the level of explainability based on on literature research and qualitative analysis of each learning technique.", + "weight": 0.6 + }, + "model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5, 10e6, 10e7, 10e8], + "description": "Ranges of how to map model size to a score from 1-5.", + "weight": 0.4 + } + } + }, + "post_hoc_methods": { + "weight": 0.6, + "metrics": { + "clipped_feature_importance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of feature importance scores of all the features.", + "weight": 0.2 + }, + "alpha_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of features needed to explain most of the attribution mass; lower values indicate sparser and more focused explanations.", + "weight": 0.2 + }, + "spread_ratio": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Normalized entropy of the attribution distribution; lower values indicate explanations concentrated on fewer features.", + "weight": 0.2 + }, + "spread_divergence": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Jensen-Shannon divergence between the attribution distribution and a uniform distribution; higher values indicate more selective explanations.", + "weight": 0.2 + }, + "visualization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/visualization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of graphical capabilities to show the explainability.", + "weight": 0.2 + } + } + } + }, + "accountability": { + "factsheet_completeness": { + "weight": 0.8, + "metrics": { + "project_specs": { + "inputs": [ + { + "source": "factsheet", + "field_path": "project/overview" + }, + { + "source": "factsheet", + "field_path": "project/purpose" + }, + { + "source": "factsheet", + "field_path": "project/background" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Specifications of the project.", + "weight": 0.1 + }, + "participants": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + }, + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + }, + { + "source": "factsheet", + "field_path": "participants/client_selector" + }, + { + "source": "factsheet", + "field_path": "participants/avg_dataset_size" + }, + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Participants information.", + "weight": 0.1 + }, + "data": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/provenance" + }, + { + "source": "factsheet", + "field_path": "data/preprocessing" + }, + { + "source": "factsheet", + "field_path": "data/avg_entropy" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Meta data about the data.", + "weight": 0.1 + }, + "configuration": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + }, + { + "source": "factsheet", + "field_path": "configuration/training_model" + }, + { + "source": "factsheet", + "field_path": "configuration/personalization" + }, + { + "source": "factsheet", + "field_path": "configuration/visualization" + }, + { + "source": "factsheet", + "field_path": "configuration/monitoring" + }, + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + }, + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + }, + { + "source": "factsheet", + "field_path": "configuration/learning_rate" + }, + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "FL model configurations.", + "weight": 0.1 + }, + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_loss_avg" + }, + { + "source": "factsheet", + "field_path": "performance/test_acc_avg" + }, + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + }, + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + }, + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + }, + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + }, + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + }, + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Performance evaluation results.", + "weight": 0.1 + }, + "fairness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/test_acc_cv" + }, + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + }, + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + }, + { + "source": "factsheet", + "field_path": "fairness/underfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Fairness metrics results.", + "weight": 0.1 + }, + "system": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/avg_time_minutes" + }, + { + "source": "factsheet", + "field_path": "system/avg_model_size" + }, + { + "source": "factsheet", + "field_path": "system/total_upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/total_download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/avg_upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/avg_download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/dropout_rate" + }, + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Privacy metrics and risk estimates documented in the factsheet.", + "weight": 0.1 + }, + "privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/privacy_risk" + }, + { + "source": "factsheet", + "field_path": "privacy/epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Explainability metrics documented in the factsheet.", + "weight": 0.1 + }, + "explainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Sustainability and emissions metrics documented in the factsheet.", + "weight": 0.1 + }, + "sustainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_server" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_clients" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_clients" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_server" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_training" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_aggregation" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_uplink" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_downlink" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "System usage information.", + "weight": 0.1 + } + } + }, + "monitoring": { + "weight": 0.2, + "metrics": { + "logs_available": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/monitoring" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of logs to show all the nodes.", + "weight": 1 + } + } + } + }, + "architectural_soundness": { + "client_management": { + "weight": 0.25, + "metrics": { + "client_selector": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_selector" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Reputation Based": 1.0, + "Full Participation": 0.5 + }, + "description": "Mapping of client selection strategies to architectural soundness. Reputation-based selection is scored higher than full participation because it introduces an explicit selection mechanism.", + "weight": 1 + } + } + }, + "optimization": { + "weight": 0.5, + "metrics": { + "algorithm": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + } + ], + "operation": "get_value", + "type": "score_map_value", + "score_map": { + "FedAvg": 0.9509, + "Krum": 0.9535, + "TrimmedMean": 0.9595, + "Median": 0.9461 + }, + "description": "The choice of a suitable aggregation algorithm.", + "weight": 1 + } + } + }, + "federation_management": { + "weight": 0.25, + "metrics": { + "topology_type": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/preprocessing" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Fully": 1.0, + "Star": 0.8, + "Ring": 0.6, + "Random": 0.2 + }, + "description": "Mapping of network topology types to architectural soundness, assuming fully connected topologies provide the strongest structural connectivity, followed by star, ring, and random topologies.", + "weight": 1 + } + } + } + }, + "sustainability": { + "energy_source": { + "weight": 0.5, + "metrics": { + "carbon_intensity_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_clients" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [20, 795], + "description": "Carbon intensity of energy grid used by clients", + "weight": 0.5 + }, + "carbon_intensity_server": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_server" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [20, 795], + "description": "Carbon intensity of energy grid used by server", + "weight": 0.5 + } + } + }, + "hardware_efficiency": { + "weight": 0.25, + "metrics": { + "avg_power_performance_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_clients" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [20, 1447], + "description": "Average Power Performanc of Client CPUs or GPUs", + "weight": 0.5 + }, + "avg_power_performance_server": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_server" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [20, 1447], + "description": "Power Performanc of Server CPU or GPU", + "weight": 0.5 + } + } + }, + "federation_complexity": { + "weight": 0.25, + "metrics": { + "communication_efficiency": { + "inputs": [ + { "source": "factsheet", "field_path": "system/total_upload_bytes" }, + { "source": "factsheet", "field_path": "system/total_download_bytes" }, + { "source": "factsheet", "field_path": "performance/test_acc_avg" } + ], + "operation": "comm_efficiency", + "type": "ranges", + "direction": "desc", + "ranges":[0.1, 10e2, 10e3,10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "Communication cost per unit of final test accuracy; lower values indicate more efficient federation communication.", + "weight": 0.3 + }, + "number_of_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The total number of training rounds", + "weight": 0.15 + }, + "avg_model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges":[10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "The size of the model", + "weight": 0.15 + }, + "client_selection_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [ + 0.1,1 + ], + "description": "The selection rate of clients for each training round", + "weight": 0.1 + }, + "number_of_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the federation.", + "weight": 0.1 + }, + "local_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [1, 100], + "description": "The number of local training rounds.", + "weight": 0.1 + }, + "avg_dataset_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_dataset_size" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5], + "description": "The average number of training samples", + "weight": 0.1 + } + } + } + } + } diff --git a/nebula/addons/trustworthiness/configs/eval_metrics.json b/nebula/addons/trustworthiness/configs/eval_metrics_cfl_tabular.json similarity index 52% rename from nebula/addons/trustworthiness/configs/eval_metrics.json rename to nebula/addons/trustworthiness/configs/eval_metrics_cfl_tabular.json index 5ab1b3427..a75400052 100755 --- a/nebula/addons/trustworthiness/configs/eval_metrics.json +++ b/nebula/addons/trustworthiness/configs/eval_metrics_cfl_tabular.json @@ -3,18 +3,41 @@ "resilience_to_attacks": { "weight": 0.4, "metrics": { - "certified_robustness": { + "adversarial_accuracy": { "inputs": [ { "source": "factsheet", - "field_path": "performance/test_clever" + "field_path": "performance/test_adv_accuracy" } ], "operation": "get_value", - "score_function": "get_range_score", "type": "true_score", - "description": "Cross Lipschitz Extreme Value for network Robustness: attack-agnostic estimator of the lower bound βL", - "weight": 1 + "description": "Adversarial accuracy; higher values indicate better predictive performance under adversarial perturbations.", + "weight": 0.4444444444 + }, + "confidence_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Confidence score; higher values indicate more stable predictive confidence.", + "weight": 0.2222222222 + }, + "inverse_attack_success_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse attack success rate; higher values indicate a lower fraction of successful adversarial attacks.", + "weight": 0.3333333334 } } }, @@ -29,10 +52,21 @@ } ], "operation": "get_value", - "score_function": "get_true_score", "type": "true_score", "description": "Average test accuracy of the global model on clients test data.", - "weight": 0.5 + "weight": 0.4 + }, + "macro_f1": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Macro F1 score of the final model on test data.", + "weight": 0.4 }, "personalization": { "inputs": [ @@ -44,7 +78,19 @@ "operation": "get_value", "type": "true_score", "description": "The use of personalized FL algorithm.", - "weight": 0.5 + "weight": 0.1 + }, + "reputation_enabled": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of an active reputation-based defense mechanism.", + "weight": 0.1 } } }, @@ -63,7 +109,45 @@ "direction": "desc", "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], "description": "The number of clients in the model.", - "weight": 1 + "weight": 0.1 + }, + "average_neighbor_reputation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average reputation score of the neighbors associated with the node or federation.", + "weight": 0.3 + }, + "inverse_dropout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/dropout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of expected client updates that were not received across rounds.", + "weight": 0.3 + }, + "inverse_timeout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of aggregation rounds that finished with missing expected client updates.", + "weight": 0.3 } } } @@ -125,14 +209,38 @@ "type": "true_score", "direction": "desc", "description": "A worst-case approximation of the maximal risk for distinguishing two clients.", - "weight": 1 + "weight": 0.2 + }, + "epsilon_star": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical privacy leakage estimated from the separability of train and test loss distributions.", + "weight": 0.4 + }, + "mia_auc_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Trust-oriented score derived from the ROC-AUC of a loss-based membership inference attack.", + "weight": 0.4 } } } }, "fairness": { "selection_fairness": { - "weight": 0.3333, + "weight": 0.25, "metrics": { "selection_variation": { "inputs": [ @@ -150,7 +258,7 @@ } }, "performance_fairness": { - "weight": 0.3333, + "weight": 0.25, "metrics": { "accuracy_variation": { "inputs": [ @@ -168,7 +276,7 @@ } }, "class_distribution": { - "weight": 0.3333, + "weight": 0.25, "metrics": { "class_imbalance": { "inputs": [ @@ -179,11 +287,87 @@ ], "operation": "get_value", "type": "true_score", - "direction": "desc", "description": "Variation of the sample size per class.", "weight": 1 } } + }, + "outcome_fairness": { + "weight": 0.25, + "metrics": { + "underfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/underfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Held-out performance proxy used as an outcome-level fairness signal.", + "weight": 0.1 + }, + "inverse_overfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Generalization quality proxy transformed so higher is better.", + "weight": 0.15 + }, + "inverse_well_calibration_error": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Calibration quality of the predictive outputs represented as a trust-oriented score.", + "weight": 0.2 + }, + "inverse_generalized_entropy_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_theil_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Theil-based outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_coefficient_of_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Coefficient-of-variation-based outcome fairness score.", + "weight": 0.15 + } + } } }, "explainability": { @@ -236,17 +420,55 @@ "post_hoc_methods": { "weight": 0.6, "metrics": { - "feature_importance": { + "clipped_feature_importance": { "inputs": [ { "source": "factsheet", - "field_path": "performance/test_feature_importance_cv" + "field_path": "performance/clipped_test_feature_importance_cv" } ], "operation": "get_value", "type": "true_score", "description": "Variation of feature importance scores of all the features.", - "weight": 0.5 + "weight": 0.2 + }, + "alpha_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of features needed to explain most of the attribution mass; lower values indicate sparser and more focused explanations.", + "weight": 0.2 + }, + "spread_ratio": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Normalized entropy of the attribution distribution; lower values indicate explanations concentrated on fewer features.", + "weight": 0.2 + }, + "spread_divergence": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Jensen-Shannon divergence between the attribution distribution and a uniform distribution; higher values indicate more selective explanations.", + "weight": 0.2 }, "visualization": { "inputs": [ @@ -258,14 +480,14 @@ "operation": "get_value", "type": "true_score", "description": "The use of graphical capabilities to show the explainability.", - "weight": 0.5 + "weight": 0.2 } } } }, "accountability": { "factsheet_completeness": { - "weight": 1, + "weight": 0.8, "metrics": { "project_specs": { "inputs": [ @@ -300,6 +522,14 @@ { "source": "factsheet", "field_path": "participants/client_selector" + }, + { + "source": "factsheet", + "field_path": "participants/avg_dataset_size" + }, + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" } ], "operation": "check_properties", @@ -325,13 +555,13 @@ "operation": "check_properties", "type": "property_check", "description": "Meta data about the data.", - "weight": 0.2 + "weight": 0.1 }, "configuration": { "inputs": [ { "source": "factsheet", - "field_path": "configuration/optimization_algorithm" + "field_path": "configuration/aggregation_algorithm" }, { "source": "factsheet", @@ -341,6 +571,14 @@ "source": "factsheet", "field_path": "configuration/personalization" }, + { + "source": "factsheet", + "field_path": "configuration/visualization" + }, + { + "source": "factsheet", + "field_path": "configuration/monitoring" + }, { "source": "factsheet", "field_path": "configuration/differential_privacy" @@ -369,7 +607,7 @@ "operation": "check_properties", "type": "property_check", "description": "FL model configurations.", - "weight": 0.2 + "weight": 0.1 }, "performance": { "inputs": [ @@ -383,17 +621,29 @@ }, { "source": "factsheet", - "field_path": "performance/test_feature_importance_cv" + "field_path": "performance/test_macro_f1" }, { "source": "factsheet", - "field_path": "performance/test_clever" + "field_path": "performance/clipped_test_feature_importance_cv" + }, + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + }, + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" } ], "operation": "check_properties", "type": "property_check", "description": "Performance evaluation results.", - "weight": 0.2 + "weight": 0.1 }, "fairness": { "inputs": [ @@ -408,6 +658,30 @@ { "source": "factsheet", "field_path": "fairness/class_imbalance" + }, + { + "source": "factsheet", + "field_path": "fairness/underfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" } ], "operation": "check_properties", @@ -425,6 +699,14 @@ "source": "factsheet", "field_path": "system/avg_model_size" }, + { + "source": "factsheet", + "field_path": "system/total_upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/total_download_bytes" + }, { "source": "factsheet", "field_path": "system/avg_upload_bytes" @@ -432,6 +714,102 @@ { "source": "factsheet", "field_path": "system/avg_download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/dropout_rate" + }, + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Privacy metrics and risk estimates documented in the factsheet.", + "weight": 0.1 + }, + "privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/privacy_risk" + }, + { + "source": "factsheet", + "field_path": "privacy/epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Explainability metrics documented in the factsheet.", + "weight": 0.1 + }, + "explainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Sustainability and emissions metrics documented in the factsheet.", + "weight": 0.1 + }, + "sustainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_server" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_carbon_intensity_clients" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_clients" + }, + { + "source": "factsheet", + "field_path": "sustainability/avg_power_performance_server" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_training" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_aggregation" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_uplink" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_downlink" } ], "operation": "check_properties", @@ -440,11 +818,28 @@ "weight": 0.1 } } + }, + "monitoring": { + "weight": 0.2, + "metrics": { + "logs_available": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/monitoring" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of logs to show all the nodes.", + "weight": 1 + } + } } }, "architectural_soundness": { "client_management": { - "weight": 0.5, + "weight": 0.25, "metrics": { "client_selector": { "inputs": [ @@ -453,9 +848,13 @@ "field_path": "participants/client_selector" } ], - "operation": "check_properties", - "type": "property_check", - "description": "The use of a client selector.", + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Reputation Based": 1.0, + "Full Participation": 0.5 + }, + "description": "Mapping of client selection strategies to architectural soundness. Reputation-based selection is scored higher than full participation because it introduces an explicit selection mechanism.", "weight": 1 } } @@ -482,6 +881,29 @@ "weight": 1 } } + }, + "federation_management": { + "weight": 0.25, + "metrics": { + "topology_type": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/preprocessing" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Fully": 1.0, + "Star": 0.8, + "Ring": 0.6, + "Random": 0.2 + }, + "description": "Mapping of network topology types to architectural soundness, assuming fully connected topologies provide the strongest structural connectivity, followed by star, ring, and random topologies.", + "weight": 1 + } + } } }, "sustainability": { @@ -554,6 +976,19 @@ "federation_complexity": { "weight": 0.25, "metrics": { + "communication_efficiency": { + "inputs": [ + { "source": "factsheet", "field_path": "system/total_upload_bytes" }, + { "source": "factsheet", "field_path": "system/total_download_bytes" }, + { "source": "factsheet", "field_path": "performance/test_acc_avg" } + ], + "operation": "comm_efficiency", + "type": "ranges", + "direction": "desc", + "ranges":[0.1, 10e2, 10e3,10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "Communication cost per unit of final test accuracy; lower values indicate more efficient federation communication.", + "weight": 0.3 + }, "number_of_training_rounds": { "inputs": [ { @@ -566,7 +1001,7 @@ "direction": "desc", "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], "description": "The total number of training rounds", - "weight": 0.16666666 + "weight": 0.15 }, "avg_model_size": { "inputs": [ @@ -580,7 +1015,7 @@ "direction": "desc", "ranges":[10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], "description": "The size of the model", - "weight": 0.16666666 + "weight": 0.15 }, "client_selection_rate": { "inputs": [ @@ -596,7 +1031,7 @@ 0.1,1 ], "description": "The selection rate of clients for each training round", - "weight": 0.16666666 + "weight": 0.1 }, "number_of_clients": { "inputs": [ @@ -610,7 +1045,7 @@ "direction": "desc", "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], "description": "The number of clients in the federation.", - "weight": 0.16666666 + "weight": 0.1 }, "local_training_rounds": { "inputs": [ @@ -624,7 +1059,7 @@ "direction": "desc", "scale": [1, 100], "description": "The number of local training rounds.", - "weight": 0.16666666 + "weight": 0.1 }, "avg_dataset_size": { "inputs": [ @@ -638,7 +1073,7 @@ "direction": "desc", "ranges": [10e1, 10e2, 10e3, 10e4, 10e5], "description": "The average number of training samples", - "weight": 0.16666666 + "weight": 0.1 } } } diff --git a/nebula/addons/trustworthiness/configs/eval_metrics_dfl.json b/nebula/addons/trustworthiness/configs/eval_metrics_dfl.json new file mode 100755 index 000000000..b43295c1d --- /dev/null +++ b/nebula/addons/trustworthiness/configs/eval_metrics_dfl.json @@ -0,0 +1,1034 @@ +{ + "robustness": { + "resilience_to_attacks": { + "weight": 0.4, + "metrics": { + "certified_robustness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Cross Lipschitz Extreme Value for network Robustness: attack-agnostic estimator of the lower bound βL", + "weight": 0.2 + }, + "inverse_loss_sensitivity": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse loss sensitivity score; higher values indicate lower sensitivity of the loss to input perturbations.", + "weight": 0.2 + }, + "adversarial_accuracy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Adversarial accuracy; higher values indicate better predictive performance under adversarial perturbations.", + "weight": 0.2 + }, + "empirical_robustness_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical robustness score; higher values indicate stronger resistance to adversarial perturbations.", + "weight": 0.15 + }, + "confidence_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Confidence score; higher values indicate more stable predictive confidence.", + "weight": 0.1 + }, + "inverse_attack_success_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse attack success rate; higher values indicate a lower fraction of successful adversarial attacks.", + "weight": 0.15 + } + } + }, + "algorithm_robustness": { + "weight": 0.4, + "metrics": { + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_acc" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Local clean test accuracy of the final model.", + "weight": 0.4 + }, + "macro_f1": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Macro F1 score of the final local model on test data.", + "weight": 0.4 + }, + "personalization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/personalization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of personalized FL algorithm.", + "weight": 0.1 + }, + "reputation_enabled": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of an active reputation-based defense mechanism.", + "weight": 0.1 + } + } + }, + "client_reliability": { + "weight": 0.2, + "metrics": { + "scale": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the model.", + "weight": 0.1 + }, + "average_neighbor_reputation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average reputation score of the neighbors associated with the node.", + "weight": 0.4 + }, + "inverse_dropout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/dropout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of expected neighbor updates that were not received across rounds.", + "weight": 0.25 + }, + "inverse_timeout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of aggregation rounds that finished with missing expected neighbor updates.", + "weight": 0.25 + } + } + } + }, + "privacy": { + "technique": { + "weight": 0.2, + "metrics": { + "differential_privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of differential privacy.", + "weight": 1 + } + } + }, + "uncertainty": { + "weight": 0.6, + "metrics": { + "entropy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/entropy_local" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The measure of uncertainty in identifying a client.", + "weight": 1 + } + } + }, + "indistinguishability": { + "weight": 0.2, + "metrics": { + "global_privacy_risk": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "participants/neighbor_num" + } + ], + "operation": "get_global_privacy_risk_dfl", + "type": "true_score", + "direction": "desc", + "description": "A worst-case approximation of the maximal risk for distinguishing two clients.", + "weight": 0.2 + }, + "inverse_epsilon_star": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical privacy leakage estimated from the separability of train and test loss distributions.", + "weight": 0.4 + }, + "mia_auc_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Trust-oriented score derived from the ROC-AUC of a loss-based membership inference attack.", + "weight": 0.4 + } + } + } + }, + "fairness": { + "class_distribution": { + "weight": 0.5, + "metrics": { + "selection_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Local variation in neighbor participation across rounds, transformed so higher values mean more stable participation.", + "weight": 0.5 + }, + "class_imbalance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of the sample size per class.", + "weight": 0.5 + } + } + }, + "outcome_fairness": { + "weight": 0.5, + "metrics": { + "underfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/underfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Held-out performance proxy used as an outcome-level fairness signal.", + "weight": 0.1 + }, + "inverse_overfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Generalization quality proxy transformed so higher is better.", + "weight": 0.15 + }, + "inverse_well_calibration_error": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Calibration quality of the predictive outputs represented as a trust-oriented score.", + "weight": 0.2 + }, + "inverse_generalized_entropy_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_theil_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Theil-based outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_coefficient_of_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Coefficient-of-variation-based outcome fairness score.", + "weight": 0.15 + } + } + } + }, + "explainability": { + "interpretability": { + "weight": 0.4, + "metrics": { + "algorithmic_transparency": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/training_model" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "RandomForestClassifier": 4, + "KNeighborsClassifier": 3, + "SVC": 2, + "GaussianProcessClassifier": 3, + "DecisionTreeClassifier": 5, + "MLP": 1, + "AdaBoostClassifier": 3, + "GaussianNB": 3.5, + "QuadraticDiscriminantAnalysis": 3, + "LogisticRegression": 4, + "LinearRegression": 3.5, + "Sequential": 1, + "CNN": 1 + }, + "description": "Mapping of Learning techniques to the level of explainability based on on literature research and qualitative analysis of each learning technique.", + "weight": 0.6 + }, + "model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5, 10e6, 10e7, 10e8], + "description": "Ranges of how to map model size to a score from 1-5.", + "weight": 0.4 + } + } + }, + "post_hoc_methods": { + "weight": 0.6, + "metrics": { + "clipped_feature_importance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of feature importance scores of all the features.", + "weight": 0.2 + }, + "alpha_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of features needed to explain most of the attribution mass; lower values indicate sparser and more focused explanations.", + "weight": 0.2 + }, + "spread_ratio": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Normalized entropy of the attribution distribution; lower values indicate explanations concentrated on fewer features.", + "weight": 0.2 + }, + "spread_divergence": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Jensen-Shannon divergence between the attribution distribution and a uniform distribution; higher values indicate more selective explanations.", + "weight": 0.2 + }, + "visualization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/visualization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of graphical capabilities to show the explainability.", + "weight": 0.2 + } + } + } + }, + "accountability": { + "factsheet_completeness": { + "weight": 0.8, + "metrics": { + "project_specs": { + "inputs": [ + { + "source": "factsheet", + "field_path": "project/overview" + }, + { + "source": "factsheet", + "field_path": "project/purpose" + }, + { + "source": "factsheet", + "field_path": "project/background" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Specifications of the project.", + "weight": 0.1 + }, + "participants": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + }, + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + }, + { + "source": "factsheet", + "field_path": "participants/client_selector" + }, + { + "source": "factsheet", + "field_path": "participants/local_dataset_size" + }, + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Participants information.", + "weight": 0.1 + }, + "data": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/provenance" + }, + { + "source": "factsheet", + "field_path": "data/preprocessing" + }, + { + "source": "factsheet", + "field_path": "data/entropy_local" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Meta data about the data.", + "weight": 0.1 + }, + "configuration": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + }, + { + "source": "factsheet", + "field_path": "configuration/training_model" + }, + { + "source": "factsheet", + "field_path": "configuration/personalization" + }, + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + }, + { + "source": "factsheet", + "field_path": "configuration/visualization" + }, + { + "source": "factsheet", + "field_path": "configuration/monitoring" + }, + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + }, + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + }, + { + "source": "factsheet", + "field_path": "configuration/learning_rate" + }, + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "FL model configurations.", + "weight": 0.1 + }, + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_loss" + }, + { + "source": "factsheet", + "field_path": "performance/test_acc" + }, + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + }, + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + }, + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + }, + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + }, + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + }, + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Performance evaluation results.", + "weight": 0.1 + }, + "fairness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + }, + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + }, + { + "source": "factsheet", + "field_path": "fairness/underfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Fairness metrics results.", + "weight": 0.1 + }, + "system": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/time_minutes" + }, + { + "source": "factsheet", + "field_path": "system/model_size" + }, + { + "source": "factsheet", + "field_path": "system/upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/dropout_rate" + }, + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "System usage information.", + "weight": 0.1 + }, + "privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/privacy_risk" + }, + { + "source": "factsheet", + "field_path": "privacy/epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Privacy metrics and risk estimates documented in the factsheet.", + "weight": 0.1 + }, + "explainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Explainability metrics documented in the factsheet.", + "weight": 0.1 + }, + "sustainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/carbon_intensity_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_training_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/energy_consumed_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_local" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Sustainability and emissions metrics documented in the factsheet.", + "weight": 0.1 + } + } + }, + "monitoring": { + "weight": 0.2, + "metrics": { + "logs_available": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/monitoring" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of logs to show all the nodes.", + "weight": 1 + } + } + } + }, + "architectural_soundness": { + "client_management": { + "weight": 0.25, + "metrics": { + "client_selector": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_selector" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Reputation Based": 1.0, + "Full Participation": 0.5 + }, + "description": "Mapping of client selection strategies to architectural soundness. Reputation-based selection is scored higher than full participation because it introduces an explicit selection mechanism.", + "weight": 1 + } + } + }, + "optimization": { + "weight": 0.5, + "metrics": { + "algorithm": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + } + ], + "operation": "get_value", + "type": "score_map_value", + "score_map": { + "FedAvg": 0.9509, + "Krum": 0.9535, + "TrimmedMean": 0.9595, + "Median": 0.9461 + }, + "description": "The choice of a suitable aggregation algorithm.", + "weight": 1 + } + } + }, + "federation_management": { + "weight": 0.25, + "metrics": { + "topology_type": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/preprocessing" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Fully": 1.0, + "Star": 0.8, + "Ring": 0.6, + "Random": 0.2 + }, + "description": "Mapping of network topology types to architectural soundness, assuming fully connected topologies provide the strongest structural connectivity, followed by star, ring, and random topologies.", + "weight": 1 + } + } + } + }, + "sustainability": { + "energy_source": { + "weight": 0.5, + "metrics": { + "carbon_intensity_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/carbon_intensity_local" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [20, 795], + "description": "Carbon intensity of energy grid used by clients", + "weight": 1 + } + } + }, + "federation_complexity": { + "weight": 0.5, + "metrics": { + "communication_efficiency": { + "inputs": [ + { "source": "factsheet", "field_path": "system/upload_bytes" }, + { "source": "factsheet", "field_path": "system/download_bytes" }, + { "source": "factsheet", "field_path": "performance/test_acc" } + ], + "operation": "comm_efficiency", + "type": "ranges", + "direction": "desc", + "ranges":[0.1, 10e2, 10e3,10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "Communication cost per unit of local test accuracy; lower values indicate more efficient neighbor communication.", + "weight": 0.3 + }, + "number_of_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The total number of training rounds", + "weight": 0.15 + }, + "avg_model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges":[10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "The size of the model", + "weight": 0.15 + }, + "client_selection_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [ + 0.1,1 + ], + "description": "The selection rate of clients for each training round", + "weight": 0.1 + }, + "number_of_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the federation.", + "weight": 0.1 + }, + "local_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [1, 100], + "description": "The number of local training rounds.", + "weight": 0.1 + }, + "avg_dataset_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/local_dataset_size" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5], + "description": "The average number of training samples", + "weight": 0.1 + } + } + } + } + } diff --git a/nebula/addons/trustworthiness/configs/eval_metrics_dfl_images.json b/nebula/addons/trustworthiness/configs/eval_metrics_dfl_images.json new file mode 100755 index 000000000..b43295c1d --- /dev/null +++ b/nebula/addons/trustworthiness/configs/eval_metrics_dfl_images.json @@ -0,0 +1,1034 @@ +{ + "robustness": { + "resilience_to_attacks": { + "weight": 0.4, + "metrics": { + "certified_robustness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Cross Lipschitz Extreme Value for network Robustness: attack-agnostic estimator of the lower bound βL", + "weight": 0.2 + }, + "inverse_loss_sensitivity": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse loss sensitivity score; higher values indicate lower sensitivity of the loss to input perturbations.", + "weight": 0.2 + }, + "adversarial_accuracy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Adversarial accuracy; higher values indicate better predictive performance under adversarial perturbations.", + "weight": 0.2 + }, + "empirical_robustness_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical robustness score; higher values indicate stronger resistance to adversarial perturbations.", + "weight": 0.15 + }, + "confidence_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Confidence score; higher values indicate more stable predictive confidence.", + "weight": 0.1 + }, + "inverse_attack_success_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse attack success rate; higher values indicate a lower fraction of successful adversarial attacks.", + "weight": 0.15 + } + } + }, + "algorithm_robustness": { + "weight": 0.4, + "metrics": { + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_acc" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Local clean test accuracy of the final model.", + "weight": 0.4 + }, + "macro_f1": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Macro F1 score of the final local model on test data.", + "weight": 0.4 + }, + "personalization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/personalization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of personalized FL algorithm.", + "weight": 0.1 + }, + "reputation_enabled": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of an active reputation-based defense mechanism.", + "weight": 0.1 + } + } + }, + "client_reliability": { + "weight": 0.2, + "metrics": { + "scale": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the model.", + "weight": 0.1 + }, + "average_neighbor_reputation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average reputation score of the neighbors associated with the node.", + "weight": 0.4 + }, + "inverse_dropout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/dropout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of expected neighbor updates that were not received across rounds.", + "weight": 0.25 + }, + "inverse_timeout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of aggregation rounds that finished with missing expected neighbor updates.", + "weight": 0.25 + } + } + } + }, + "privacy": { + "technique": { + "weight": 0.2, + "metrics": { + "differential_privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of differential privacy.", + "weight": 1 + } + } + }, + "uncertainty": { + "weight": 0.6, + "metrics": { + "entropy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/entropy_local" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The measure of uncertainty in identifying a client.", + "weight": 1 + } + } + }, + "indistinguishability": { + "weight": 0.2, + "metrics": { + "global_privacy_risk": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "participants/neighbor_num" + } + ], + "operation": "get_global_privacy_risk_dfl", + "type": "true_score", + "direction": "desc", + "description": "A worst-case approximation of the maximal risk for distinguishing two clients.", + "weight": 0.2 + }, + "inverse_epsilon_star": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical privacy leakage estimated from the separability of train and test loss distributions.", + "weight": 0.4 + }, + "mia_auc_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Trust-oriented score derived from the ROC-AUC of a loss-based membership inference attack.", + "weight": 0.4 + } + } + } + }, + "fairness": { + "class_distribution": { + "weight": 0.5, + "metrics": { + "selection_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Local variation in neighbor participation across rounds, transformed so higher values mean more stable participation.", + "weight": 0.5 + }, + "class_imbalance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of the sample size per class.", + "weight": 0.5 + } + } + }, + "outcome_fairness": { + "weight": 0.5, + "metrics": { + "underfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/underfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Held-out performance proxy used as an outcome-level fairness signal.", + "weight": 0.1 + }, + "inverse_overfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Generalization quality proxy transformed so higher is better.", + "weight": 0.15 + }, + "inverse_well_calibration_error": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Calibration quality of the predictive outputs represented as a trust-oriented score.", + "weight": 0.2 + }, + "inverse_generalized_entropy_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_theil_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Theil-based outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_coefficient_of_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Coefficient-of-variation-based outcome fairness score.", + "weight": 0.15 + } + } + } + }, + "explainability": { + "interpretability": { + "weight": 0.4, + "metrics": { + "algorithmic_transparency": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/training_model" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "RandomForestClassifier": 4, + "KNeighborsClassifier": 3, + "SVC": 2, + "GaussianProcessClassifier": 3, + "DecisionTreeClassifier": 5, + "MLP": 1, + "AdaBoostClassifier": 3, + "GaussianNB": 3.5, + "QuadraticDiscriminantAnalysis": 3, + "LogisticRegression": 4, + "LinearRegression": 3.5, + "Sequential": 1, + "CNN": 1 + }, + "description": "Mapping of Learning techniques to the level of explainability based on on literature research and qualitative analysis of each learning technique.", + "weight": 0.6 + }, + "model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5, 10e6, 10e7, 10e8], + "description": "Ranges of how to map model size to a score from 1-5.", + "weight": 0.4 + } + } + }, + "post_hoc_methods": { + "weight": 0.6, + "metrics": { + "clipped_feature_importance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of feature importance scores of all the features.", + "weight": 0.2 + }, + "alpha_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of features needed to explain most of the attribution mass; lower values indicate sparser and more focused explanations.", + "weight": 0.2 + }, + "spread_ratio": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Normalized entropy of the attribution distribution; lower values indicate explanations concentrated on fewer features.", + "weight": 0.2 + }, + "spread_divergence": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Jensen-Shannon divergence between the attribution distribution and a uniform distribution; higher values indicate more selective explanations.", + "weight": 0.2 + }, + "visualization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/visualization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of graphical capabilities to show the explainability.", + "weight": 0.2 + } + } + } + }, + "accountability": { + "factsheet_completeness": { + "weight": 0.8, + "metrics": { + "project_specs": { + "inputs": [ + { + "source": "factsheet", + "field_path": "project/overview" + }, + { + "source": "factsheet", + "field_path": "project/purpose" + }, + { + "source": "factsheet", + "field_path": "project/background" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Specifications of the project.", + "weight": 0.1 + }, + "participants": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + }, + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + }, + { + "source": "factsheet", + "field_path": "participants/client_selector" + }, + { + "source": "factsheet", + "field_path": "participants/local_dataset_size" + }, + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Participants information.", + "weight": 0.1 + }, + "data": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/provenance" + }, + { + "source": "factsheet", + "field_path": "data/preprocessing" + }, + { + "source": "factsheet", + "field_path": "data/entropy_local" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Meta data about the data.", + "weight": 0.1 + }, + "configuration": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + }, + { + "source": "factsheet", + "field_path": "configuration/training_model" + }, + { + "source": "factsheet", + "field_path": "configuration/personalization" + }, + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + }, + { + "source": "factsheet", + "field_path": "configuration/visualization" + }, + { + "source": "factsheet", + "field_path": "configuration/monitoring" + }, + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + }, + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + }, + { + "source": "factsheet", + "field_path": "configuration/learning_rate" + }, + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "FL model configurations.", + "weight": 0.1 + }, + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_loss" + }, + { + "source": "factsheet", + "field_path": "performance/test_acc" + }, + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + }, + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + }, + { + "source": "factsheet", + "field_path": "performance/test_clever_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_loss_sensitivity" + }, + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + }, + { + "source": "factsheet", + "field_path": "performance/test_empirical_robustness_score" + }, + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Performance evaluation results.", + "weight": 0.1 + }, + "fairness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + }, + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + }, + { + "source": "factsheet", + "field_path": "fairness/underfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Fairness metrics results.", + "weight": 0.1 + }, + "system": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/time_minutes" + }, + { + "source": "factsheet", + "field_path": "system/model_size" + }, + { + "source": "factsheet", + "field_path": "system/upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/dropout_rate" + }, + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "System usage information.", + "weight": 0.1 + }, + "privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/privacy_risk" + }, + { + "source": "factsheet", + "field_path": "privacy/epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Privacy metrics and risk estimates documented in the factsheet.", + "weight": 0.1 + }, + "explainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Explainability metrics documented in the factsheet.", + "weight": 0.1 + }, + "sustainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/carbon_intensity_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_training_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/energy_consumed_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_local" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Sustainability and emissions metrics documented in the factsheet.", + "weight": 0.1 + } + } + }, + "monitoring": { + "weight": 0.2, + "metrics": { + "logs_available": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/monitoring" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of logs to show all the nodes.", + "weight": 1 + } + } + } + }, + "architectural_soundness": { + "client_management": { + "weight": 0.25, + "metrics": { + "client_selector": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_selector" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Reputation Based": 1.0, + "Full Participation": 0.5 + }, + "description": "Mapping of client selection strategies to architectural soundness. Reputation-based selection is scored higher than full participation because it introduces an explicit selection mechanism.", + "weight": 1 + } + } + }, + "optimization": { + "weight": 0.5, + "metrics": { + "algorithm": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + } + ], + "operation": "get_value", + "type": "score_map_value", + "score_map": { + "FedAvg": 0.9509, + "Krum": 0.9535, + "TrimmedMean": 0.9595, + "Median": 0.9461 + }, + "description": "The choice of a suitable aggregation algorithm.", + "weight": 1 + } + } + }, + "federation_management": { + "weight": 0.25, + "metrics": { + "topology_type": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/preprocessing" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Fully": 1.0, + "Star": 0.8, + "Ring": 0.6, + "Random": 0.2 + }, + "description": "Mapping of network topology types to architectural soundness, assuming fully connected topologies provide the strongest structural connectivity, followed by star, ring, and random topologies.", + "weight": 1 + } + } + } + }, + "sustainability": { + "energy_source": { + "weight": 0.5, + "metrics": { + "carbon_intensity_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/carbon_intensity_local" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [20, 795], + "description": "Carbon intensity of energy grid used by clients", + "weight": 1 + } + } + }, + "federation_complexity": { + "weight": 0.5, + "metrics": { + "communication_efficiency": { + "inputs": [ + { "source": "factsheet", "field_path": "system/upload_bytes" }, + { "source": "factsheet", "field_path": "system/download_bytes" }, + { "source": "factsheet", "field_path": "performance/test_acc" } + ], + "operation": "comm_efficiency", + "type": "ranges", + "direction": "desc", + "ranges":[0.1, 10e2, 10e3,10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "Communication cost per unit of local test accuracy; lower values indicate more efficient neighbor communication.", + "weight": 0.3 + }, + "number_of_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The total number of training rounds", + "weight": 0.15 + }, + "avg_model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges":[10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "The size of the model", + "weight": 0.15 + }, + "client_selection_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [ + 0.1,1 + ], + "description": "The selection rate of clients for each training round", + "weight": 0.1 + }, + "number_of_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the federation.", + "weight": 0.1 + }, + "local_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [1, 100], + "description": "The number of local training rounds.", + "weight": 0.1 + }, + "avg_dataset_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/local_dataset_size" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5], + "description": "The average number of training samples", + "weight": 0.1 + } + } + } + } + } diff --git a/nebula/addons/trustworthiness/configs/eval_metrics_dfl_tabular.json b/nebula/addons/trustworthiness/configs/eval_metrics_dfl_tabular.json new file mode 100755 index 000000000..b7770033d --- /dev/null +++ b/nebula/addons/trustworthiness/configs/eval_metrics_dfl_tabular.json @@ -0,0 +1,986 @@ +{ + "robustness": { + "resilience_to_attacks": { + "weight": 0.4, + "metrics": { + "adversarial_accuracy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Adversarial accuracy; higher values indicate better predictive performance under adversarial perturbations.", + "weight": 0.4444444444 + }, + "confidence_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Confidence score; higher values indicate more stable predictive confidence.", + "weight": 0.2222222222 + }, + "inverse_attack_success_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Inverse attack success rate; higher values indicate a lower fraction of successful adversarial attacks.", + "weight": 0.3333333334 + } + } + }, + "algorithm_robustness": { + "weight": 0.4, + "metrics": { + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_acc" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Local clean test accuracy of the final model.", + "weight": 0.4 + }, + "macro_f1": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Macro F1 score of the final local model on test data.", + "weight": 0.4 + }, + "personalization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/personalization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of personalized FL algorithm.", + "weight": 0.1 + }, + "reputation_enabled": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of an active reputation-based defense mechanism.", + "weight": 0.1 + } + } + }, + "client_reliability": { + "weight": 0.2, + "metrics": { + "scale": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the model.", + "weight": 0.1 + }, + "average_neighbor_reputation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Average reputation score of the neighbors associated with the node.", + "weight": 0.4 + }, + "inverse_dropout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/dropout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of expected neighbor updates that were not received across rounds.", + "weight": 0.25 + }, + "inverse_timeout_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of aggregation rounds that finished with missing expected neighbor updates.", + "weight": 0.25 + } + } + } + }, + "privacy": { + "technique": { + "weight": 0.2, + "metrics": { + "differential_privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of differential privacy.", + "weight": 1 + } + } + }, + "uncertainty": { + "weight": 0.6, + "metrics": { + "entropy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/entropy_local" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The measure of uncertainty in identifying a client.", + "weight": 1 + } + } + }, + "indistinguishability": { + "weight": 0.2, + "metrics": { + "global_privacy_risk": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "participants/neighbor_num" + } + ], + "operation": "get_global_privacy_risk_dfl", + "type": "true_score", + "direction": "desc", + "description": "A worst-case approximation of the maximal risk for distinguishing two clients.", + "weight": 0.2 + }, + "inverse_epsilon_star": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Empirical privacy leakage estimated from the separability of train and test loss distributions.", + "weight": 0.4 + }, + "mia_auc_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Trust-oriented score derived from the ROC-AUC of a loss-based membership inference attack.", + "weight": 0.4 + } + } + } + }, + "fairness": { + "class_distribution": { + "weight": 0.5, + "metrics": { + "selection_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Local variation in neighbor participation across rounds, transformed so higher values mean more stable participation.", + "weight": 0.5 + }, + "class_imbalance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of the sample size per class.", + "weight": 0.5 + } + } + }, + "outcome_fairness": { + "weight": 0.5, + "metrics": { + "underfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/underfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Held-out performance proxy used as an outcome-level fairness signal.", + "weight": 0.1 + }, + "inverse_overfitting": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Generalization quality proxy transformed so higher is better.", + "weight": 0.15 + }, + "inverse_well_calibration_error": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Calibration quality of the predictive outputs represented as a trust-oriented score.", + "weight": 0.2 + }, + "inverse_generalized_entropy_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_theil_index": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Theil-based outcome inequality score transformed so higher values indicate better fairness.", + "weight": 0.2 + }, + "inverse_coefficient_of_variation": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Coefficient-of-variation-based outcome fairness score.", + "weight": 0.15 + } + } + } + }, + "explainability": { + "interpretability": { + "weight": 0.4, + "metrics": { + "algorithmic_transparency": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/training_model" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "RandomForestClassifier": 4, + "KNeighborsClassifier": 3, + "SVC": 2, + "GaussianProcessClassifier": 3, + "DecisionTreeClassifier": 5, + "MLP": 1, + "AdaBoostClassifier": 3, + "GaussianNB": 3.5, + "QuadraticDiscriminantAnalysis": 3, + "LogisticRegression": 4, + "LinearRegression": 3.5, + "Sequential": 1, + "CNN": 1 + }, + "description": "Mapping of Learning techniques to the level of explainability based on on literature research and qualitative analysis of each learning technique.", + "weight": 0.6 + }, + "model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5, 10e6, 10e7, 10e8], + "description": "Ranges of how to map model size to a score from 1-5.", + "weight": 0.4 + } + } + }, + "post_hoc_methods": { + "weight": 0.6, + "metrics": { + "clipped_feature_importance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Variation of feature importance scores of all the features.", + "weight": 0.2 + }, + "alpha_score": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Fraction of features needed to explain most of the attribution mass; lower values indicate sparser and more focused explanations.", + "weight": 0.2 + }, + "spread_ratio": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + } + ], + "operation": "get_value", + "type": "true_score", + "direction": "desc", + "description": "Normalized entropy of the attribution distribution; lower values indicate explanations concentrated on fewer features.", + "weight": 0.2 + }, + "spread_divergence": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "Jensen-Shannon divergence between the attribution distribution and a uniform distribution; higher values indicate more selective explanations.", + "weight": 0.2 + }, + "visualization": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/visualization" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of graphical capabilities to show the explainability.", + "weight": 0.2 + } + } + } + }, + "accountability": { + "factsheet_completeness": { + "weight": 0.8, + "metrics": { + "project_specs": { + "inputs": [ + { + "source": "factsheet", + "field_path": "project/overview" + }, + { + "source": "factsheet", + "field_path": "project/purpose" + }, + { + "source": "factsheet", + "field_path": "project/background" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Specifications of the project.", + "weight": 0.1 + }, + "participants": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + }, + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + }, + { + "source": "factsheet", + "field_path": "participants/client_selector" + }, + { + "source": "factsheet", + "field_path": "participants/local_dataset_size" + }, + { + "source": "factsheet", + "field_path": "participants/avg_neighbor_reputation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Participants information.", + "weight": 0.1 + }, + "data": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/provenance" + }, + { + "source": "factsheet", + "field_path": "data/preprocessing" + }, + { + "source": "factsheet", + "field_path": "data/entropy_local" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Meta data about the data.", + "weight": 0.1 + }, + "configuration": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + }, + { + "source": "factsheet", + "field_path": "configuration/training_model" + }, + { + "source": "factsheet", + "field_path": "configuration/personalization" + }, + { + "source": "factsheet", + "field_path": "configuration/reputation_enabled" + }, + { + "source": "factsheet", + "field_path": "configuration/visualization" + }, + { + "source": "factsheet", + "field_path": "configuration/monitoring" + }, + { + "source": "factsheet", + "field_path": "configuration/differential_privacy" + }, + { + "source": "factsheet", + "field_path": "configuration/dp_epsilon" + }, + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + }, + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + }, + { + "source": "factsheet", + "field_path": "configuration/learning_rate" + }, + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "FL model configurations.", + "weight": 0.1 + }, + "performance": { + "inputs": [ + { + "source": "factsheet", + "field_path": "performance/test_loss" + }, + { + "source": "factsheet", + "field_path": "performance/test_acc" + }, + { + "source": "factsheet", + "field_path": "performance/test_macro_f1" + }, + { + "source": "factsheet", + "field_path": "performance/clipped_test_feature_importance_cv" + }, + { + "source": "factsheet", + "field_path": "performance/test_adv_accuracy" + }, + { + "source": "factsheet", + "field_path": "performance/test_confidence_score" + }, + { + "source": "factsheet", + "field_path": "performance/inverse_test_attack_success_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Performance evaluation results.", + "weight": 0.1 + }, + "fairness": { + "inputs": [ + { + "source": "factsheet", + "field_path": "fairness/class_imbalance" + }, + { + "source": "factsheet", + "field_path": "fairness/selection_cv" + }, + { + "source": "factsheet", + "field_path": "fairness/underfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_overfitting" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_well_calibration_error" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_generalized_entropy_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_theil_index" + }, + { + "source": "factsheet", + "field_path": "fairness/inverse_coefficient_of_variation" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Fairness metrics results.", + "weight": 0.1 + }, + "system": { + "inputs": [ + { + "source": "factsheet", + "field_path": "system/time_minutes" + }, + { + "source": "factsheet", + "field_path": "system/model_size" + }, + { + "source": "factsheet", + "field_path": "system/upload_bytes" + }, + { + "source": "factsheet", + "field_path": "system/download_bytes" + }, + { + "source": "factsheet", + "field_path": "system/dropout_rate" + }, + { + "source": "factsheet", + "field_path": "system/timeout_rate" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "System usage information.", + "weight": 0.1 + }, + "privacy": { + "inputs": [ + { + "source": "factsheet", + "field_path": "privacy/privacy_risk" + }, + { + "source": "factsheet", + "field_path": "privacy/epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/inverse_epsilon_star" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc" + }, + { + "source": "factsheet", + "field_path": "privacy/mia_auc_score" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Privacy metrics and risk estimates documented in the factsheet.", + "weight": 0.1 + }, + "explainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "explainability/alpha_score" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_ratio" + }, + { + "source": "factsheet", + "field_path": "explainability/spread_divergence" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Explainability metrics documented in the factsheet.", + "weight": 0.1 + }, + "sustainability": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/carbon_intensity_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_training_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/energy_consumed_local" + }, + { + "source": "factsheet", + "field_path": "sustainability/emissions_communication_local" + } + ], + "operation": "check_properties", + "type": "property_check", + "description": "Sustainability and emissions metrics documented in the factsheet.", + "weight": 0.1 + } + } + }, + "monitoring": { + "weight": 0.2, + "metrics": { + "logs_available": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/monitoring" + } + ], + "operation": "get_value", + "type": "true_score", + "description": "The use of logs to show all the nodes.", + "weight": 1 + } + } + } + }, + "architectural_soundness": { + "client_management": { + "weight": 0.25, + "metrics": { + "client_selector": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_selector" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Reputation Based": 1.0, + "Full Participation": 0.5 + }, + "description": "Mapping of client selection strategies to architectural soundness. Reputation-based selection is scored higher than full participation because it introduces an explicit selection mechanism.", + "weight": 1 + } + } + }, + "optimization": { + "weight": 0.5, + "metrics": { + "algorithm": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/aggregation_algorithm" + } + ], + "operation": "get_value", + "type": "score_map_value", + "score_map": { + "FedAvg": 0.9509, + "Krum": 0.9535, + "TrimmedMean": 0.9595, + "Median": 0.9461 + }, + "description": "The choice of a suitable aggregation algorithm.", + "weight": 1 + } + } + }, + "federation_management": { + "weight": 0.25, + "metrics": { + "topology_type": { + "inputs": [ + { + "source": "factsheet", + "field_path": "data/preprocessing" + } + ], + "operation": "get_value", + "type": "score_mapping", + "score_map": { + "Fully": 1.0, + "Star": 0.8, + "Ring": 0.6, + "Random": 0.2 + }, + "description": "Mapping of network topology types to architectural soundness, assuming fully connected topologies provide the strongest structural connectivity, followed by star, ring, and random topologies.", + "weight": 1 + } + } + } + }, + "sustainability": { + "energy_source": { + "weight": 0.5, + "metrics": { + "carbon_intensity_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "sustainability/carbon_intensity_local" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [20, 795], + "description": "Carbon intensity of energy grid used by clients", + "weight": 1 + } + } + }, + "federation_complexity": { + "weight": 0.5, + "metrics": { + "communication_efficiency": { + "inputs": [ + { "source": "factsheet", "field_path": "system/upload_bytes" }, + { "source": "factsheet", "field_path": "system/download_bytes" }, + { "source": "factsheet", "field_path": "performance/test_acc" } + ], + "operation": "comm_efficiency", + "type": "ranges", + "direction": "desc", + "ranges":[0.1, 10e2, 10e3,10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "Communication cost per unit of local test accuracy; lower values indicate more efficient neighbor communication.", + "weight": 0.3 + }, + "number_of_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/total_round_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The total number of training rounds", + "weight": 0.15 + }, + "avg_model_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/trainable_param_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges":[10e4, 10e5, 10e6,10e7,10e8,10e9,10e10,10e11], + "description": "The size of the model", + "weight": 0.15 + }, + "client_selection_rate": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/sample_client_rate" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "asc", + "scale": [ + 0.1,1 + ], + "description": "The selection rate of clients for each training round", + "weight": 0.1 + }, + "number_of_clients": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/client_num" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], + "description": "The number of clients in the federation.", + "weight": 0.1 + }, + "local_training_rounds": { + "inputs": [ + { + "source": "factsheet", + "field_path": "configuration/local_update_steps" + } + ], + "operation": "get_value", + "type": "scaled_score", + "direction": "desc", + "scale": [1, 100], + "description": "The number of local training rounds.", + "weight": 0.1 + }, + "avg_dataset_size": { + "inputs": [ + { + "source": "factsheet", + "field_path": "participants/local_dataset_size" + } + ], + "operation": "get_value", + "type": "ranges", + "direction": "desc", + "ranges": [10e1, 10e2, 10e3, 10e4, 10e5], + "description": "The average number of training samples", + "weight": 0.1 + } + } + } + } + } diff --git a/nebula/addons/trustworthiness/configs/factsheet_template_cfl.json b/nebula/addons/trustworthiness/configs/factsheet_template_cfl.json new file mode 100755 index 000000000..61a465fa0 --- /dev/null +++ b/nebula/addons/trustworthiness/configs/factsheet_template_cfl.json @@ -0,0 +1,88 @@ +{ + "project": { + "overview": "", + "purpose": "", + "background": "" + }, + "data": { + "provenance": "", + "preprocessing": "", + "avg_entropy": "" + }, + "participants": { + "client_num": "", + "sample_client_rate": "", + "client_selector": "", + "avg_neighbor_reputation": "", + "avg_dataset_size": "" + }, + "configuration": { + "aggregation_algorithm": "", + "training_model": "", + "reputation_enabled": "", + "personalization": "", + "visualization": "", + "monitoring": "", + "differential_privacy": "", + "dp_epsilon": "", + "trainable_param_num": "", + "total_round_num": "", + "learning_rate": "", + "local_update_steps": "" + }, + "privacy": { + "privacy_risk": "", + "epsilon_star": "", + "inverse_epsilon_star": "", + "mia_auc": "", + "mia_auc_score": "" + }, + "explainability": { + "alpha_score": "", + "spread_ratio": "", + "spread_divergence": "" + }, + "performance": { + "test_loss_avg": "", + "test_acc_avg": "", + "test_macro_f1": "", + "clipped_test_feature_importance_cv": "", + "test_clever_score": "", + "inverse_test_loss_sensitivity": "", + "test_adv_accuracy": "", + "test_empirical_robustness_score": "", + "test_confidence_score": "", + "inverse_test_attack_success_rate": "" + }, + "fairness": { + "test_acc_cv": "", + "selection_cv": "", + "class_imbalance": "", + "underfitting": "", + "inverse_overfitting": "", + "inverse_well_calibration_error": "", + "inverse_generalized_entropy_index": "", + "inverse_theil_index": "", + "inverse_coefficient_of_variation": "" + }, + "system": { + "avg_time_minutes": "", + "avg_model_size": "", + "total_upload_bytes": "", + "total_download_bytes":"", + "avg_upload_bytes": "", + "avg_download_bytes": "", + "dropout_rate": "", + "timeout_rate": "" + }, + "sustainability": { + "avg_carbon_intensity_server": "", + "avg_carbon_intensity_clients": "", + "avg_power_performance_clients": "", + "avg_power_performance_server": "", + "emissions_training": "", + "emissions_aggregation": "", + "emissions_communication_uplink": "", + "emissions_communication_downlink": "" + } +} diff --git a/nebula/addons/trustworthiness/configs/factsheet_template_cfl_images.json b/nebula/addons/trustworthiness/configs/factsheet_template_cfl_images.json new file mode 100755 index 000000000..61a465fa0 --- /dev/null +++ b/nebula/addons/trustworthiness/configs/factsheet_template_cfl_images.json @@ -0,0 +1,88 @@ +{ + "project": { + "overview": "", + "purpose": "", + "background": "" + }, + "data": { + "provenance": "", + "preprocessing": "", + "avg_entropy": "" + }, + "participants": { + "client_num": "", + "sample_client_rate": "", + "client_selector": "", + "avg_neighbor_reputation": "", + "avg_dataset_size": "" + }, + "configuration": { + "aggregation_algorithm": "", + "training_model": "", + "reputation_enabled": "", + "personalization": "", + "visualization": "", + "monitoring": "", + "differential_privacy": "", + "dp_epsilon": "", + "trainable_param_num": "", + "total_round_num": "", + "learning_rate": "", + "local_update_steps": "" + }, + "privacy": { + "privacy_risk": "", + "epsilon_star": "", + "inverse_epsilon_star": "", + "mia_auc": "", + "mia_auc_score": "" + }, + "explainability": { + "alpha_score": "", + "spread_ratio": "", + "spread_divergence": "" + }, + "performance": { + "test_loss_avg": "", + "test_acc_avg": "", + "test_macro_f1": "", + "clipped_test_feature_importance_cv": "", + "test_clever_score": "", + "inverse_test_loss_sensitivity": "", + "test_adv_accuracy": "", + "test_empirical_robustness_score": "", + "test_confidence_score": "", + "inverse_test_attack_success_rate": "" + }, + "fairness": { + "test_acc_cv": "", + "selection_cv": "", + "class_imbalance": "", + "underfitting": "", + "inverse_overfitting": "", + "inverse_well_calibration_error": "", + "inverse_generalized_entropy_index": "", + "inverse_theil_index": "", + "inverse_coefficient_of_variation": "" + }, + "system": { + "avg_time_minutes": "", + "avg_model_size": "", + "total_upload_bytes": "", + "total_download_bytes":"", + "avg_upload_bytes": "", + "avg_download_bytes": "", + "dropout_rate": "", + "timeout_rate": "" + }, + "sustainability": { + "avg_carbon_intensity_server": "", + "avg_carbon_intensity_clients": "", + "avg_power_performance_clients": "", + "avg_power_performance_server": "", + "emissions_training": "", + "emissions_aggregation": "", + "emissions_communication_uplink": "", + "emissions_communication_downlink": "" + } +} diff --git a/nebula/addons/trustworthiness/configs/factsheet_template.json b/nebula/addons/trustworthiness/configs/factsheet_template_cfl_tabular.json similarity index 59% rename from nebula/addons/trustworthiness/configs/factsheet_template.json rename to nebula/addons/trustworthiness/configs/factsheet_template_cfl_tabular.json index eeeaa7f67..75b539f1b 100755 --- a/nebula/addons/trustworthiness/configs/factsheet_template.json +++ b/nebula/addons/trustworthiness/configs/factsheet_template_cfl_tabular.json @@ -13,13 +13,16 @@ "client_num": "", "sample_client_rate": "", "client_selector": "", + "avg_neighbor_reputation": "", "avg_dataset_size": "" }, "configuration": { "aggregation_algorithm": "", "training_model": "", + "reputation_enabled": "", "personalization": "", "visualization": "", + "monitoring": "", "differential_privacy": "", "dp_epsilon": "", "trainable_param_num": "", @@ -27,16 +30,37 @@ "learning_rate": "", "local_update_steps": "" }, + "privacy": { + "privacy_risk": "", + "epsilon_star": "", + "inverse_epsilon_star": "", + "mia_auc": "", + "mia_auc_score": "" + }, + "explainability": { + "alpha_score": "", + "spread_ratio": "", + "spread_divergence": "" + }, "performance": { "test_loss_avg": "", "test_acc_avg": "", - "test_feature_importance_cv": "", - "test_clever": "" + "test_macro_f1": "", + "clipped_test_feature_importance_cv": "", + "test_adv_accuracy": "", + "test_confidence_score": "", + "inverse_test_attack_success_rate": "" }, "fairness": { "test_acc_cv": "", "selection_cv": "", - "class_imbalance": "" + "class_imbalance": "", + "underfitting": "", + "inverse_overfitting": "", + "inverse_well_calibration_error": "", + "inverse_generalized_entropy_index": "", + "inverse_theil_index": "", + "inverse_coefficient_of_variation": "" }, "system": { "avg_time_minutes": "", @@ -44,7 +68,9 @@ "total_upload_bytes": "", "total_download_bytes":"", "avg_upload_bytes": "", - "avg_download_bytes": "" + "avg_download_bytes": "", + "dropout_rate": "", + "timeout_rate": "" }, "sustainability": { "avg_carbon_intensity_server": "", diff --git a/nebula/addons/trustworthiness/configs/factsheet_template_dfl.json b/nebula/addons/trustworthiness/configs/factsheet_template_dfl.json new file mode 100755 index 000000000..c5ea5af60 --- /dev/null +++ b/nebula/addons/trustworthiness/configs/factsheet_template_dfl.json @@ -0,0 +1,82 @@ +{ + "project": { + "overview": "", + "purpose": "", + "background": "" + }, + "data": { + "provenance": "", + "preprocessing": "", + "entropy_local": "" + }, + "participants": { + "client_num": "", + "sample_client_rate": "", + "client_selector": "", + "local_dataset_size": "", + "neighbor_num": "", + "avg_neighbor_reputation": "" + }, + "configuration": { + "aggregation_algorithm": "", + "training_model": "", + "personalization": "", + "reputation_enabled": "", + "visualization": "", + "monitoring": "", + "differential_privacy": "", + "dp_epsilon": "", + "trainable_param_num": "", + "total_round_num": "", + "learning_rate": "", + "local_update_steps": "" + }, + "privacy": { + "privacy_risk": "", + "epsilon_star": "", + "inverse_epsilon_star": "", + "mia_auc": "", + "mia_auc_score": "" + }, + "explainability": { + "alpha_score": "", + "spread_ratio": "", + "spread_divergence": "" + }, + "performance": { + "test_loss": "", + "test_acc": "", + "test_macro_f1": "", + "clipped_test_feature_importance_cv": "", + "test_clever_score": "", + "inverse_test_loss_sensitivity": "", + "test_adv_accuracy": "", + "test_empirical_robustness_score": "", + "test_confidence_score": "", + "inverse_test_attack_success_rate": "" + }, + "fairness": { + "selection_cv": "", + "class_imbalance": "", + "underfitting": "", + "inverse_overfitting": "", + "inverse_well_calibration_error": "", + "inverse_generalized_entropy_index": "", + "inverse_theil_index": "", + "inverse_coefficient_of_variation": "" + }, + "system": { + "time_minutes": "", + "model_size": "", + "upload_bytes": "", + "download_bytes":"", + "dropout_rate": "", + "timeout_rate": "" + }, + "sustainability": { + "carbon_intensity_local": "", + "emissions_training_local": "", + "energy_consumed_local": "", + "emissions_communication_local": "" + } +} diff --git a/nebula/addons/trustworthiness/configs/factsheet_template_dfl_images.json b/nebula/addons/trustworthiness/configs/factsheet_template_dfl_images.json new file mode 100755 index 000000000..c5ea5af60 --- /dev/null +++ b/nebula/addons/trustworthiness/configs/factsheet_template_dfl_images.json @@ -0,0 +1,82 @@ +{ + "project": { + "overview": "", + "purpose": "", + "background": "" + }, + "data": { + "provenance": "", + "preprocessing": "", + "entropy_local": "" + }, + "participants": { + "client_num": "", + "sample_client_rate": "", + "client_selector": "", + "local_dataset_size": "", + "neighbor_num": "", + "avg_neighbor_reputation": "" + }, + "configuration": { + "aggregation_algorithm": "", + "training_model": "", + "personalization": "", + "reputation_enabled": "", + "visualization": "", + "monitoring": "", + "differential_privacy": "", + "dp_epsilon": "", + "trainable_param_num": "", + "total_round_num": "", + "learning_rate": "", + "local_update_steps": "" + }, + "privacy": { + "privacy_risk": "", + "epsilon_star": "", + "inverse_epsilon_star": "", + "mia_auc": "", + "mia_auc_score": "" + }, + "explainability": { + "alpha_score": "", + "spread_ratio": "", + "spread_divergence": "" + }, + "performance": { + "test_loss": "", + "test_acc": "", + "test_macro_f1": "", + "clipped_test_feature_importance_cv": "", + "test_clever_score": "", + "inverse_test_loss_sensitivity": "", + "test_adv_accuracy": "", + "test_empirical_robustness_score": "", + "test_confidence_score": "", + "inverse_test_attack_success_rate": "" + }, + "fairness": { + "selection_cv": "", + "class_imbalance": "", + "underfitting": "", + "inverse_overfitting": "", + "inverse_well_calibration_error": "", + "inverse_generalized_entropy_index": "", + "inverse_theil_index": "", + "inverse_coefficient_of_variation": "" + }, + "system": { + "time_minutes": "", + "model_size": "", + "upload_bytes": "", + "download_bytes":"", + "dropout_rate": "", + "timeout_rate": "" + }, + "sustainability": { + "carbon_intensity_local": "", + "emissions_training_local": "", + "energy_consumed_local": "", + "emissions_communication_local": "" + } +} diff --git a/nebula/addons/trustworthiness/configs/factsheet_template_dfl_tabular.json b/nebula/addons/trustworthiness/configs/factsheet_template_dfl_tabular.json new file mode 100755 index 000000000..5e2e841a8 --- /dev/null +++ b/nebula/addons/trustworthiness/configs/factsheet_template_dfl_tabular.json @@ -0,0 +1,79 @@ +{ + "project": { + "overview": "", + "purpose": "", + "background": "" + }, + "data": { + "provenance": "", + "preprocessing": "", + "entropy_local": "" + }, + "participants": { + "client_num": "", + "sample_client_rate": "", + "client_selector": "", + "local_dataset_size": "", + "neighbor_num": "", + "avg_neighbor_reputation": "" + }, + "configuration": { + "aggregation_algorithm": "", + "training_model": "", + "personalization": "", + "reputation_enabled": "", + "visualization": "", + "monitoring": "", + "differential_privacy": "", + "dp_epsilon": "", + "trainable_param_num": "", + "total_round_num": "", + "learning_rate": "", + "local_update_steps": "" + }, + "privacy": { + "privacy_risk": "", + "epsilon_star": "", + "inverse_epsilon_star": "", + "mia_auc": "", + "mia_auc_score": "" + }, + "explainability": { + "alpha_score": "", + "spread_ratio": "", + "spread_divergence": "" + }, + "performance": { + "test_loss": "", + "test_acc": "", + "test_macro_f1": "", + "clipped_test_feature_importance_cv": "", + "test_adv_accuracy": "", + "test_confidence_score": "", + "inverse_test_attack_success_rate": "" + }, + "fairness": { + "selection_cv": "", + "class_imbalance": "", + "underfitting": "", + "inverse_overfitting": "", + "inverse_well_calibration_error": "", + "inverse_generalized_entropy_index": "", + "inverse_theil_index": "", + "inverse_coefficient_of_variation": "" + }, + "system": { + "time_minutes": "", + "model_size": "", + "upload_bytes": "", + "download_bytes":"", + "dropout_rate": "", + "timeout_rate": "" + }, + "sustainability": { + "carbon_intensity_local": "", + "emissions_training_local": "", + "energy_consumed_local": "", + "emissions_communication_local": "" + } +} diff --git a/nebula/addons/trustworthiness/dfl_factsheet.py b/nebula/addons/trustworthiness/dfl_factsheet.py new file mode 100644 index 000000000..8cb8b752c --- /dev/null +++ b/nebula/addons/trustworthiness/dfl_factsheet.py @@ -0,0 +1,186 @@ +import logging +import os +import pandas as pd + +from nebula.addons.trustworthiness.helpers.csv_io import ( + load_data_results_participant, + load_emissions_participant, +) +from nebula.addons.trustworthiness.helpers.data_distribution import ( + get_all_data_entropy, + get_local_class_imbalance_score, + get_local_normalized_entropy, +) +from nebula.addons.trustworthiness.helpers.privacy import ( + get_global_privacy_risk_dfl, +) +from nebula.addons.trustworthiness.helpers.scenario_metrics import ( + get_bytes_model, + get_dp_local, + get_elapsed_time, + get_underfitting_score_local, +) +from nebula.addons.trustworthiness.factsheet_common import ( + get_factsheet_path, + get_factsheet_template_name, + get_trustworthiness_dir, + load_or_create_factsheet, + populate_common_pre_train_sections, + populate_participation, + populate_reliability, + populate_reputation, + set_dp_configuration, + write_factsheet, +) +from nebula.addons.trustworthiness.factsheet_populators import populate_profile_metrics + +logger = logging.getLogger(__name__) + +class DflFactsheet: + def __init__(self): + # Manage participant-specific DFL/SDFL factsheets. + self.factsheet_template_file_nm = "factsheet_template_dfl.json" + + def populate_factsheet_dfl( + self, + scenario_name, + participant_idx, + data, + start_time, + end_time, + model, + train_loader, + test_loader, + reputation_summary=None, + participation_summary=None, + reliability_summary=None, + ): + + # Resolve participant-specific output and data-type-aware template. + self.factsheet_file_nm = f"factsheet_participant_{participant_idx}.json" + factsheet_template_file_nm = get_factsheet_template_name( + data["federation"], + model, + self.factsheet_template_file_nm, + dataset_name=data["dataset"], + ) + + factsheet_file = get_factsheet_path(scenario_name, self.factsheet_file_nm) + + factsheet_file, factsheet = load_or_create_factsheet( + scenario_name, + self.factsheet_file_nm, + factsheet_template_file_nm, + ) + + logging.info("DFL FactSheet: Populating factsheet") + + populate_common_pre_train_sections(factsheet, data, model) + + # DP configuration is stored per participant in decentralized runs. + dp_enabled, dp_epsilon = get_dp_local(scenario_name, participant_idx) + set_dp_configuration(factsheet, dp_enabled, dp_epsilon) + + files_dir = get_trustworthiness_dir(scenario_name) + + # Refresh entropy.json so participant-local entropy can be read consistently. + get_all_data_entropy(scenario_name) + + factsheet["data"]["entropy_local"] = get_local_normalized_entropy(scenario_name, participant_idx) + + # Use the final valid round metrics as participant test performance. + df = load_round_metrics(scenario_name, participant_idx) + acc = df["accuracy"].astype(float).to_numpy() + loss = df["loss"].astype(float).to_numpy() + + final_acc = float(acc[-1]) + final_loss = float(loss[-1]) + + factsheet["performance"]["test_loss"] = float(final_loss) + factsheet["performance"]["test_acc"] = float(final_acc) + + # Load local communication and privacy values reported by the participant. + bytes_sent, bytes_recv, _, _, _, macro_f1, train_accuracy, *_ = load_data_results_participant(scenario_name, participant_idx) + factsheet["performance"]["test_macro_f1"] = macro_f1 + factsheet["performance"]["train_accuracy"] = train_accuracy + + factsheet["system"]["model_size"] = get_bytes_model(model) + + factsheet["system"]["upload_bytes"] = int(bytes_sent) + factsheet["system"]["download_bytes"] = int(bytes_recv) + + populate_reliability(factsheet, reliability_summary) + + factsheet["system"]["time_minutes"] = get_elapsed_time(start_time, end_time) + + # Class imbalance can only be populated after local class-counts exist. + count_class_file = os.path.join(files_dir, f"{participant_idx}_class_count.json") + factsheet["fairness"]["class_imbalance"] = ( + get_local_class_imbalance_score(scenario_name, participant_idx) + if os.path.exists(count_class_file) + else factsheet["fairness"].get("class_imbalance", 0.0) + ) + + populate_participation(factsheet, participation_summary) + + # Local CodeCarbon output feeds participant sustainability fields. + ( + role, + carbon_intensity_local, + emissions_training_local, + workload, + cpu_model, + gpu_model, + cpu_used, + gpu_used, + energy_consumed_local, + sample_size, + ) = load_emissions_participant( + scenario_name, + participant_idx, + ) + + factsheet["sustainability"]["carbon_intensity_local"] = carbon_intensity_local + factsheet["sustainability"]["emissions_training_local"] = emissions_training_local + factsheet["sustainability"]["energy_consumed_local"] = energy_consumed_local + factsheet["participants"]["local_dataset_size"] = sample_size + + populate_reputation(factsheet, reputation_summary, include_neighbor_num=True) + # DFL privacy risk depends on local DP settings and neighbor count. + factsheet["privacy"]["privacy_risk"] = get_global_privacy_risk_dfl( + dp_enabled, + dp_epsilon, + factsheet["participants"]["neighbor_num"], + ) + + # Communication emissions are estimated from local bytes and carbon intensity. + factsheet["sustainability"]["emissions_communication_local"] = ( + (bytes_sent * 2.24e-10 * carbon_intensity_local) + + (bytes_recv * 2.24e-10 * carbon_intensity_local) + ) + + # Populate model/profile metrics after final participant accuracy is known. + factsheet["fairness"]["underfitting"] = get_underfitting_score_local(scenario_name, participant_idx) + populate_profile_metrics( + factsheet, + data["federation"], + model, + train_loader, + test_loader, + factsheet["performance"]["test_acc"], + ) + + write_factsheet(factsheet_file, factsheet) + + +def load_round_metrics(scenario_name, participant_idx): + # Load participant per-round metrics and keep only rows with loss/accuracy. + files_dir = get_trustworthiness_dir(scenario_name) + path = os.path.join(files_dir, f"round_metrics_participant_{participant_idx}.csv") + df = pd.read_csv(path) + + if "round" in df.columns: + df = df.sort_values("round") + + df = df.dropna(subset=["loss", "accuracy"]) + return df diff --git a/nebula/addons/trustworthiness/factsheet.py b/nebula/addons/trustworthiness/factsheet.py deleted file mode 100755 index 3ffce970a..000000000 --- a/nebula/addons/trustworthiness/factsheet.py +++ /dev/null @@ -1,281 +0,0 @@ -import json -import logging -import os -import glob -import shutil -from json import JSONDecodeError -import pickle -import numpy as np -import pandas as pd - -# from nebula.core.models.cifar10.cnn import CIFAR10ModelCNN -from nebula.core.models.mnist.mlp import MNISTModelMLP -from nebula.core.models.mnist.cnn import MNISTModelCNN -from nebula.addons.trustworthiness.calculation import get_elapsed_time, get_bytes_models, get_bytes_sent_recv, get_avg_loss_accuracy, get_cv, get_clever_score, get_feature_importance_cv -from nebula.addons.trustworthiness.utils import count_all_class_samples, read_csv, check_field_filled, get_all_data_entropy -# from nebula.core.models.syscall.mlp import SyscallModelMLP - -dirname = os.path.dirname(__file__) - -class Factsheet: - def __init__(self): - """ - Manager class to populate the FactSheet - """ - self.factsheet_file_nm = "factsheet.json" - self.factsheet_template_file_nm = "factsheet_template.json" - - def populate_factsheet_pre_train(self, data, scenario_name): - """ - Populates the factsheet with values before the training. - - Args: - data (dict): Contains the data from the scenario. - scenario_name (string): The name of the scenario. - """ - - factsheet_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", self.factsheet_file_nm) - - factsheet_template = os.path.join(dirname, "configs", self.factsheet_template_file_nm) - - if not os.path.exists(factsheet_file): - shutil.copyfile(factsheet_template, factsheet_file) - - with open(factsheet_file, "r+") as f: - factsheet = {} - - try: - factsheet = json.load(f) - - if data is not None: - logging.info("FactSheet: Populating factsheet with pre training metrics") - - federation = data["federation"] - n_nodes = int(data["n_nodes"]) - dataset = data["dataset"] - algorithm = data["model"] - aggregation_algorithm = data["agg_algorithm"] - n_rounds = int(data["rounds"]) - attack = data["attack_params"]["attacks"] - if attack != "No Attack": - poisoned_node_percent = int(data["attack_params"]["poisoned_node_percent"]) - poisoned_sample_percent = int(data["attack_params"]["poisoned_sample_percent"]) - poisoned_noise_percent = int(data["attack_params"]["poisoned_noise_percent"]) - else: - poisoned_node_percent = 0 - poisoned_sample_percent = 0 - poisoned_noise_percent = 0 - with_reputation = data["reputation"]["enabled"] - is_dynamic_topology = False # data["is_dynamic_topology"] - is_dynamic_aggregation = False # data["is_dynamic_aggregation"] - target_aggregation = False # data["target_aggregation"] - - if attack != "No Attack" and with_reputation == True and is_dynamic_aggregation == True: - background = f"For the project setup, the most important aspects are the following: The federation architecture is {federation}, involving {n_nodes} clients, the dataset used is {dataset}, the learning algorithm is {algorithm}, the aggregation algorithm is {aggregation_algorithm} and the number of rounds is {n_rounds}. In addition, the type of attack used against the clients is {attack}, where the percentage of attacked nodes is {poisoned_node_percent}, the percentage of attacked samples of each node is {poisoned_sample_percent}, and the percent of poisoned noise is {poisoned_noise_percent}. A reputation-based defence with a dynamic aggregation based on the aggregation algorithm {target_aggregation} is used, and the trustworthiness of the project is desired." - - elif attack != "No Attack" and with_reputation == True and is_dynamic_topology == True: - background = f"For the project setup, the most important aspects are the following: The federation architecture is {federation}, involving {n_nodes} clients, the dataset used is {dataset}, the learning algorithm is {algorithm}, the aggregation algorithm is {aggregation_algorithm} and the number of rounds is {n_rounds}. In addition, the type of attack used against the clients is {attack}, where the percentage of attacked nodes is {poisoned_node_percent}, the percentage of attacked samples of each node is {poisoned_sample_percent}, and the percent of poisoned noise is {poisoned_noise_percent}. A reputation-based defence with a dynamic topology is used, and the trustworthiness of the project is desired." - - elif attack != "No Attack" and with_reputation == False: - background = f"For the project setup, the most important aspects are the following: The federation architecture is {federation}, involving {n_nodes} clients, the dataset used is {dataset}, the learning algorithm is {algorithm}, the aggregation algorithm is {aggregation_algorithm} and the number of rounds is {n_rounds}. In addition, the type of attack used against the clients is {attack}, where the percentage of attacked nodes is {poisoned_node_percent}, the percentage of attacked samples of each node is {poisoned_sample_percent}, and the percent of poisoned noise is {poisoned_noise_percent}. No defence mechanism is used, and the trustworthiness of the project is desired." - - elif attack == "No Attack": - background = f"For the project setup, the most important aspects are the following: The federation architecture is {federation}, involving {n_nodes} clients, the dataset used is {dataset}, the learning algorithm is {algorithm}, the aggregation algorithm is {aggregation_algorithm} and the number of rounds is {n_rounds}. No attacks against clients are used, and the trustworthiness of the project is desired." - - # Set project specifications - factsheet["project"]["overview"] = data["scenario_title"] - factsheet["project"]["purpose"] = data["scenario_description"] - factsheet["project"]["background"] = background - - # Set data specifications - factsheet["data"]["provenance"] = data["dataset"] - factsheet["data"]["preprocessing"] = data["topology"] - - # Set participants - factsheet["participants"]["client_num"] = data["n_nodes"] or "" - factsheet["participants"]["sample_client_rate"] = 1 - factsheet["participants"]["client_selector"] = "" - - # Set configuration - factsheet["configuration"]["aggregation_algorithm"] = data["agg_algorithm"] or "" - factsheet["configuration"]["training_model"] = data["model"] or "" - factsheet["configuration"]["personalization"] = False - factsheet["configuration"]["visualization"] = True - factsheet["configuration"]["total_round_num"] = n_rounds - - if poisoned_noise_percent != 0: - factsheet["configuration"]["differential_privacy"] = True - factsheet["configuration"]["dp_epsilon"] = poisoned_noise_percent - else: - factsheet["configuration"]["differential_privacy"] = False - factsheet["configuration"]["dp_epsilon"] = "" - - if dataset == "MNIST" and algorithm == "MLP": - model = MNISTModelMLP() - elif dataset == "MNIST" and algorithm == "CNN": - model = MNISTModelCNN() - # elif dataset == "Syscall" and algorithm == "MLP": - # model = SyscallModelMLP() - # else: - # model = CIFAR10ModelCNN() - - factsheet["configuration"]["learning_rate"] = model.get_learning_rate() - factsheet["configuration"]["trainable_param_num"] = model.count_parameters() - factsheet["configuration"]["local_update_steps"] = 1 - - f.seek(0) - f.truncate() - json.dump(factsheet, f, indent=4) - - except JSONDecodeError as e: - logging.warning(f"{factsheet_file} is invalid") - logging.error(e) - - def populate_factsheet_post_train(self, scenario_name, start_time, end_time): - """ - Populates the factsheet with values after the training. - - Args: - scenario (object): The scenario object. - """ - factsheet_file = os.path.join(f"{os.environ.get('NEBULA_LOGS_DIR')}{scenario_name}/trustworthiness/{self.factsheet_file_nm}") - - logging.info("FactSheet: Populating factsheet with post training metrics") - - with open(factsheet_file, "r+") as f: - factsheet = {} - try: - factsheet = json.load(f) - - dataset = factsheet["data"]["provenance"] - model = factsheet["configuration"]["training_model"] - - files_dir = f"{os.environ.get('NEBULA_LOGS_DIR')}/{scenario_name}/trustworthiness" - - models_files = glob.glob(os.path.join(files_dir, "*final_model*")) - #dataloaders_files = glob.glob(os.path.join(files_dir, "*train_loader*")) - test_dataloader_file = f"{files_dir}/participant_1_test_loader.pk" - train_model_file = f"{files_dir}/participant_1_train_model.pk" - emissions_file = os.path.join(files_dir, "emissions.csv") - - # # Entropy - # i = 0 - # for file in dataloaders_files: - # with open(file, "rb") as file: - # dataloader = pickle.load(file) - # get_entropy(i, scenario_name, dataloader) - # i += 1 - - get_all_data_entropy(scenario_name) - - with open(f"{files_dir}/entropy.json", "r") as file: - entropy_distribution = json.load(file) - - values = np.array(list(entropy_distribution.values())) - - normalized_values = (values - np.min(values)) / (np.max(values) - np.min(values)) - - avg_entropy = np.mean(normalized_values) - - factsheet["data"]["avg_entropy"] = avg_entropy - - # Set performance data - result_avg_loss_accuracy = get_avg_loss_accuracy(scenario_name) - factsheet["performance"]["test_loss_avg"] = result_avg_loss_accuracy[0] - factsheet["performance"]["test_acc_avg"] = result_avg_loss_accuracy[1] - test_acc_cv = get_cv(std=result_avg_loss_accuracy[2], mean=result_avg_loss_accuracy[1]) - factsheet["fairness"]["test_acc_cv"] = 1 if test_acc_cv > 1 else test_acc_cv - - factsheet["system"]["avg_time_minutes"] = get_elapsed_time(start_time, end_time) - factsheet["system"]["avg_model_size"] = get_bytes_models(models_files) - - result_bytes_sent_recv = get_bytes_sent_recv(scenario_name) - factsheet["system"]["total_upload_bytes"] = result_bytes_sent_recv[0] - factsheet["system"]["total_download_bytes"] = result_bytes_sent_recv[1] - factsheet["system"]["avg_upload_bytes"] = result_bytes_sent_recv[2] - factsheet["system"]["avg_download_bytes"] = result_bytes_sent_recv[3] - - factsheet["fairness"]["selection_cv"] = 1 - - count_all_class_samples(scenario_name) - - with open(f"{files_dir}/count_class.json", "r") as file: - class_distribution = json.load(file) - - class_samples_sizes = [x for x in class_distribution.values()] - class_imbalance = get_cv(list=class_samples_sizes) - factsheet["fairness"]["class_imbalance"] = 1 if class_imbalance > 1 else class_imbalance - - with open(train_model_file, "rb") as file: - lightning_model = pickle.load(file) - - if dataset == "MNIST" and model == "MLP": - model = MNISTModelMLP() - elif dataset == "MNIST" and model == "CNN": - model = MNISTModelCNN() - # elif dataset == "Syscall" and model == "MLP": - # model = SyscallModelMLP() - # else: - # model = CIFAR10ModelCNN() - - model.load_state_dict(lightning_model.state_dict()) - - with open(test_dataloader_file, "rb") as file: - test_dataloader = pickle.load(file) - - test_sample = next(iter(test_dataloader)) - - lr = factsheet["configuration"]["learning_rate"] - value_clever = get_clever_score(model, test_sample, 10, lr) - - factsheet["performance"]["test_clever"] = 1 if value_clever > 1 else value_clever - - feature_importance = get_feature_importance_cv(model, test_sample) - - factsheet["performance"]["test_feature_importance_cv"] = 1 if feature_importance > 1 else feature_importance - - # Set emissions metrics - emissions = None if emissions_file is None else read_csv(emissions_file) - if emissions is not None: - logging.info("FactSheet: Populating emissions") - cpu_spez_df = pd.read_csv(os.path.join(os.path.dirname(__file__), "benchmarks", "CPU_benchmarks_v4.csv"), header=0) - emissions["CPU_model"] = emissions["CPU_model"].astype(str).str.replace(r"\([^)]*\)", "", regex=True) - emissions["CPU_model"] = emissions["CPU_model"].astype(str).str.replace(r" CPU", "", regex=True) - emissions["GPU_model"] = emissions["GPU_model"].astype(str).str.replace(r"[0-9] x ", "", regex=True) - emissions = pd.merge(emissions, cpu_spez_df[["cpuName", "powerPerf"]], left_on="CPU_model", right_on="cpuName", how="left") - gpu_spez_df = pd.read_csv(os.path.join(os.path.dirname(__file__), "benchmarks", "GPU_benchmarks_v7.csv"), header=0) - emissions = pd.merge(emissions, gpu_spez_df[["gpuName", "powerPerformance"]], left_on="GPU_model", right_on="gpuName", how="left") - - emissions.drop("cpuName", axis=1, inplace=True) - emissions.drop("gpuName", axis=1, inplace=True) - emissions["powerPerf"] = emissions["powerPerf"].astype(float) - emissions["powerPerformance"] = emissions["powerPerformance"].astype(float) - client_emissions = emissions.loc[emissions["role"] == "trainer"] - client_avg_carbon_intensity = round(client_emissions["energy_grid"].mean(), 2) - factsheet["sustainability"]["avg_carbon_intensity_clients"] = check_field_filled(factsheet, ["sustainability", "avg_carbon_intensity_clients"], client_avg_carbon_intensity, "") - factsheet["sustainability"]["emissions_training"] = check_field_filled(factsheet, ["sustainability", "emissions_training"], client_emissions["emissions"].sum(), "") - factsheet["participants"]["avg_dataset_size"] = check_field_filled(factsheet, ["participants", "avg_dataset_size"], client_emissions["sample_size"].mean(), "") - GPU_powerperf = (client_emissions.loc[client_emissions["GPU_used"] == True])["powerPerformance"] - CPU_powerperf = (client_emissions.loc[client_emissions["CPU_used"] == True])["powerPerf"] - clients_power_performance = round(pd.concat([GPU_powerperf, CPU_powerperf]).mean(), 2) - factsheet["sustainability"]["avg_power_performance_clients"] = check_field_filled(factsheet, ["sustainability", "avg_power_performance_clients"], clients_power_performance, "") - - server_emissions = emissions.loc[emissions["role"] == "server"] - server_avg_carbon_intensity = round(server_emissions["energy_grid"].mean(), 2) - factsheet["sustainability"]["avg_carbon_intensity_server"] = check_field_filled(factsheet, ["sustainability", "avg_carbon_intensity_server"], server_avg_carbon_intensity, "") - factsheet["sustainability"]["emissions_aggregation"] = check_field_filled(factsheet, ["sustainability", "emissions_aggregation"], server_emissions["emissions"].sum(), "") - GPU_powerperf = (server_emissions.loc[server_emissions["GPU_used"] == True])["powerPerformance"] - CPU_powerperf = (server_emissions.loc[server_emissions["CPU_used"] == True])["powerPerf"] - server_power_performance = round(pd.concat([GPU_powerperf, CPU_powerperf]).mean(), 2) - factsheet["sustainability"]["avg_power_performance_server"] = check_field_filled(factsheet, ["sustainability", "avg_power_performance_server"], server_power_performance, "") - - factsheet["sustainability"]["emissions_communication_uplink"] = check_field_filled(factsheet, ["sustainability", "emissions_communication_uplink"], factsheet["system"]["total_upload_bytes"] * 2.24e-10 * factsheet["sustainability"]["avg_carbon_intensity_clients"], "") - factsheet["sustainability"]["emissions_communication_downlink"] = check_field_filled(factsheet, ["sustainability", "emissions_communication_downlink"], factsheet["system"]["total_download_bytes"] * 2.24e-10 * factsheet["sustainability"]["avg_carbon_intensity_server"], "") - - f.seek(0) - f.truncate() - json.dump(factsheet, f, indent=4) - - except JSONDecodeError as e: - logging.info(f"{factsheet_file} is invalid") - logging.error(e) \ No newline at end of file diff --git a/nebula/addons/trustworthiness/factsheet_common.py b/nebula/addons/trustworthiness/factsheet_common.py new file mode 100644 index 000000000..2b6051e4a --- /dev/null +++ b/nebula/addons/trustworthiness/factsheet_common.py @@ -0,0 +1,265 @@ +import json +import os +import shutil + + +dirname = os.path.dirname(__file__) + +# Shared helpers for trustworthiness factsheet generation. +DATA_TYPE_IMAGES = "images" +DATA_TYPE_TABULAR = "tabular" +DATASET_DATA_TYPES = { + "mnist": DATA_TYPE_IMAGES, + "fashionmnist": DATA_TYPE_IMAGES, + "emnist": DATA_TYPE_IMAGES, + "cifar10": DATA_TYPE_IMAGES, + "cifar100": DATA_TYPE_IMAGES, + "kddcup99": DATA_TYPE_TABULAR, + "adultcensus": DATA_TYPE_TABULAR, + "breastcancer": DATA_TYPE_TABULAR, + "covtype": DATA_TYPE_TABULAR, + "sentiment140": DATA_TYPE_TABULAR, +} + + +def get_dataset_data_type(dataset_name): + # Infer the data type from Nebula's built-in dataset names. + if dataset_name is None: + return "" + + normalized_name = str(dataset_name).strip().lower().replace("_", "").replace("-", "") + return DATASET_DATA_TYPES.get(normalized_name, "") + + +def get_model_data_type(model, dataset_name=None): + # Return the model-declared data type, falling back to the dataset name. + if not hasattr(model, "get_data_type"): + return get_dataset_data_type(dataset_name) + + try: + data_type = model.get_data_type() + except AttributeError: + return get_dataset_data_type(dataset_name) + + if data_type is None: + return get_dataset_data_type(dataset_name) + + data_type = str(data_type).strip() + return data_type or get_dataset_data_type(dataset_name) + + +def get_normalized_model_data_type(model, dataset_name=None): + # Normalize the model data type before matching templates or profiles. + return get_model_data_type(model, dataset_name=dataset_name).lower() + + +def get_factsheet_template_name(federation, model, default_template_name, dataset_name=None): + # Select a data-type-specific template when one exists for the federation. + federation_prefix = "dfl" if str(federation).upper() in {"DFL", "SDFL"} else "cfl" + data_type = get_normalized_model_data_type(model, dataset_name=dataset_name) + + if data_type not in {DATA_TYPE_IMAGES, DATA_TYPE_TABULAR}: + return default_template_name + + template_name = f"factsheet_template_{federation_prefix}_{data_type}.json" + template_path = get_factsheet_template_path(template_name) + + return template_name if os.path.exists(template_path) else default_template_name + + +def get_trustworthiness_dir(scenario_name): + # Return the trustworthiness output directory for a scenario. + return os.path.join(os.environ.get("NEBULA_LOGS_DIR"), scenario_name, "trustworthiness") + + +def get_factsheet_path(scenario_name, factsheet_name): + # Return the path to a factsheet inside the scenario trustworthiness directory. + return os.path.join(get_trustworthiness_dir(scenario_name), factsheet_name) + + +def get_factsheet_template_path(template_name): + # Return the path to a factsheet template bundled with the addon. + return os.path.join(dirname, "configs", template_name) + + +def load_or_create_factsheet(scenario_name, factsheet_name, template_name): + # Load a factsheet, creating it from the selected template if needed. + trustworthiness_dir = get_trustworthiness_dir(scenario_name) + os.makedirs(trustworthiness_dir, exist_ok=True) + + factsheet_path = os.path.join(trustworthiness_dir, factsheet_name) + template_path = get_factsheet_template_path(template_name) + + if not os.path.exists(factsheet_path): + shutil.copyfile(template_path, factsheet_path) + + with open(factsheet_path, encoding="utf-8") as factsheet_file: + return factsheet_path, json.load(factsheet_file) + + +def write_factsheet(factsheet_path, factsheet): + # Write a factsheet using readable standard JSON formatting. + with open(factsheet_path, "w", encoding="utf-8") as factsheet_file: + json.dump(factsheet, factsheet_file, indent=4) + + +def cap_score(value, maximum=1): + # Cap a score to the maximum value expected by the factsheet. + return maximum if value > maximum else value + + +def inverse_score(value): + # Convert an error or risk value into a bounded inverse score. + return 1 / (1 + value) + + +def inverse_bounded_score(value): + # Invert an error already bounded in [0, 1] while keeping the full score range. + return min(max(1 - float(value), 0.0), 1.0) + + +def get_enabled_defences(data): + # Return the active training-time defences declared in the scenario. + defences = [] + if data.get("reputation", {}).get("enabled", False): + defences.append("reputation-based defence") + if data.get("feature_squeezing", {}).get("enabled", False): + defences.append("feature squeezing") + if data.get("adversarial_training", {}).get("enabled", False): + defences.append(_format_adversarial_training_defence(data["adversarial_training"])) + return defences + + +def _format_adversarial_training_defence(adversarial_training): + attack = str(adversarial_training.get("attack", "")).upper() + domain = str(adversarial_training.get("domain", "")).lower() + if attack: + return f"adversarial training with {attack}" + if domain: + return f"adversarial training for {domain} data" + return "adversarial training" + + +def build_project_background(data): + # Build the natural-language scenario description used in factsheets. + federation = data["federation"] + n_nodes = int(data["n_nodes"]) + dataset = data["dataset"] + algorithm = data["model"] + aggregation_algorithm = data["agg_algorithm"] + n_rounds = int(data["rounds"]) + attack = data["attack_params"]["attacks"] + enabled_defences = get_enabled_defences(data) + + base = ( + "For the project setup, the most important aspects are the following: " + f"The federation architecture is {federation}, involving {n_nodes} clients, " + f"the dataset used is {dataset}, the learning algorithm is {algorithm}, " + f"the aggregation algorithm is {aggregation_algorithm} and the number of rounds is {n_rounds}. " + ) + + if attack != "No Attack": + attack_text = f"In addition, the type of attack used is {attack}. " + else: + attack_text = "No attacks are used. " + + if enabled_defences: + defence_list = ", ".join(enabled_defences) + defence_text = f"The active defence mechanisms are: {defence_list}. The trustworthiness of the project is desired." + else: + defence_text = "No defence mechanism is used, and the trustworthiness of the project is desired." + + return base + attack_text + defence_text + + +def populate_common_pre_train_sections(factsheet, data, model): + # Populate project, data, participant and training configuration fields. + with_reputation = data["reputation"]["enabled"] + + factsheet["project"]["overview"] = data["scenario_title"] + factsheet["project"]["purpose"] = data["scenario_description"] + factsheet["project"]["background"] = build_project_background(data) + + factsheet["data"]["provenance"] = data["dataset"] + factsheet["data"]["type"] = get_model_data_type(model, dataset_name=data["dataset"]) + factsheet["data"]["preprocessing"] = data["topology"] + + factsheet["participants"]["client_num"] = data["n_nodes"] or "" + factsheet["participants"]["sample_client_rate"] = 1 + factsheet["participants"]["client_selector"] = ( + "Reputation Based" if with_reputation else "Full Participation" + ) + + factsheet["configuration"]["aggregation_algorithm"] = data["agg_algorithm"] or "" + factsheet["configuration"]["training_model"] = data["model"] or "" + factsheet["configuration"]["personalization"] = False + factsheet["configuration"]["reputation_enabled"] = bool( + data.get("reputation", {}).get("enabled", False) + ) + adversarial_training = data.get("adversarial_training", {}) or {} + factsheet["configuration"]["adversarial_training"] = bool( + adversarial_training.get("enabled", False) + ) + factsheet["configuration"]["adversarial_training_domain"] = ( + adversarial_training.get("domain", "") if adversarial_training.get("enabled", False) else "" + ) + factsheet["configuration"]["adversarial_training_attack"] = ( + adversarial_training.get("attack", "") if adversarial_training.get("enabled", False) else "" + ) + factsheet["configuration"]["adversarial_training_mode"] = ( + adversarial_training.get("mode", "") if adversarial_training.get("enabled", False) else "" + ) + factsheet["configuration"]["visualization"] = True + factsheet["configuration"]["monitoring"] = True + factsheet["configuration"]["total_round_num"] = int(data["rounds"]) + factsheet["configuration"]["learning_rate"] = model.get_learning_rate() + factsheet["configuration"]["trainable_param_num"] = model.count_parameters() + factsheet["configuration"]["local_update_steps"] = data["epochs"] + + +def set_dp_configuration(factsheet, dp_enabled, dp_epsilon): + # Write differential privacy configuration using the factsheet schema. + factsheet["configuration"]["differential_privacy"] = bool(dp_enabled) + factsheet["configuration"]["dp_epsilon"] = dp_epsilon if dp_enabled else "" + + +def populate_reliability(factsheet, reliability_summary): + # Write dropout and timeout rates, defaulting to a fully reliable run. + factsheet["system"]["dropout_rate"] = ( + reliability_summary.get("dropout_rate", 0.0) + if reliability_summary is not None + else 0.0 + ) + factsheet["system"]["timeout_rate"] = ( + reliability_summary.get("timeout_rate", 0.0) + if reliability_summary is not None + else 0.0 + ) + + +def populate_participation(factsheet, participation_summary): + # Write participant selection dispersion, defaulting to full participation. + factsheet["fairness"]["selection_cv"] = ( + participation_summary.get("selection_cv", 1) + if participation_summary is not None + else 1 + ) + + +def populate_reputation(factsheet, reputation_summary, include_neighbor_num=False): + # Write reputation information for centralized or decentralized factsheets. + if reputation_summary is not None: + factsheet["participants"]["avg_neighbor_reputation"] = reputation_summary.get( + "avg_neighbor_reputation", + "", + ) + if include_neighbor_num: + factsheet["participants"]["neighbor_num"] = reputation_summary.get( + "neighbor_num", + 0, + ) + return + + factsheet["participants"]["avg_neighbor_reputation"] = 0 + if include_neighbor_num: + factsheet["participants"]["neighbor_num"] = 0 diff --git a/nebula/addons/trustworthiness/factsheet_populators.py b/nebula/addons/trustworthiness/factsheet_populators.py new file mode 100644 index 000000000..bee3b1e22 --- /dev/null +++ b/nebula/addons/trustworthiness/factsheet_populators.py @@ -0,0 +1,213 @@ +"""Profile-specific factsheet metric population.""" + +import logging + +from nebula.addons.trustworthiness.helpers.explainability import ( + get_explainability_metrics_summary, +) +from nebula.addons.trustworthiness.helpers.model_quality import ( + get_coefficient_of_variation, + get_generalized_entropy_index, + get_theil_index, + get_well_calibration_error, +) +from nebula.addons.trustworthiness.helpers.privacy import ( + get_epsilon_star, + get_mia_auc, +) +from nebula.addons.trustworthiness.helpers.robustness import ( + attack_success_rate, + get_adversarial_accuracy, + get_clever_score, + get_confidence_score, + get_empirical_robustness_score, + get_loss_sensitivity_score, +) + +logger = logging.getLogger(__name__) +from nebula.addons.trustworthiness.factsheet_common import ( + DATA_TYPE_IMAGES, + DATA_TYPE_TABULAR, + cap_score, + get_normalized_model_data_type, + inverse_bounded_score, + inverse_score, +) + +FEDERATION_CFL = "cfl" +FEDERATION_DFL = "dfl" + + +def get_federation_profile(federation): + # Group SDFL with DFL because both use decentralized factsheet profiles. + return FEDERATION_DFL if str(federation).upper() in {"DFL", "SDFL"} else FEDERATION_CFL + + +def populate_profile_metrics( + factsheet, + federation, + model, + train_loader, + test_loader, + test_accuracy, +): + # Select the profile-specific populator, falling back to the shared metric set. + federation_profile = get_federation_profile(federation) + data_type = str(factsheet.get("data", {}).get("type", "")).strip().lower() + if not data_type: + data_type = get_normalized_model_data_type(model) + populator = PROFILE_POPULATORS.get((federation_profile, data_type), populate_common_profile_metrics) + + populator( + factsheet=factsheet, + model=model, + train_loader=train_loader, + test_loader=test_loader, + test_accuracy=test_accuracy, + ) + + +def populate_cfl_images_metrics(factsheet, model, train_loader, test_loader, test_accuracy): + # Image factsheets include all image-compatible robustness metrics. + populate_common_profile_metrics(factsheet, model, train_loader, test_loader, test_accuracy) + populate_image_robustness_metrics(factsheet, model, test_loader) + + +def populate_cfl_tabular_metrics(factsheet, model, train_loader, test_loader, test_accuracy): + # Tabular factsheets use only metrics shared by valid tabular and image workflows. + populate_common_profile_metrics(factsheet, model, train_loader, test_loader, test_accuracy) + remove_image_only_robustness_metrics(factsheet) + + +def populate_dfl_images_metrics(factsheet, model, train_loader, test_loader, test_accuracy): + # Image factsheets include all image-compatible robustness metrics. + populate_common_profile_metrics(factsheet, model, train_loader, test_loader, test_accuracy) + populate_image_robustness_metrics(factsheet, model, test_loader) + + +def populate_dfl_tabular_metrics(factsheet, model, train_loader, test_loader, test_accuracy): + # Tabular factsheets use only metrics shared by valid tabular and image workflows. + populate_common_profile_metrics(factsheet, model, train_loader, test_loader, test_accuracy) + remove_image_only_robustness_metrics(factsheet) + + +def populate_common_profile_metrics(factsheet, model, train_loader, test_loader, test_accuracy): + # Current shared metric set used by every factsheet profile. + # Reuse one test batch for sample-based metrics and compute summary explainability once. + test_sample = next(iter(test_loader)) + explainability_metrics = get_explainability_metrics_summary(model, test_loader) + + populate_common_model_quality_metrics( + factsheet, + model, + train_loader, + test_loader, + test_accuracy, + test_sample, + ) + populate_common_explainability_metrics(factsheet, explainability_metrics) + populate_common_robustness_metrics(factsheet, model, test_loader) + + +def populate_common_model_quality_metrics( + factsheet, + model, + train_loader, + test_loader, + test_accuracy, + test_sample, +): + # Populate model quality, privacy, and fairness metrics shared by all profiles. + + # Privacy metrics derived from train/test behavior. + factsheet["privacy"]["epsilon_star"] = get_epsilon_star(model, train_loader, test_loader) + factsheet["privacy"]["inverse_epsilon_star"] = inverse_score(factsheet["privacy"]["epsilon_star"]) + factsheet["privacy"]["mia_auc"] = get_mia_auc(model, train_loader, test_loader) + factsheet["privacy"]["mia_auc_score"] = 1 - 2 * abs(factsheet["privacy"]["mia_auc"] - 0.5) + + # Fairness and calibration metrics expressed as inverse scores. + overfitting_value = max(0.0, float(factsheet["performance"]["train_accuracy"]) - float(test_accuracy)) + factsheet["fairness"]["inverse_overfitting"] = inverse_bounded_score(overfitting_value) + + well_calibration_error_value = get_well_calibration_error(model, test_loader) + factsheet["fairness"]["inverse_well_calibration_error"] = inverse_bounded_score(well_calibration_error_value) + + generalized_entropy_index_value = get_generalized_entropy_index(model, test_loader) + factsheet["fairness"]["inverse_generalized_entropy_index"] = inverse_score(generalized_entropy_index_value) + + theil_index_value = get_theil_index(model, test_loader) + factsheet["fairness"]["inverse_theil_index"] = inverse_score(theil_index_value) + + coefficient_of_variation_value = get_coefficient_of_variation(model, test_loader) + factsheet["fairness"]["inverse_coefficient_of_variation"] = inverse_score(coefficient_of_variation_value) + + # Confidence is already a probability-like score in [0, 1]. + value_confidence_score = get_confidence_score(model, test_sample) + factsheet["performance"]["test_confidence_score"] = value_confidence_score + + +def populate_common_explainability_metrics(factsheet, explainability_metrics): + # Copy explainability summary metrics into the factsheet schema. + factsheet["explainability"]["alpha_score"] = explainability_metrics["alpha_score"] + factsheet["explainability"]["spread_ratio"] = explainability_metrics["spread_ratio"] + factsheet["explainability"]["spread_divergence"] = explainability_metrics["spread_divergence"] + + feature_importance = explainability_metrics["feature_importance_cv"] + factsheet["performance"]["clipped_test_feature_importance_cv"] = cap_score(feature_importance) + + +def populate_common_robustness_metrics(factsheet, model, test_loader): + # Populate robustness metrics valid for both image and tabular datasets. + lr = factsheet["configuration"]["learning_rate"] + num_classes = model.get_num_classes() + + # Loader-based adversarial accuracy. + value_adv_accuracy = get_adversarial_accuracy(model, test_loader, num_classes, lr) + factsheet["performance"]["test_adv_accuracy"] = value_adv_accuracy + + # Attack success is inverted so higher remains better in the factsheet. + value_attack_success_rate = attack_success_rate( + model, + test_loader, + ) + factsheet["performance"]["inverse_test_attack_success_rate"] = 1 - value_attack_success_rate + + +def populate_image_robustness_metrics(factsheet, model, test_loader): + # Populate image-only continuous-input robustness metrics. + lr = factsheet["configuration"]["learning_rate"] + num_classes = model.get_num_classes() + test_sample = next(iter(test_loader)) + + value_clever = get_clever_score(model, test_sample, num_classes, lr) + factsheet["performance"]["test_clever_score"] = value_clever + + value_loss_sensitivity = get_loss_sensitivity_score(model, test_sample, num_classes, lr) + factsheet["performance"]["inverse_test_loss_sensitivity"] = inverse_score(value_loss_sensitivity) + + value_empirical_robustness = get_empirical_robustness_score( + model, + test_sample, + num_classes, + lr, + ) + factsheet["performance"]["test_empirical_robustness_score"] = value_empirical_robustness + + +def remove_image_only_robustness_metrics(factsheet): + # Drop stale values when an existing factsheet was created before tabular metrics were split. + performance = factsheet.get("performance", {}) + for field in ( + "test_clever_score", + "inverse_test_loss_sensitivity", + "test_empirical_robustness_score", + ): + performance.pop(field, None) + + +PROFILE_POPULATORS = { + (FEDERATION_CFL, DATA_TYPE_IMAGES): populate_cfl_images_metrics, + (FEDERATION_CFL, DATA_TYPE_TABULAR): populate_cfl_tabular_metrics, + (FEDERATION_DFL, DATA_TYPE_IMAGES): populate_dfl_images_metrics, + (FEDERATION_DFL, DATA_TYPE_TABULAR): populate_dfl_tabular_metrics, +} diff --git a/nebula/addons/trustworthiness/graphics.py b/nebula/addons/trustworthiness/graphics.py index 9233db756..13743680e 100644 --- a/nebula/addons/trustworthiness/graphics.py +++ b/nebula/addons/trustworthiness/graphics.py @@ -1,159 +1,220 @@ -from abc import ABC +import json import logging -import torch import os -import pickle -import lightning as pl -from torchmetrics.classification import MulticlassAccuracy, MulticlassRecall, MulticlassPrecision, MulticlassF1Score, MulticlassConfusionMatrix -from torchmetrics import MetricCollection -import seaborn as sns + import matplotlib.pyplot as plt -import json import pandas as pd +import seaborn as sns from nebula.core.utils.nebulalogger_tensorboard import NebulaTensorBoardLogger + logging.basicConfig(level=logging.INFO) -class Graphics(): +PILLAR_CONFIGS = [ + ("robustness", "#F8D3DF", -0.4, (10, 6), "Robustness"), + ("privacy", "#DA8D8B", -0.2, (10, 6), "Privacy"), + ("fairness", "#DDDDDD", -0.4, (10, 6), "Fairness"), + ("explainability", "#FCEFC3", -0.4, (10, 6), "Explainability"), + ("accountability", "#8FAADC", -0.3, (10, 6), "Accountability"), + ("architectural_soundness", "#DBB9FA", -0.3, (10, 6), "Architectural Soundness"), + ("sustainability", "#BBFDAF", -0.5, (12, 8), "Sustainability"), +] +TRUST_SCORE_COLOR = "#BF9000" + + +class Graphics: def __init__( self, scenario_start_time, - scenario_name + scenario_name, + participant_id=None, ): + # Configure the TensorBoard logger used to store trustworthiness figures. self.scenario_start_time = scenario_start_time self.scenario_name = scenario_name log_dir = os.path.join(os.environ["NEBULA_LOGS_DIR"], scenario_name) - self.nebulalogger = NebulaTensorBoardLogger(scenario_start_time, f"{log_dir}", name="metrics", version=f"trust", log_graph=True) - - def __log_figure(self, df, pillar, color, notion_y_pos = -0.4, figsize=(10,6)): - filtered_df = df[df['Pillar'] == pillar].copy() + version = "trust" if participant_id is None else f"trust_{participant_id}" + self.nebulalogger = NebulaTensorBoardLogger( + scenario_start_time, + f"{log_dir}", + name="metrics", + version=version, + log_graph=True, + ) - filtered_df.loc[:, 'Metric'] = filtered_df['Metric'].astype(str).str.replace('_', ' ') - filtered_df.loc[:, 'Metric'] = filtered_df['Metric'].apply(lambda x: str(x).title()) + def _trustworthiness_dir(self): + # Return the directory where trustworthiness JSON reports are stored. + return os.path.join(os.environ.get("NEBULA_LOGS_DIR"), self.scenario_name, "trustworthiness") - filtered_df.loc[:, 'Notion'] = filtered_df['Notion'].astype(str).str.replace('_', ' ') - filtered_df.loc[:, 'Notion'] = filtered_df['Notion'].apply(lambda x: str(x).title()) + def _trust_report_path(self, file_name): + # Build the absolute path for one trustworthiness report file. + return os.path.join(self._trustworthiness_dir(), file_name) - unique_notion_count = filtered_df['Notion'].nunique() - palette = [color] * unique_notion_count + def _load_trust_results(self, results_file): + # Load one trustworthiness JSON report from disk. + with open(results_file, "r") as f: + return json.load(f) - plt.figure(figsize=figsize) - ax = sns.barplot(data=filtered_df, x='Metric', y='Metric Score', hue='Notion', palette=palette, dodge=False) + def _log_report_from_file(self, results_file, tag_root, all_pillars_tag, label_suffix=""): + # Load a report and log all figures generated from it. + results = self._load_trust_results(results_file) + self._log_trust_report(results, tag_root, all_pillars_tag, label_suffix=label_suffix) + + def _format_report_dataframe(self, df, pillar): + # Keep one pillar and format metric/notion names for plot labels. + filtered_df = df[df["Pillar"] == pillar].copy() + + filtered_df.loc[:, "Metric"] = filtered_df["Metric"].astype(str).str.replace("_", " ") + filtered_df.loc[:, "Metric"] = filtered_df["Metric"].apply(lambda x: str(x).title()) + filtered_df.loc[:, "Notion"] = filtered_df["Notion"].astype(str).str.replace("_", " ") + filtered_df.loc[:, "Notion"] = filtered_df["Notion"].apply(lambda x: str(x).title()) + return filtered_df + + def _notion_ranges(self, filtered_df): + # Compute the x-axis range occupied by each notion in a pillar plot. + ranges = [] x_positions = range(len(filtered_df)) + seen_notions = set() + + for i, notion in enumerate(filtered_df["Notion"]): + if notion in seen_notions: + continue + + metrics_for_notion = filtered_df[filtered_df["Notion"] == notion]["Metric"] + start_pos = x_positions[i] + end_pos = x_positions[i + len(metrics_for_notion) - 1] + notion_x_pos = (start_pos + end_pos) / 2 + + ranges.append((notion, start_pos, end_pos, notion_x_pos)) + seen_notions.add(notion) + + return ranges + + def _draw_notion_score_lines(self, ax, filtered_df): + # Draw dashed horizontal notion score lines over the metrics they group. + x_count = len(filtered_df) + if x_count == 0: + return + + for notion, start_pos, end_pos, notion_x_pos in self._notion_ranges(filtered_df): + notion_score = filtered_df[filtered_df["Notion"] == notion]["Notion Score"].iloc[0] + ax.axhline( + notion_score, + ls="--", + color="black", + lw=0.5, + xmin=start_pos / x_count, + xmax=(end_pos + 1) / x_count, + ) + ax.text( + notion_x_pos, + notion_score + 0.01, + f"{notion_score:.2f}", + ha="center", + va="bottom", + fontsize=10, + color="black", + ) + + def _draw_notion_labels(self, ax, filtered_df, notion_y_pos): + # Add notion labels below the metric labels. + for notion, _, _, notion_x_pos in self._notion_ranges(filtered_df): + ax.text( + notion_x_pos, + notion_y_pos, + notion, + ha="center", + va="center", + fontsize=10, + color="black", + ) + + def _draw_metric_score_labels(self, ax, filtered_df): + # Add numeric metric scores above each bar. + for i, value in enumerate(filtered_df["Metric Score"]): + ax.text(i, value + 0.01, f"{value:.2f}", ha="center", va="bottom", fontsize=10, color="black") - notion_scores = {} - - for i in range(len(filtered_df)): - row = filtered_df.iloc[i] - notion = row['Notion'] - notion_score = row['Notion Score'] - metric_score = row['Metric Score'] - - if notion not in notion_scores: - metrics_for_notion = filtered_df[filtered_df['Notion'] == notion]['Metric'] - start_pos = x_positions[i] - end_pos = x_positions[i + len(metrics_for_notion) - 1] - - notion_x_pos = (start_pos + end_pos) / 2 - ax.axhline(notion_score, ls='--', color='black', lw=0.5, xmin=start_pos/len(x_positions), xmax=(end_pos+1)/len(x_positions)) - ax.text(notion_x_pos, notion_score + 0.01, f"{notion_score:.2f}", ha='center', va='bottom', fontsize=10, color='black') # Color negro - notion_scores[notion] = notion_score + def _log_pillar_figure(self, df, pillar, color, tag_root, notion_y_pos=-0.4, figsize=(10, 6)): + # Generate and log the metric/notion bar chart for one pillar. + filtered_df = self._format_report_dataframe(df, pillar) + unique_notion_count = filtered_df["Notion"].nunique() + palette = [color] * unique_notion_count + plt.figure(figsize=figsize) + ax = sns.barplot(data=filtered_df, x="Metric", y="Metric Score", hue="Notion", palette=palette, dodge=False) + + x_positions = range(len(filtered_df)) ax.set_xticks(x_positions) - ax.set_xticklabels(filtered_df['Metric'], rotation=45, ha='right', fontsize=10) + ax.set_xticklabels(filtered_df["Metric"], rotation=45, ha="right", fontsize=10) - seen_notions = set() - for i, (metric, notion) in enumerate(zip(filtered_df['Metric'], filtered_df['Notion'])): - if notion not in seen_notions: - metrics_for_notion = filtered_df[filtered_df['Notion'] == notion]['Metric'] - start_pos = x_positions[i] - end_pos = x_positions[i + len(metrics_for_notion) - 1] - - notion_x_pos = (start_pos + end_pos) / 2 - - ax.text(notion_x_pos, notion_y_pos, notion, ha='center', va='center', fontsize=10, color='black') - - seen_notions.add(notion) - - for i, v in enumerate(filtered_df['Metric Score']): - ax.text(i, v + 0.01, f"{v:.2f}", ha='center', va='bottom', fontsize=10, color='black') - - plt.xlabel('Metrics and notions', labelpad=35) - plt.ylabel('Score') - plt.title(f'Metrics and notion scores for the {pillar} pillar') - - ax.legend_.remove() + self._draw_notion_score_lines(ax, filtered_df) + self._draw_notion_labels(ax, filtered_df, notion_y_pos) + self._draw_metric_score_labels(ax, filtered_df) + + plt.xlabel("Metrics and notions", labelpad=35) + plt.ylabel("Score") + plt.title(f"Metrics and notion scores for the {pillar} pillar") + + if ax.legend_ is not None: + ax.legend_.remove() plt.tight_layout() - - self.nebulalogger.log_figure(ax.get_figure(), 0, f"Trust/Pillar/{pillar}") + + self.nebulalogger.log_figure(ax.get_figure(), 0, f"{tag_root}/Pillar/{pillar}") plt.close() - def graphics(self): - results_file = os.path.join(os.environ.get("NEBULA_LOGS_DIR"), self.scenario_name, "trustworthiness", "nebula_trust_results.json") - with open(results_file, 'r') as f: - results = json.load(f) + def _trust_report_rows(self, results): + # Flatten the nested trust report into rows that pandas can plot. + rows = [] + for pillar in results["pillars"]: + for pillar_name, pillar_value in pillar.items(): + if "notions" not in pillar_value: + continue - pillars_list = [] - notion_names = [] - notion_scores = [] - metric_names = [] - metric_scores = [] + for notion in pillar_value["notions"]: + for notion_name, notion_value in notion.items(): + for metric in notion_value["metrics"]: + for metric_name, metric_value in metric.items(): + rows.append( + { + "Pillar": pillar_name, + "Notion": notion_name, + "Notion Score": notion_value["score"], + "Metric": metric_name, + "Metric Score": metric_value["score"], + } + ) + return rows - for pillar in results["pillars"]: - for key, value in pillar.items(): - pillar_name = key - if "notions" in value: - for notion in value["notions"]: - for notion_key, notion_value in notion.items(): - notion_name = notion_key - notion_score = notion_value["score"] - for metric in notion_value["metrics"]: - for metric_key, metric_value in metric.items(): - metric_name = metric_key - metric_score = metric_value["score"] - - pillars_list.append(pillar_name) - notion_names.append(notion_name) - notion_scores.append(notion_score) - metric_names.append(metric_name) - metric_scores.append(metric_score) - - df = pd.DataFrame({ - "Pillar": pillars_list, - "Notion": notion_names, - "Notion Score": notion_scores, - "Metric": metric_names, - "Metric Score": metric_scores - }) - - self.__log_figure(df, 'robustness', "#F8D3DF") - self.__log_figure(df, "privacy", "#DA8D8B", -0.2) - self.__log_figure(df, "fairness", "#DDDDDD") - self.__log_figure(df, "explainability", "#FCEFC3") - self.__log_figure(df, "accountability", "#8FAADC", -0.3) - self.__log_figure(df, "architectural_soundness", "#DBB9FA", -0.3) - self.__log_figure(df, "sustainability", "#BBFDAF", -0.5, figsize=(12,8)) - - categories = [ - "robustness", - "privacy", - "fairness", - "explainability", - "accountability", - "architectural_soundness", - "sustainability" - ] + def _build_trust_report_dataframe(self, results): + # Convert flattened report rows into a DataFrame for pillar plots. + return pd.DataFrame( + self._trust_report_rows(results), + columns=["Pillar", "Notion", "Notion Score", "Metric", "Metric Score"], + ) + def _pillar_scores(self, results): + # Read pillar scores in the same order used by the all-pillars chart. + categories = [config[0] for config in PILLAR_CONFIGS] scores = [results["pillars"][i][category]["score"] for i, category in enumerate(categories)] + return categories, scores + + def _pillar_labels(self, label_suffix): + # Build human-readable labels for the all-pillars chart. + labels = [config[4] for config in PILLAR_CONFIGS] + labels.append("Trust Score") + return [f"{label}{label_suffix}" for label in labels] - trust_score = results["trust_score"] + def _log_all_pillars_figure(self, results, all_pillars_tag, label_suffix=""): + # Generate and log the summary chart with every pillar and the final trust score. + categories, scores = self._pillar_scores(results) categories.append("trust_score") - scores.append(trust_score) + scores.append(results["trust_score"]) - palette = ["#F8D3DF", "#DA8D8B", "#DDDDDD", "#FCEFC3", "#8FAADC", "#DBB9FA", "#BBFDAF", "#BF9000"] + palette = [config[1] for config in PILLAR_CONFIGS] + palette.append(TRUST_SCORE_COLOR) plt.figure(figsize=(10, 8)) ax = sns.barplot(x=categories, y=scores, palette=palette, hue=categories, legend=False) @@ -161,22 +222,55 @@ def graphics(self): ax.set_ylabel("Score") ax.set_title("Pillars and trust scores") - for i, v in enumerate(scores): - ax.text(i, v + 0.01, f"{v:.2f}", ha='center', va='bottom', fontsize=10) - - name_labels = [ - "Robustness", - "Privacy", - "Fairness", - "Explainability", - "Accountability", - "Architectural Soundness", - "Sustainability", - "Trust Score" - ] + for i, value in enumerate(scores): + ax.text(i, value + 0.01, f"{value:.2f}", ha="center", va="bottom", fontsize=10) ax.set_xticks(range(len(categories))) - ax.set_xticklabels(name_labels, rotation=45) + ax.set_xticklabels(self._pillar_labels(label_suffix), rotation=45) + + self.nebulalogger.log_figure(ax.get_figure(), 0, all_pillars_tag) + plt.close() + + def _log_trust_report(self, results, tag_root, all_pillars_tag, label_suffix=""): + # Log each pillar chart plus the all-pillars summary for a trust report. + df = self._build_trust_report_dataframe(results) + + for pillar, color, notion_y_pos, figsize, _ in PILLAR_CONFIGS: + self._log_pillar_figure(df, pillar, color, tag_root, notion_y_pos, figsize=figsize) + + self._log_all_pillars_figure(results, all_pillars_tag, label_suffix=label_suffix) + + def graphics(self): + # Log centralized/global trustworthiness graphics. + results_file = self._trust_report_path("nebula_trust_results.json") + self._log_report_from_file(results_file, "Trust", "Trust/AllPillars") + + def graphics_dfl(self, participant_id): + # Log local DFL graphics for one participant. + results_file = self._trust_report_path(f"nebula_trust_results_{participant_id}.json") + self._log_report_from_file( + results_file, + "Trust", + f"Trust/AllPillars_{participant_id}", + label_suffix=f"_{participant_id}", + ) + + def graphics_dfl_global(self, participant_id): + # Log aggregated DFL global graphics for one participant. + results_file = self._trust_report_path(f"nebula_trust_results_{participant_id}_global.json") + self._log_report_from_file( + results_file, + "TrustGlobal", + f"TrustGlobal/AllPillars_{participant_id}", + label_suffix=f"_{participant_id}", + ) - self.nebulalogger.log_figure(ax.get_figure(), 0, f"Trust/AllPillars") - plt.close() \ No newline at end of file + def graphics_sdfl_global(self, participant_id): + # Log SDFL global graphics from the shared global report. + results_file = self._trust_report_path("nebula_trust_results.json") + self._log_report_from_file( + results_file, + "TrustGlobal", + f"TrustGlobal/AllPillars_{participant_id}", + label_suffix=f"_{participant_id}", + ) diff --git a/nebula/addons/trustworthiness/helpers/__init__.py b/nebula/addons/trustworthiness/helpers/__init__.py new file mode 100644 index 000000000..4ef3ae023 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/__init__.py @@ -0,0 +1 @@ +"""Small helper modules for trustworthiness calculations and persistence.""" diff --git a/nebula/addons/trustworthiness/helpers/csv_io.py b/nebula/addons/trustworthiness/helpers/csv_io.py new file mode 100644 index 000000000..c924fc887 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/csv_io.py @@ -0,0 +1,334 @@ +import csv +import json +import logging +import os + +import pandas as pd + +logger = logging.getLogger(__name__) + +# CSV schemas used by trustworthiness outputs. Keeping column order centralized +# avoids subtle differences between append writes and full report exports. +DATA_RESULTS_COLUMNS = [ + "id", + "bytes_sent", + "bytes_recv", + "accuracy", + "loss", + "val_accuracy", + "macro_f1", + "train_accuracy", + "dp_enabled", + "dp_epsilon", +] + +CFL_DATA_RESULTS_COLUMNS = [ + "id", + "bytes_sent", + "bytes_recv", + "accuracy", + "loss", + "class_imbalance", + "model_size", + "local_entropy", + "val_accuracy", + "macro_f1", + "train_accuracy", + "dp_enabled", + "dp_epsilon", +] + +EMISSIONS_COLUMNS = [ + "id", + "role", + "energy_grid", + "emissions", + "workload", + "CPU_model", + "GPU_model", + "CPU_used", + "GPU_used", + "energy_consumed", + "sample_size", +] + + +def _logs_dir(): + # Prefer the runtime logs directory; keep the historical app path as fallback. + return os.environ.get("NEBULA_LOGS_DIR") or os.path.join("nebula", "app", "logs") + + +def _trustworthiness_dir(scenario_name: str) -> str: + # Every scenario stores trustworthiness artifacts in this subdirectory. + return os.path.join(_logs_dir(), scenario_name, "trustworthiness") + + +def _trustworthiness_path(scenario_name: str, filename: str) -> str: + # Build a concrete artifact path for a scenario. + return os.path.join(_trustworthiness_dir(scenario_name), filename) + + +def _ensure_parent_dir(file_path: str) -> None: + # Ensure CSV/JSON writes work even when the trust folder was not created yet. + directory = os.path.dirname(file_path) + if directory: + os.makedirs(directory, exist_ok=True) + + +def _read_first_csv_row(file_path: str) -> dict: + # Per-participant summary CSVs are expected to contain one current row. + if not os.path.exists(file_path): + raise FileNotFoundError(f"File not found: {file_path}") + + with open(file_path, "r", newline="") as csv_file: + rows = list(csv.DictReader(csv_file)) + + if not rows: + raise ValueError(f"No rows found in {file_path}") + + return rows[0] + + +def _read_or_empty_dataframe(file_path: str, columns: list[str]) -> pd.DataFrame: + # Append flows start from the existing CSV or from an empty schema. + if os.path.exists(file_path): + return pd.read_csv(file_path) + + return pd.DataFrame(columns=columns) + + +def _append_csv_row(file_path: str, columns: list[str], row: dict) -> None: + # Preserve the declared schema and ignore any unexpected keys in row. + _ensure_parent_dir(file_path) + df = _read_or_empty_dataframe(file_path, columns) + new_row = pd.DataFrame([{column: row.get(column) for column in columns}]) + pd.concat([df, new_row], ignore_index=True).to_csv(file_path, encoding="utf-8", index=False) + + +def _write_csv_rows(file_path: str, fieldnames: list[str], rows: list[dict]) -> None: + # Aggregate reports replace the previous CSV content in one write. + _ensure_parent_dir(file_path) + with open(file_path, "w", newline="") as csv_file: + writer = csv.DictWriter(csv_file, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(rows) + + +def _to_bool(value) -> bool: + # DictReader returns strings, while some tests/builders may pass booleans. + return str(value).strip().lower() == "true" + + +def read_csv(filename): + # Missing optional CSVs are represented as None for existing callers. + if os.path.exists(filename): + return pd.read_csv(filename) + + return None + + +def write_results_json(out_file, data): + # Trust metric evaluation appends one result object per evaluation call. + _ensure_parent_dir(out_file) + with open(out_file, "a", encoding="utf-8") as file: + json.dump(data, file, indent=4) + + +def load_data_results_participant(experiment_name: str, participant_id: int | str): + # Load the DFL/SDFL participant training summary written by save_results_csv. + row = _read_first_csv_row( + _trustworthiness_path(experiment_name, f"data_results_{participant_id}.csv") + ) + macro_f1 = row["macro_f1"] or 0.0 + train_accuracy = row["train_accuracy"] or 0.0 + + return ( + int(float(row["bytes_sent"])), + int(float(row["bytes_recv"])), + float(row["accuracy"]), + float(row["loss"]), + float(row["val_accuracy"]), + float(macro_f1), + float(train_accuracy), + _to_bool(row["dp_enabled"]), + float(row["dp_epsilon"]), + ) + + +def load_emissions_participant(experiment_name: str, participant_id: int | str): + # Load the DFL/SDFL participant CodeCarbon summary. + row = _read_first_csv_row( + _trustworthiness_path(experiment_name, f"emissions_{participant_id}.csv") + ) + + return ( + str(row["role"]), + float(row["energy_grid"]), + float(row["emissions"]), + str(row["workload"]), + str(row["CPU_model"]), + str(row["GPU_model"]), + _to_bool(row["CPU_used"]), + _to_bool(row["GPU_used"]), + float(row["energy_consumed"]), + int(float(row["sample_size"])), + ) + + +def save_trustworthiness_reports_csv( + reports: dict, + experiment_name: str, +) -> None: + # Server-side CFL flow exports one aggregate data CSV and one emissions CSV. + sorted_reports = sorted(reports.values(), key=lambda report: int(report["node_id"])) + + data_rows = [ + { + "id": report["node_id"], + "bytes_sent": report["bytes_sent"], + "bytes_recv": report["bytes_recv"], + "accuracy": report["accuracy"], + "loss": report["loss"], + "class_imbalance": report["class_imbalance"], + "model_size": report["model_size"], + "local_entropy": report["local_entropy"], + "val_accuracy": report["val_accuracy"], + "macro_f1": report["macro_f1"], + "train_accuracy": report["train_accuracy"], + "dp_enabled": report["dp_enabled"], + "dp_epsilon": report["dp_epsilon"], + } + for report in sorted_reports + ] + emissions_rows = [ + { + "id": report["node_id"], + "role": report["role"], + "energy_grid": report["energy_grid"], + "emissions": report["emissions"], + "workload": report["workload"], + "CPU_model": report["cpu_model"], + "GPU_model": report["gpu_model"], + "CPU_used": report["cpu_used"], + "GPU_used": report["gpu_used"], + "energy_consumed": report["energy_consumed"], + "sample_size": report["sample_size"], + } + for report in sorted_reports + ] + + data_results_path = _trustworthiness_path(experiment_name, "data_results.csv") + emissions_path = _trustworthiness_path(experiment_name, "emissions.csv") + + _write_csv_rows(data_results_path, CFL_DATA_RESULTS_COLUMNS, data_rows) + _write_csv_rows(emissions_path, EMISSIONS_COLUMNS, emissions_rows) + + logger.info( + "[TW SERVER] CSV files written correctly: %s, %s", + data_results_path, + emissions_path, + ) + + +def save_results_csv_cfl( + scenario_name: str, + id: int, + bytes_sent: int, + bytes_recv: int, + accuracy: float, + loss: float, + class_imbalance: float, + model_size: int, + local_entropy: float, + val_accuracy: float, + macro_f1: float, + train_accuracy: float, + dp_enabled: bool, + dp_epsilon: float, +): + # Append one participant to the centralized data-results CSV. + _append_csv_row( + _trustworthiness_path(scenario_name, "data_results.csv"), + CFL_DATA_RESULTS_COLUMNS, + { + "id": id, + "bytes_sent": bytes_sent, + "bytes_recv": bytes_recv, + "accuracy": accuracy, + "loss": loss, + "class_imbalance": class_imbalance, + "model_size": model_size, + "local_entropy": local_entropy, + "val_accuracy": val_accuracy, + "macro_f1": macro_f1, + "train_accuracy": train_accuracy, + "dp_enabled": dp_enabled, + "dp_epsilon": dp_epsilon, + }, + ) + + +def save_emissions_csv_cfl( + scenario_name: str, + id: int, + role: str, + energy_grid: float, + emissions: float, + workload: str, + cpu_model: str, + gpu_model: str, + cpu_used: bool, + gpu_used: bool, + energy_consumed: float, + sample_size: int, +): + # Append one participant to the centralized emissions CSV. + _append_csv_row( + _trustworthiness_path(scenario_name, "emissions.csv"), + EMISSIONS_COLUMNS, + { + "id": id, + "role": role, + "energy_grid": energy_grid, + "emissions": emissions, + "workload": workload, + "CPU_model": cpu_model, + "GPU_model": gpu_model, + "CPU_used": cpu_used, + "GPU_used": gpu_used, + "energy_consumed": energy_consumed, + "sample_size": sample_size, + }, + ) + + +def save_results_csv( + scenario_name: str, + id: int, + bytes_sent: int, + bytes_recv: int, + accuracy: float, + loss: float, + val_accuracy: float, + macro_f1: float, + train_accuracy: float, + dp_enabled: bool, + dp_epsilon: float, +): + # Local DFL/SDFL nodes persist their own data-results CSV before exchange. + _append_csv_row( + _trustworthiness_path(scenario_name, f"data_results_{id}.csv"), + DATA_RESULTS_COLUMNS, + { + "id": id, + "bytes_sent": bytes_sent, + "bytes_recv": bytes_recv, + "accuracy": accuracy, + "loss": loss, + "val_accuracy": val_accuracy, + "macro_f1": macro_f1, + "train_accuracy": train_accuracy, + "dp_enabled": dp_enabled, + "dp_epsilon": dp_epsilon, + }, + ) diff --git a/nebula/addons/trustworthiness/helpers/data_distribution.py b/nebula/addons/trustworthiness/helpers/data_distribution.py new file mode 100644 index 000000000..6a118019b --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/data_distribution.py @@ -0,0 +1,178 @@ +import json +import os +from collections import Counter + +import numpy as np +from hashids import Hashids +from scipy.stats import entropy + +hashids = Hashids() + + +def _logs_dir(): + # Return the base logs directory used to read and write trust artifacts. + return os.environ.get("NEBULA_LOGS_DIR") or os.path.join("nebula", "app", "logs") + + +def _trustworthiness_dir(scenario_name: str) -> str: + # Return the trustworthiness directory for a scenario. + return os.path.join(_logs_dir(), scenario_name, "trustworthiness") + + +def _trustworthiness_path(scenario_name: str, filename: str) -> str: + # Return the path of a trustworthiness artifact for a scenario. + return os.path.join(_trustworthiness_dir(scenario_name), filename) + + +def _ensure_trustworthiness_dir(scenario_name: str) -> None: + # Create the scenario trustworthiness directory if it does not exist. + os.makedirs(_trustworthiness_dir(scenario_name), exist_ok=True) + + +def _encode_class_id(class_id) -> str: + # Convert a numeric class ID into the hash used in persisted JSON files. + return hashids.encode(int(class_id)) + + +def _class_counts_from_counter(class_counter: Counter) -> dict: + # Return hashed class counts from an in-memory Counter. + return { + _encode_class_id(class_id): int(count) + for class_id, count in class_counter.items() + } + + +def _write_json(scenario_name: str, filename: str, data: dict, indent=None) -> None: + # Write a JSON trust artifact inside the scenario trustworthiness directory. + _ensure_trustworthiness_dir(scenario_name) + with open(_trustworthiness_path(scenario_name, filename), "w") as file: + json.dump(data, file, indent=indent) + + +def _iter_participant_class_counts(experiment_name: str): + # Yield each consecutive participant ID and its saved class-count dictionary. + participant_id = 0 + while True: + file_path = get_class_count_file(experiment_name, participant_id) + if not os.path.exists(file_path): + break + + yield participant_id, load_class_counts(experiment_name, participant_id) + participant_id += 1 + + +def get_class_count_file(scenario_name, participant_id): + # Return the class-count JSON path for one participant. + return _trustworthiness_path(scenario_name, f"{str(participant_id)}_class_count.json") + + +def load_class_counts(scenario_name, participant_id): + # Load one participant's saved class-count dictionary. + with open(get_class_count_file(scenario_name, participant_id), "r") as file: + return json.load(file) + + +def get_class_imbalance_from_counts(class_counts): + # Calculate class imbalance as the coefficient of variation of class counts. + return get_cv(list=list(class_counts.values())) + + +def get_class_imbalance_score(class_imbalance): + # Convert class imbalance into a score where 1 means balanced classes. + return 1 / (1 + class_imbalance) + + +def get_class_imbalance_local(participant_id, experiment_name): + # Return the raw class-imbalance value for one participant. + return get_class_imbalance_from_counts(load_class_counts(experiment_name, participant_id)) + + +def get_local_class_imbalance_score(scenario_name, participant_id): + # Return the trust-oriented class-imbalance score for one participant. + return get_class_imbalance_score(get_class_imbalance_local(participant_id, scenario_name)) + + +def get_entropy_from_class_counts(class_counts, normalize=False): + # Calculate entropy from a class-count dictionary, optionally normalized to [0, 1]. + counts = np.array(list(class_counts.values()), dtype=float) + total = counts.sum() + if total <= 0: + return 0.0 + + probabilities = counts / total + entropy_value = entropy(probabilities, base=2) + + if not normalize: + return round(float(entropy_value), 6) + + class_count = len(probabilities) + if class_count <= 1: + return 0.0 + + normalized_entropy = float(entropy_value / np.log2(class_count)) + return float(np.clip(normalized_entropy, 0.0, 1.0)) + + +def get_local_normalized_entropy(scenario_name, participant_id): + # Return normalized entropy for one participant's saved class counts. + return get_entropy_from_class_counts( + load_class_counts(scenario_name, participant_id), + normalize=True, + ) + + +def get_cv(list=None, std=None, mean=None): + # Return the coefficient of variation from either values or precomputed std/mean. + if std is not None and mean is not None: + return 0 if mean == 0 else std / mean + + if list is None: + return 0 + + values = np.asarray(list, dtype=float) + mean_value = float(np.mean(values)) if values.size else 0.0 + if mean_value == 0: + return 0 + + return float(np.std(values) / mean_value) + + +def get_participation_variation_score(participation_counts): + # Convert participation-count dispersion into a score where 1 means equal participation. + if not participation_counts: + return 1.0 + + counts = np.asarray(participation_counts, dtype=float) + mean_count = float(np.mean(counts)) + if mean_count <= 0: + return 0.0 + + cv = get_cv(list=counts) + if not np.isfinite(cv): + return 0.0 + + return float(1 / (1 + cv)) + + +def save_class_count_per_participant(experiment_name, class_counter: Counter, idx): + # Save one participant's class-count dictionary as _class_count.json. + _write_json( + experiment_name, + f"{str(idx)}_class_count.json", + _class_counts_from_counter(class_counter), + ) + + +def get_all_data_entropy(experiment_name): + # Compute entropy for every participant class-count file and write entropy.json. + entropy_per_participant = { + str(participant_id): round(get_entropy_from_class_counts(class_count), 6) + for participant_id, class_count in _iter_participant_class_counts(experiment_name) + } + + _write_json(experiment_name, "entropy.json", entropy_per_participant, indent=2) + + +def get_local_entropy(id, experiment_name): + # Return non-normalized entropy for one participant's saved class counts. + return get_entropy_from_class_counts(load_class_counts(experiment_name, id)) diff --git a/nebula/addons/trustworthiness/helpers/explainability.py b/nebula/addons/trustworthiness/helpers/explainability.py new file mode 100644 index 000000000..96b066a1f --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/explainability.py @@ -0,0 +1,233 @@ +import logging +import math + +import numpy as np +import shap +import torch +from scipy.spatial.distance import jensenshannon +from scipy.stats import entropy, variation + +logger = logging.getLogger(__name__) + + +def _feature_importance_cv_from_values(vals): + # Higher CV means attributions differ more across features, i.e. a more selective explanation. + vals = np.asarray(vals, dtype=float).reshape(-1) + vals = np.nan_to_num(vals, nan=0.0, posinf=0.0, neginf=0.0) + vals = vals[vals > 0] + + if len(vals) <= 1: + return 0.0 + + cv = float(variation(vals)) + if math.isnan(cv) or math.isinf(cv): + return 1.0 + return max(0.0, cv) + + +def _get_feature_importances(model, test_sample): + # Computes global feature importances with a simple modality-aware policy: + # SHAP for tabular tensors and Integrated Gradients for image-like tensors. + if not isinstance(model, torch.nn.Module): + logger.warning("Model is not a torch.nn.Module") + return np.array([]) + + if not isinstance(test_sample, (tuple, list)) or len(test_sample) < 1: + return np.array([]) + + inputs = test_sample[0] + if not torch.is_tensor(inputs) or inputs.ndim < 2 or inputs.size(0) == 0: + return np.array([]) + + try: + device = next(model.parameters()).device + except Exception: + device = torch.device("cpu") + + inputs = inputs.to(device) + if not torch.is_floating_point(inputs): + inputs = inputs.float() + + was_training = bool(getattr(model, "training", False)) + model.eval() + + try: + if inputs.ndim == 2: + logger.info("Computing tabular feature importances with SHAP, input_shape=%s", tuple(inputs.shape)) + importances = _get_shap_importances(model, inputs) + else: + logger.info("Computing image-like feature importances with Integrated Gradients, input_shape=%s", tuple(inputs.shape)) + importances = _get_integrated_gradients_importances(model, inputs) + + logger.info("Computed feature importances, n_features=%s, total_importance=%s", len(importances), float(np.sum(importances))) + return importances + except Exception as exc: + logger.warning("Could not compute feature importances") + logger.warning(exc) + return np.array([]) + finally: + if was_training: + model.train() + + +def _get_shap_importances(model, inputs): + # SHAP is a natural fit for tabular data: one attribution per input column. + if inputs.size(0) < 2: + return np.array([]) + + background_size = min(16, inputs.size(0) - 1) + background = inputs[:background_size] + explained = inputs[background_size:] + + logger.info("SHAP background_size=%s, explained_size=%s", int(background.size(0)), int(explained.size(0))) + explainer = shap.GradientExplainer(model, background) + shap_values = explainer.shap_values(explained) + + if isinstance(shap_values, (list, tuple)): + arrays = [np.asarray(values, dtype=float) for values in shap_values if values is not None] + if not arrays: + return np.array([]) + shap_arr = np.stack(arrays, axis=0) + importances = np.mean(np.abs(shap_arr), axis=(0, 1)) + else: + shap_arr = np.asarray(shap_values, dtype=float) + if shap_arr.ndim == 3: + importances = np.mean(np.abs(shap_arr), axis=(0, 2)) + else: + importances = np.mean(np.abs(shap_arr), axis=0) + + return _clean_importances(importances) + + +def _get_integrated_gradients_importances(model, inputs, steps=16): + # Zero baseline is simple and works well for normalized image tensors. + logger.info("Integrated Gradients steps=%s", int(steps)) + baseline = torch.zeros_like(inputs) + total_gradients = torch.zeros_like(inputs) + + for alpha in torch.linspace(0.0, 1.0, steps, device=inputs.device): + scaled_inputs = (baseline + alpha * (inputs - baseline)).detach().requires_grad_(True) + model.zero_grad(set_to_none=True) + + outputs = model(scaled_inputs) + if isinstance(outputs, (tuple, list)): + outputs = outputs[0] + + # Explain the model's predicted class for each sample. + if outputs.ndim == 1: + score = outputs.sum() + else: + score = outputs.reshape(outputs.shape[0], -1).max(dim=1).values.sum() + + gradients = torch.autograd.grad(score, scaled_inputs)[0] + total_gradients += gradients.detach() + + attributions = (inputs - baseline) * total_gradients / float(steps) + importances = torch.abs(attributions).mean(dim=0) + + if importances.ndim == 3: + # For RGB images, keep one importance value per spatial position. + importances = importances.mean(dim=0) + + return _clean_importances(importances.detach().cpu().numpy()) + + +def _clean_importances(importances): + importances = np.asarray(importances, dtype=float).reshape(-1) + importances = np.nan_to_num(importances, nan=0.0, posinf=0.0, neginf=0.0) + return np.maximum(importances, 0.0) + + +def _alpha_score_from_values(vals, alpha=0.8): + # Fraction of features needed to explain alpha of the attribution mass; lower is better. + vals = np.asarray(vals, dtype=float).reshape(-1) + vals = np.nan_to_num(vals, nan=0.0, posinf=0.0, neginf=0.0) + vals = np.maximum(vals, 0.0) + total_features = len(vals) + if total_features == 0 or np.sum(vals) <= 1e-12: + return 1.0 + + try: + alpha = float(alpha) + except Exception: + alpha = 0.8 + alpha = min(max(alpha, 0.0), 1.0) + + vals_sorted = np.sort(vals)[::-1] + cum_sum = np.cumsum(vals_sorted) + threshold = float(alpha) * np.sum(vals_sorted) + idx = np.searchsorted(cum_sum, threshold) + return float(min(total_features, idx + 1) / total_features) + + +def _spread_base_from_values(vals, divergence=True): + # Entropy ratio measures spread; JS divergence measures distance from uniform attribution. + vals = np.asarray(vals, dtype=float).reshape(-1) + tol = 1e-8 + + if len(vals) == 0 or np.sum(vals) < tol: + return 0.0 if divergence else 1.0 + if len(vals) == 1: + return 0.0 if divergence else 1.0 + + weights = vals / np.sum(vals) + equal_weights = np.ones(len(vals)) / len(vals) + + if divergence: + metric = jensenshannon(weights, equal_weights, base=2) + else: + denom = entropy(equal_weights) + metric = 0.0 if denom <= tol else entropy(weights) / denom + + if math.isnan(metric) or math.isinf(metric): + return 0.0 if divergence else 1.0 + return float(np.clip(metric, 0.0, 1.0)) + + +def get_explainability_metrics_summary(model, test_dataloader, max_batches=4): + # Computes explainability metrics over multiple test batches and returns their mean values. + summary = { + "feature_importance_cv": 1.0, + "alpha_score": 1.0, + "spread_ratio": 1.0, + "spread_divergence": 0.0, + } + + if test_dataloader is None: + return summary + + try: + max_batches = max(1, int(max_batches)) + except Exception: + max_batches = 4 + + fi_values = [] + alpha_values = [] + spread_ratio_values = [] + spread_divergence_values = [] + + try: + for batch_idx, test_sample in enumerate(test_dataloader): + if batch_idx >= max_batches: + break + + # Compute attributions once per batch and derive all explainability metrics from them. + importances = _get_feature_importances(model, test_sample) + fi_values.append(float(_feature_importance_cv_from_values(importances))) + alpha_values.append(float(_alpha_score_from_values(importances))) + spread_ratio_values.append(float(_spread_base_from_values(importances, divergence=False))) + spread_divergence_values.append(float(_spread_base_from_values(importances, divergence=True))) + except Exception as exc: + logger.warning("Could not compute explainability metrics summary") + logger.warning(exc) + + if fi_values: + summary["feature_importance_cv"] = float(np.mean(fi_values)) + if alpha_values: + summary["alpha_score"] = float(np.mean(alpha_values)) + if spread_ratio_values: + summary["spread_ratio"] = float(np.mean(spread_ratio_values)) + if spread_divergence_values: + summary["spread_divergence"] = float(np.mean(spread_divergence_values)) + + return summary diff --git a/nebula/addons/trustworthiness/helpers/factsheet_values.py b/nebula/addons/trustworthiness/helpers/factsheet_values.py new file mode 100644 index 000000000..8faa2cf81 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/factsheet_values.py @@ -0,0 +1,83 @@ +import logging +import math + +from nebula.addons.trustworthiness.helpers.privacy import ( + get_global_privacy_risk, + get_global_privacy_risk_dfl, +) +from nebula.addons.trustworthiness.helpers.scenario_metrics import comm_efficiency +from nebula.addons.trustworthiness.helpers.scoring import ( + check_properties, + get_value, +) + +logger = logging.getLogger(__name__) + +# Operations available from the eval_metrics JSON files. +OPERATIONS = { + "check_properties": check_properties, + "comm_efficiency": comm_efficiency, + "get_global_privacy_risk": get_global_privacy_risk, + "get_global_privacy_risk_dfl": get_global_privacy_risk_dfl, + "get_value": get_value, +} + + +def check_field_filled(factsheet_dict, factsheet_path, value, empty=""): + # Keep an existing factsheet value; otherwise return a clean fallback for empty or NaN values. + current_value = factsheet_dict[factsheet_path[0]][factsheet_path[1]] + if current_value: + return current_value + + if _is_empty_value(value): + return empty + + if _is_nan_number(value): + return 0 + + return value + + +def _is_empty_value(value): + # Empty strings and the literal "nan" should not overwrite missing factsheet fields. + return value == "" or value == "nan" + + +def _is_nan_number(value): + # Only numeric values can be checked with math.isnan safely. + return isinstance(value, (int, float)) and not isinstance(value, bool) and math.isnan(value) + + +def get_input_value(input_docs, inputs, operation): + # Collect metric inputs from their configured paths and apply the configured operation. + args = [] + for input_config in inputs: + source = input_config.get("source", "") + field = input_config.get("field_path", "") + input_doc = input_docs.get(source) + if input_doc is None: + logger.warning(f"{source} is null") + continue + + args.append(get_value_from_path(input_doc, field)) + + try: + operation_fn = OPERATIONS[operation] + return operation_fn(*args) + except (KeyError, TypeError): + logger.warning(f"{operation} is not valid") + return None + + +def get_value_from_path(input_doc, path): + # Walk a slash-separated path through a nested dict and return the leaf value. + current_value = input_doc + for nested_key in path.split("/"): + if not isinstance(current_value, dict): + return None + + current_value = current_value.get(nested_key) + if current_value is None: + return None + + return current_value diff --git a/nebula/addons/trustworthiness/helpers/model_quality.py b/nebula/addons/trustworthiness/helpers/model_quality.py new file mode 100644 index 000000000..979583607 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/model_quality.py @@ -0,0 +1,220 @@ +import logging +import math + +import numpy as np +import torch + +# AIF360: AI Fairness 360 [Software]. https://github.com/Trusted-AI/AIF360 +# Licensed under Apache License 2.0: https://github.com/Trusted-AI/AIF360/blob/main/LICENSE +# HolisticAI: open-source library to assess and improve AI trustworthiness. +# Licensed under Apache License 2.0: https://github.com/holistic-ai/holisticai/blob/main/LICENSE + +logger = logging.getLogger(__name__) + +def _extract_model_logits(model_output): + # Normalize the output returned by a model forward pass into a logits tensor. + return model_output[0] if isinstance(model_output, (tuple, list)) else model_output + + +def _prepare_class_targets(y): + # Convert different target representations into a flat class-index tensor. + if not torch.is_tensor(y): + y = torch.as_tensor(y) + + if y.ndim > 1: + if y.size(-1) > 1: + y = y.argmax(dim=-1) + else: + y = y.view(-1) + + return y.long().view(-1) + + +def _logits_to_probabilities(logits): + # Convert model outputs into a probability matrix of shape (N, C). + if not torch.is_tensor(logits): + logits = torch.as_tensor(logits) + + if logits.ndim == 0: + logits = logits.view(1, 1) + elif logits.ndim == 1: + logits = logits.view(-1, 1) + elif logits.ndim > 2: + logits = logits.reshape(logits.shape[0], -1) + + if logits.size(1) == 1: + pos_prob = torch.sigmoid(logits[:, 0]) + probs = torch.stack([1.0 - pos_prob, pos_prob], dim=1) + else: + row_sums = logits.sum(dim=1) + looks_like_probs = ( + torch.all(logits >= 0) + and torch.all(logits <= 1.0 + 1e-6) + and torch.allclose(row_sums, torch.ones_like(row_sums), atol=1e-4, rtol=1e-4) + ) + probs = logits if looks_like_probs else torch.softmax(logits, dim=1) + + probs = torch.clamp(probs, min=0.0, max=1.0) + probs = probs / probs.sum(dim=1, keepdim=True).clamp_min(1e-12) + return probs + + +def _collect_classification_statistics(model, dataloader): + # Collect prediction statistics required by calibration and inequality metrics. + if not isinstance(model, torch.nn.Module): + logger.warning("Model is not a torch.nn.Module") + empty = np.array([], dtype=float) + return empty, empty, empty + + try: + device = next(model.parameters()).device + except Exception: + device = torch.device("cpu") + + confidences = [] + correct = [] + true_probs = [] + + model.eval() + with torch.no_grad(): + for batch in dataloader: + if not isinstance(batch, (tuple, list)) or len(batch) < 2: + continue + + x, y = batch[0], batch[1] + if not (torch.is_tensor(x) and torch.is_tensor(y)): + continue + + x = x.to(device) + y = _prepare_class_targets(y).to(device) + + # Metrics consume probabilities even when the model returns raw logits + # or wraps the classification output in a tuple/list. + probs = _logits_to_probabilities(_extract_model_logits(model(x))) + + if probs.ndim != 2 or probs.size(0) == 0: + continue + + n = min(int(y.numel()), int(probs.size(0))) + if n == 0: + continue + y = y[:n] + probs = probs[:n] + + valid_mask = (y >= 0) & (y < probs.size(1)) + if not torch.any(valid_mask): + continue + + y = y[valid_mask] + probs = probs[valid_mask] + + # Confidence is the predicted-class probability. true_probs is the + # probability assigned to the actual class, used as a continuous benefit. + conf, preds = probs.max(dim=1) + confidences.append(conf.cpu()) + correct.append(preds.eq(y).float().cpu()) + true_probs.append(probs.gather(1, y.view(-1, 1)).squeeze(1).cpu()) + + if not confidences: + empty = np.array([], dtype=float) + return empty, empty, empty + + return ( + torch.cat(confidences).numpy(), + torch.cat(correct).numpy(), + torch.cat(true_probs).numpy(), + ) + + +def get_well_calibration_error(model, test_dataloader, n_bins=10): + # Calculates a well-calibration error style metric using prediction confidence. + if not isinstance(model, torch.nn.Module): + logger.warning("Model is not a torch.nn.Module") + return 1.0 + + try: + n_bins = max(2, int(n_bins)) + except Exception: + n_bins = 10 + + confidences, correct, _ = _collect_classification_statistics(model, test_dataloader) + + if len(confidences) == 0 or len(correct) == 0: + return 1.0 + + confidences = np.clip(np.asarray(confidences, dtype=float), 0.0, 1.0) + correct = np.clip(np.asarray(correct, dtype=float), 0.0, 1.0) + + bin_edges = np.linspace(0.0, 1.0, n_bins + 1) + ece = 0.0 + total = float(len(confidences)) + + # ECE compares empirical accuracy and average confidence within each bin. + for idx in range(n_bins): + left = bin_edges[idx] + right = bin_edges[idx + 1] + if idx == n_bins - 1: + mask = (confidences >= left) & (confidences <= right) + else: + mask = (confidences >= left) & (confidences < right) + + if not np.any(mask): + continue + + bin_weight = float(mask.sum()) / total + bin_accuracy = float(correct[mask].mean()) + bin_confidence = float(confidences[mask].mean()) + ece += bin_weight * abs(bin_accuracy - bin_confidence) + + return float(np.clip(ece, 0.0, 1.0)) + + +def get_generalized_entropy_index(model, test_dataloader, alpha=2): + # Calculates generalized entropy index from model predictions. + try: + _, _, true_class_probs = _collect_classification_statistics(model, test_dataloader) + if len(true_class_probs) == 0: + return 0.0 + + eps = 1e-12 + b = np.clip(np.asarray(true_class_probs, dtype=float), eps, 1.0) + mu = float(np.mean(b)) + if mu <= 0: + return 0.0 + + # GEI measures dispersion around the mean benefit. Lower values mean the + # model gives more even true-class confidence across samples. + ratio = np.clip(b / mu, eps, None) + + if alpha == 0: + val = float(np.mean(-np.log(ratio))) + elif alpha == 1: + val = float(np.mean(ratio * np.log(ratio))) + elif alpha == 2: + val = float(np.mean((ratio - 1.0) ** 2) / 2.0) + else: + val = float(np.mean(ratio**alpha - 1.0) / (alpha * (alpha - 1.0))) + + if math.isnan(val) or math.isinf(val): + return 0.0 + return max(0.0, val) + except Exception as exc: + logger.warning("Could not compute generalized entropy index") + logger.warning(exc) + return 0.0 + + +def get_theil_index(model, test_dataloader): + # Convenience wrapper for generalized entropy index with alpha=1. + return get_generalized_entropy_index(model, test_dataloader, alpha=1) + + +def get_coefficient_of_variation(model, test_dataloader): + # Calculates coefficient of variation from GEI(alpha=2). + try: + gei = get_generalized_entropy_index(model, test_dataloader, alpha=2) + return float(np.sqrt(2 * gei)) + except Exception as exc: + logger.warning("Could not compute coefficient of variation") + logger.warning(exc) + return 0.0 diff --git a/nebula/addons/trustworthiness/helpers/privacy.py b/nebula/addons/trustworthiness/helpers/privacy.py new file mode 100644 index 000000000..b33c062f2 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/privacy.py @@ -0,0 +1,161 @@ +import logging +import math +import numbers +from math import e + +import numpy as np +import torch +from sklearn.metrics import roc_auc_score, roc_curve +from torch import nn + +logger = logging.getLogger(__name__) + +def get_global_privacy_risk(dp, epsilon, n): + # Calculates the global privacy risk by epsilon and the number of clients. + + try: + epsilon = float(epsilon) + n = float(n) + except (TypeError, ValueError): + return 1 + + if dp is True and isinstance(epsilon, numbers.Number): + return 1 / (1 + (n - 1) * math.pow(e, -epsilon)) + else: + return 1 + + +def get_global_privacy_risk_dfl(dp, epsilon, n): + # Calculates the global privacy risk by epsilon and the number of clients for DFL. + + try: + epsilon = float(epsilon) + n = float(n) + except (TypeError, ValueError): + return 1 + + if dp is True and isinstance(epsilon, numbers.Number): + return 1 / (1 + (n + 1) * math.pow(e, -epsilon)) + else: + return 1 + + +def _collect_per_sample_losses(model, dataloader, max_samples=5000): + # Compute per-sample cross-entropy losses for a dataloader. + if not isinstance(model, torch.nn.Module) or dataloader is None: + return np.array([]) + + try: + device = next(model.parameters()).device + except Exception: + device = torch.device("cpu") + + criterion = nn.CrossEntropyLoss(reduction="none") + losses = [] + collected = 0 + + model.eval() + with torch.no_grad(): + for batch in dataloader: + if not isinstance(batch, (tuple, list)) or len(batch) < 2: + continue + + samples, labels = batch[0], batch[1] + if not torch.is_tensor(samples) or not torch.is_tensor(labels): + continue + + remaining = max_samples - collected + if remaining <= 0: + break + + samples = samples[:remaining].to(device) + labels = labels[:remaining] + + if labels.ndim > 1: + labels = torch.argmax(labels, dim=1) + + labels = labels.long().to(device) + + outputs = model(samples) + logits = outputs[0] if isinstance(outputs, (tuple, list)) else outputs + batch_losses = criterion(logits, labels) + + batch_losses_np = batch_losses.detach().cpu().numpy() + batch_losses_np = batch_losses_np[np.isfinite(batch_losses_np)] + if batch_losses_np.size == 0: + continue + + losses.append(batch_losses_np) + collected += int(batch_losses.shape[0]) + + if not losses: + return np.array([]) + + return np.concatenate(losses, axis=0) + + +def get_epsilon_star(model, train_dataloader, test_dataloader, max_samples=5000, percentile=95): + # Compute empirical epsilon* from train/test loss distributions. + try: + loss_train = _collect_per_sample_losses(model, train_dataloader, max_samples=max_samples) + loss_test = _collect_per_sample_losses(model, test_dataloader, max_samples=max_samples) + + if loss_train.size == 0 or loss_test.size == 0: + return 0.0 + + scores = np.concatenate([-loss_train, -loss_test]) + y_true = np.concatenate([np.ones(len(loss_train)), np.zeros(len(loss_test))]) + + fpr, tpr, _ = roc_curve(y_true, scores) + + fpr_floor = 1.0 / len(loss_test) + fnr_floor = 1.0 / len(loss_train) + + fpr = np.clip(fpr, fpr_floor, 1 - fpr_floor) + fnr = np.clip(1 - tpr, fnr_floor, 1 - fnr_floor) + + delta = 1.0 / len(loss_train) if len(loss_train) > 0 else 1e-5 + + m1 = (1 - delta - fnr) / fpr + m2 = (1 - delta - fpr) / fnr + m3 = (fnr - delta) / (1 - fpr) + m4 = (fpr - delta) / (1 - fnr) + + ratios = np.maximum.reduce([m1, m2, m3, m4, np.ones_like(m1)]) + ratios = ratios[np.isfinite(ratios)] + if ratios.size == 0: + return 0.0 + + epsilon_star_val = np.log(np.nanpercentile(ratios, percentile)) + + if np.isnan(epsilon_star_val) or np.isinf(epsilon_star_val): + return 0.0 + + return float(max(0.0, epsilon_star_val)) + except Exception as exc: + logger.warning("Could not compute epsilon_star") + logger.warning(exc) + return 0.0 + + +def get_mia_auc(model, train_dataloader, test_dataloader, max_samples=5000): + # Compute membership inference attack AUC using per-sample loss as the attack score. + try: + loss_train = _collect_per_sample_losses(model, train_dataloader, max_samples=max_samples) + loss_test = _collect_per_sample_losses(model, test_dataloader, max_samples=max_samples) + + if loss_train.size == 0 or loss_test.size == 0: + return 0.5 + + scores = np.concatenate([-loss_train, -loss_test]) + y_true = np.concatenate([np.ones(len(loss_train)), np.zeros(len(loss_test))]) + mia_auc = roc_auc_score(y_true, scores) + + if np.isnan(mia_auc) or np.isinf(mia_auc): + return 0.5 + + return float(np.clip(mia_auc, 0.0, 1.0)) + except Exception as exc: + logger.warning("Could not compute mia_auc") + logger.warning(exc) + return 0.5 diff --git a/nebula/addons/trustworthiness/helpers/robustness.py b/nebula/addons/trustworthiness/helpers/robustness.py new file mode 100644 index 000000000..a64fc5001 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/robustness.py @@ -0,0 +1,622 @@ +import logging +import math + +import numpy as np +import torch +import torch.nn.functional as F +from art.estimators.classification import PyTorchClassifier +from art.metrics import clever_u, empirical_robustness, loss_sensitivity +from nebula.core.datasets.image_metadata import get_image_normalization +from torch import nn, optim + +logger = logging.getLogger(__name__) + +R_L2 = 2 +ROBUSTNESS_EPSILON = 0.03 +# ART CLEVER is an L2 lower-bound estimate; the attack radius maps to a full trust score. +CLEVER_REFERENCE = R_L2 +# ART empirical robustness is a relative perturbation distance; this maps 0.2 to a full trust score. +EMPIRICAL_ROBUSTNESS_REFERENCE = 0.2 +TABULAR_ATTACK_STEPS = 3 +ADVERSARIAL_LOG_SAMPLES = 2 +ADVERSARIAL_LOG_FEATURES = 12 + +def _build_art_classifier(model, input_shape, nb_classes, learning_rate): + # Wrap the PyTorch model with the ART classifier interface used by ART metrics. + criterion = nn.CrossEntropyLoss() + optimizer = optim.Adam(model.parameters(), learning_rate) + + return PyTorchClassifier( + model=model, + loss=criterion, + optimizer=optimizer, + input_shape=tuple(input_shape), + nb_classes=nb_classes, + ) + + +def _validate_test_sample_tensors(test_sample): + # Shared guard for sample-based metrics that expect a non-empty (x, y) batch. + if not (isinstance(test_sample, (tuple, list)) and len(test_sample) >= 2): + raise ValueError("`test_sample` must contain samples and labels.") + + samples, labels = test_sample[0], test_sample[1] + if not (torch.is_tensor(samples) and torch.is_tensor(labels) and samples.shape[0] > 0): + raise ValueError("`test_sample` must contain non-empty tensors for samples and labels.") + + return samples, labels + + +def _coerce_max_samples(max_samples, default=8): + # Keep metric calls bounded even if configuration values are missing or invalid. + try: + return max(1, int(max_samples)) + except Exception: + return default + + +def _coerce_tabular_metadata(metadata): + # Accept both serialized dataset metadata and the typed metadata object. + if metadata is None: + return None + + # Keep tabular-only imports lazy so image workflows do not depend on them. + from nebula.core.datasets.tabular_metadata import TabularAdversarialMetadata + + if isinstance(metadata, TabularAdversarialMetadata): + return metadata + return TabularAdversarialMetadata.from_dict(metadata) + + +def _get_tabular_metadata_from_dataset(dataset): + # Dataloaders can wrap datasets; walk through wrappers until metadata is found. + if dataset is None: + return None + + metadata = getattr(dataset, "tabular_metadata", None) + if metadata is not None: + return _coerce_tabular_metadata(metadata) + + return _get_tabular_metadata_from_dataset(getattr(dataset, "dataset", None)) + + +def _get_tabular_metadata_from_loader(data_loader): + # Return None for image datasets, which keeps the adversarial path on FGSM. + return _get_tabular_metadata_from_dataset(getattr(data_loader, "dataset", None)) + + +def _get_dataset_name_from_dataset(dataset): + # Dataset wrappers keep the real dataset in `.dataset`; walk through them. + if dataset is None: + return None + + dataset_name = getattr(dataset, "dataset_name", None) + if dataset_name is not None: + return dataset_name + + config = getattr(dataset, "config", None) + participant = getattr(config, "participant", None) + if isinstance(config, dict): + participant = config.get("participant", participant) + if isinstance(participant, dict): + dataset_name = participant.get("data_args", {}).get("dataset") + if dataset_name is not None: + return dataset_name + + return _get_dataset_name_from_dataset(getattr(dataset, "dataset", None)) + + +def _get_image_normalization_from_loader(data_loader): + # Resolve image mean/std from shared dataset metadata instead of inferring by channels. + dataset_name = _get_dataset_name_from_dataset(getattr(data_loader, "dataset", None)) + normalization = get_image_normalization(dataset_name) + if normalization is not None: + logger.info("[Robustness] Image normalization loaded | dataset=%s | mean/std=%s", dataset_name, normalization) + return normalization + + +def _build_fixed_epsilon_tabular_generator(epsilon, tabular_metadata): + # Reuse the tabular adversarial-training generator, but make evaluation deterministic. + from nebula.addons.defenses.adversarial_training.config import AdversarialTrainingConfig + from nebula.addons.defenses.adversarial_training.tabular import TabularConstrainedPGDGenerator + + class FixedEpsilonTabularConstrainedPGDGenerator(TabularConstrainedPGDGenerator): + def _sample_epsilon(self, device): + # Training samples epsilon; factsheet metrics should use the requested epsilon exactly. + self.last_epsilon = float(self.config.epsilon) + return self.last_epsilon + + config = AdversarialTrainingConfig( + domain="tabular", + attack="constrained_pgd", + epsilon=float(epsilon), + steps=TABULAR_ATTACK_STEPS, + candidate_selection="none", + ) + return FixedEpsilonTabularConstrainedPGDGenerator(config, tabular_metadata) + + +def _build_tabular_generator(epsilon, tabular_metadata): + # A missing generator intentionally means "use the image/default FGSM path". + tabular_metadata = _coerce_tabular_metadata(tabular_metadata) + if tabular_metadata is None: + return None + + return _build_fixed_epsilon_tabular_generator(epsilon, tabular_metadata) + + +def _attack_name(tabular_generator): + # Keep log messages explicit about which adversarial path is active. + return "tabular_constrained_pgd" if tabular_generator is not None else "fgsm" + + +def _tensor_range(tensor): + # Compact numeric summary for batch-level logging. + if tensor.numel() == 0: + return "empty" + + tensor = tensor.detach().float().cpu() + return "min={:.6f}, max={:.6f}, mean={:.6f}".format( + tensor.min().item(), + tensor.max().item(), + tensor.mean().item(), + ) + + +def _format_preview_vector(vector, feature_names=None, max_features=ADVERSARIAL_LOG_FEATURES): + # Log only a small prefix of the flattened vector to keep factsheet logs readable. + values = vector.detach().flatten().float().cpu().tolist() + preview_values = values[:max_features] + + if feature_names: + names = list(feature_names)[:max_features] + items = [ + "{}={:.6f}".format(name, float(value)) + for name, value in zip(names, preview_values, strict=False) + ] + else: + items = ["{:.6f}".format(float(value)) for value in preview_values] + + suffix = ", ..." if len(values) > max_features else "" + return "[" + ", ".join(items) + suffix + "]" + + +def _log_adversarial_generation(metric_name, samples, labels, x_adv, epsilon, tabular_generator, batch_idx): + # Log one representative batch per metric invocation to inspect generated samples. + if batch_idx != 0: + return + + attack = _attack_name(tabular_generator) + clean = samples.detach().cpu() + adv = x_adv.detach().cpu() + delta = adv - clean + flat_delta = delta.reshape(delta.shape[0], -1).float() + feature_names = getattr(getattr(tabular_generator, "metadata", None), "feature_names", None) + + logger.info( + "[Robustness] %s adversarial generation | attack=%s | epsilon=%.6f | " + "clean_shape=%s | adv_shape=%s | clean=%s | adv=%s | " + "delta_linf=%.6f | delta_l2_mean=%.6f", + metric_name, + attack, + float(epsilon), + tuple(clean.shape), + tuple(adv.shape), + _tensor_range(clean), + _tensor_range(adv), + flat_delta.abs().max().item() if flat_delta.numel() else 0.0, + flat_delta.norm(p=2, dim=1).mean().item() if flat_delta.numel() else 0.0, + ) + + n_preview = min(int(clean.shape[0]), ADVERSARIAL_LOG_SAMPLES) + labels_cpu = labels.detach().cpu() if torch.is_tensor(labels) else labels + for sample_idx in range(n_preview): + label = labels_cpu[sample_idx].item() if torch.is_tensor(labels_cpu) else None + logger.info( + "[Robustness] %s adversarial sample %s | attack=%s | label=%s | " + "clean=%s | adversarial=%s | delta=%s", + metric_name, + sample_idx, + attack, + label, + _format_preview_vector(clean[sample_idx], feature_names), + _format_preview_vector(adv[sample_idx], feature_names), + _format_preview_vector(delta[sample_idx]), + ) + + +def _generate_adversarial_samples( + model, + samples, + labels, + epsilon=ROBUSTNESS_EPSILON, + tabular_generator=None, + image_normalization=None, +): + # Central switch: FGSM for images, constrained PGD for tabular datasets. + if tabular_generator is None: + return fgsm_attack( + model, + samples, + labels, + epsilon=epsilon, + image_normalization=image_normalization, + ) + + return tabular_generator.generate(model, samples, labels, nn.CrossEntropyLoss()) + + +def get_clever_score(model, test_sample, nb_classes, learning_rate, max_samples=8): + # Calculates and scales ART CLEVER into a trust score. + + samples, _ = _validate_test_sample_tensors(test_sample) + + input_shape = tuple(samples.shape[1:]) if samples.dim() >= 2 else tuple(samples.shape) + + max_samples = _coerce_max_samples(max_samples) + n_samples = min(int(samples.shape[0]), max_samples) + + # Create the ART classifier once and reuse it for all selected samples. + classifier = _build_art_classifier(model, input_shape, nb_classes, learning_rate) + + clever_scores = [] + for idx in range(n_samples): + # ART CLEVER evaluates one input at a time without the batch dimension. + background = samples[idx].detach().cpu() + sample_np = background.numpy() + + try: + score_untargeted = clever_u( + classifier, + sample_np, + 10, + 5, + R_L2, + norm=2, + pool_factor=3, + verbose=False, + ) + if score_untargeted is not None and not math.isnan(float(score_untargeted)): + clever_scores.append(float(score_untargeted)) + except Exception as exc: + logger.warning("Could not compute CLEVER score for sample index %s", idx) + logger.warning(exc) + + if not clever_scores: + return 0.0 + + raw_score = float(np.mean(clever_scores)) + score = min(max(raw_score / CLEVER_REFERENCE, 0.0), 1.0) + logger.info( + "[Robustness] CLEVER | raw_l2=%.6f | reference=%.6f | score=%.6f", + raw_score, + CLEVER_REFERENCE, + score, + ) + return score + +def get_loss_sensitivity_score(model, test_sample, nb_classes, learning_rate, max_samples=8): + # Calculates the loss sensitivity score as the mean score over multiple samples. + + samples, labels = _validate_test_sample_tensors(test_sample) + + max_samples = _coerce_max_samples(max_samples) + n_samples = min(int(samples.shape[0]), max_samples) + + # Create the ART classifier once and reuse it for all selected samples. + classifier = _build_art_classifier(model, samples.shape[1:], nb_classes, learning_rate) + + sensitivity_scores = [] + for idx in range(n_samples): + # ART loss_sensitivity expects a batch and one-hot labels. + sample = samples[idx].detach().cpu().unsqueeze(0) + label = labels[idx].detach().cpu().unsqueeze(0) + label = F.one_hot(label, num_classes=nb_classes).float() + + try: + score = loss_sensitivity( + classifier, + sample.numpy(), + label.numpy(), + ) + if score is not None and not math.isnan(float(score)): + sensitivity_scores.append(float(score)) + except Exception as exc: + logger.warning("Could not compute loss sensitivity for sample index %s", idx) + logger.warning(exc) + + if not sensitivity_scores: + return 0.0 + + return float(np.mean(sensitivity_scores)) + + +def get_adversarial_accuracy( + model, + test_loader, + nb_classes, + learning_rate, + epsilon=ROBUSTNESS_EPSILON +): + # Computes adversarial accuracy on generated adversarial samples. + + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + model.eval() + model.to(device) + # If metadata exists, adversarial examples preserve tabular feature constraints. + tabular_generator = _build_tabular_generator( + epsilon, + _get_tabular_metadata_from_loader(test_loader), + ) + image_normalization = None if tabular_generator is not None else _get_image_normalization_from_loader(test_loader) + logger.info( + "[Robustness] adversarial accuracy | attack=%s | epsilon=%.6f", + _attack_name(tabular_generator), + float(epsilon), + ) + + correct = 0 + total = 0 + + for batch_idx, (samples, labels) in enumerate(test_loader): + samples = samples.to(device) + labels = labels.to(device) + + x_adv = _generate_adversarial_samples( + model, + samples, + labels, + epsilon=epsilon, + tabular_generator=tabular_generator, + image_normalization=image_normalization, + ) + _log_adversarial_generation( + "adversarial_accuracy", + samples, + labels, + x_adv, + epsilon, + tabular_generator, + batch_idx, + ) + + with torch.no_grad(): + outputs = model(x_adv) + logits = outputs[0] if isinstance(outputs, (tuple, list)) else outputs + preds = logits.argmax(dim=1) + + correct += (preds == labels).sum().item() + total += labels.size(0) + + return correct / total if total > 0 else 0.0 + + +def get_empirical_robustness_score( + model, + test_sample, + nb_classes, + learning_rate, + attack_name = "fgsm", + attack_params = None, + max_samples = 128, +): + # Calculates and scales ART empirical robustness into a trust score. + + try: + samples, _ = _validate_test_sample_tensors(test_sample) + + batch_size: int = int(samples.shape[0]) + n: int = int(min(max_samples, batch_size)) + x = samples[:n].detach().cpu().numpy() + + classifier = _build_art_classifier(model, samples.shape[1:], nb_classes, learning_rate) + + raw_score = empirical_robustness( + classifier=classifier, + x=x, + attack_name=attack_name, + attack_params=attack_params, + ) + + if isinstance(raw_score, np.ndarray): + raw_score = float(np.mean(raw_score)) + + if raw_score is None or (isinstance(raw_score, float) and math.isnan(raw_score)): + return 0.0 + + score = min(max(float(raw_score) / EMPIRICAL_ROBUSTNESS_REFERENCE, 0.0), 1.0) + logger.info( + "[Robustness] empirical robustness | raw_distance=%.6f | reference=%.6f | score=%.6f", + float(raw_score), + EMPIRICAL_ROBUSTNESS_REFERENCE, + score, + ) + return score + + except Exception as exc: + logger.warning("Could not compute empirical robustness (ART). Returning 0.0") + logger.warning(exc) + return 0.0 + + +def _get_image_normalization_for_samples(samples, image_normalization=None): + # Image normalization must come from dataset metadata; do not infer it by channel count. + if image_normalization is not None: + return image_normalization + + if isinstance(samples, torch.Tensor) and samples.ndim >= 4: + logger.warning( + "[Robustness] Image normalization missing; FGSM will perturb without normalized-space clamping." + ) + return None + + +def _channel_tensor(values, samples): + # Broadcast channel statistics over the batch and spatial dimensions. + shape = [1, len(values)] + [1] * max(samples.dim() - 2, 0) + return torch.tensor(values, dtype=samples.dtype, device=samples.device).view(*shape) + + +def _fgsm_step_and_clamp(samples, grad, epsilon, image_normalization=None): + # Clamp image attacks in normalized space; leave non-image tensors unclamped here. + normalization = _get_image_normalization_for_samples(samples, image_normalization=image_normalization) + if normalization is None: + return samples + epsilon * grad.sign() + + mean, std = normalization + mean = _channel_tensor(mean, samples) + std = _channel_tensor(std, samples) + + normalized_epsilon = float(epsilon) / std + lower = (0.0 - mean) / std + upper = (1.0 - mean) / std + + x_adv = samples + normalized_epsilon * grad.sign() + x_adv = torch.max(torch.min(x_adv, samples + normalized_epsilon), samples - normalized_epsilon) + return torch.max(torch.min(x_adv, upper), lower) + + +def fgsm_attack(model, samples, labels, epsilon=ROBUSTNESS_EPSILON, image_normalization=None): + # Performs an FGSM (Fast Gradient Sign Method) adversarial attack on a batch of samples. + + try: + device = next(model.parameters()).device + except Exception: + device = samples.device + + samples = samples.clone().detach().to(device) + labels = labels.to(device) + # Gradients are needed only with respect to the input batch. + samples.requires_grad = True + + outputs = model(samples) + logits = outputs[0] if isinstance(outputs, (tuple, list)) else outputs + loss = nn.CrossEntropyLoss()(logits, labels) + grad = torch.autograd.grad(loss, samples, only_inputs=True)[0] + x_adv = _fgsm_step_and_clamp(samples, grad, epsilon, image_normalization=image_normalization) + logger.debug( + "[Robustness] FGSM batch generated | epsilon=%.6f | samples_shape=%s | grad=%s | adv=%s", + float(epsilon), + tuple(samples.shape), + _tensor_range(grad), + _tensor_range(x_adv), + ) + + return x_adv.detach() + + +def get_confidence_score( + model, + test_sample, + max_samples = 128, + use_true_label = True, +): + # Calculates the confidence score. + + try: + if not isinstance(model, torch.nn.Module): + logger.warning("Model is not a torch.nn.Module") + return 0.0 + + x, y = test_sample + + if isinstance(x, torch.Tensor): + x = x[:max_samples] + if isinstance(y, torch.Tensor): + y = y[:max_samples] + + try: + device = next(model.parameters()).device + except Exception: + device = torch.device("cpu") + + model.eval() + with torch.no_grad(): + x = x.to(device) if isinstance(x, torch.Tensor) else x + out = model(x) + + logits = out[0] if isinstance(out, (tuple, list)) else out + probs = torch.softmax(logits, dim=1) + + if use_true_label and isinstance(y, torch.Tensor): + # True-label confidence is used when labels are available. + if y.ndim > 1: + y_idx = torch.argmax(y, dim=1) + else: + y_idx = y + y_idx = y_idx.to(device) + + true_probs = probs.gather(1, y_idx.view(-1, 1)).squeeze(1) + return float(true_probs.mean().detach().cpu().item()) + + msp = probs.max(dim=1).values + return float(msp.mean().detach().cpu().item()) + + except Exception as e: + logger.warning("Could not compute confidence score") + logger.warning(e) + return 0.0 + + +def attack_success_rate(model, test_loader, epsilon=ROBUSTNESS_EPSILON): + # Computes ASR over originally correct predictions only. + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + model.eval() + model.to(device) + # Tabular datasets use constrained PGD; image datasets fall back to FGSM. + tabular_generator = _build_tabular_generator( + epsilon, + _get_tabular_metadata_from_loader(test_loader), + ) + image_normalization = None if tabular_generator is not None else _get_image_normalization_from_loader(test_loader) + logger.info( + "[Robustness] attack success rate | attack=%s | epsilon=%.6f", + _attack_name(tabular_generator), + float(epsilon), + ) + + successful_attacks = 0 + num_correct = 0 + + for batch_idx, (samples, labels) in enumerate(test_loader): + samples = samples.to(device) + labels = labels.to(device) + + with torch.no_grad(): + outputs = model(samples) + logits = outputs[0] if isinstance(outputs, (tuple, list)) else outputs + preds = logits.argmax(dim=1) + + correct_mask = preds.eq(labels) + batch_correct = correct_mask.sum().item() + if batch_correct == 0: + # ASR is defined over clean-correct samples, so this batch contributes nothing. + continue + + x_adv = _generate_adversarial_samples( + model, + samples, + labels, + epsilon=epsilon, + tabular_generator=tabular_generator, + image_normalization=image_normalization, + ) + _log_adversarial_generation( + "attack_success_rate", + samples, + labels, + x_adv, + epsilon, + tabular_generator, + batch_idx, + ) + + with torch.no_grad(): + outputs_adv = model(x_adv) + logits_adv = outputs_adv[0] if isinstance(outputs_adv, (tuple, list)) else outputs_adv + preds_adv = logits_adv.argmax(dim=1) + + successful_attacks += (correct_mask & preds_adv.ne(labels)).sum().item() + num_correct += batch_correct + + return successful_attacks / num_correct if num_correct > 0 else 0.0 diff --git a/nebula/addons/trustworthiness/helpers/scenario_metrics.py b/nebula/addons/trustworthiness/helpers/scenario_metrics.py new file mode 100644 index 000000000..081c3db39 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/scenario_metrics.py @@ -0,0 +1,240 @@ +import io +import logging +import os +import statistics +from datetime import datetime + +import pandas as pd +import torch +from codecarbon import EmissionsTracker + +from nebula.addons.trustworthiness.helpers.csv_io import read_csv + +logger = logging.getLogger(__name__) + +DATETIME_FORMAT = "%d/%m/%Y %H:%M:%S" + + +def get_elapsed_time(start_time, end_time): + # Return scenario duration in minutes from the timestamps stored by the workload. + start_date = datetime.strptime(start_time, DATETIME_FORMAT) + end_date = datetime.strptime(end_time, DATETIME_FORMAT) + return (end_date - start_date).total_seconds() / 60 + + +def _trustworthiness_dir(scenario_name): + # All scenario metrics are stored under the scenario trustworthiness directory. + return os.path.join(os.environ.get("NEBULA_LOGS_DIR"), scenario_name, "trustworthiness") + + +def _global_data_results_path(scenario_name): + # CFL/global metrics are written in the shared data_results.csv file. + return os.path.join(_trustworthiness_dir(scenario_name), "data_results.csv") + + +def _participant_data_results_path(scenario_name, participant_id): + # DFL/SDFL participant metrics are written in participant-specific CSV files. + return os.path.join(_trustworthiness_dir(scenario_name), f"data_results_{participant_id}.csv") + + +def _read_global_results(scenario_name): + # Load the aggregate scenario metrics once and let callers pick the columns they need. + return read_csv(_global_data_results_path(scenario_name)) + + +def _read_participant_results(scenario_name, participant_id): + # Load local metrics for one participant. + return read_csv(_participant_data_results_path(scenario_name, participant_id)) + + +def _find_participant_row(data, participant_id, source_name): + # Match both string and integer IDs because CSV typing can vary between runs. + row = data[data["id"] == participant_id] + if row.empty: + row = _find_participant_row_by_int_id(data, participant_id) + + if row.empty: + raise ValueError(f"Participant {participant_id} not found in {source_name}") + + return row.iloc[0] + + +def _find_participant_row_by_int_id(data, participant_id): + # Retry numeric participant IDs when pandas read the id column as integers. + try: + return data[data["id"] == int(participant_id)] + except (TypeError, ValueError): + return data.iloc[0:0] + + +def _client_count(data): + # Global CSVs include the server row, so client averages exclude one row. + return len(_client_rows(data)) + + +def _client_rows(data): + # CFL writes client reports first and appends the server row last. + return data.iloc[:-1] if len(data) > 1 else data + + +def _mean_client_column(data, column_name): + # Average a global metric across clients while keeping the historical server-row exclusion. + clients = _client_rows(data) + return clients[column_name].sum() / max(1, len(clients)) + + +def get_bytes_model(model): + # Serialize the model state_dict to measure the bytes that would be transmitted. + buffer = io.BytesIO() + torch.save(model.state_dict(), buffer) + return buffer.tell() + + +def get_bytes_sent_recv(scenario_name): + # Return total and average upload/download bytes from aggregate scenario results. + data = _read_global_results(scenario_name) + number_files = len(data) + + total_upload_bytes = int(data["bytes_sent"].sum()) + total_download_bytes = int(data["bytes_recv"].sum()) + + avg_upload_bytes = total_upload_bytes / number_files + avg_download_bytes = total_download_bytes / number_files + + return total_upload_bytes, total_download_bytes, avg_upload_bytes, avg_download_bytes + + +def get_avg_loss_accuracy(scenario_name): + # Return client-average test loss, accuracy, accuracy std, macro F1 and train accuracy. + data = _read_global_results(scenario_name) + clients = _client_rows(data) + + avg_loss = _mean_client_column(data, "loss") + avg_accuracy = _mean_client_column(data, "accuracy") + std_accuracy = statistics.stdev(clients["accuracy"]) if len(clients) > 1 else 0.0 + avg_macro_f1 = _mean_client_column(data, "macro_f1") + avg_train_accuracy = _mean_client_column(data, "train_accuracy") + + return avg_loss, avg_accuracy, std_accuracy, avg_macro_f1, avg_train_accuracy + + +def get_underfitting_score(scenario_name, participant_id): + # CFL underfitting uses the average validation accuracy across client rows. + data = _read_global_results(scenario_name) + return _mean_client_column(data, "val_accuracy") + + +def get_participant_loss_accuracy(scenario_name, participant_id): + # Read one participant's final CFL loss and accuracy from the aggregate CSV. + data_file = _global_data_results_path(scenario_name) + row = _find_participant_row(read_csv(data_file), participant_id, data_file) + return float(row["loss"]), float(row["accuracy"]) + + +def get_underfitting_score_local(scenario_name, participant_id): + # DFL/SDFL underfitting uses the participant-local validation accuracy. + data = _read_participant_results(scenario_name, participant_id) + return float(data["val_accuracy"].iloc[0]) + + +def get_dp_local(scenario_name, participant_id): + # Return DP settings stored by a single DFL/SDFL participant. + data = _read_participant_results(scenario_name, participant_id) + return data["dp_enabled"].iloc[0], float(data["dp_epsilon"].iloc[0]) + + +def get_dp_global(scenario_name): + # Return CFL DP settings, averaging epsilon across client rows when DP is enabled. + data = _read_global_results(scenario_name) + clients = _client_rows(data) + + if clients["dp_enabled"].iloc[0] == False: + return False, 0.0 + + return True, _mean_client_column(data, "dp_epsilon") + + +def get_avg_class_imbalance_model_size(scenario_name): + # Return average class imbalance and model size across client rows. + data = _read_global_results(scenario_name) + clients = _client_rows(data) + number_files = max(1, len(clients)) + + avg_class_imbalance = clients["class_imbalance"].sum() / number_files + avg_model_size = clients["model_size"].sum() / number_files + + return avg_class_imbalance, avg_model_size + + +def get_entropy_list(scenario_name): + # Return client entropy values so callers can normalize the distribution. + data = _read_global_results(scenario_name) + return _client_rows(data)["local_entropy"].tolist() + + +def stop_emissions_tracking_and_save( + tracker: EmissionsTracker, + outdir: str, + emissions_file: str, + role: str, + workload: str, + sample_size: int = 0, + participant_idx=None, +): + # Stop CodeCarbon tracking and append the final emissions row to emissions.csv. + tracker.stop() + + emissions_path = os.path.join(outdir, emissions_file) + df = _read_or_create_emissions_dataframe(emissions_path) + + try: + row = _build_emissions_row(tracker, role, workload, sample_size, participant_idx) + df = pd.concat([df, pd.DataFrame(row)], ignore_index=True) + df.to_csv(emissions_path, encoding="utf-8", index=False) + except Exception as e: + logger.warning(e) + + +def _read_or_create_emissions_dataframe(emissions_path): + # Reuse the existing file when present, otherwise create the expected columns. + if os.path.exists(emissions_path): + return pd.read_csv(emissions_path) + + return pd.DataFrame( + columns=[ + "id", + "role", + "energy_grid", + "emissions", + "workload", + "CPU_model", + "GPU_model", + ] + ) + + +def _build_emissions_row(tracker, role, workload, sample_size, participant_idx): + # Convert CodeCarbon's final data object into the CSV row persisted by trustworthiness. + emissions_data = tracker.final_emissions_data + energy_grid = (emissions_data.emissions / emissions_data.energy_consumed) * 1000 + + return { + "id": participant_idx, + "role": role, + "energy_grid": [energy_grid], + "emissions": [emissions_data.emissions], + "workload": workload, + "CPU_model": emissions_data.cpu_model if emissions_data.cpu_model else "None", + "GPU_model": emissions_data.gpu_model if emissions_data.gpu_model else "None", + "CPU_used": bool(emissions_data.cpu_energy), + "GPU_used": bool(emissions_data.gpu_energy), + "energy_consumed": emissions_data.energy_consumed, + "sample_size": sample_size, + } + + +def comm_efficiency(bytes_up: int, bytes_down: int, test_acc_avg: float, eps: float = 1e-12) -> float: + # Communication efficiency is total transferred bytes divided by final accuracy. + total_bytes = float(bytes_up) + float(bytes_down) + accuracy = max(float(test_acc_avg), eps) + return total_bytes / accuracy diff --git a/nebula/addons/trustworthiness/helpers/scoring.py b/nebula/addons/trustworthiness/helpers/scoring.py new file mode 100644 index 000000000..955bf5421 --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/scoring.py @@ -0,0 +1,125 @@ +import logging + +import numpy as np + +logger = logging.getLogger(__name__) + + +def _is_number(value): + # Score calculations expect real numeric values; booleans are handled explicitly. + return isinstance(value, (int, float, np.number)) and not isinstance(value, bool) + + +def _warn_not_number(value): + # Keep the warning format consistent across all numeric scoring functions. + logger.warning("Input value is not a number") + logger.warning(f"{value}") + + +def get_mapped_score(score_key, score_map): + # Normalize the configured score map and return the normalized value for the input key. + if score_map is None: + logger.warning("Score map is missing") + return 0 + + normalized_scores = get_normalized_scores(list(score_map.values())) + normalized_score_map = dict(zip(score_map.keys(), normalized_scores, strict=False)) + return normalized_score_map.get(score_key, np.nan) + + +def get_normalized_scores(scores): + # Convert a list of raw configured scores to the [0, 1] range. + if scores is None or len(scores) == 0: + return [] + + min_score = np.min(scores) + max_score = np.max(scores) + if max_score == min_score: + return [1.0 for _ in scores] + + return [(score - min_score) / (max_score - min_score) for score in scores] + + +def get_range_score(value, ranges, direction="asc"): + # Place the value in one of the configured bins and normalize that bin index. + if not _is_number(value): + _warn_not_number(value) + return 0 + + if ranges is None: + logger.warning("Score ranges are missing") + return 0 + + total_bins = len(ranges) + 1 + bin_index = np.digitize(value, ranges, right=True) + score = bin_index / total_bins + return 1 - score if direction == "desc" else score + + +def get_map_value_score(score_key, score_map): + # Return the exact configured score for maps that already store normalized values. + if score_map is None: + logger.warning("Score map is missing") + return 0 + + return score_map[score_key] + + +def get_true_score(value, direction): + # Booleans are direct scores; numeric values can be inverted for descending metrics. + if value is True: + return 1 + if value is False: + return 0 + + if not _is_number(value): + _warn_not_number(value) + return 0 + + return 1 - value if direction == "desc" else value + + +def get_scaled_score(value, scale: list, direction: str): + # Clamp a metric from its configured scale into the [0, 1] score range. + if value is None or value == "": + logger.warning("Score value is missing. Set value to zero") + return 0 + + if not _is_number(value): + _warn_not_number(value) + return 0 + + value_min, value_max = _get_scale_bounds(scale) + if value_max == value_min: + score = 1 + elif value >= value_max: + score = 1 + elif value <= value_min: + score = 0 + else: + score = (float(value) - value_min) / (value_max - value_min) + + return 1 - score if direction == "desc" else score + + +def _get_scale_bounds(scale): + # Fall back to the default [0, 1] scale when the config is incomplete. + try: + return scale[0], scale[1] + except (TypeError, IndexError): + logger.warning("Score minimum or score maximum is missing. The minimum has been set to 0 and the maximum to 1") + return 0, 1 + + +def get_value(value): + # Factsheet operations use this when a metric only needs the raw input value. + return value + + +def check_properties(*args): + # Return the fraction of required properties that are filled. + if not args: + return 0 + + filled = [value is not None and value != "" for value in args] + return np.mean(filled) diff --git a/nebula/addons/trustworthiness/helpers/trust_reports.py b/nebula/addons/trustworthiness/helpers/trust_reports.py new file mode 100644 index 000000000..11e09208d --- /dev/null +++ b/nebula/addons/trustworthiness/helpers/trust_reports.py @@ -0,0 +1,210 @@ +import copy +import json +import os + +SCORE_KEYS = {"trust_score", "score"} +NAMED_ENTRY_KEYS = {"score", "metrics", "notions", "pillars"} +NAMED_ENTRY_PATH_KEY = "__named_entry__" + + +def _logs_dir() -> str: + # Return the configured logs directory required by trust report exchange. + logs_dir = os.environ.get("NEBULA_LOGS_DIR") + if not logs_dir: + raise ValueError("The NEBULA_LOGS_DIR environment variable is not defined.") + return logs_dir + + +def _trustworthiness_dir(scenario_name: str) -> str: + # Return the scenario trustworthiness directory used by report JSON files. + return os.path.join(_logs_dir(), scenario_name, "trustworthiness") + + +def _trust_report_path(scenario_name: str, participant_id: int | str) -> str: + # Return the local trust report path for one participant. + return os.path.join(_trustworthiness_dir(scenario_name), f"nebula_trust_results_{participant_id}.json") + + +def _read_json_file(file_path: str) -> dict: + # Load a JSON object and raise clear errors for missing or invalid files. + if not os.path.exists(file_path): + raise FileNotFoundError(f"The file does not exist: {file_path}") + + try: + with open(file_path, "r", encoding="utf-8") as file: + return json.load(file) + except json.JSONDecodeError as error: + raise ValueError(f"The file does not contain valid JSON: {file_path}") from error + + +def _write_json_file(file_path: str, data: dict) -> str: + # Write a formatted JSON object, creating the parent directory if needed. + directory = os.path.dirname(file_path) + if directory: + os.makedirs(directory, exist_ok=True) + + with open(file_path, "w", encoding="utf-8") as file: + json.dump(data, file, indent=4) + + return file_path + + +def _is_score_entry(key, value) -> bool: + # Trust report scores are numeric values stored under score-like keys. + return key in SCORE_KEYS and _is_numeric_score(value) + + +def load_trust_report_json_dumped(scenario_name: str, participant_id: int) -> str: + # Load one participant report and return it serialized for network messages. + return json.dumps(load_trust_report_json(scenario_name, participant_id)) + + +def load_trust_report_json(scenario_name: str, participant_id: int | str) -> dict: + # Load one participant trustworthiness report as a dictionary. + return _read_json_file(_trust_report_path(scenario_name, participant_id)) + + +def create_local_trust_report_copy(scenario_name: str, participant_id: int | str, suffix: str = "global") -> tuple[dict, str]: + # Copy a participant report to a local aggregation output file. + trust_report = load_trust_report_json(scenario_name, participant_id) + file_path = os.path.join( + _trustworthiness_dir(scenario_name), + f"nebula_trust_results_{participant_id}_{suffix}.json", + ) + + return trust_report, _write_json_file(file_path, trust_report) + + +def save_trust_report_json(file_path: str, trust_report: dict) -> str: + # Save a trust report and return the written file path. + return _write_json_file(file_path, trust_report) + + +def accumulate_weighted_trustscores(report: dict, weight: float, score_accumulator: dict, weight_accumulator: dict): + # Add all score values from a report into weighted accumulators. + if weight <= 0: + raise ValueError("The aggregation weight must be greater than 0.") + + _accumulate_weighted_trustscores_recursive( + obj=report, + weight=float(weight), + path=(), + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + + +def build_weighted_trustscores_report(template_report: dict, score_accumulator: dict, weight_accumulator: dict) -> dict: + # Return a deep-copied report with every score replaced by its weighted mean. + aggregated_report = copy.deepcopy(template_report) + _apply_weighted_trustscores_recursive( + obj=aggregated_report, + path=(), + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + return aggregated_report + + +def _accumulate_weighted_trustscores_recursive(obj, weight: float, path: tuple, score_accumulator: dict, weight_accumulator: dict): + # Walk a trust report and accumulate weighted sums for every score path. + if isinstance(obj, dict): + named_entry = _get_structural_named_entry(obj) + if named_entry is not None: + _, nested_value = named_entry + _accumulate_weighted_trustscores_recursive( + obj=nested_value, + weight=weight, + path=path + (NAMED_ENTRY_PATH_KEY,), + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + return + + for key, value in obj.items(): + score_path = path + (key,) + if _is_score_entry(key, value): + score_accumulator[score_path] = score_accumulator.get(score_path, 0.0) + (float(value) * weight) + weight_accumulator[score_path] = weight_accumulator.get(score_path, 0.0) + weight + continue + + _accumulate_weighted_trustscores_recursive( + obj=value, + weight=weight, + path=score_path, + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + return + + if isinstance(obj, list): + for index, item in enumerate(obj): + _accumulate_weighted_trustscores_recursive( + obj=item, + weight=weight, + path=path + (index,), + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + + +def _apply_weighted_trustscores_recursive(obj, path: tuple, score_accumulator: dict, weight_accumulator: dict): + # Walk a report copy and replace score values with weighted averages. + if isinstance(obj, dict): + named_entry = _get_structural_named_entry(obj) + if named_entry is not None: + entry_key, nested_value = named_entry + obj[entry_key] = _apply_weighted_trustscores_recursive( + obj=nested_value, + path=path + (NAMED_ENTRY_PATH_KEY,), + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + return obj + + for key, value in obj.items(): + score_path = path + (key,) + if _is_score_entry(key, value): + total_weight = weight_accumulator.get(score_path) + if total_weight: + obj[key] = round(score_accumulator[score_path] / total_weight, 6) + continue + + obj[key] = _apply_weighted_trustscores_recursive( + obj=value, + path=score_path, + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + return obj + + if isinstance(obj, list): + for index, item in enumerate(obj): + obj[index] = _apply_weighted_trustscores_recursive( + obj=item, + path=path + (index,), + score_accumulator=score_accumulator, + weight_accumulator=weight_accumulator, + ) + + return obj + + +def _get_structural_named_entry(obj: dict): + # Detect wrappers like {"Privacy": {"score": ..., "metrics": ...}}. + if len(obj) != 1: + return None + + entry_key, nested_value = next(iter(obj.items())) + if not isinstance(nested_value, dict): + return None + + if any(key in nested_value for key in NAMED_ENTRY_KEYS): + return entry_key, nested_value + + return None + + +def _is_numeric_score(value): + # Booleans are ints in Python, but they are not trust score values here. + return isinstance(value, (int, float)) and not isinstance(value, bool) diff --git a/nebula/addons/trustworthiness/metric.py b/nebula/addons/trustworthiness/metric.py index 0952576b3..f1e453235 100755 --- a/nebula/addons/trustworthiness/metric.py +++ b/nebula/addons/trustworthiness/metric.py @@ -4,23 +4,43 @@ from nebula.addons.trustworthiness.graphics import Graphics from nebula.addons.trustworthiness.pillar import TrustPillar -from nebula.addons.trustworthiness.utils import write_results_json +from nebula.addons.trustworthiness.helpers.csv_io import write_results_json dirname = os.path.dirname(__file__) logger = logging.getLogger(__name__) +def get_eval_metrics_file(federation_prefix, factsheet, default_file_name): + data_type = str(factsheet.get("data", {}).get("type", "")).strip().lower() + + if data_type not in {"images", "tabular"}: + return os.path.join(dirname, "configs", default_file_name) + + metrics_file_name = f"eval_metrics_{federation_prefix}_{data_type}.json" + metrics_file = os.path.join(dirname, "configs", metrics_file_name) + + return metrics_file if os.path.exists(metrics_file) else os.path.join(dirname, "configs", default_file_name) + + class TrustMetricManager: """ Manager class to help store the output directory and handle calls from the FL framework. """ - def __init__(self, scenario_start_time): - self.factsheet_file_nm = "factsheet.json" - self.eval_metrics_file_nm = "eval_metrics.json" - self.nebula_trust_results_nm = "nebula_trust_results.json" - self.scenario_start_time = scenario_start_time + def __init__(self, scenario_start_time, federation, participant=None): + if federation == "DFL" or federation == "SDFL": + self.federation_prefix = "dfl" + self.factsheet_file_nm = f"factsheet_participant_{participant}.json" + self.eval_metrics_file_nm = "eval_metrics_dfl.json" + self.nebula_trust_results_nm = f"nebula_trust_results_{participant}.json" + self.scenario_start_time = scenario_start_time + else: + self.federation_prefix = "cfl" + self.factsheet_file_nm = "factsheet.json" + self.eval_metrics_file_nm = "eval_metrics_cfl.json" + self.nebula_trust_results_nm = "nebula_trust_results.json" + self.scenario_start_time = scenario_start_time def evaluate(self, experiment_name, weights, use_weights=False): """ @@ -34,19 +54,22 @@ def evaluate(self, experiment_name, weights, use_weights=False): # Get scenario name scenario_name = experiment_name factsheet_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", self.factsheet_file_nm) - metrics_cfg_file = os.path.join(dirname, "configs", self.eval_metrics_file_nm) results_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", self.nebula_trust_results_nm) if not os.path.exists(factsheet_file): logger.error(f"{factsheet_file} is missing! Please check documentation.") return + with open(factsheet_file, "r") as f: + factsheet = json.load(f) + + metrics_cfg_file = get_eval_metrics_file(self.federation_prefix, factsheet, self.eval_metrics_file_nm) + if not os.path.exists(metrics_cfg_file): logger.error(f"{metrics_cfg_file} is missing! Please check documentation.") return - with open(factsheet_file, "r") as f, open(metrics_cfg_file, "r") as m: - factsheet = json.load(f) + with open(metrics_cfg_file, "r") as m: metrics_cfg = json.load(m) metrics = metrics_cfg.items() input_docs = {"factsheet": factsheet} @@ -55,7 +78,7 @@ def evaluate(self, experiment_name, weights, use_weights=False): final_score = 0 result_print = [] for key, value in metrics: - pillar = TrustPillar(key, value, input_docs, use_weights) + pillar = TrustPillar(key, value, input_docs, use_weights, user_weights=weights) score, result = pillar.evaluate() weight = weights.get(key) / 100 final_score += weight * score @@ -64,6 +87,58 @@ def evaluate(self, experiment_name, weights, use_weights=False): final_score = round(final_score, 2) result_json["trust_score"] = final_score write_results_json(results_file, result_json) - + graphics = Graphics(self.scenario_start_time, scenario_name) graphics.graphics() + + def evaluate_participant(self, experiment_name, weights, participant_id, use_weights=False): + """ + Evaluates the trustworthiness score. + + Args: + scenario (object): The scenario in whith the trustworthiness will be calculated. + weights (dict): The desired weghts of the pillars. + use_weights (bool): True to turn on the weights in the metric config file, default to False. + """ + # Get scenario name + scenario_name = experiment_name + factsheet_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", self.factsheet_file_nm) + results_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", self.nebula_trust_results_nm) + + if not os.path.exists(factsheet_file): + logger.error(f"{factsheet_file} is missing! Please check documentation.") + return + + with open(factsheet_file, "r") as f: + factsheet = json.load(f) + + metrics_cfg_file = get_eval_metrics_file(self.federation_prefix, factsheet, self.eval_metrics_file_nm) + + if not os.path.exists(metrics_cfg_file): + logger.error(f"{metrics_cfg_file} is missing! Please check documentation.") + return + + with open(metrics_cfg_file, "r") as m: + raw_metrics_cfg: str = m.read() + raw_metrics_cfg = raw_metrics_cfg.replace("factsheet", f"factsheet_participant_{participant_id}") + metrics_cfg = json.loads(raw_metrics_cfg) + + metrics = metrics_cfg.items() + input_docs = {f"factsheet_participant_{participant_id}": factsheet} + + result_json = {"trust_score": 0, "pillars": []} + final_score = 0 + result_print = [] + for key, value in metrics: + pillar = TrustPillar(key, value, input_docs, use_weights, user_weights=weights) + score, result = pillar.evaluate() + weight = weights.get(key) / 100 + final_score += weight * score + result_print.append([key, score]) + result_json["pillars"].append(result) + final_score = round(final_score, 2) + result_json["trust_score"] = final_score + write_results_json(results_file, result_json) + + graphics = Graphics(self.scenario_start_time, scenario_name, participant_id) + graphics.graphics_dfl(participant_id) diff --git a/nebula/addons/trustworthiness/per_round_metrics.py b/nebula/addons/trustworthiness/per_round_metrics.py new file mode 100644 index 000000000..ea5a2ff3d --- /dev/null +++ b/nebula/addons/trustworthiness/per_round_metrics.py @@ -0,0 +1,89 @@ +# nebula/addons/trustworthiness/per_round_metrics.py +from __future__ import annotations + +import asyncio +import csv +import os +from dataclasses import dataclass, field +from typing import Optional + + +from nebula.addons.functions import print_msg_box + + +def _safe_get_round(engine) -> int: + trainer = getattr(engine, "trainer", None) + if trainer is None: + return -1 + + try: + return int(trainer.get_round()) + except Exception: + return int(getattr(trainer, "round", -1)) + + +@dataclass +class PerRoundTrustMetrics: + experiment_name: str + participant_idx: int + trust_dir: str + role_label: str + + enable_print: bool = True + enable_csv: bool = True + + _csv_path: str = field(init=False) + _prev_acc: Optional[float] = field(default=None, init=False) + _lock: asyncio.Lock = field(default_factory=asyncio.Lock, init=False) + + async def setup(self, engine) -> None: + os.makedirs(self.trust_dir, exist_ok=True) + self._csv_path = os.path.join( + self.trust_dir, f"round_metrics_participant_{self.participant_idx}.csv" + ) + + if self.enable_csv and not os.path.exists(self._csv_path): + with open(self._csv_path, "w", newline="") as f: + w = csv.writer(f) + w.writerow([ + "round", + "participant", + "role", + "loss", + "accuracy", + "tw_stability", + ]) + async def on_test_metrics(self, engine, loss: float, acc: float) -> None: + async with self._lock: + round_id = _safe_get_round(engine) + + if self._prev_acc is None: + tw_stability = 1.0 + else: + tw_stability = 1.0 - abs(acc - self._prev_acc) + tw_stability = max(0.0, min(1.0, tw_stability)) + self._prev_acc = acc + + if self.enable_csv: + with open(self._csv_path, "a", newline="") as f: + w = csv.writer(f) + w.writerow([ + round_id, + self.participant_idx, + self.role_label, + float(loss), + float(acc), + float(tw_stability), + ]) + + if self.enable_print: + print_msg_box( + msg=( + f"Round: {round_id}\n" + f"Loss: {loss:.4f}\n" + f"Accuracy: {acc:.4f}\n" + f"TW/Stability: {tw_stability:.4f}\n" + ), + indent=2, + title=f"Trustworthiness (per-round) | {self.role_label} | Participant: {self.participant_idx}", + ) diff --git a/nebula/addons/trustworthiness/pillar.py b/nebula/addons/trustworthiness/pillar.py index 1a780cc5b..ecd15cf7e 100755 --- a/nebula/addons/trustworthiness/pillar.py +++ b/nebula/addons/trustworthiness/pillar.py @@ -1,7 +1,13 @@ import logging -from nebula.addons.trustworthiness import calculation -from nebula.addons.trustworthiness.utils import get_input_value +from nebula.addons.trustworthiness.helpers.factsheet_values import get_input_value +from nebula.addons.trustworthiness.helpers.scoring import ( + get_map_value_score, + get_mapped_score, + get_range_score, + get_scaled_score, + get_true_score, +) logger = logging.getLogger(__name__) @@ -18,12 +24,13 @@ class TrustPillar: """ - def __init__(self, name, metrics, input_docs, use_weights=False): + def __init__(self, name, metrics, input_docs, use_weights=False, user_weights=None): self.name = name self.input_docs = input_docs self.metrics = metrics self.result = [] self.use_weights = use_weights + self.user_weights = user_weights or {} def evaluate(self): """ @@ -35,11 +42,22 @@ def evaluate(self): score = 0 avg_weight = 1 / len(self.metrics) for key, value in self.metrics.items(): - weight = value.get("weight", avg_weight) if self.use_weights else avg_weight + weight = self._get_notion_weight(key, value, avg_weight) if self.use_weights else avg_weight score += weight * self.get_notion_score(key, value.get("metrics")) score = round(score, 2) return score, {self.name: {"score": score, "notions": self.result}} + def _get_notion_weight(self, notion_name, notion_config, avg_weight): + """ + Resolve the weight for a notion. + + Scenario-defined notion weights are stored as percentages in scenario.json. + When present, they must override the defaults from the metrics config. + """ + if notion_name in self.user_weights: + return float(self.user_weights[notion_name]) / 100 + return notion_config.get("weight", avg_weight) + def get_notion_score(self, name, metrics): """ Evaluate the trust score for the notion. @@ -84,15 +102,15 @@ def get_metric_score(self, result, name, metric): logger.warning(f"{name} input value is null") else: if score_type == "true_score": - score = calculation.get_true_score(input_value, metric.get("direction")) + score = get_true_score(input_value, metric.get("direction")) elif score_type == "score_mapping": - score = calculation.get_mapped_score(input_value, metric.get("score_map")) + score = get_mapped_score(input_value, metric.get("score_map")) elif score_type == "ranges": - score = calculation.get_range_score(input_value, metric.get("ranges"), metric.get("direction")) + score = get_range_score(input_value, metric.get("ranges"), metric.get("direction")) elif score_type == "score_map_value": - score = calculation.get_map_value_score(input_value, metric.get("score_map")) + score = get_map_value_score(input_value, metric.get("score_map")) elif score_type == "scaled_score": - score = calculation.get_scaled_score(input_value, metric.get("scale"), metric.get("direction")) + score = get_scaled_score(input_value, metric.get("scale"), metric.get("direction")) elif score_type == "property_check": score = 0 if input_value is None else input_value diff --git a/nebula/addons/trustworthiness/trustworthiness.py b/nebula/addons/trustworthiness/trustworthiness.py index 1eaa17c6a..39778c22c 100644 --- a/nebula/addons/trustworthiness/trustworthiness.py +++ b/nebula/addons/trustworthiness/trustworthiness.py @@ -1,15 +1,48 @@ import logging +import asyncio from nebula.addons.functions import print_msg_box -from nebula.core.nebulaevents import ExperimentFinishEvent, RoundEndEvent, TestMetricsEvent +from nebula.core.nebulaevents import AggregationEvent, ExperimentFinishEvent, RoundStartEvent, TestMetricsEvent, ValidationMetricsEvent from nebula.core.eventmanager import EventManager from nebula.core.noderole import Role, ServerRoleBehavior from abc import ABC, abstractmethod from nebula.config.config import Config from nebula.core.engine import Engine -import pickle -from nebula.addons.trustworthiness.calculation import stop_emissions_tracking_and_save -from nebula.addons.trustworthiness.utils import save_results_csv +from nebula.addons.trustworthiness.helpers.csv_io import ( + load_data_results_participant, + load_emissions_participant, + save_emissions_csv_cfl, + save_results_csv, + save_results_csv_cfl, + save_trustworthiness_reports_csv, +) +from nebula.addons.trustworthiness.helpers.data_distribution import ( + get_class_imbalance_local, + get_local_entropy, + get_participation_variation_score, + save_class_count_per_participant, +) +from nebula.addons.trustworthiness.helpers.scenario_metrics import ( + get_bytes_model, + stop_emissions_tracking_and_save, +) +from nebula.addons.trustworthiness.helpers.trust_reports import ( + accumulate_weighted_trustscores, + build_weighted_trustscores_report, + create_local_trust_report_copy, + load_trust_report_json_dumped, + save_trust_report_json, +) from codecarbon import EmissionsTracker +from nebula.addons.trustworthiness.per_round_metrics import PerRoundTrustMetrics +from datetime import datetime +from nebula.addons.trustworthiness.cfl_factsheet import CflFactsheet +from nebula.addons.trustworthiness.metric import TrustMetricManager +from nebula.addons.trustworthiness.dfl_factsheet import DflFactsheet +from nebula.addons.trustworthiness.graphics import Graphics +from nebula.addons.trustworthiness.weights import load_trust_weights +import json +import os +from nebula.core.network.communications import CommunicationsManager """ ############################## # TRUST WORKLOADS # @@ -22,191 +55,889 @@ class TrustWorkloadException(Exception): class TrustWorkload(ABC): @abstractmethod async def init(self, experiment_name): + # Initialize workload resources and event subscriptions. raise NotImplementedError - + @abstractmethod def get_workload(self) -> str: + # Return the workload label persisted in trustworthiness outputs. raise NotImplementedError - + @abstractmethod def get_sample_size(self) -> float: + # Return the local sample size used by the workload. raise NotImplementedError - - abstractmethod - def get_metrics(self) -> tuple[float, float]: + + @abstractmethod + def get_metrics(self) -> tuple[float, float, float]: + # Return the latest test loss, accuracy and macro F1. raise NotImplementedError - + @abstractmethod async def finish_experiment_role_pre_actions(self): + # Run role-specific work before final metrics are persisted. raise NotImplementedError - + @abstractmethod async def finish_experiment_role_post_actions(self, trust_config, experiment_name): + # Run role-specific work after final metrics are persisted. raise NotImplementedError -class TrustWorkloadTrainer(TrustWorkload): - def __init__(self, engine, idx, trust_files_route): +class BaseTrustWorkload(TrustWorkload): + def __init__(self, engine: Engine, idx, trust_files_route, workload: str, role_label: str, sample_size=None, start_time=None): + # Store shared workload state used by trainers and servers. self._engine: Engine = engine - self._workload = 'training' + self._workload = workload self._idx = idx self._trust_files_route = trust_files_route - self._train_loader_file = f'{self._trust_files_route}/participant_{self._idx}_train_loader.pk' - self._sample_size = None + self._sample_size = sample_size self._current_loss = None self._current_accuracy = None + self._current_macro_f1 = None + self._current_val_loss = None + self._current_val_accuracy = None + self._current_train_accuracy = None self._experiment_name = "" - + self._per_round = None + self._role_label = role_label + self._start_time = start_time or datetime.now().strftime("%d/%m/%Y %H:%M:%S") + self._end_time = None + self._round_participation_counts = {} + self._dropout_expected_total = 0 + self._dropout_missing_total = 0 + self._aggregation_rounds_total = 0 + self._timed_out_rounds_total = 0 + async def init(self, experiment_name): + # Subscribe to the events needed to build final trust summaries. self._experiment_name = experiment_name - await EventManager.get_instance().subscribe_node_event(RoundEndEvent, self._process_round_end_event) + await EventManager.get_instance().subscribe_node_event(AggregationEvent, self._process_aggregation_event) + await EventManager.get_instance().subscribe_node_event(RoundStartEvent, self._process_round_start_event) await EventManager.get_instance().subscribe_addonevent(TestMetricsEvent, self._process_test_metrics_event) - await EventManager.get_instance().subscribe_node_event(ExperimentFinishEvent, self._process_experiment_finished_event) - await self._create_pk_files(experiment_name) - - async def _create_pk_files(self, experiment_name): - # Save data to local files to calculate the trustworthyness - train_loader_filename = f"/nebula/app/logs/{experiment_name}/trustworthiness/participant_{self._idx}_train_loader.pk" - test_loader_filename = f"/nebula/app/logs/{experiment_name}/trustworthiness/participant_{self._idx}_test_loader.pk" - self._engine.trainer.datamodule.setup(stage="fit") - train_loader = self._engine.trainer.datamodule.train_dataloader() - self._engine.trainer.datamodule.setup(stage="test") - test_loader = self._engine.trainer.datamodule.test_dataloader()[0] - - with open(train_loader_filename, 'wb') as f: - pickle.dump(train_loader, f) - f.close() - with open(test_loader_filename, 'wb') as f: - pickle.dump(test_loader, f) - f.close() - + await EventManager.get_instance().subscribe_addonevent(ValidationMetricsEvent, self._process_validation_metrics_event) + + self._per_round = PerRoundTrustMetrics( + experiment_name=experiment_name, + participant_idx=self._idx, + trust_dir=self._trust_files_route, + role_label=self._role_label, + enable_print=True, + enable_csv=True, + ) + await self._per_round.setup(self._engine) + def get_workload(self): + # Return the workload name associated with this node role. return self._workload - + def get_sample_size(self): + # Return the sample size captured by the role pre-actions. return self._sample_size - + def get_metrics(self): - return (self._current_loss, self._current_accuracy) - + # Return the latest test metrics observed through events. + return (self._current_loss, self._current_accuracy, self._current_macro_f1) + + def get_validation_metrics(self): + # Return the latest validation metrics and train accuracy observed through events. + return (self._current_val_loss, self._current_val_accuracy, self._current_train_accuracy) + + def _is_reputation_enabled(self) -> bool: + # Read the reputation toggle from the participant defense config. + defense_args = self._engine.config.participant.get("defense_args", {}) + reputation_config = defense_args.get("reputation", {}) + return bool(reputation_config.get("enabled", False)) + + def _get_reputation_system(self): + # Return the reputation system attached to the engine, when present. + return getattr(self._engine, "_reputation", None) + + def _get_reputation_trust_summary(self) -> dict: + # Build the reputation fields added to the trust factsheet. + summary = { + "reputation_enabled": self._is_reputation_enabled(), + "avg_neighbor_reputation": 0.0, + } + if hasattr(self, "_expected_trustscores_sources"): + summary["neighbor_num"] = len(self._expected_trustscores_sources) + + if not summary["reputation_enabled"]: + return summary + + reputation_system = self._get_reputation_system() + reputation_values = [] + if reputation_system is not None: + for addr, data in reputation_system.reputation.items(): + if addr == self._engine.addr: + continue + + reputation_value = data.get("reputation") + if reputation_value is not None: + reputation_values.append(float(reputation_value)) + + if reputation_values: + summary["avg_neighbor_reputation"] = sum(reputation_values) / len(reputation_values) + else: + reputation_config = self._engine.config.participant.get("defense_args", {}).get("reputation", {}) + summary["avg_neighbor_reputation"] = float(reputation_config.get("initial_reputation", 0.0) or 0.0) + + return summary + + def _get_participation_trust_summary(self) -> dict: + # Build the participation variability fields added to the trust factsheet. + total_clients = int(self._engine.config.participant["scenario_args"]["n_nodes"]) - 1 + counts = list(self._round_participation_counts.values()) + + if len(counts) < total_clients: + counts.extend([0] * (total_clients - len(counts))) + + return { + "selection_cv": get_participation_variation_score(counts), + } + + def _get_system_reliability_summary(self) -> dict: + # Build dropout and timeout rates from aggregation events. + dropout_rate = 0.0 + if self._dropout_expected_total > 0: + dropout_rate = self._dropout_missing_total / self._dropout_expected_total + + timeout_rate = 0.0 + if self._aggregation_rounds_total > 0: + timeout_rate = self._timed_out_rounds_total / self._aggregation_rounds_total + + return { + "dropout_rate": float(dropout_rate), + "timeout_rate": float(timeout_rate), + } + + async def _process_round_start_event(self, rse: RoundStartEvent): + # Track how often each peer is expected to participate. + _, _, expected_nodes = await rse.get_event_data() + for node_addr in expected_nodes: + self._round_participation_counts[node_addr] = self._round_participation_counts.get(node_addr, 0) + 1 + + async def _process_aggregation_event(self, age: AggregationEvent): + # Track missing peers and timed-out aggregation rounds. + _, expected_nodes, missing_nodes = await age.get_event_data() + self_addr = self._engine.addr + + expected_without_self = {node for node in expected_nodes if node != self_addr} + missing_without_self = {node for node in missing_nodes if node != self_addr} + + self._aggregation_rounds_total += 1 + self._dropout_expected_total += len(expected_without_self) + self._dropout_missing_total += len(missing_without_self) + if missing_without_self: + self._timed_out_rounds_total += 1 + + async def _process_test_metrics_event(self, tme: TestMetricsEvent): + # Cache final test metrics and forward them to per-round trust metrics. + cur_loss, cur_acc, cur_macro_f1 = await tme.get_event_data() + if cur_loss is not None and cur_acc is not None: + self._current_loss, self._current_accuracy = cur_loss, cur_acc + self._current_macro_f1 = cur_macro_f1 + + if self._per_round is not None: + await self._per_round.on_test_metrics(self._engine, float(cur_loss), float(cur_acc)) + + async def _process_validation_metrics_event(self, vme: ValidationMetricsEvent): + # Cache final validation metrics for final trustworthiness outputs. + cur_loss, cur_acc, train_acc = await vme.get_event_data() + if cur_loss is not None and cur_acc is not None: + self._current_val_loss, self._current_val_accuracy = cur_loss, cur_acc + self._current_train_accuracy = train_acc + + +class TrustWorkloadTrainer(BaseTrustWorkload): + TRUSTSCORES_WAIT_TIMEOUT_SECONDS = 20 + TRUSTSCORES_FORWARDING_GRACE_SECONDS = 1.0 + TRUSTSCORES_FORWARDING_GRACE_MARGIN_SECONDS = 1.0 + + def __init__(self, engine, idx, trust_files_route): + # Initialize trainer-side state for CFL reports and DFL/SDFL trustscores. + super().__init__(engine, idx, trust_files_route, workload="training", role_label="TRAINER") + self._expected_trustscores_sources = set() + self._expected_trustscores_reports = int(self._engine.config.participant["scenario_args"]["n_nodes"]) - 1 + self._received_trustscores_node_ids = set() + self._trustscores_wait_event = None + self._trustscores_score_accumulator = {} + self._trustscores_weight_accumulator = {} + self._trustscores_template_report = None + self._trustscores_local_copy_path = None + self._trustscores_local_report_initialized = False + + async def init(self, experiment_name): + # Reset exchange state before subscribing to shared workload events. + self._reset_trustscores_exchange_state() + self._trustscores_wait_event = asyncio.Event() + await super().init(experiment_name) + async def finish_experiment_role_pre_actions(self): - with open(self._train_loader_file, 'rb') as file: - train_loader = pickle.load(file) + # Capture the training sample size before final trust outputs are written. + self._engine.trainer.datamodule.setup(stage="fit") + train_loader = self._engine.trainer.datamodule.train_dataloader() self._sample_size = len(train_loader) - + async def finish_experiment_role_post_actions(self, trust_config, experiment_name): - pass - - async def _process_round_end_event(self, ree: RoundEndEvent): - scenario_name = self._engine.config.participant["scenario_args"]["name"] - train_model = f"/nebula/app/logs/{scenario_name}/trustworthiness/participant_{self._idx}_train_model.pk" - # Save the train model in trustworthy dir - with open(train_model, 'wb') as f: - pickle.dump(self._engine.trainer.model, f) - - async def _process_test_metrics_event(self, tme: TestMetricsEvent): - cur_loss, cur_acc = await tme.get_event_data() - if cur_loss and cur_acc: - self._current_loss, self._current_accuracy = cur_loss, cur_acc - - async def _process_experiment_finished_event(self, efe:ExperimentFinishEvent): - model_file = f"/nebula/app/logs/{self._experiment_name}/trustworthiness/participant_{self._engine.idx}_final_model.pk" - - # Save model in trustworthy dir - with open(model_file, 'wb') as f: - pickle.dump(self._engine.trainer.model, f) - - - -class TrustWorkloadServer(TrustWorkload): - + # Finish with the report flow required by the selected federation type. + federation = trust_config.get("federation") + + if self._uses_trustscores_exchange(federation): + await self._finish_trustscores_exchange(federation, trust_config, experiment_name) + return + + await self._send_cfl_trustworthiness_report(experiment_name) + + def _uses_trustscores_exchange(self, federation: str | None) -> bool: + # DFL and SDFL share trust reports directly between participants. + return federation in {"DFL", "SDFL"} + + async def _send_cfl_trustworthiness_report(self, experiment_name: str): + # Send the participant trustworthiness report to the CFL server. + cm = CommunicationsManager.get_instance() + server_addr = str(self._engine.config.participant["network_args"]["neighbors"]).strip() + report = self._build_cfl_trustworthiness_report(experiment_name) + + message = cm.create_message( + "trustworthiness", + action="report", + node_id=str(self._idx), + **report, + ) + + self._log_cfl_trustworthiness_report(server_addr, report) + + await cm.send_message( + server_addr, + message, + message_type="trustworthiness", + allow_after_learning_finished=True, + ) + + def _build_cfl_trustworthiness_report(self, experiment_name: str) -> dict: + # Load local metrics and shape them as a trustworthiness message payload. + bytes_sent, bytes_recv, accuracy, loss, val_accuracy, macro_f1, train_accuracy, dp_enabled, dp_epsilon = load_data_results_participant( + experiment_name, + self._idx, + ) + role, energy_grid, emissions, workload, cpu_model, gpu_model, cpu_used, gpu_used, energy_consumed, sample_size = load_emissions_participant( + experiment_name, + self._idx, + ) + + return { + "bytes_sent": bytes_sent, + "bytes_recv": bytes_recv, + "accuracy": accuracy, + "loss": loss, + "role": role, + "energy_grid": energy_grid, + "emissions": emissions, + "workload": workload, + "cpu_model": cpu_model, + "gpu_model": gpu_model, + "cpu_used": cpu_used, + "gpu_used": gpu_used, + "energy_consumed": energy_consumed, + "sample_size": sample_size, + "class_imbalance": get_class_imbalance_local(self._idx, experiment_name), + "model_size": get_bytes_model(self._engine.trainer.model), + "local_entropy": get_local_entropy(self._idx, experiment_name), + "val_accuracy": val_accuracy, + "macro_f1": macro_f1, + "train_accuracy": train_accuracy, + "dp_enabled": dp_enabled, + "dp_epsilon": dp_epsilon, + } + + def _log_cfl_trustworthiness_report(self, server_addr: str, report: dict): + # Log the CFL report with the same fields sent over the network. + logging.info( + "[TW SEND] dest=%s node_id=%s bytes_sent=%s bytes_recv=%s " + "accuracy=%s loss=%s role=%s energy_grid=%s emissions=%s workload=%s " + "cpu_model=%s gpu_model=%s cpu_used=%s gpu_used=%s energy_consumed=%s sample_size=%s class_imbalance=%s model_size=%s local_entropy=%s val_accuracy=%s dp_enabled=%s dp_epsilon=%s macro_f1=%s train_accuracy=%s", + server_addr, + str(self._idx), + report["bytes_sent"], + report["bytes_recv"], + report["accuracy"], + report["loss"], + report["role"], + report["energy_grid"], + report["emissions"], + report["workload"], + report["cpu_model"], + report["gpu_model"], + report["cpu_used"], + report["gpu_used"], + report["energy_consumed"], + report["sample_size"], + report["class_imbalance"], + report["model_size"], + report["local_entropy"], + report["val_accuracy"], + report["dp_enabled"], + report["dp_epsilon"], + report["macro_f1"], + report["train_accuracy"], + ) + + async def _finish_trustscores_exchange(self, federation, trust_config, experiment_name): + # Compute, share, wait for, and optionally aggregate DFL/SDFL trustscores. + self._end_time = datetime.now().strftime("%d/%m/%Y %H:%M:%S") + await self._prepare_trustscores_exchange(federation) + + weights = self._load_local_trustscores_weights(experiment_name) + local_trust_report_json = await asyncio.to_thread( + self._compute_local_trustscores_report, + experiment_name, + trust_config, + weights, + federation, + ) + logging.info("[TW %s] local trustscores report computed", federation) + + if federation == "DFL": + self._initialize_local_trustscores_aggregation(experiment_name) + elif self._is_sdfl_aggregator_node(): + self._initialize_sdfl_global_trustscores_aggregation(experiment_name) + + await self._share_trustscores_report(local_trust_report_json, federation) + await self._wait_for_trustscores_reports(federation) + await self._wait_for_trustscores_forwarding_drain(federation) + + if federation == "DFL": + self._finalize_local_trustscores_aggregation() + elif self._is_sdfl_aggregator_node(): + self._finalize_sdfl_global_trustscores_aggregation() + + def _compute_local_trustscores_report(self, experiment_name, trust_config, weights, federation) -> str: + # Build the local DFL/SDFL factsheet and return its JSON report. + factsheet = DflFactsheet() + self._engine.trainer.datamodule.setup(stage="fit") + train_loader = self._engine.trainer.datamodule.train_dataloader() + self._engine.trainer.datamodule.setup(stage="test") + test_loader = self._engine.trainer.datamodule.test_dataloader()[0] + factsheet.populate_factsheet_dfl( + experiment_name, + self._idx, + trust_config, + self._start_time, + self._end_time, + self._engine.trainer.model, + train_loader, + test_loader, + reputation_summary=self._get_reputation_trust_summary(), + participation_summary=self._get_participation_trust_summary(), + reliability_summary=self._get_system_reliability_summary(), + ) + + trust_metric_manager = TrustMetricManager(self._start_time, federation, self._idx) + trust_metric_manager.evaluate_participant(experiment_name, weights, self._idx, use_weights=True) + + return load_trust_report_json_dumped(experiment_name, self._idx) + + def _load_local_trustscores_weights(self, experiment_name: str) -> dict: + # Load trust metric weights for the active federation. + federation = self._engine.config.participant["trust_args"]["scenario"].get("federation") + return load_trust_weights(experiment_name, federation) + + def _reset_trustscores_exchange_state(self): + # Clear mutable state from any previous trustscores exchange. + self._expected_trustscores_sources = set() + self._received_trustscores_node_ids = set() + self._trustscores_score_accumulator = {} + self._trustscores_weight_accumulator = {} + self._trustscores_template_report = None + self._trustscores_local_copy_path = None + self._trustscores_local_report_initialized = False + + def _get_trustscores_weight_for_source(self, source: str, node_id: int | str) -> float: + # Resolve the aggregation weight for a remote trust report. + if not self._is_reputation_enabled(): + return 0.5 + + reputation_system = self._get_reputation_system() + if reputation_system is None: + logging.warning( + "[TW DFL] Reputation is enabled but the reputation system is not available. Using fallback weight=0.5 for node_id=%s source=%s", + node_id, + source, + ) + return 0.5 + + reputation_entry = reputation_system.reputation.get(source) + if reputation_entry is None or reputation_entry.get("reputation") is None: + logging.warning( + "[TW DFL] No reputation value available for node_id=%s source=%s. Using fallback weight=0.5", + node_id, + source, + ) + return 0.5 + + return float(reputation_entry["reputation"]) + + def _get_trustscores_peer_weights_from_reputation(self) -> dict: + # Extract peer trustscores weights from the reputation system. + if not self._is_reputation_enabled(): + return {} + + reputation_system = self._get_reputation_system() + if reputation_system is None: + return {} + + peer_weights = {} + for addr, data in reputation_system.reputation.items(): + reputation_value = data.get("reputation") + if addr == self._engine.addr or reputation_value is None: + continue + peer_weights[addr] = float(reputation_value) + return peer_weights + + def _get_trustscores_self_weight(self) -> float: + # Keep local reports fully trusted in the weighted aggregation. + return 1.0 + + def _log_trustscores_node_weights(self, federation: str): + # Log the weights that will be used by trustscores aggregation. + if not self._is_reputation_enabled(): + logging.info( + "[TW %s] Reputation system disabled. trustscores weights fallback to 0.5 for all nodes", + federation, + ) + return + + peer_weight_map = self._get_trustscores_peer_weights_from_reputation() + if not peer_weight_map: + logging.info( + "[TW %s] Reputation system enabled, but no peer reputation weights are available yet. Falling back to 0.5 when needed", + federation, + ) + return + + logging.info( + "[TW %s] Trustscores weights from reputation | self_node_id=%s self_weight=%s peer_weights_by_addr=%s", + federation, + self._idx, + self._get_trustscores_self_weight(), + peer_weight_map, + ) + + for addr, weight in sorted(peer_weight_map.items()): + logging.info( + "[TW %s] Trustscores weight from reputation | self_node_id=%s target_addr=%s weight=%s", + federation, + self._idx, + addr, + weight, + ) + + def _initialize_local_trustscores_aggregation(self, experiment_name: str): + # Initialize a DFL local aggregation copy with this node's own report. + if self._trustscores_local_report_initialized: + return + + trust_report_template, copy_path = create_local_trust_report_copy(experiment_name, self._idx) + self._initialize_trustscores_accumulator(trust_report_template, copy_path, self._get_trustscores_self_weight()) + logging.info( + "[TW DFL] Local trustscores copy created at %s and accumulator initialized with local weight=%s", + copy_path, + self._get_trustscores_self_weight(), + ) + + async def _prepare_trustscores_exchange(self, federation: str): + # Discover direct neighbors and prepare the wait event for incoming reports. + cm = CommunicationsManager.get_instance() + self._expected_trustscores_sources = await cm.get_all_addrs_current_connections(only_direct=True) + + if self._trustscores_wait_event is None: + self._trustscores_wait_event = asyncio.Event() + self._trustscores_wait_event.clear() + + if len(self._received_trustscores_node_ids) >= self._expected_trustscores_reports: + self._trustscores_wait_event.set() + + if self._expected_trustscores_reports <= 0: + self._trustscores_wait_event.set() + logging.info("[TW %s] No remote trustscores reports expected", federation) + return + + logging.info( + "[TW %s] Expecting %s trustscores reports. Initial neighbors=%s aggregator_mode=%s", + federation, + self._expected_trustscores_reports, + sorted(self._expected_trustscores_sources), + self._is_sdfl_aggregator_node() if federation == "SDFL" else False, + ) + if federation == "DFL" or self._is_sdfl_aggregator_node(): + self._log_trustscores_node_weights(federation) + + async def _share_trustscores_report(self, trust_report_json: str, federation: str): + # Broadcast the local trustscores report to direct neighbors. + cm = CommunicationsManager.get_instance() + neighbors = self._expected_trustscores_sources.copy() + + if not neighbors: + logging.info("[TW %s] No direct neighbors available to share trustscores", federation) + return + + message = cm.create_message( + "trustscores", + action="share", + node_id=str(self._idx), + trust_report_json=trust_report_json, + ) + + logging.info("[TW %s] Sharing trustscores report with neighbors=%s", federation, sorted(neighbors)) + for neighbor in neighbors: + await cm.send_message( + neighbor, + message, + message_type="trustscores", + allow_after_learning_finished=True, + ) + + async def _wait_for_trustscores_reports(self, federation: str): + # Wait until every expected report arrives or the exchange times out. + if self._trustscores_wait_event is None: + return + + try: + await asyncio.wait_for( + self._trustscores_wait_event.wait(), + timeout=self.TRUSTSCORES_WAIT_TIMEOUT_SECONDS, + ) + logging.info( + "[TW %s] Trustscores exchange complete (%s/%s)", + federation, + len(self._received_trustscores_node_ids), + self._expected_trustscores_reports, + ) + except asyncio.TimeoutError: + logging.warning( + "[TW %s] Timeout waiting trustscores reports. Received=%s/%s missing=%s", + federation, + len(self._received_trustscores_node_ids), + self._expected_trustscores_reports, + self._expected_trustscores_reports - len(self._received_trustscores_node_ids), + ) + + async def _wait_for_trustscores_forwarding_drain(self, federation: str): + # Give the forwarder a short grace period before shutdown. + if not self._expected_trustscores_sources: + return + + cm = CommunicationsManager.get_instance() + forwarder = getattr(cm, "forwarder", None) + forwarder_interval = getattr(forwarder, "interval", 0) + messages_interval = getattr(forwarder, "messages_interval", 0) + forwarding_grace = max( + self.TRUSTSCORES_FORWARDING_GRACE_SECONDS, + float(forwarder_interval) + float(messages_interval) + self.TRUSTSCORES_FORWARDING_GRACE_MARGIN_SECONDS, + ) + + logging.info( + "[TW %s] Waiting %.2fs to drain forwarded trustscores messages before shutdown", + federation, + forwarding_grace, + ) + await asyncio.sleep(forwarding_grace) + + def _build_weighted_trustscores_report(self) -> dict | None: + # Build the weighted report when the aggregation template is available. + if self._trustscores_template_report is None or self._trustscores_local_copy_path is None: + return None + + return build_weighted_trustscores_report( + template_report=self._trustscores_template_report, + score_accumulator=self._trustscores_score_accumulator, + weight_accumulator=self._trustscores_weight_accumulator, + ) + + def _finalize_local_trustscores_aggregation(self): + # Write the weighted DFL report and generate DFL graphics. + aggregated_report = self._build_weighted_trustscores_report() + if aggregated_report is None: + logging.warning("[TW DFL] Skipping weighted trustscores write because local copy/template is not available") + return + + save_trust_report_json(self._trustscores_local_copy_path, aggregated_report) + logging.info( + "[TW DFL] Weighted trustscores written to local copy=%s", + self._trustscores_local_copy_path, + ) + + graphics = Graphics(self._start_time, self._experiment_name, self._idx) + graphics.graphics_dfl_global(self._idx) + + def _finalize_sdfl_global_trustscores_aggregation(self): + # Write the weighted SDFL global report and generate SDFL graphics. + aggregated_report = self._build_weighted_trustscores_report() + if aggregated_report is None: + logging.warning("[TW SDFL] Skipping global trustscores write because the template/output is not available") + return + + save_trust_report_json(self._trustscores_local_copy_path, aggregated_report) + logging.info( + "[TW SDFL] Global weighted trustscores written to %s", + self._trustscores_local_copy_path, + ) + + graphics = Graphics(self._start_time, self._experiment_name, self._idx) + graphics.graphics_sdfl_global(self._idx) + + def _is_sdfl_aggregator_node(self) -> bool: + # Check whether this node should aggregate global SDFL trustscores. + effective_role = self._engine.rb.get_role_name(True) + return effective_role in {Role.AGGREGATOR.value, Role.TRAINER_AGGREGATOR.value} + + def _initialize_sdfl_global_trustscores_aggregation(self, experiment_name: str): + # Initialize the SDFL global aggregation output with this node's own report. + if self._trustscores_local_report_initialized: + return + + trust_report_template = json.loads(load_trust_report_json_dumped(experiment_name, self._idx)) + logs_dir = os.environ.get("NEBULA_LOGS_DIR", os.path.join("nebula", "app", "logs")) + output_path = os.path.join( + logs_dir, + experiment_name, + "trustworthiness", + "nebula_trust_results.json", + ) + save_trust_report_json(output_path, trust_report_template) + + self._initialize_trustscores_accumulator(trust_report_template, output_path, self._get_trustscores_self_weight()) + logging.info( + "[TW SDFL] Global trustscores accumulator initialized at %s with local weight=1.0", + output_path, + ) + + def _initialize_trustscores_accumulator(self, trust_report_template: dict, output_path: str, local_weight: float): + # Store the aggregation template and seed accumulators with the local report. + self._trustscores_template_report = trust_report_template + self._trustscores_local_copy_path = output_path + accumulate_weighted_trustscores( + report=trust_report_template, + weight=local_weight, + score_accumulator=self._trustscores_score_accumulator, + weight_accumulator=self._trustscores_weight_accumulator, + ) + self._trustscores_local_report_initialized = True + + async def register_trustscores_report(self, source, message): + # Register a remote trustscores message using the active federation. + federation = self._engine.config.participant["trust_args"]["scenario"].get("federation") + await self._register_trustscores_report(source, message, federation) + + async def _register_trustscores_report(self, source, message, federation: str): + # Deduplicate, optionally accumulate, and mark remote trustscores as received. + if str(message.node_id) == str(self._idx): + logging.info("[TW %s] Ignoring own trustscores report from %s", federation, source) + return + + if str(message.node_id) in self._received_trustscores_node_ids: + logging.info( + "[TW %s] Ignoring duplicated trustscores report from node_id=%s source=%s", + federation, + message.node_id, + source, + ) + return + + should_accumulate = federation == "DFL" or self._is_sdfl_aggregator_node() + if should_accumulate: + trust_report = json.loads(message.trust_report_json) + remote_weight = self._get_trustscores_weight_for_source(source, message.node_id) + accumulate_weighted_trustscores( + report=trust_report, + weight=remote_weight, + score_accumulator=self._trustscores_score_accumulator, + weight_accumulator=self._trustscores_weight_accumulator, + ) + logging.info( + "[TW %s] Trustscores report received from node_id=%s source=%s accumulated_with_weight=%s", + federation, + message.node_id, + source, + remote_weight, + ) + else: + logging.info( + "[TW %s] Trustscores report received from node_id=%s source=%s forwarding_only=True", + federation, + message.node_id, + source, + ) + + self._received_trustscores_node_ids.add(str(message.node_id)) + logging.info( + "[TW %s] Trustscores progress %s/%s", + federation, + len(self._received_trustscores_node_ids), + self._expected_trustscores_reports, + ) + if len(self._received_trustscores_node_ids) >= self._expected_trustscores_reports: + self._trustscores_wait_event.set() + +class TrustWorkloadServer(BaseTrustWorkload): + REPORTS_WAIT_TIMEOUT_SECONDS = 60 + def __init__(self, engine: Engine, idx, trust_files_route): - self._workload = 'aggregation' - self._sample_size = 0 - self._current_loss = None - self._current_accuracy = None + # Initialize server-side state for collecting participant reports. server_start_time: ServerRoleBehavior = engine.rb - self._start_time = server_start_time._start_time - self._engine: Engine = engine - self._end_time = None - self._experiment_name = "" - + super().__init__( + engine, + idx, + trust_files_route, + workload="aggregation", + role_label="SERVER", + sample_size=0, + start_time=server_start_time._start_time, + ) + self._trustworthiness_reports = {} + self._expected_reports = int(self._engine.config.participant["scenario_args"]["n_nodes"])-1 + self._trust_config = None + self._csv_completed = False + self._reports_wait_event = asyncio.Event() + if self._expected_reports <= 0: + self._reports_wait_event.set() + async def init(self, experiment_name): - self._experiment_name = experiment_name - await EventManager.get_instance().subscribe_addonevent(TestMetricsEvent, self._process_test_metrics_event) - await EventManager.get_instance().subscribe_node_event(ExperimentFinishEvent, self._process_experiment_finished_event) - - def get_workload(self): - return self._workload - - def get_sample_size(self): - return self._sample_size - - def get_metrics(self): - return (self._current_loss, self._current_accuracy) - + # Reuse the shared workload event subscriptions. + await super().init(experiment_name) + async def finish_experiment_role_pre_actions(self): + # Server has no pre-save work because aggregation sample size is zero. pass - + async def finish_experiment_role_post_actions(self, trust_config, experiment_name): - from datetime import datetime + # Wait for participant reports, save CSV data, and generate the CFL factsheet. self._end_time = datetime.now().strftime("%d/%m/%Y %H:%M:%S") + self._trust_config = trust_config + self._experiment_name = experiment_name + + if self._csv_completed: + logging.info("[TW SERVER] finish_experiment_role_post_actions called, trustworthiness reports OK, starting generate_factsheet") + await self._save_local_server_report_and_generate_factsheet(trust_config, experiment_name) + return + + logging.info("[TW SERVER] finish_experiment_role_post_actions called, waiting for trustworthiness reports") + await self._wait_for_trustworthiness_reports() + self._save_trustworthiness_reports_once() + await self._save_local_server_report_and_generate_factsheet(trust_config, experiment_name) + + async def _wait_for_trustworthiness_reports(self): + # Wait until reports arrive or the server-side timeout expires. + try: + await asyncio.wait_for( + self._reports_wait_event.wait(), + timeout=self.REPORTS_WAIT_TIMEOUT_SECONDS, + ) + except asyncio.TimeoutError: + logging.warning( + "[TW SERVER] Timeout waiting trustworthiness reports. Received=%s/%s", + len(self._trustworthiness_reports), + self._expected_reports, + ) + + def _save_trustworthiness_reports_once(self): + # Persist received participant reports only once. + if self._trustworthiness_reports is not None and not self._csv_completed: + save_trustworthiness_reports_csv(self._trustworthiness_reports, self._experiment_name) + self._csv_completed = True + + async def _save_local_server_report_and_generate_factsheet(self, trust_config, experiment_name): + # Add the server's own local report and generate final trust artifacts. + bytes_sent, bytes_recv, _, _, val_accuracy, _, _, dp_enabled, dp_epsilon = load_data_results_participant( + self._experiment_name, + self._idx, + ) + + role, energy_grid, emissions, workload, cpu_model, gpu_model, cpu_used, gpu_used, energy_consumed, sample_size = load_emissions_participant( + self._experiment_name, + self._idx, + ) + + logging.info( + "[TW SERVER] local server report added for node_id=%s", + str(self._idx), + ) + + class_imbalance = get_class_imbalance_local(self._idx, experiment_name) + model_size = get_bytes_model(self._engine.trainer.model) + local_entropy = get_local_entropy(self._idx, experiment_name) + + save_results_csv_cfl(self._experiment_name, self._idx, bytes_sent, bytes_recv, 0, 0, class_imbalance, model_size, local_entropy, val_accuracy, 0, 0, dp_enabled, dp_epsilon) + save_emissions_csv_cfl(self._experiment_name, self._idx, role, energy_grid, emissions, workload, cpu_model, gpu_model, cpu_used, gpu_used, energy_consumed, sample_size) await self._generate_factsheet(trust_config, experiment_name) - + + async def register_trustworthiness_report(self, source, message): + # Store one participant trustworthiness report received by the server. + self._trustworthiness_reports[message.node_id] = { + "source": source, + "node_id": message.node_id, + "bytes_sent": message.bytes_sent, + "bytes_recv": message.bytes_recv, + "accuracy": message.accuracy, + "loss": message.loss, + "role": message.role, + "energy_grid": message.energy_grid, + "emissions": message.emissions, + "workload": message.workload, + "cpu_model": message.cpu_model, + "gpu_model": message.gpu_model, + "cpu_used": message.cpu_used, + "gpu_used": message.gpu_used, + "energy_consumed": message.energy_consumed, + "sample_size": message.sample_size, + "class_imbalance": message.class_imbalance, + "model_size": message.model_size, + "local_entropy": message.local_entropy, + "val_accuracy": message.val_accuracy, + "dp_enabled": message.dp_enabled, + "dp_epsilon": message.dp_epsilon, + "macro_f1": message.macro_f1, + "train_accuracy": message.train_accuracy, + } + + logging.info( + "[TW SERVER] received report from node_id=%s total=%s", + message.node_id, + len(self._trustworthiness_reports), + ) + + if (len(self._trustworthiness_reports) >= self._expected_reports): + logging.info("[TW SERVER] all reports received, generating csv") + self._save_trustworthiness_reports_once() + self._reports_wait_event.set() + logging.info(f"[TW SERVER] all reports received, waiting for finish post, csv_completed {self._csv_completed}") + async def _generate_factsheet(self, trust_config, experiment_name): - from nebula.addons.trustworthiness.factsheet import Factsheet - from nebula.addons.trustworthiness.metric import TrustMetricManager - import json - import os - - factsheet = Factsheet() - factsheet.populate_factsheet_pre_train(trust_config, experiment_name) - factsheet.populate_factsheet_post_train(experiment_name, self._start_time, self._end_time) - - data_file_path = os.path.join(os.environ.get('NEBULA_CONFIG_DIR'), experiment_name, "scenario.json") - with open(data_file_path, 'r') as data_file: - data = json.load(data_file) - - weights = { - "robustness": float(data["robustness_pillar"]), - "resilience_to_attacks": float(data["resilience_to_attacks"]), - "algorithm_robustness": float(data["algorithm_robustness"]), - "client_reliability": float(data["client_reliability"]), - "privacy": float(data["privacy_pillar"]), - "technique": float(data["technique"]), - "uncertainty": float(data["uncertainty"]), - "indistinguishability": float(data["indistinguishability"]), - "fairness": float(data["fairness_pillar"]), - "selection_fairness": float(data["selection_fairness"]), - "performance_fairness": float(data["performance_fairness"]), - "class_distribution": float(data["class_distribution"]), - "explainability": float(data["explainability_pillar"]), - "interpretability": float(data["interpretability"]), - "post_hoc_methods": float(data["post_hoc_methods"]), - "accountability": float(data["accountability_pillar"]), - "factsheet_completeness": float(data["factsheet_completeness"]), - "architectural_soundness": float(data["architectural_soundness_pillar"]), - "client_management": float(data["client_management"]), - "optimization": float(data["optimization"]), - "sustainability": float(data["sustainability_pillar"]), - "energy_source": float(data["energy_source"]), - "hardware_efficiency": float(data["hardware_efficiency"]), - "federation_complexity": float(data["federation_complexity"]) - } - - trust_metric_manager = TrustMetricManager(self._start_time) - trust_metric_manager.evaluate(experiment_name, weights, use_weights=True) - - async def _process_test_metrics_event(self, tme: TestMetricsEvent): - cur_loss, cur_acc = await tme.get_event_data() - if cur_loss and cur_acc: - self._current_loss, self._current_accuracy = cur_loss, cur_acc + # Generate the CFL factsheet and evaluate final trust metrics. + factsheet = CflFactsheet() + self._engine.trainer.datamodule.setup(stage="fit") + train_loader = self._engine.trainer.datamodule.train_dataloader() + self._engine.trainer.datamodule.setup(stage="test") + test_loader = self._engine.trainer.datamodule.test_dataloader()[0] + factsheet.populate_factsheet_cfl( + experiment_name, + trust_config, + self._start_time, + self._end_time, + self._idx, + self._engine.trainer.model, + train_loader, + test_loader, + reputation_summary=self._get_reputation_trust_summary(), + participation_summary=self._get_participation_trust_summary(), + reliability_summary=self._get_system_reliability_summary(), + ) - async def _process_experiment_finished_event(self, efe:ExperimentFinishEvent): - model_file = f"/nebula/app/logs/{self._experiment_name}/trustworthiness/participant_{self._engine.idx}_final_model.pk" - - # Save model in trustworthy dir - with open(model_file, 'wb') as f: - pickle.dump(self._engine.trainer.model, f) + federation = trust_config.get("federation") + weights = load_trust_weights(experiment_name, federation) + + trust_metric_manager = TrustMetricManager(self._start_time, federation) + trust_metric_manager.evaluate(experiment_name, weights, use_weights=True) """ ############################## # TRUSTWORTHINESS # @@ -215,6 +946,7 @@ async def _process_experiment_finished_event(self, efe:ExperimentFinishEvent): class Trustworthiness(): def __init__(self, engine: Engine, config: Config): + # Select the workload implementation for this node and start emissions tracking. config.reset_logging_configuration() print_msg_box( msg=f"Name Trustworthiness Module\nRole: {engine.rb.get_role_name()}", @@ -224,59 +956,93 @@ def __init__(self, engine: Engine, config: Config): self._config = config self._trust_config = self._config.participant["trust_args"]["scenario"] self._experiment_name = self._config.participant["scenario_args"]["name"] - self._trust_dir_files = f"/nebula/app/logs/{self._experiment_name}/trustworthiness" + logs_dir = os.environ.get("NEBULA_LOGS_DIR", os.path.join("nebula", "app", "logs")) + self._trust_dir_files = os.path.join(logs_dir, self._experiment_name, "trustworthiness") self._emissions_file = 'emissions.csv' self._role: Role = engine.rb.get_role() self._idx = self._config.participant["device_args"]["idx"] - self._trust_workload: TrustWorkload = self._factory_trust_workload(self._role, self._engine, self._idx, self._trust_dir_files) - - # EmissionsTracker from codecarbon to measure the emissions during the aggregation step in the server + self._trust_workload: TrustWorkload = self._factory_trust_workload(self._role, self._engine, self._idx, self._trust_dir_files) + + self._engine.trustworthiness = self + + # EmissionsTracker from CodeCarbon to measure emissions during the server aggregation step self._tracker= EmissionsTracker(tracking_mode='process', log_level='error', save_to_file=False) - + @property def tw(self): - """TrustWorkload depending on the node Role""" + """TrustWorkload implementation chosen according to the node role.""" + # Expose the role-specific trust workload. return self._trust_workload - + async def start(self): + # Prepare output directories, subscribe to finish events, and start tracking emissions. await self._create_trustworthiness_directory() await self.tw.init(self._experiment_name) await EventManager.get_instance().subscribe_node_event(ExperimentFinishEvent, self._process_experiment_finish_event) self._tracker.start() - + async def _create_trustworthiness_directory(self): - import os - trust_dir = os.path.join(os.environ.get("NEBULA_LOGS_DIR"), self._experiment_name, "trustworthiness") - # Create a directory to save files to calcutate trust + # Ensure the experiment trustworthiness directory exists. + logs_dir = os.environ.get("NEBULA_LOGS_DIR", os.path.join("nebula", "app", "logs")) + trust_dir = os.path.join(logs_dir, self._experiment_name, "trustworthiness") + # Create a directory to store files used to compute trust os.makedirs(trust_dir, exist_ok=True) - os.chmod(trust_dir, 0o777) - + os.chmod(trust_dir, 0o755) + async def _process_experiment_finish_event(self, efe: ExperimentFinishEvent): - from nebula.addons.trustworthiness.utils import save_class_count_per_participant + # Persist final local metrics and delegate role-specific finalization. class_counter = self._engine.trainer.datamodule.get_samples_per_label() + save_class_count_per_participant(self._experiment_name, class_counter, self._idx) - + await self.tw.finish_experiment_role_pre_actions() - - last_loss, last_accuracy = self.tw.get_metrics() - - # Get bytes send/received from reporter + + last_loss, last_accuracy, last_macro_f1 = self.tw.get_metrics() + _, last_val_accuracy, last_train_accuracy = self.tw.get_validation_metrics() + if last_val_accuracy is None: + last_val_accuracy = 0.0 + + # Get sent/received bytes from the reporter bytes_sent = self._engine.reporter.acc_bytes_sent bytes_recv = self._engine.reporter.acc_bytes_recv - - # Get TrustWorkload info + + # Persist the trainer-reported DP budget so factsheets can score privacy. + privacy_metrics = self._engine.trainer.get_privacy_metrics() + dp_enabled=bool(privacy_metrics.get("dp_enabled", False)) + dp_epsilon=privacy_metrics.get("dp_epsilon") + if dp_epsilon is None: + dp_epsilon=0 + + # Get TrustWorkload information workload = self.tw.get_workload() sample_size = self.tw.get_sample_size() - - # Last operations - save_results_csv(self._experiment_name, self._idx, bytes_sent, bytes_recv, last_loss, last_accuracy) - stop_emissions_tracking_and_save(self._tracker, self._trust_dir_files, self._emissions_file, self._role.value, workload, sample_size) - + + # Final operations + save_results_csv( + self._experiment_name, + self._idx, + bytes_sent, + bytes_recv, + last_accuracy, + last_loss, + last_val_accuracy, + last_macro_f1, + last_train_accuracy, + dp_enabled, + dp_epsilon, + ) + stop_emissions_tracking_and_save(self._tracker, self._trust_dir_files, f'emissions_{self._idx}.csv', self._role.value, workload, sample_size, self._idx) await self.tw.finish_experiment_role_post_actions(self._trust_config, self._experiment_name) - - def _factory_trust_workload(self, role: Role, engine: Engine, idx, trust_files_route) -> TrustWorkload: + + def _factory_trust_workload(self, role: Role, engine: Engine, idx, trust_files_route) -> TrustWorkload: + # Create the workload implementation associated with the node role. trust_workloads = { - Role.TRAINER: TrustWorkloadTrainer, + Role.TRAINER: TrustWorkloadTrainer, + Role.AGGREGATOR: TrustWorkloadTrainer, + Role.PROXY: TrustWorkloadTrainer, + Role.IDLE: TrustWorkloadTrainer, + Role.TRAINER_AGGREGATOR: TrustWorkloadTrainer, + Role.MALICIOUS: TrustWorkloadTrainer, Role.SERVER: TrustWorkloadServer } trust_workload = trust_workloads.get(role) @@ -284,5 +1050,3 @@ def _factory_trust_workload(self, role: Role, engine: Engine, idx, trust_files_r return trust_workload(engine, idx, trust_files_route) else: raise TrustWorkloadException(f"Trustworthiness workload for role {role} not defined") - - \ No newline at end of file diff --git a/nebula/addons/trustworthiness/utils.py b/nebula/addons/trustworthiness/utils.py deleted file mode 100755 index e081fcafd..000000000 --- a/nebula/addons/trustworthiness/utils.py +++ /dev/null @@ -1,293 +0,0 @@ -import json -import logging -import math -import os -import pickle -from os.path import exists - -import pandas as pd -from hashids import Hashids -from scipy.stats import entropy - -from nebula.addons.trustworthiness import calculation -from collections import Counter - -hashids = Hashids() -logger = logging.getLogger(__name__) -dirname = os.path.dirname(__file__) - - -def save_class_count_per_participant(experiment_name, class_counter: Counter, idx): - class_count = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), experiment_name, "trustworthiness", f"{str(idx)}_class_count.json") - result = {hashids.encode(int(class_id)): count for class_id, count in class_counter.items()} - with open(class_count, "w") as f: - json.dump(result, f) - -def count_all_class_samples(experiment_name): - participant_id = 0 - global_class_count = {} - - while True: - data_class_count_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), experiment_name, "trustworthiness", f"{str(participant_id)}_class_count.json") - - if not os.path.exists(data_class_count_file): - break - - with open(data_class_count_file, "r") as f: - class_count = json.load(f) - - for class_hash, count in class_count.items(): - global_class_count[class_hash] = global_class_count.get(class_hash, 0) + count - - participant_id += 1 - - # Guardar conteo total en class_count.json - output_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'),experiment_name, "trustworthiness", "count_class.json") - - with open(output_file, "w") as f: - json.dump(global_class_count, f, indent=2) - -def count_class_samples(scenario_name, dataloaders_files, class_counter: Counter = None): - """ - Counts the number of samples by class. - - Args: - scenario_name (string): Name of the scenario. - dataloaders_files (list): Files that contain the dataloaders. - - """ - - result = {} - dataloaders = [] - - if class_counter: - result = {hashids.encode(int(class_id)): count for class_id, count in class_counter.items()} - else: - for file in dataloaders_files: - with open(file, "rb") as f: - dataloader = pickle.load(f) - dataloaders.append(dataloader) - - for dataloader in dataloaders: - for batch, labels in dataloader: - for b, label in zip(batch, labels): - l = hashids.encode(label.item()) - if l in result: - result[l] += 1 - else: - result[l] = 1 - - try: - name_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", "count_class.json") - except: - name_file = os.path.join("nebula", "app", "logs", scenario_name, "trustworthiness", "count_class.json") - - with open(name_file, "w") as f: - json.dump(result, f) - - -def get_all_data_entropy(experiment_name): - participant_id = 0 - data_class_count_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), experiment_name, "trustworthiness", f"{str(participant_id)}_class_count.json") - entropy_per_participant = {} - - while True: - data_class_count_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), experiment_name, "trustworthiness", f"{str(participant_id)}_class_count.json") - - if not os.path.exists(data_class_count_file): - break - - with open(data_class_count_file, "r") as f: - class_count = json.load(f) - - total = sum(class_count.values()) - if total == 0: - entropy_value = 0.0 - else: - probabilities = [count / total for count in class_count.values()] - entropy_value = entropy(probabilities, base=2) - - entropy_per_participant[str(participant_id)] = round(entropy_value, 6) - participant_id += 1 - - name_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'),experiment_name, "trustworthiness", "entropy.json") - - with open(name_file, "w") as f: - json.dump(entropy_per_participant, f, indent=2) - -def get_entropy(client_id, scenario_name, dataloader): - """ - Get the entropy of each client in the scenario. - - Args: - client_id (int): The client id. - scenario_name (string): Name of the scenario. - dataloaders_files (list): Files that contain the dataloaders. - - """ - result = {} - client_entropy = {} - - name_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", "entropy.json") - - if os.path.exists(name_file): - logging.info(f"entropy fiel already exists.. loading.") - with open(name_file, "r") as f: - client_entropy = json.load(f) - - client_id_hash = hashids.encode(client_id) - - for batch, labels in dataloader: - for b, label in zip(batch, labels): - l = hashids.encode(label.item()) - if l in result: - result[l] += 1 - else: - result[l] = 1 - - n = len(dataloader) - entropy_value = entropy([x / n for x in result.values()], base=2) - client_entropy[client_id_hash] = entropy_value - with open(name_file, "w") as f: - json.dump(client_entropy, f) - - -def read_csv(filename): - """ - Read a CSV file. - - Args: - filename (string): Name of the file. - - Returns: - object: The CSV readed. - - """ - if exists(filename): - return pd.read_csv(filename) - - -def check_field_filled(factsheet_dict, factsheet_path, value, empty=""): - """ - Check if the field in the factsheet file is filled or not. - - Args: - factsheet_dict (dict): The factshett dict. - factsheet_path (list): The factsheet field to check. - value (float): The value to add in the field. - empty (string): If the value could not be appended, the empty string is returned. - - Returns: - float: The value added in the factsheet or empty if the value could not be appened - - """ - if factsheet_dict[factsheet_path[0]][factsheet_path[1]]: - return factsheet_dict[factsheet_path[0]][factsheet_path[1]] - elif value != "" and value != "nan": - if type(value) != str and type(value) != list: - if math.isnan(value): - return 0 - else: - return value - else: - return value - else: - return empty - - -def get_input_value(input_docs, inputs, operation): - """ - Gets the input value from input document and apply the metric operation on the value. - - Args: - inputs_docs (map): The input document map. - inputs (list): All the inputs. - operation (string): The metric operation. - - Returns: - float: The metric value - - """ - - input_value = None - args = [] - for i in inputs: - source = i.get("source", "") - field = i.get("field_path", "") - input_doc = input_docs.get(source, None) - if input_doc is None: - logger.warning(f"{source} is null") - else: - input = get_value_from_path(input_doc, field) - args.append(input) - try: - operationFn = getattr(calculation, operation) - input_value = operationFn(*args) - except TypeError: - logger.warning(f"{operation} is not valid") - - return input_value - - -def get_value_from_path(input_doc, path): - """ - Gets the input value from input document by path. - - Args: - inputs_doc (map): The input document map. - path (string): The field name of the input value of interest. - - Returns: - float: The input value from the input document - - """ - - d = input_doc - for nested_key in path.split("/"): - temp = d.get(nested_key) - if isinstance(temp, dict): - d = d.get(nested_key) - else: - return temp - return None - - -def write_results_json(out_file, dict): - """ - Writes the result to JSON. - - Args: - out_file (string): The output file. - dict (dict): The object to be witten into JSON. - - Returns: - float: The input value from the input document - - """ - - with open(out_file, "a") as f: - json.dump(dict, f, indent=4) - - -def save_results_csv(scenario_name: str, id: int, bytes_sent: int, bytes_recv: int, accuracy: float, loss: float): - try: - data_results_file = os.path.join(os.environ.get('NEBULA_LOGS_DIR'), scenario_name, "trustworthiness", "data_results.csv") - except: - data_results_file = os.path.join("nebula", "app", "logs", scenario_name, "trustworthiness", "data_results.csv") - - if exists(data_results_file): - df = pd.read_csv(data_results_file) - else: - df = pd.DataFrame(columns=["id", "bytes_sent", "bytes_recv", "accuracy", "loss"]) - - try: - # Add new entry to DataFrame - new_data = pd.DataFrame({'id': [id], 'bytes_sent': [bytes_sent], - 'bytes_recv': [bytes_recv], 'accuracy': [accuracy], - 'loss': [loss]}) - df = pd.concat([df, new_data], ignore_index=True) - - df.to_csv(data_results_file, encoding='utf-8', index=False) - - except Exception as e: - logger.warning(e) diff --git a/nebula/addons/trustworthiness/weights.py b/nebula/addons/trustworthiness/weights.py new file mode 100644 index 000000000..8df8bbc44 --- /dev/null +++ b/nebula/addons/trustworthiness/weights.py @@ -0,0 +1,75 @@ +import json +import os + + +COMMON_TRUST_WEIGHT_FIELDS = { + "robustness": "robustness_pillar", + "resilience_to_attacks": "resilience_to_attacks", + "algorithm_robustness": "algorithm_robustness", + "client_reliability": "client_reliability", + "privacy": "privacy_pillar", + "technique": "technique", + "uncertainty": "uncertainty", + "indistinguishability": "indistinguishability", + "fairness": "fairness_pillar", + "class_distribution": "class_distribution", + "outcome_fairness": "outcome_fairness", + "explainability": "explainability_pillar", + "interpretability": "interpretability", + "post_hoc_methods": "post_hoc_methods", + "accountability": "accountability_pillar", + "factsheet_completeness": "factsheet_completeness", + "monitoring": "monitoring", + "architectural_soundness": "architectural_soundness_pillar", + "client_management": "client_management", + "optimization": "optimization", + "federation_management": "federation_management", + "sustainability": "sustainability_pillar", + "energy_source": "energy_source", + "federation_complexity": "federation_complexity", +} + +CFL_TRUST_WEIGHT_FIELDS = { + **COMMON_TRUST_WEIGHT_FIELDS, + "selection_fairness": "selection_fairness", + "performance_fairness": "performance_fairness", + "hardware_efficiency": "hardware_efficiency", +} + +DFL_TRUST_WEIGHT_FIELDS = COMMON_TRUST_WEIGHT_FIELDS + +TRUST_WEIGHT_FIELDS_BY_FEDERATION = { + "CFL": CFL_TRUST_WEIGHT_FIELDS, + "DFL": DFL_TRUST_WEIGHT_FIELDS, + "SDFL": DFL_TRUST_WEIGHT_FIELDS, +} + + +def load_trust_weights(experiment_name: str, federation: str) -> dict[str, float]: + config_dir = os.environ.get("NEBULA_CONFIG_DIR") + if not config_dir: + raise RuntimeError("NEBULA_CONFIG_DIR is not configured") + + federation_key = (federation or "CFL").upper() + weight_fields = TRUST_WEIGHT_FIELDS_BY_FEDERATION.get(federation_key) + if weight_fields is None: + raise ValueError(f"Unsupported trustworthiness federation: {federation}") + + scenario_path = os.path.join(config_dir, experiment_name, "scenario.json") + with open(scenario_path, "r") as data_file: + data = json.load(data_file) + + weights = {} + missing_fields = [] + for weight_name, scenario_field in weight_fields.items(): + if scenario_field not in data: + missing_fields.append(scenario_field) + continue + weights[weight_name] = float(data[scenario_field]) + + if missing_fields: + raise KeyError( + f"Missing {federation_key} trustworthiness weight fields in {scenario_path}: {', '.join(sorted(missing_fields))}" + ) + + return weights diff --git a/nebula/config/config.py b/nebula/config/config.py index 5ef336e3a..cae3cf7f8 100755 --- a/nebula/config/config.py +++ b/nebula/config/config.py @@ -55,7 +55,7 @@ def reset_logging_configuration(self): self.__set_default_logging(mode="a") self.__set_training_logging(mode="a") - + def shutdown_logging(self): """ Properly shuts down all loggers and their handlers in the system. diff --git a/nebula/controller/controller.py b/nebula/controller/controller.py index a00d142d1..0d7142dbe 100755 --- a/nebula/controller/controller.py +++ b/nebula/controller/controller.py @@ -264,24 +264,24 @@ async def get_available_gpu(): def validate_physical_fields(data: dict): if data.get("deployment") != "physical": - return - + return + ips = data.get("physical_ips") if not ips: raise HTTPException( status_code=400, detail="physical deployment requires 'physical_ips'" ) - + if len(ips) != data.get("n_nodes"): raise HTTPException( status_code=400, detail="'physical_ips' must have the same length as 'n_nodes'" ) - + try: for ip in ips: - ipaddress.ip_address(ip) + ipaddress.ip_address(ip) print(ip) except ValueError as e: raise HTTPException(status_code=400, detail=str(e)) @@ -347,21 +347,21 @@ async def stop_scenario( ): """ Stops the execution of a federated learning scenario and performs cleanup operations. - + This endpoint: - Stops all participant containers associated with the specified scenario. - Removes Docker containers and network resources tied to the scenario and user. - Sets the scenario's status to "finished" in the database. - Optionally finalizes all active scenarios if the 'all' flag is set. - + Args: scenario_name (str): Name of the scenario to stop. username (str): User who initiated the stop operation. all (bool): Whether to stop all running scenarios instead of just one (default: False). - + Raises: HTTPException: Returns a 500 status code if any step fails. - + Note: This function does not currently trigger statistics generation. """ @@ -847,27 +847,27 @@ async def discover_vpn(): stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, ) - + # 2) Wait for it to finish and capture stdout/stderr out, err = await proc.communicate() if proc.returncode != 0: # If the CLI returned an error, raise to be caught below raise RuntimeError(err.decode()) - + # 3) Parse the JSON output data = json.loads(out.decode()) - + # 4) Collect only the IPv4 addresses from each peer ips = [] for peer in data.get("Peer", {}).values(): for ip in peer.get("TailscaleIPs", []): - if ":" not in ip: + if ":" not in ip: # Skip IPv6 entries (they contain colons) ips.append(ip) - + # 5) Return the list of IPv4s return {"ips": ips} - + except Exception as e: # 6) Log any failure and respond with HTTP 500 logging.error(f"Error discovering VPN devices: {e}") @@ -877,14 +877,14 @@ async def discover_vpn(): @app.get("/physical/run/{ip}", tags=["physical"]) async def physical_run(ip: str): status, data = await remote_get(ip, "/run/") - + if status == 200: return data if status is None: raise HTTPException(status_code=502, detail=f"Node unreachable: {data}") raise HTTPException(status_code=status, detail=data) - - + + @app.get("/physical/stop/{ip}", tags=["physical"]) async def physical_stop(ip: str): status, data = await remote_get(ip, "/stop/") @@ -893,8 +893,8 @@ async def physical_stop(ip: str): if status is None: raise HTTPException(status_code=502, detail=f"Node unreachable: {data}") raise HTTPException(status_code=status, detail=data) - - + + @app.put("/physical/setup/{ip}", tags=["physical"], status_code=status.HTTP_201_CREATED) async def physical_setup( @@ -903,7 +903,7 @@ async def physical_setup( global_test: UploadFile = File(..., description="Global Dataset*.h5*"), train_set: UploadFile = File(..., description="Training dataset*.h5*"), ): - + form = aiohttp.FormData() await config.seek(0) form.add_field("config", config.file, @@ -914,17 +914,17 @@ async def physical_setup( await train_set.seek(0) form.add_field("train_set", train_set.file, filename=train_set.filename, content_type="application/octet-stream") - + status_code, data = await remote_post_form( ip, "/setup/", form, method="PUT" ) - + if status_code == 201: return data if status_code is None: raise HTTPException(status_code=502, detail=f"Node unreachable: {data}") raise HTTPException(status_code=status_code, detail=data) - + # ────────────────────────────────────────────────────────────── # Physical · single-node state # ────────────────────────────────────────────────────────────── @@ -932,22 +932,22 @@ async def physical_setup( async def get_physical_node_state(ip: str): """ Query a single Raspberry Pi (or other node) for its training state. - + Parameters ---------- ip : str IP address or hostname of the node. - + Returns ------- dict - • running (bool) – True if a training process is active. + • running (bool) – True if a training process is active. • error (str) – Optional error message when the node is unreachable or returns a non-200 HTTP status. """ # Short global timeout so a dead node doesn't block the whole request timeout = aiohttp.ClientTimeout(total=3) # seconds - + try: async with aiohttp.ClientSession(timeout=timeout) as session: async with session.get(f"http://{ip}/state/") as resp: @@ -960,8 +960,8 @@ async def get_physical_node_state(ip: str): except Exception as exc: # Network errors, timeouts, DNS failures, … return {"running": False, "error": str(exc)} - - + + # ────────────────────────────────────────────────────────────── # Physical · aggregate state for an entire scenario # ────────────────────────────────────────────────────────────── @@ -969,12 +969,12 @@ async def get_physical_node_state(ip: str): async def get_physical_scenario_state(scenario_name: str): """ Check the training state of *every* physical node assigned to a scenario. - + Parameters ---------- scenario_name : str Scenario identifier. - + Returns ------- dict @@ -989,16 +989,16 @@ async def get_physical_scenario_state(scenario_name: str): scenario = await get_scenario_by_name(scenario_name) if not scenario: raise HTTPException(status_code=404, detail="Scenario not found") - + nodes = await list_nodes_by_scenario_name(scenario_name) if not nodes: raise HTTPException(status_code=404, detail="No nodes found for scenario") - + # 2) Probe all nodes concurrently ips = [n["ip"] for n in nodes] tasks = [get_physical_node_state(ip) for ip in ips] states = await asyncio.gather(*tasks) # parallel HTTP calls - + # 3) Aggregate results nodes_state = dict(zip(ips, states)) any_running = any(s.get("running") for s in states) @@ -1007,7 +1007,7 @@ async def get_physical_scenario_state(scenario_name: str): all_available = all( (not s.get("running")) and (not s.get("error")) for s in states ) - + return { "running": any_running, "nodes_state": nodes_state, diff --git a/nebula/controller/http_helpers.py b/nebula/controller/http_helpers.py index ed60f44e5..886cc57e7 100644 --- a/nebula/controller/http_helpers.py +++ b/nebula/controller/http_helpers.py @@ -1,13 +1,13 @@ from __future__ import annotations - + import logging from typing import Optional, Union - + import aiohttp from aiohttp import FormData - + _TIMEOUT = aiohttp.ClientTimeout(total=15) - + async def _request_json( method: str, host: str, @@ -27,12 +27,12 @@ async def _request_json( except Exception as exc: logging.error("[%s] %s%s – %s", method.upper(), host, endpoint, exc) return None, str(exc) - - + + async def remote_get(host: str, endpoint: str): return await _request_json("GET", host, endpoint) - - + + async def remote_post_form( host: str, endpoint: str, @@ -40,4 +40,4 @@ async def remote_post_form( *, method: str = "POST", ): - return await _request_json(method, host, endpoint, data=form) \ No newline at end of file + return await _request_json(method, host, endpoint, data=form) diff --git a/nebula/controller/scenarios.py b/nebula/controller/scenarios.py index bbfa8996c..7325e650a 100644 --- a/nebula/controller/scenarios.py +++ b/nebula/controller/scenarios.py @@ -23,6 +23,10 @@ from nebula.core.datasets.cifar100.cifar100 import CIFAR100Dataset from nebula.core.datasets.emnist.emnist import EMNISTDataset from nebula.core.datasets.fashionmnist.fashionmnist import FashionMNISTDataset +from nebula.core.datasets.kddcup99.kddcup99 import KDDCUP99Dataset +from nebula.core.datasets.covtype.covtype import CovtypeDataset +from nebula.core.datasets.adultcensus.adultcensus import AdultCensusDataset +from nebula.core.datasets.breast_cancer.breast_cancer import BreastCancerDataset from nebula.core.datasets.mnist.mnist import MNISTDataset from nebula.core.utils.certificate import generate_ca_certificate, generate_certificate from nebula.utils import DockerUtils, FileUtils @@ -87,14 +91,17 @@ def __init__( selection_fairness, performance_fairness, class_distribution, + outcome_fairness, explainability_pillar, interpretability, post_hoc_methods, accountability_pillar, factsheet_completeness, + monitoring, architectural_soundness_pillar, client_management, optimization, + federation_management, sustainability_pillar, energy_source, hardware_efficiency, @@ -108,6 +115,9 @@ def __init__( sar_neighbor_policy, sar_training, sar_training_policy, + dp=None, + feature_squeezing=None, + adversarial_training=None, physical_ips=None, ): """ @@ -185,6 +195,9 @@ def __init__( self.network_subnet = network_subnet self.network_gateway = network_gateway self.epochs = epochs + self.dp = dp + self.feature_squeezing = feature_squeezing + self.adversarial_training = adversarial_training self.attack_params = attack_params self.reputation = reputation self.random_geo = random_geo @@ -211,14 +224,17 @@ def __init__( self.selection_fairness = selection_fairness, self.performance_fairness = performance_fairness, self.class_distribution = class_distribution, + self.outcome_fairness = outcome_fairness, self.explainability_pillar = explainability_pillar, self.interpretability = interpretability, self.post_hoc_methods = post_hoc_methods, self.accountability_pillar = accountability_pillar, self.factsheet_completeness = factsheet_completeness, + self.monitoring = monitoring, self.architectural_soundness_pillar = architectural_soundness_pillar, self.client_management = client_management, self.optimization = optimization, + self.federation_management = federation_management, self.sustainability_pillar = sustainability_pillar, self.energy_source = energy_source, self.hardware_efficiency = hardware_efficiency, @@ -690,6 +706,64 @@ def __init__(self, scenario, user=None): participant_config["data_args"]["partition_parameter"] = self.scenario.partition_parameter participant_config["model_args"]["model"] = self.scenario.model participant_config["training_args"]["epochs"] = int(self.scenario.epochs) + if isinstance(self.scenario.dp, dict): + participant_config.setdefault("training_args", {}) + participant_config["training_args"].setdefault("dp", {}) + if "enabled" in self.scenario.dp: + participant_config["training_args"]["dp"]["enabled"] = bool(self.scenario.dp["enabled"]) + if "noise_multiplier" in self.scenario.dp: + participant_config["training_args"]["dp"]["noise_multiplier"] = float( + self.scenario.dp["noise_multiplier"] + ) + if "max_grad_norm" in self.scenario.dp: + participant_config["training_args"]["dp"]["max_grad_norm"] = float( + self.scenario.dp["max_grad_norm"] + ) + feature_squeezing = ( + self.scenario.feature_squeezing if isinstance(self.scenario.feature_squeezing, dict) else {} + ) + participant_config.setdefault("defense_args", {}) + participant_config["defense_args"].setdefault("feature_squeezing", {}) + participant_config["defense_args"]["feature_squeezing"]["enabled"] = bool( + feature_squeezing.get("enabled", False) + ) + bit_depth = feature_squeezing.get("bit_depth", feature_squeezing.get("n")) + if bit_depth is not None: + participant_config["defense_args"]["feature_squeezing"]["bit_depth"] = int(bit_depth) + adversarial_training = ( + self.scenario.adversarial_training if isinstance(self.scenario.adversarial_training, dict) else {} + ) + participant_config["defense_args"].setdefault("adversarial_training", {}) + participant_config["defense_args"]["adversarial_training"]["enabled"] = bool( + adversarial_training.get("enabled", False) + ) + if "domain" in adversarial_training: + participant_config["defense_args"]["adversarial_training"]["domain"] = str( + adversarial_training["domain"] + ) + if "attack" in adversarial_training: + participant_config["defense_args"]["adversarial_training"]["attack"] = str( + adversarial_training["attack"] + ) + for key in ( + "epsilon", + "alpha", + "apply_probability", + "target_loss_increase", + "max_loss_increase", + ): + if key in adversarial_training and adversarial_training[key] is not None: + participant_config["defense_args"]["adversarial_training"][key] = float( + adversarial_training[key] + ) + if "steps" in adversarial_training: + participant_config["defense_args"]["adversarial_training"]["steps"] = int( + adversarial_training["steps"] + ) + if "mode" in adversarial_training: + participant_config["defense_args"]["adversarial_training"]["mode"] = str( + adversarial_training["mode"] + ) participant_config["device_args"]["accelerator"] = self.scenario.accelerator participant_config["device_args"]["gpu_id"] = self.scenario.gpu_id participant_config["device_args"]["logging"] = self.scenario.logginglevel @@ -743,14 +817,17 @@ def __init__(self, scenario, user=None): "selection_fairness": self.scenario.selection_fairness, "performance_fairness": self.scenario.performance_fairness, "class_distribution": self.scenario.class_distribution, + "outcome_fairness": self.scenario.outcome_fairness, "explainability_pillar": self.scenario.explainability_pillar, "interpretability": self.scenario.interpretability, "post_hoc_methods": self.scenario.post_hoc_methods, "accountability_pillar": self.scenario.accountability_pillar, "factsheet_completeness": self.scenario.factsheet_completeness, + "monitoring": self.scenario.monitoring, "architectural_soundness_pillar": self.scenario.architectural_soundness_pillar, "client_management": self.scenario.client_management, "optimization": self.scenario.optimization, + "federation_management": self.scenario.federation_management, "sustainability_pillar": self.scenario.sustainability_pillar, "energy_source": self.scenario.energy_source, "hardware_efficiency": self.scenario.hardware_efficiency, @@ -953,7 +1030,7 @@ async def load_configurations_and_start_nodes( logging.info(f"Configuration | additional nodes | participant: {self.n_nodes + i + 1}") last_ip = participant_config["network_args"]["ip"] - logging.info(f"Valores de la ultima ip: ({last_ip})") + logging.info(f"Last ip values: ({last_ip})") participant_config["scenario_args"]["n_nodes"] = self.n_nodes + additional_nodes # self.n_nodes + i + 1 participant_config["device_args"]["idx"] = last_participant_index + i participant_config["network_args"]["neighbors"] = "" @@ -988,9 +1065,12 @@ async def load_configurations_and_start_nodes( if additional_participants: self.n_nodes += len(additional_participants) + + # Splitting dataset dataset_name = self.scenario.dataset dataset = None + if dataset_name == "MNIST": dataset = MNISTDataset( num_classes=10, @@ -1011,6 +1091,46 @@ async def load_configurations_and_start_nodes( seed=42, config_dir=self.config_dir, ) + elif dataset_name == "Covtype": + dataset = CovtypeDataset( + num_classes=7, + partitions_number=self.n_nodes, + iid=self.scenario.iid, + partition=self.scenario.partition_selection, + partition_parameter=self.scenario.partition_parameter, + seed=42, + config_dir=self.config_dir, + ) + elif dataset_name == "KDDCUP99": + dataset = KDDCUP99Dataset( + num_classes=2, + partitions_number=self.n_nodes, + iid=self.scenario.iid, + partition=self.scenario.partition_selection, + partition_parameter=self.scenario.partition_parameter, + seed=42, + config_dir=self.config_dir, + ) + elif dataset_name == "AdultCensus": + dataset = AdultCensusDataset( + num_classes=2, + partitions_number=self.n_nodes, + iid=self.scenario.iid, + partition=self.scenario.partition_selection, + partition_parameter=self.scenario.partition_parameter, + seed=42, + config_dir=self.config_dir, + ) + elif dataset_name == "BreastCancer": + dataset = BreastCancerDataset( + num_classes=2, + partitions_number=self.n_nodes, + iid=self.scenario.iid, + partition=self.scenario.partition_selection, + partition_parameter=self.scenario.partition_parameter, + seed=42, + config_dir=self.config_dir, + ) elif dataset_name == "EMNIST": dataset = EMNISTDataset( num_classes=47, @@ -1046,6 +1166,7 @@ async def load_configurations_and_start_nodes( logging.info(f"Splitting {dataset_name} dataset...") dataset.initialize_dataset() + logging.info(f"Splitting {dataset_name} dataset... Done") if self.scenario.deployment in ["docker", "process", "physical"]: diff --git a/nebula/core/aggregation/aggregator.py b/nebula/core/aggregation/aggregator.py index ff88668de..b9ab0d2fd 100755 --- a/nebula/core/aggregation/aggregator.py +++ b/nebula/core/aggregation/aggregator.py @@ -54,13 +54,13 @@ async def update_federation_nodes(self, federation_nodes: set): """ Updates the current set of nodes expected to participate in the upcoming aggregation round. - This method informs the update handler (`us`) about the new set of federation nodes, - clears any pending models, and attempts to acquire the aggregation lock to prepare + This method informs the update handler (`us`) about the new set of federation nodes, + clears any pending models, and attempts to acquire the aggregation lock to prepare for model aggregation. If the aggregation process is already running, it releases the lock and tries again to ensure proper cleanup between rounds. Args: - federation_nodes (set): A set of addresses representing the nodes expected to contribute + federation_nodes (set): A set of addresses representing the nodes expected to contribute updates for the next aggregation round. Raises: @@ -108,7 +108,10 @@ async def get_aggregation(self): TimeoutError: If the aggregation lock is not acquired within the defined timeout. asyncio.CancelledError: If the aggregation lock acquisition is cancelled. Exception: For any other unexpected errors during the aggregation process. - """ + """ + lock_acquired = False + lock_task = None + skip_task = None try: timeout = self.config.participant["aggregator_args"]["aggregation_timeout"] logging.info(f"Aggregation timeout: {timeout} starts...") @@ -119,24 +122,38 @@ async def get_aggregation(self): [lock_task, skip_task], return_when=asyncio.FIRST_COMPLETED, ) - lock_acquired = lock_task in done + if skip_task in done: logging.info("Skipping aggregation timeout, updates received before grace time") self._aggregation_waiting_skip.clear() - if not lock_acquired: + if not lock_task.done(): lock_task.cancel() + + if lock_task in done: try: - await lock_task # Clean cancel + await lock_task + lock_acquired = True + except TimeoutError: + logging.info("🔄 get_aggregation | Timeout reached; aggregating received updates") except asyncio.CancelledError: - pass + logging.info("🔄 get_aggregation | Lock acquisition was cancelled") - except TimeoutError: - logging.exception("🔄 get_aggregation | Timeout reached for aggregation") except asyncio.CancelledError: - logging.exception("🔄 get_aggregation | Lock acquisition was cancelled") + logging.exception("🔄 get_aggregation | Aggregation wait was cancelled") except Exception as e: logging.exception(f"🔄 get_aggregation | Error acquiring lock: {e}") finally: + for task in (lock_task, skip_task): + if task is None: + continue + if not task.done(): + task.cancel() + try: + await task + except asyncio.CancelledError: + pass + except TimeoutError: + pass if lock_acquired or self._aggregation_done_lock.locked(): await self._aggregation_done_lock.release_async() @@ -145,13 +162,15 @@ async def get_aggregation(self): if not updates: logging.info(f"🔄 get_aggregation | No updates has been received..resolving conflict to continue...") updates = {self._addr: await self.engine.resolve_missing_updates()} - + missing_nodes = await self.us.get_round_missing_nodes() if missing_nodes: logging.info(f"🔄 get_aggregation | Aggregation incomplete, missing models from: {missing_nodes}") else: logging.info("🔄 get_aggregation | All models accounted for, proceeding with aggregation.") + await self.us.before_aggregation(updates, self._federation_nodes) + agg_event = AggregationEvent(updates, self._federation_nodes, missing_nodes) await EventManager.get_instance().publish_node_event(agg_event) aggregated_result = self.run_aggregation(updates) diff --git a/nebula/core/aggregation/fedavg.py b/nebula/core/aggregation/fedavg.py index 2ae036a9f..42e82f14a 100755 --- a/nebula/core/aggregation/fedavg.py +++ b/nebula/core/aggregation/fedavg.py @@ -1,6 +1,7 @@ import gc import torch +import logging from nebula.core.aggregation.aggregator import Aggregator @@ -18,12 +19,17 @@ def __init__(self, config=None, **kwargs): def run_aggregation(self, models): super().run_aggregation(models) + if not models: + logging.warning("FedAvg received an empty update set.") + return None + models = list(models.values()) total_samples = float(sum(weight for _, weight in models)) if total_samples == 0: - raise ValueError("Total number of samples must be greater than zero.") + logging.warning("Total number of samples must be greater than zero.") + return None last_model_params = models[-1][0] accum = {layer: torch.zeros_like(param, dtype=torch.float32) for layer, param in last_model_params.items()} diff --git a/nebula/core/aggregation/krum.py b/nebula/core/aggregation/krum.py index 902b33fd5..1b6f0b8dd 100755 --- a/nebula/core/aggregation/krum.py +++ b/nebula/core/aggregation/krum.py @@ -1,5 +1,6 @@ import numpy import torch +import logging from nebula.core.aggregation.aggregator import Aggregator @@ -18,6 +19,10 @@ def __init__(self, config=None, **kwargs): def run_aggregation(self, models): super().run_aggregation(models) + if not models: + logging.warning("Krum received an empty update set.") + return None + models = list(models.values()) accum = {layer: torch.zeros_like(param).float() for layer, param in models[-1][0].items()} diff --git a/nebula/core/aggregation/median.py b/nebula/core/aggregation/median.py index a455ff77d..86608da97 100755 --- a/nebula/core/aggregation/median.py +++ b/nebula/core/aggregation/median.py @@ -1,5 +1,6 @@ import numpy as np import torch +import logging from nebula.core.aggregation.aggregator import Aggregator @@ -40,6 +41,10 @@ def get_median(self, weights): def run_aggregation(self, models): super().run_aggregation(models) + if not models: + logging.warning("Median received an empty update set.") + return None + models = list(models.values()) models_params = [m for m, _ in models] diff --git a/nebula/core/aggregation/trimmedmean.py b/nebula/core/aggregation/trimmedmean.py index f9af238db..bee62699f 100755 --- a/nebula/core/aggregation/trimmedmean.py +++ b/nebula/core/aggregation/trimmedmean.py @@ -1,5 +1,6 @@ import numpy as np import torch +import logging from nebula.core.aggregation.aggregator import Aggregator @@ -44,6 +45,10 @@ def get_trimmedmean(self, weights): def run_aggregation(self, models): super().run_aggregation(models) + if not models: + logging.warning("TrimmedMean received an empty update set.") + return None + models = list(models.values()) models_params = [m for m, _ in models] diff --git a/nebula/core/aggregation/updatehandlers/cflupdatehandler.py b/nebula/core/aggregation/updatehandlers/cflupdatehandler.py index 6e66203cb..d3ccace29 100644 --- a/nebula/core/aggregation/updatehandlers/cflupdatehandler.py +++ b/nebula/core/aggregation/updatehandlers/cflupdatehandler.py @@ -15,7 +15,7 @@ class Update: """ Represents a model update received from a node in a specific training round. - + Attributes: model (object): The model object or weights received. weight (float): The weight or importance of the update. @@ -55,7 +55,7 @@ class CFLUpdateHandler(UpdateHandler): _missing_ones (set): Tracks nodes whose updates are missing. _role (str): Role of this node (e.g., trainer or server). """ - + def __init__(self, aggregator, addr, buffersize=MAX_UPDATE_BUFFER_SIZE): self._addr = addr self._aggregator: Aggregator = aggregator @@ -130,6 +130,10 @@ async def storage_update(self, updt_received_event: UpdateReceivedEvent): Args: updt_received_event (UpdateReceivedEvent): The event containing the update. """ + if updt_received_event.is_reputation_update(): + logging.debug("Discard reputation-only update in aggregation storage") + return + time_received = time.time() (model, weight, source, round, _) = await updt_received_event.get_event_data() diff --git a/nebula/core/aggregation/updatehandlers/dflupdatehandler.py b/nebula/core/aggregation/updatehandlers/dflupdatehandler.py index b98cbaf98..c8b5a16d8 100644 --- a/nebula/core/aggregation/updatehandlers/dflupdatehandler.py +++ b/nebula/core/aggregation/updatehandlers/dflupdatehandler.py @@ -15,7 +15,7 @@ class Update: """ Represents a model update received from a node in a specific training round. - + Attributes: model (object): The model object or weights received. weight (float): The weight or importance of the update. @@ -47,7 +47,7 @@ class DFLUpdateHandler(UpdateHandler): This handler manages the reception, storage, and tracking of model updates from federation nodes during asynchronous rounds. It supports partial updates, late arrivals, and maintains update history. """ - + def __init__(self, aggregator, addr, buffersize=MAX_UPDATE_BUFFER_SIZE): """ Initialize the update handler with required locks and storage. @@ -149,6 +149,10 @@ async def storage_update(self, updt_received_event: UpdateReceivedEvent): Args: updt_received_event (UpdateReceivedEvent): Event with model update data. """ + if updt_received_event.is_reputation_update(): + logging.debug("Discard reputation-only update in aggregation storage") + return + time_received = time.time() (model, weight, source, round, _) = await updt_received_event.get_event_data() if source in self._sources_expected: diff --git a/nebula/core/aggregation/updatehandlers/sdflupdatehandler.py b/nebula/core/aggregation/updatehandlers/sdflupdatehandler.py index ec214f4cb..b91e82ef4 100644 --- a/nebula/core/aggregation/updatehandlers/sdflupdatehandler.py +++ b/nebula/core/aggregation/updatehandlers/sdflupdatehandler.py @@ -54,8 +54,10 @@ def __init__(self, aggregator, addr, buffersize=MAX_UPDATE_BUFFER_SIZE): self._addr = addr self._aggregator: Aggregator = aggregator self._buffersize = buffersize + # Store the last used update plus a short history per source to tolerate late/missing updates. self._updates_storage: dict[str, tuple[Update, deque[Update]]] = {} self._updates_storage_lock = Locker(name="updates_storage_lock", async_lock=True) + # SDFL aggregation waits for a dynamic set of trainer sources each round. self._sources_expected = set() self._sources_received = set() self._round_updates_lock = Locker(name="round_updates_lock", async_lock=True) @@ -91,6 +93,7 @@ async def round_expected_updates(self, federation_nodes: set): """ await self._update_federation_lock.acquire_async() await self._updates_storage_lock.acquire_async() + # Reset per-round reception state while preserving per-node history buffers. self._sources_expected = federation_nodes.copy() self._sources_received.clear() @@ -143,6 +146,11 @@ async def storage_update(self, updt_received_event: UpdateReceivedEvent): Args: updt_received_event (UpdateReceivedEvent): Event with model update data. """ + if updt_received_event.is_reputation_update(): + # Reputation model updates are consumed by the reputation addon, not by aggregation. + logging.debug("Discard reputation-only update in SDFL aggregation storage") + return + time_received = time.time() (model, weight, source, round, _) = await updt_received_event.get_event_data() if source in self._sources_expected: @@ -164,6 +172,7 @@ async def storage_update(self, updt_received_event: UpdateReceivedEvent): f"Updates received ({len(self._sources_received)}/{len(self._sources_expected)}) | Missing nodes: {updates_left}" ) if self._round_updates_lock.locked() and not updates_left: + # Release aggregation as soon as the last expected trainer update arrives. all_rec = await self._all_updates_received() if all_rec: await self._notify() @@ -190,6 +199,7 @@ async def get_round_updates(self): self._nodes_using_historic.clear() updates = {} for sr in self._sources_received: + # Use the newest update unless it was already consumed in a previous aggregation. source_historic = self.us[sr][1] last_updt_received = self.us[sr][0] updt: Update = None @@ -205,6 +215,32 @@ async def get_round_updates(self): await self._updates_storage_lock.release_async() return updates + async def before_aggregation(self, updates: dict[str, tuple[object, float]], federation_nodes: set): + """ + Calculate indirect SDFL reputation before aggregating trainer updates. + """ + engine = self.agg.engine + if not hasattr(engine, "_reputation") or engine._reputation is None: + return + + # The aggregator may receive updates from non-neighbor trainers through forwarding. + # Their reputation is inferred from reputation tables shared by expected trainers. + round_num = await engine.get_round() + expected_table_nodes = engine.get_sdfl_expected_trainers() + target_nodes = set(federation_nodes) | set(updates.keys()) + timeout = float( + self.agg.config.participant["defense_args"] + .get("reputation", {}) + .get("table_aggregation_timeout", 10) + ) + + await engine._reputation.calculate_indirect_reputation_for_non_neighbors( + target_nodes=target_nodes, + expected_table_nodes=expected_table_nodes, + round_num=round_num, + timeout=timeout, + ) + async def notify_federation_update(self, updt_nei_event: UpdateNeighborEvent): """ Handle federation node join/leave events. @@ -257,6 +293,7 @@ async def notify_if_all_updates_received(self): Set a notification trigger and notify aggregator if all updates are already received. """ logging.info("Set notification when all expected updates received") + # Hold this lock while the caller is waiting; _notify releases it once ready. await self._round_updates_lock.acquire_async() await self._updates_storage_lock.acquire_async() all_received = await self._all_updates_received() @@ -278,6 +315,7 @@ async def _notify(self): """ await self._notification_sent_lock.acquire_async() if self._notification: + # Multiple updates can race to complete the round; notify the aggregator once. await self._notification_sent_lock.release_async() return self._notification = True diff --git a/nebula/core/aggregation/updatehandlers/updatehandler.py b/nebula/core/aggregation/updatehandlers/updatehandler.py index f34849237..d6ac8367b 100644 --- a/nebula/core/aggregation/updatehandlers/updatehandler.py +++ b/nebula/core/aggregation/updatehandlers/updatehandler.py @@ -105,6 +105,15 @@ async def stop_notifying_updates(self): """ raise NotImplementedError + async def before_aggregation(self, updates: dict[str, tuple[object, float]], federation_nodes: set): + """ + Hook for federation-specific processing just before aggregation. + + DFL/CFL do not need extra work here. Federation-specific handlers can override this + without making the base aggregator know about a concrete federation type. + """ + return None + def factory_update_handler(updt_handler, aggregator, addr) -> UpdateHandler: from nebula.core.aggregation.updatehandlers.cflupdatehandler import CFLUpdateHandler diff --git a/nebula/core/datasets/adultcensus/__init__.py b/nebula/core/datasets/adultcensus/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/datasets/adultcensus/adultcensus.py b/nebula/core/datasets/adultcensus/adultcensus.py new file mode 100644 index 000000000..6618ccad9 --- /dev/null +++ b/nebula/core/datasets/adultcensus/adultcensus.py @@ -0,0 +1,394 @@ +# nebula/core/datasets/adultcensus/adultcensus.py +# Becker, B. & Kohavi, R. (1996). Adult [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5XW20. +# Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ + +import logging +import os +from typing import Any, ClassVar + +import numpy as np +import torch +from torch.utils.data import Dataset + +from nebula.core.datasets.nebuladataset import NebulaDataset, NebulaPartitionHandler +from nebula.core.datasets.tabular_metadata import ( + build_tabular_adversarial_metadata, +) + +logger = logging.getLogger(__name__) + + +class AdultCensusTorchDataset(Dataset): + """ + Torch Dataset wrapper for Adult Census Income dataset (tabular, already numeric). + x: float32 tensor (n_features,) + y: long scalar {0,1} where 1 means >50K + """ + def __init__( + self, + x: np.ndarray, + y: np.ndarray, + feature_names: list[str] | None = None, + continuous_features: list[int] | None = None, + integer_features: list[int] | None = None, + categorical_features: list[int] | None = None, + non_perturbable_features: list[int] | None = None, + categorical_groups: list[list[int]] | None = None, + tabular_metadata: dict | None = None, + ): + if not isinstance(x, np.ndarray) or not isinstance(y, np.ndarray): + raise ValueError("x and y must be numpy arrays") + + if x.ndim != 2: + raise ValueError(f"x must be 2D (n_samples, n_features). Got shape={x.shape}") + + y_arr: np.ndarray = np.asarray(y).reshape(-1) + if x.shape[0] != y_arr.shape[0]: + raise ValueError(f"x and y must have same number of samples. Got {x.shape[0]} != {y_arr.shape[0]}") + + self.x: np.ndarray = x.astype(np.float32, copy=False) + self.y: np.ndarray = y_arr.astype(np.int64, copy=False) + + # Nebula dataset conventions used by partitioning, logging and model setup. + self.data: np.ndarray = self.x + self.targets: np.ndarray = self.y + self.classes: list[str] = ["<=50K", ">50K"] + self.feature_names = feature_names or [f"feature_{i}" for i in range(self.x.shape[1])] + self.continuous_features = continuous_features or [] + self.integer_features = integer_features or [] + self.categorical_features = categorical_features or [] + self.non_perturbable_features = non_perturbable_features or [] + self.categorical_groups = categorical_groups or [] + self.tabular_metadata = tabular_metadata + self.input_dim = int(self.x.shape[1]) + + def __len__(self) -> int: + return int(self.y.shape[0]) + + def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]: + x_i: torch.Tensor = torch.from_numpy(self.x[idx]) + y_i: torch.Tensor = torch.tensor(int(self.y[idx]), dtype=torch.long) + return x_i, y_i + + +class AdultCensusPartitionHandler(NebulaPartitionHandler): + """ + Partition handler for tabular data. + """ + def __init__(self, file_path: str, prefix: str, config: Any, empty: bool = False): + super().__init__(file_path, prefix, config, empty) + self.transform = None # no torchvision transforms for tabular + + def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]: + data, target = super().__getitem__(idx) + + # Some Nebula handlers may wrap data in tuples + if isinstance(data, tuple): + data = data[0] + + if isinstance(data, torch.Tensor): + x: torch.Tensor = data.to(dtype=torch.float32) + else: + x = torch.tensor(np.asarray(data), dtype=torch.float32) + + if isinstance(target, torch.Tensor): + y: torch.Tensor = target.to(dtype=torch.long) + else: + y = torch.tensor(int(target), dtype=torch.long) + + if self.target_transform is not None: + y = self.target_transform(y) + + return x, y + + +class AdultCensusDataset(NebulaDataset): + """ + Adult Census Income dataset integration for Nebula. + + - 2 classes: <=50K vs >50K + - mixed tabular data -> numeric model input via preprocessing + - deterministic stratified train/test split + """ + CONTINUOUS_COLUMNS: ClassVar[list[str]] = [] + INTEGER_COLUMNS: ClassVar[list[str]] = [ + "age", + "fnlwgt", + "education-num", + "capital-gain", + "capital-loss", + "hours-per-week", + ] + CATEGORICAL_COLUMNS: ClassVar[list[str]] = [ + "workclass", + "education", + "marital-status", + "occupation", + "relationship", + "race", + "sex", + "native-country", + ] + # Experimental wide attack surface for testing constrained PGD thoroughly. + # This intentionally allows broad changes, including categorical flips. + PERTURBABLE_INTEGER_COLUMNS: ClassVar[list[str]] = list(INTEGER_COLUMNS) + PERTURBABLE_CATEGORICAL_COLUMNS: ClassVar[list[str]] = list(CATEGORICAL_COLUMNS) + + def __init__( + self, + num_classes: int = 2, + partitions_number: int = 1, + batch_size: int = 32, + num_workers: int = 4, + iid: bool = True, + partition: str = "dirichlet", + partition_parameter: float = 0.5, + seed: int = 42, + config_dir: str | None = None, + test_size: float = 0.2, + ): + super().__init__( + num_classes=num_classes, + partitions_number=partitions_number, + batch_size=batch_size, + num_workers=num_workers, + iid=iid, + partition=partition, + partition_parameter=partition_parameter, + seed=seed, + config_dir=config_dir, + ) + self.test_size: float = float(test_size) + + def initialize_dataset(self) -> None: + if self.train_set is None or self.test_set is None: + self.train_set, self.test_set = self.load_adult_census_dataset() + + self.data_partitioning(plot=True) + + @staticmethod + def _make_ohe_dense(): + """ + scikit-learn compatibility: + - older: OneHotEncoder(..., sparse=False) + - newer: OneHotEncoder(..., sparse_output=False) + """ + from sklearn.preprocessing import OneHotEncoder + + try: + return OneHotEncoder(handle_unknown="ignore", sparse_output=False) + except TypeError: + return OneHotEncoder(handle_unknown="ignore", sparse=False) + + @classmethod + def _validate_manual_schema(cls, columns) -> None: + continuous_columns = set(cls.CONTINUOUS_COLUMNS) + integer_columns = set(cls.INTEGER_COLUMNS) + categorical_columns = set(cls.CATEGORICAL_COLUMNS) + overlapping_columns = sorted( + (continuous_columns & integer_columns) + | (continuous_columns & categorical_columns) + | (integer_columns & categorical_columns) + ) + if overlapping_columns: + raise ValueError(f"AdultCensusDataset columns configured twice: {overlapping_columns}") + + configured_columns = continuous_columns | integer_columns | categorical_columns + dataset_columns = set(columns) + missing_columns = sorted(configured_columns - dataset_columns) + if missing_columns: + raise ValueError(f"AdultCensusDataset is missing configured columns: {missing_columns}") + unconfigured_columns = sorted(dataset_columns - configured_columns) + if unconfigured_columns: + raise ValueError(f"AdultCensusDataset has unconfigured columns: {unconfigured_columns}") + + def load_adult_census_dataset(self) -> tuple[AdultCensusTorchDataset, AdultCensusTorchDataset]: + """ + Loads Adult dataset from OpenML and preprocesses to all-numeric features. + + Steps: + 1) fetch_openml(data_id=1590, as_frame=True) + 2) y = (target == '>50K').astype(int) + 3) replace '?' with NA for missing values + 4) ColumnTransformer: + - continuous: median impute + StandardScaler + - integer: median impute + StandardScaler + - categorical: most_frequent impute + OneHotEncoder(dense) + 5) train/test split (stratified), fit preprocessing only on train (avoid leakage) + """ + data_dir: str = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data") + os.makedirs(data_dir, exist_ok=True) + + try: + import pandas as pd + from sklearn.compose import ColumnTransformer + from sklearn.datasets import fetch_openml + from sklearn.impute import SimpleImputer + from sklearn.model_selection import train_test_split + from sklearn.pipeline import Pipeline + from sklearn.preprocessing import StandardScaler + except Exception as e: + raise ImportError( + "AdultCensusDataset requires pandas + scikit-learn. Install them (e.g., pip install pandas scikit-learn)." + ) from e + + # Raw Adult Census uses mixed pandas columns; the model receives the + # numeric matrix produced later by the ColumnTransformer. + bunch = fetch_openml(data_id=1590, as_frame=True, data_home=data_dir) + X_df = bunch.data.copy() + y_raw = bunch.target + + # Normalize target labels to {0, 1}; 1 means income >50K. + y_str = y_raw.astype(str).str.strip() + y: np.ndarray = (y_str == ">50K").astype(np.int64).to_numpy() + + # Adult encodes missing values as '?'. Drop incomplete rows so the + # adversarial metadata is based on real observed feature ranges. + X_df = X_df.replace(r"^\s*\?\s*$", np.nan, regex=True) + self._validate_manual_schema(X_df.columns) + + numeric_columns = self.CONTINUOUS_COLUMNS + self.INTEGER_COLUMNS + for column in numeric_columns: + X_df[column] = pd.to_numeric(X_df[column], errors="coerce") + for column in self.CATEGORICAL_COLUMNS: + X_df[column] = X_df[column].astype(object) + + configured_columns = numeric_columns + self.CATEGORICAL_COLUMNS + valid_rows = ~X_df[configured_columns].isna().any(axis=1) + removed_rows = int((~valid_rows).sum()) + if removed_rows: + logger.info("[AdultCensus] Dropping %s rows with NA values", removed_rows) + X_df = X_df.loc[valid_rows].copy() + y = y[valid_rows.to_numpy()] + + # Numeric columns are standardized; categorical columns become one-hot + # columns. Constrained PGD metadata is built after this, in model input space. + numeric_transformer = Pipeline( + steps=[ + ("impute", SimpleImputer(strategy="median")), + ("scaler", StandardScaler(with_mean=True, with_std=True)), + ] + ) + + categorical_transformer = Pipeline( + steps=[ + ("impute", SimpleImputer(strategy="most_frequent")), + ("ohe", self._make_ohe_dense()), + ] + ) + + transformers = [] + if self.CONTINUOUS_COLUMNS: + transformers.append(("continuous", numeric_transformer, self.CONTINUOUS_COLUMNS)) + if self.INTEGER_COLUMNS: + transformers.append(("integer", numeric_transformer, self.INTEGER_COLUMNS)) + if self.CATEGORICAL_COLUMNS: + transformers.append(("categorical", categorical_transformer, self.CATEGORICAL_COLUMNS)) + + preprocessor = ColumnTransformer(transformers=transformers, remainder="drop") + + # Fit preprocessing only on train to avoid leaking test statistics. + X_train_df, X_test_df, y_train, y_test = train_test_split( + X_df, + y, + test_size=self.test_size, + random_state=self.seed, + shuffle=True, + stratify=y, + ) + + X_train = preprocessor.fit_transform(X_train_df) + X_test = preprocessor.transform(X_test_df) + feature_names = self._feature_names(preprocessor, X_train.shape[1]) + + # In case some sklearn path returns sparse matrices, densify safely + if hasattr(X_train, "toarray"): + X_train = X_train.toarray() + if hasattr(X_test, "toarray"): + X_test = X_test.toarray() + + X_train_np = np.asarray(X_train, dtype=np.float32) + X_test_np: np.ndarray = np.asarray(X_test, dtype=np.float32) + metadata = self._build_adversarial_metadata(feature_names, X_train_np, preprocessor) + logger.info("[AdultCensus] X_train shape = %s", X_train_np.shape) + logger.info("[AdultCensus] INPUT_DIM (post-OHE) = %s", int(X_train_np.shape[1])) + self._log_adversarial_metadata(metadata, feature_names) + + train_ds = self._make_dataset(X_train_np, y_train, feature_names, metadata) + test_ds = self._make_dataset(X_test_np, y_test, feature_names, metadata) + + return train_ds, test_ds + + @staticmethod + def _feature_names(preprocessor, n_features: int) -> list[str]: + try: + return [str(name) for name in preprocessor.get_feature_names_out()] + except Exception: + return [f"feature_{idx}" for idx in range(n_features)] + + @staticmethod + def _make_dataset(x, y, feature_names, metadata) -> AdultCensusTorchDataset: + return AdultCensusTorchDataset( + x, + np.asarray(y, dtype=np.int64), + feature_names=feature_names, + continuous_features=[], + integer_features=metadata["integer_features"], + categorical_features=metadata["categorical_features"], + non_perturbable_features=metadata["non_perturbable_features"], + categorical_groups=metadata["categorical_groups"], + tabular_metadata=metadata["tabular_metadata"], + ) + + @classmethod + def _build_adversarial_metadata(cls, feature_names, x_train, preprocessor) -> dict[str, Any]: + # Dataset responsibility ends here: declare which raw columns are perturbable. + # The shared metadata builder maps those declarations to transformed model features. + integer_scaler = preprocessor.named_transformers_["integer"].named_steps["scaler"] + integer_step_by_column = { + column: float(1.0 / scale) + for column, scale in zip(cls.INTEGER_COLUMNS, integer_scaler.scale_, strict=False) + } + return build_tabular_adversarial_metadata( + feature_names=feature_names, + x_train=x_train, + continuous_columns=cls.CONTINUOUS_COLUMNS, + integer_columns=cls.INTEGER_COLUMNS, + categorical_columns=cls.CATEGORICAL_COLUMNS, + perturbable_integer_columns=cls.PERTURBABLE_INTEGER_COLUMNS, + perturbable_categorical_columns=cls.PERTURBABLE_CATEGORICAL_COLUMNS, + integer_step_by_column=integer_step_by_column, + ) + + @staticmethod + def _log_adversarial_metadata(metadata: dict[str, Any], feature_names: list[str]) -> None: + integer_features = metadata["integer_features"] + categorical_features = metadata["categorical_features"] + non_perturbable_features = metadata["non_perturbable_features"] + logger.info( + "[AdultCensus] Tabular adversarial feature mask | integer=%s | categorical=%s | " + "categorical_groups=%s | non_perturbable=%s | integer_features=%s | " + "categorical_preview=%s | non_perturbable_preview=%s | integer_step_norm=%s", + len(integer_features), + len(categorical_features), + len(metadata["categorical_groups"]), + len(non_perturbable_features), + [feature_names[idx] for idx in integer_features], + [feature_names[idx] for idx in categorical_features[:20]], + [feature_names[idx] for idx in non_perturbable_features[:20]], + metadata["integer_step_norm"], + ) + + def generate_non_iid_map(self, dataset, partition: str = "dirichlet", partition_parameter: float = 0.5): + if partition == "dirichlet": + return self.dirichlet_partition(dataset, alpha=partition_parameter) + if partition == "percent": + return self.percentage_partition(dataset, percentage=partition_parameter) + raise ValueError(f"Partition {partition} is not supported for Non-IID map") + + def generate_iid_map(self, dataset, partition: str = "balancediid", partition_parameter: float = 2): + if partition == "balancediid": + return self.balanced_iid_partition(dataset) + if partition == "unbalancediid": + return self.unbalanced_iid_partition(dataset, imbalance_factor=partition_parameter) + raise ValueError(f"Partition {partition} is not supported for IID map") diff --git a/nebula/core/datasets/breast_cancer/__init__.py b/nebula/core/datasets/breast_cancer/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/datasets/breast_cancer/breast_cancer.py b/nebula/core/datasets/breast_cancer/breast_cancer.py new file mode 100644 index 000000000..04fbcf9ae --- /dev/null +++ b/nebula/core/datasets/breast_cancer/breast_cancer.py @@ -0,0 +1,287 @@ +# Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B. +# Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ + +import logging +import os +from typing import Any + +import numpy as np +import torch +from torch.utils.data import Dataset + +from nebula.core.datasets.nebuladataset import NebulaDataset, NebulaPartitionHandler +from nebula.core.datasets.tabular_metadata import build_tabular_adversarial_metadata + +logger = logging.getLogger(__name__) + + +class BreastCancerTorchDataset(Dataset): + """ + Torch Dataset wrapper for sklearn breast cancer dataset (tabular). + x: float32 tensor (n_features,) + y: long scalar {0,1} + """ + def __init__( + self, + x: np.ndarray, + y: np.ndarray, + feature_names: list[str] | None = None, + continuous_features: list[int] | None = None, + integer_features: list[int] | None = None, + non_perturbable_features: list[int] | None = None, + tabular_metadata: dict | None = None, + ): + if not isinstance(x, np.ndarray) or not isinstance(y, np.ndarray): + raise ValueError("x and y must be numpy arrays") + + if x.ndim != 2: + raise ValueError(f"x must be 2D (n_samples, n_features). Got shape={x.shape}") + + y = np.asarray(y).reshape(-1) + if x.shape[0] != y.shape[0]: + raise ValueError(f"x and y must have same number of samples. Got {x.shape[0]} != {y.shape[0]}") + + self.x = x.astype(np.float32, copy=False) + self.y = y.astype(np.int64, copy=False) + + # Nebula dataset conventions used by partitioning, logging and model setup. + self.data = self.x + self.targets = self.y + self.classes = ["0", "1"] + self.feature_names = feature_names or [f"feature_{i}" for i in range(self.x.shape[1])] + self.continuous_features = list(range(self.x.shape[1])) if continuous_features is None else continuous_features + self.integer_features = [] if integer_features is None else integer_features + self.non_perturbable_features = [] if non_perturbable_features is None else non_perturbable_features + self.binary_features = [] + self.tabular_metadata = tabular_metadata + self.input_dim = int(self.x.shape[1]) + + def __len__(self) -> int: + return int(self.y.shape[0]) + + def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]: + x_i = torch.from_numpy(self.x[idx]) + y_i = torch.tensor(self.y[idx], dtype=torch.long) + return x_i, y_i + + +class BreastCancerPartitionHandler(NebulaPartitionHandler): + """ + Partition handler for tabular data. + """ + def __init__(self, file_path: str, prefix: str, config: Any, empty: bool = False): + super().__init__(file_path, prefix, config, empty) + self.transform = None # no torchvision transforms for tabular + + def __getitem__(self, idx: int): + data, target = super().__getitem__(idx) + + if isinstance(data, tuple): + data = data[0] + + if isinstance(data, torch.Tensor): + x = data.to(dtype=torch.float32) + else: + x = torch.tensor(np.asarray(data), dtype=torch.float32) + + if isinstance(target, torch.Tensor): + y = target.to(dtype=torch.long) + else: + y = torch.tensor(int(target), dtype=torch.long) + + if self.target_transform is not None: + y = self.target_transform(y) + + return x, y + + +class BreastCancerDataset(NebulaDataset): + """ + Breast Cancer Wisconsin (Diagnostic) dataset integration for Nebula. + + - 2 classes + - tabular features (30) + - deterministic stratified train/test split + """ + # Raw sklearn feature names. These names are also the schema used to decide + # which variables adversarial training may perturb. + FEATURE_COLUMNS = [ + "mean radius", + "mean texture", + "mean perimeter", + "mean area", + "mean smoothness", + "mean compactness", + "mean concavity", + "mean concave points", + "mean symmetry", + "mean fractal dimension", + "radius error", + "texture error", + "perimeter error", + "area error", + "smoothness error", + "compactness error", + "concavity error", + "concave points error", + "symmetry error", + "fractal dimension error", + "worst radius", + "worst texture", + "worst perimeter", + "worst area", + "worst smoothness", + "worst compactness", + "worst concavity", + "worst concave points", + "worst symmetry", + "worst fractal dimension", + ] + # Breast Cancer has only continuous medical measurements. Keeping this as a + # list makes perturbability a dataset-level decision: remove a column here + # and the shared metadata builder will mark it as non-perturbable. + PERTURBABLE_CONTINUOUS_COLUMNS = list(FEATURE_COLUMNS) + PERTURBABLE_INTEGER_COLUMNS = [] + + def __init__( + self, + num_classes: int = 2, + partitions_number: int = 1, + batch_size: int = 32, + num_workers: int = 4, + iid: bool = True, + partition: str = "dirichlet", + partition_parameter: float = 0.5, + seed: int = 42, + config_dir: str | None = None, + test_size: float = 0.2, + ): + super().__init__( + num_classes=num_classes, + partitions_number=partitions_number, + batch_size=batch_size, + num_workers=num_workers, + iid=iid, + partition=partition, + partition_parameter=partition_parameter, + seed=seed, + config_dir=config_dir, + ) + self.test_size = float(test_size) + + def initialize_dataset(self): + if self.train_set is None or self.test_set is None: + self.train_set, self.test_set = self.load_breast_cancer_dataset() + + self.data_partitioning(plot=True) + + def load_breast_cancer_dataset(self): + data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data") + os.makedirs(data_dir, exist_ok=True) + + try: + from sklearn.datasets import load_breast_cancer + from sklearn.model_selection import train_test_split + from sklearn.preprocessing import StandardScaler + except Exception as e: + raise ImportError( + "BreastCancerDataset requires scikit-learn. Install it (e.g., pip install scikit-learn)." + ) from e + + ds = load_breast_cancer() + x = np.asarray(ds.data) + y = np.asarray(ds.target).reshape(-1) # already 0/1 + feature_names = [str(name) for name in ds.feature_names] + self._validate_manual_schema(feature_names) + + x_train, x_test, y_train, y_test = train_test_split( + x, + y, + test_size=self.test_size, + random_state=self.seed, + shuffle=True, + stratify=y, + ) + + scaler = StandardScaler() + x_train = scaler.fit_transform(x_train) + x_test = scaler.transform(x_test) + + # Constrained PGD receives standardized tensors, so metadata bounds must also be + # computed in this transformed model-input space. + x_train_np = np.asarray(x_train, dtype=np.float32) + x_test_np = np.asarray(x_test, dtype=np.float32) + metadata = self._build_adversarial_metadata(feature_names, x_train_np) + self._log_adversarial_metadata(metadata, feature_names) + + return ( + self._make_dataset(x_train_np, y_train, feature_names, metadata), + self._make_dataset(x_test_np, y_test, feature_names, metadata), + ) + + @classmethod + def _validate_manual_schema(cls, columns) -> None: + dataset_columns = set(columns) + expected_columns = set(cls.FEATURE_COLUMNS) + missing_columns = sorted(expected_columns - dataset_columns) + extra_columns = sorted(dataset_columns - expected_columns) + if missing_columns or extra_columns: + raise ValueError( + "BreastCancerDataset schema mismatch: " + f"missing={missing_columns}, extra={extra_columns}" + ) + + @classmethod + def _build_adversarial_metadata(cls, feature_names, x_train): + # The dataset only declares perturbable columns. The shared builder + # turns that declaration into feature types, bounds and masks for constrained PGD. + return build_tabular_adversarial_metadata( + feature_names=feature_names, + x_train=x_train, + continuous_columns=cls.FEATURE_COLUMNS, + integer_columns=[], + categorical_columns=[], + perturbable_continuous_columns=cls.PERTURBABLE_CONTINUOUS_COLUMNS, + perturbable_integer_columns=cls.PERTURBABLE_INTEGER_COLUMNS, + ) + + @staticmethod + def _make_dataset(x, y, feature_names, metadata) -> BreastCancerTorchDataset: + # Store the same metadata on train and test. Training uses it to create + # adversarial examples; evaluation can inspect it for robustness reports. + return BreastCancerTorchDataset( + x, + y, + feature_names=feature_names, + continuous_features=metadata["continuous_features"], + integer_features=metadata["integer_features"], + non_perturbable_features=metadata["non_perturbable_features"], + tabular_metadata=metadata["tabular_metadata"], + ) + + @staticmethod + def _log_adversarial_metadata(metadata: dict[str, Any], feature_names: list[str]) -> None: + continuous_features = metadata["continuous_features"] + non_perturbable_features = metadata["non_perturbable_features"] + logger.info( + "[BreastCancer] Tabular adversarial feature mask | continuous=%s | " + "non_perturbable=%s | continuous_features=%s | non_perturbable_preview=%s", + len(continuous_features), + len(non_perturbable_features), + [feature_names[idx] for idx in continuous_features], + [feature_names[idx] for idx in non_perturbable_features[:20]], + ) + + def generate_non_iid_map(self, dataset, partition: str = "dirichlet", partition_parameter: float = 0.5): + if partition == "dirichlet": + return self.dirichlet_partition(dataset, alpha=partition_parameter) + if partition == "percent": + return self.percentage_partition(dataset, percentage=partition_parameter) + raise ValueError(f"Partition {partition} is not supported for Non-IID map") + + def generate_iid_map(self, dataset, partition: str = "balancediid", partition_parameter: float = 2): + if partition == "balancediid": + return self.balanced_iid_partition(dataset) + if partition == "unbalancediid": + return self.unbalanced_iid_partition(dataset, imbalance_factor=partition_parameter) + raise ValueError(f"Partition {partition} is not supported for IID map") diff --git a/nebula/core/datasets/covtype/__init__.py b/nebula/core/datasets/covtype/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/datasets/covtype/covtype.py b/nebula/core/datasets/covtype/covtype.py new file mode 100644 index 000000000..2ef0a360c --- /dev/null +++ b/nebula/core/datasets/covtype/covtype.py @@ -0,0 +1,415 @@ +# nebula/core/datasets/covtype/covtype.py +# Blackard, J. (1998). Covertype [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C50K5N. +# Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ + +import logging +import os +from typing import Any + +import numpy as np +import torch +from torch.utils.data import Dataset + +from nebula.core.datasets.nebuladataset import NebulaDataset, NebulaPartitionHandler +from nebula.core.datasets.tabular_metadata import build_tabular_adversarial_metadata + +logger = logging.getLogger(__name__) + + +class CovtypeTorchDataset(Dataset): + """ + Torch Dataset wrapper for tabular Covtype data. + + Returns: + x: torch.float32 tensor of shape (n_features,) + y: torch.long scalar in [0, num_classes-1] + """ + def __init__( + self, + x: np.ndarray, + y: np.ndarray, + feature_names: list[str] | None = None, + continuous_features: list[int] | None = None, + integer_features: list[int] | None = None, + non_perturbable_features: list[int] | None = None, + binary_features: list[int] | None = None, + tabular_metadata: dict | None = None, + ): + if not isinstance(x, np.ndarray) or not isinstance(y, np.ndarray): + raise ValueError("x and y must be numpy arrays") + + if x.ndim != 2: + raise ValueError(f"x must be 2D (n_samples, n_features). Got shape={x.shape}") + if y.ndim != 1: + y = y.reshape(-1) + + if x.shape[0] != y.shape[0]: + raise ValueError(f"x and y must have same number of samples. Got {x.shape[0]} != {y.shape[0]}") + + self.x = x.astype(np.float32, copy=False) + self.y = y.astype(np.int64, copy=False) + + # Nebula dataset conventions used by partitioning, logging and model setup. + self.data = self.x + self.targets = self.y + + n_classes = int(np.max(self.targets)) + 1 + self.classes = [str(i) for i in range(n_classes)] + self.feature_names = feature_names or [f"feature_{i}" for i in range(self.x.shape[1])] + self.continuous_features = continuous_features or [] + self.integer_features = integer_features or [] + self.non_perturbable_features = non_perturbable_features or [] + self.binary_features = binary_features or [] + self.tabular_metadata = tabular_metadata + self.input_dim = int(self.x.shape[1]) + + def __len__(self) -> int: + return int(self.y.shape[0]) + + def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]: + x_i = torch.from_numpy(self.x[idx]) + y_i = torch.tensor(self.y[idx], dtype=torch.long) + return x_i, y_i + + +class CovtypePartitionHandler(NebulaPartitionHandler): + """ + Partition handler for tabular datasets. + + NebulaPartitionHandler provides (data, target) from the partition storage. + For images, we usually convert to PIL and apply torchvision transforms. + Here we convert features to float32 torch tensors and targets to long. + """ + def __init__(self, file_path: str, prefix: str, config: Any, empty: bool = False): + super().__init__(file_path, prefix, config, empty) + + # Tabular features are already preprocessed before partitioning, so no + # torchvision-style transform is applied here. + self.transform = None + + def __getitem__(self, idx: int): + data, target = super().__getitem__(idx) + + # Partition storage can return lists, numpy arrays or tensors. The model + # expects a 1D float32 tensor for each tabular sample. + if isinstance(data, tuple): + data = data[0] + + if isinstance(data, torch.Tensor): + x = data.to(dtype=torch.float32) + else: + x = torch.tensor(np.asarray(data), dtype=torch.float32) + + # Targets are stored as class indices and consumed by CrossEntropyLoss. + if isinstance(target, torch.Tensor): + y = target.to(dtype=torch.long) + else: + y = torch.tensor(int(target), dtype=torch.long) + + if self.target_transform is not None: + y = self.target_transform(y) + + return x, y + + +class CovtypeDataset(NebulaDataset): + """ + Covtype (Forest CoverType) dataset integration for Nebula. + + Notes: + - Covtype has 7 classes. + - Features are tabular (54 features in the classic version). + - Deterministic stratified train/test split. + + Requirements: + - scikit-learn must be installed (for fetch_covtype + train_test_split). + """ + CONTINUOUS_COLUMNS = [ + "Elevation", + "Aspect", + "Slope", + "Horizontal_Distance_To_Hydrology", + "Vertical_Distance_To_Hydrology", + "Horizontal_Distance_To_Roadways", + "Hillshade_9am", + "Hillshade_Noon", + "Hillshade_3pm", + "Horizontal_Distance_To_Fire_Points", + ] + BINARY_COLUMNS = [ + "Wilderness_Area_0", + "Wilderness_Area_1", + "Wilderness_Area_2", + "Wilderness_Area_3", + "Soil_Type_0", + "Soil_Type_1", + "Soil_Type_2", + "Soil_Type_3", + "Soil_Type_4", + "Soil_Type_5", + "Soil_Type_6", + "Soil_Type_7", + "Soil_Type_8", + "Soil_Type_9", + "Soil_Type_10", + "Soil_Type_11", + "Soil_Type_12", + "Soil_Type_13", + "Soil_Type_14", + "Soil_Type_15", + "Soil_Type_16", + "Soil_Type_17", + "Soil_Type_18", + "Soil_Type_19", + "Soil_Type_20", + "Soil_Type_21", + "Soil_Type_22", + "Soil_Type_23", + "Soil_Type_24", + "Soil_Type_25", + "Soil_Type_26", + "Soil_Type_27", + "Soil_Type_28", + "Soil_Type_29", + "Soil_Type_30", + "Soil_Type_31", + "Soil_Type_32", + "Soil_Type_33", + "Soil_Type_34", + "Soil_Type_35", + "Soil_Type_36", + "Soil_Type_37", + "Soil_Type_38", + "Soil_Type_39", + ] + # Covtype has two kinds of inputs: + # - terrain measurements, which constrained PGD may perturb; + # - binary wilderness/soil indicators, which are already one-hot-like. + # + # The binary groups are immutable in the current metadata. This avoids + # invalid wilderness/soil combinations while still exercising constrained + # PGD on the numeric part of the dataset. + PERTURBABLE_CONTINUOUS_COLUMNS = list(CONTINUOUS_COLUMNS) + PERTURBABLE_INTEGER_COLUMNS = [] + NON_PERTURBABLE_COLUMNS = list(BINARY_COLUMNS) + + def __init__( + self, + num_classes: int = 7, + partitions_number: int = 1, + batch_size: int = 32, + num_workers: int = 4, + iid: bool = True, + partition: str = "dirichlet", + partition_parameter: float = 0.5, + seed: int = 42, + config_dir: str | None = None, + test_size: float = 0.2, + train_limit: int | None = None, + test_limit: int | None = None, + ): + super().__init__( + num_classes=num_classes, + partitions_number=partitions_number, + batch_size=batch_size, + num_workers=num_workers, + iid=iid, + partition=partition, + partition_parameter=partition_parameter, + seed=seed, + config_dir=config_dir, + ) + self.test_size = float(test_size) + self.train_limit = train_limit + self.test_limit = test_limit + + def initialize_dataset(self): + if self.train_set is None or self.test_set is None: + self.train_set, self.test_set = self.load_covtype_dataset() + + self.data_partitioning(plot=True) + + @classmethod + def _default_feature_names(cls, n_features: int) -> list[str]: + configured_columns = cls.CONTINUOUS_COLUMNS + cls.BINARY_COLUMNS + if n_features == len(configured_columns): + return configured_columns + return [f"feature_{i}" for i in range(n_features)] + + @classmethod + def _validate_manual_schema(cls, columns) -> None: + continuous_columns = set(cls.CONTINUOUS_COLUMNS) + integer_columns = set(cls.PERTURBABLE_INTEGER_COLUMNS) + non_perturbable_columns = set(cls.NON_PERTURBABLE_COLUMNS) + overlapping_columns = sorted( + (continuous_columns & integer_columns) + | (continuous_columns & non_perturbable_columns) + | (integer_columns & non_perturbable_columns) + ) + if overlapping_columns: + raise ValueError(f"CovtypeDataset columns configured twice: {overlapping_columns}") + + configured_columns = continuous_columns | integer_columns | non_perturbable_columns + dataset_columns = set(columns) + missing_columns = sorted(configured_columns - dataset_columns) + if missing_columns: + raise ValueError(f"CovtypeDataset is missing configured columns: {missing_columns}") + unconfigured_columns = sorted(dataset_columns - configured_columns) + if unconfigured_columns: + raise ValueError(f"CovtypeDataset has unconfigured columns: {unconfigured_columns}") + + def load_covtype_dataset(self): + """ + Loads Covtype via sklearn, performs a deterministic train/test split, + and wraps into torch Datasets. + """ + data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data") + os.makedirs(data_dir, exist_ok=True) + + try: + from sklearn.datasets import fetch_covtype + from sklearn.model_selection import train_test_split + from sklearn.preprocessing import StandardScaler + except Exception as e: + raise ImportError( + "CovtypeDataset requires scikit-learn. Install it (e.g., pip install scikit-learn)." + ) from e + + cov = fetch_covtype(data_home=data_dir, download_if_missing=True) + + x = cov.data + y = cov.target # commonly 1..7 in sklearn + feature_names = getattr(cov, "feature_names", None) + if feature_names is None: + feature_names = self._default_feature_names(x.shape[1]) + feature_names = [str(name) for name in feature_names] + try: + self._validate_manual_schema(feature_names) + except ValueError: + if x.shape[1] != len(self.CONTINUOUS_COLUMNS) + len(self.BINARY_COLUMNS): + raise + logger.info( + "[Covtype] Replacing sklearn feature names with canonical Covtype names for adversarial metadata" + ) + feature_names = self._default_feature_names(x.shape[1]) + self._validate_manual_schema(feature_names) + + # sklearn usually returns labels in 1..7. CrossEntropyLoss expects + # zero-based class indices, so map them to 0..6 when needed. + y = np.asarray(y).reshape(-1) + if y.min() == 1: + y = y - 1 + + # Build a deterministic stratified train/test split. + x_train, x_test, y_train, y_test = train_test_split( + x, y, + test_size=self.test_size, + random_state=self.seed, + shuffle=True, + stratify=y, + ) + + # Optional stratified limits keep experiments manageable without + # changing the class distribution unnecessarily. + if self.train_limit is not None and len(y_train) > self.train_limit: + x_train, _, y_train, _ = train_test_split( + x_train, y_train, + train_size=self.train_limit, + random_state=self.seed, + shuffle=True, + stratify=y_train, + ) + + if self.test_limit is not None and len(y_test) > self.test_limit: + x_test, _, y_test, _ = train_test_split( + x_test, y_test, + train_size=self.test_limit, + random_state=self.seed, + shuffle=True, + stratify=y_test, + ) + + # Scale only the terrain measurements. The binary columns must remain + # exact 0/1 values because they encode wilderness and soil indicators. + scaler = StandardScaler() + x_train = np.asarray(x_train, dtype=np.float32).copy() + x_test = np.asarray(x_test, dtype=np.float32).copy() + continuous_features = [ + idx for idx, name in enumerate(feature_names) + if name in self.CONTINUOUS_COLUMNS + ] + x_train[:, continuous_features] = scaler.fit_transform(x_train[:, continuous_features]) + x_test[:, continuous_features] = scaler.transform(x_test[:, continuous_features]) + metadata = self._build_adversarial_metadata(feature_names, x_train) + self._log_adversarial_metadata(metadata, feature_names) + + return ( + self._make_dataset(x_train, y_train, feature_names, metadata), + self._make_dataset(x_test, y_test, feature_names, metadata), + ) + + @staticmethod + def _make_dataset( + x: np.ndarray, + y: np.ndarray, + feature_names: list[str], + metadata: dict[str, Any], + ) -> CovtypeTorchDataset: + return CovtypeTorchDataset( + x, + y, + feature_names=feature_names, + continuous_features=metadata["continuous_features"], + integer_features=metadata["integer_features"], + non_perturbable_features=metadata["non_perturbable_features"], + binary_features=metadata["non_perturbable_features"], + tabular_metadata=metadata["tabular_metadata"], + ) + + @classmethod + def _build_adversarial_metadata(cls, feature_names, x_train): + # Dataset responsibility: declare which variables are perturbable. The + # shared builder marks every other feature, including binary indicators, + # as non-perturbable and creates the masks consumed by constrained PGD. + return build_tabular_adversarial_metadata( + feature_names=feature_names, + x_train=x_train, + continuous_columns=cls.CONTINUOUS_COLUMNS, + integer_columns=[], + categorical_columns=[], + perturbable_continuous_columns=cls.PERTURBABLE_CONTINUOUS_COLUMNS, + perturbable_integer_columns=cls.PERTURBABLE_INTEGER_COLUMNS, + ) + + @staticmethod + def _log_adversarial_metadata(metadata: dict[str, Any], feature_names: list[str]) -> None: + continuous_features = metadata["continuous_features"] + non_perturbable_features = metadata["non_perturbable_features"] + logger.info( + "[Covtype] Tabular adversarial feature mask | continuous=%s | binary_non_perturbable=%s | " + "continuous_features=%s | non_perturbable_preview=%s", + len(continuous_features), + len(non_perturbable_features), + [feature_names[idx] for idx in continuous_features], + [feature_names[idx] for idx in non_perturbable_features[:20]], + ) + + def generate_non_iid_map(self, dataset, partition: str = "dirichlet", partition_parameter: float = 0.5): + if partition == "dirichlet": + partitions_map = self.dirichlet_partition(dataset, alpha=partition_parameter) + elif partition == "percent": + partitions_map = self.percentage_partition(dataset, percentage=partition_parameter) + else: + raise ValueError(f"Partition {partition} is not supported for Non-IID map") + + return partitions_map + + def generate_iid_map(self, dataset, partition: str = "balancediid", partition_parameter: float = 2): + if partition == "balancediid": + partitions_map = self.balanced_iid_partition(dataset) + elif partition == "unbalancediid": + partitions_map = self.unbalanced_iid_partition(dataset, imbalance_factor=partition_parameter) + else: + raise ValueError(f"Partition {partition} is not supported for IID map") + + return partitions_map diff --git a/nebula/core/datasets/datamodule.py b/nebula/core/datasets/datamodule.py index 04413f35a..aae9bf820 100755 --- a/nebula/core/datasets/datamodule.py +++ b/nebula/core/datasets/datamodule.py @@ -46,7 +46,7 @@ def __init__( self.data_val = None self.global_te_subset = None self.local_te_subset = None - + def get_samples_per_label(self): return self._samples_per_label diff --git a/nebula/core/datasets/image_metadata.py b/nebula/core/datasets/image_metadata.py new file mode 100644 index 000000000..0b206fbf8 --- /dev/null +++ b/nebula/core/datasets/image_metadata.py @@ -0,0 +1,14 @@ +IMAGE_DATASET_NORMALIZATION = { + "MNIST": ((0.5,), (0.5,)), + "FashionMNIST": ((0.5,), (0.5,)), + "EMNIST": ((0.5,), (0.5,)), + "CIFAR10": ((0.4914, 0.4822, 0.4465), (0.2471, 0.2435, 0.2616)), + "CIFAR100": ((0.4914, 0.4822, 0.4465), (0.2471, 0.2435, 0.2616)), +} + + +def get_image_normalization(dataset_name): + # Shared source of image mean/std values used by attacks in normalized model space. + if dataset_name is None: + return None + return IMAGE_DATASET_NORMALIZATION.get(str(dataset_name)) diff --git a/nebula/core/datasets/kddcup99/__init__.py b/nebula/core/datasets/kddcup99/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/datasets/kddcup99/kddcup99.py b/nebula/core/datasets/kddcup99/kddcup99.py new file mode 100644 index 000000000..494265bbe --- /dev/null +++ b/nebula/core/datasets/kddcup99/kddcup99.py @@ -0,0 +1,502 @@ +# Stolfo, S., Fan, W., Lee, W., Prodromidis, A., & Chan, P. (1999). KDD Cup 1999 Data [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C51C7N. +# Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ + +import logging +import os +from typing import Any + +import numpy as np +import torch +from torch.utils.data import Dataset + +from nebula.core.datasets.nebuladataset import NebulaDataset, NebulaPartitionHandler +from nebula.core.datasets.tabular_metadata import build_tabular_adversarial_metadata + +logger = logging.getLogger(__name__) + + +class KDDCUP99TorchDataset(Dataset): + """ + Torch Dataset wrapper for tabular KDDCUP99 data. + + Returns: + x: torch.float32 tensor of shape (n_features,) + y: torch.long scalar in [0, num_classes-1] + """ + def __init__( + self, + x: np.ndarray, + y: np.ndarray, + feature_names: list[str] | None = None, + continuous_features: list[int] | None = None, + integer_features: list[int] | None = None, + non_perturbable_features: list[int] | None = None, + binary_features: list[int] | None = None, + tabular_metadata: dict | None = None, + ): + if not isinstance(x, np.ndarray) or not isinstance(y, np.ndarray): + raise ValueError("x and y must be numpy arrays") + + if x.ndim != 2: + raise ValueError(f"x must be 2D (n_samples, n_features). Got shape={x.shape}") + if y.ndim != 1: + y = y.reshape(-1) + + if x.shape[0] != y.shape[0]: + raise ValueError(f"x and y must have same number of samples. Got {x.shape[0]} != {y.shape[0]}") + + self.x = x.astype(np.float32, copy=False) + self.y = y.astype(np.int64, copy=False) + + self.data = self.x + self.targets = self.y + + n_classes = int(np.max(self.targets)) + 1 + self.classes = [str(i) for i in range(n_classes)] + self.feature_names = feature_names or [f"feature_{i}" for i in range(self.x.shape[1])] + self.continuous_features = continuous_features or [] + self.integer_features = integer_features or [] + self.non_perturbable_features = non_perturbable_features or [] + self.binary_features = binary_features or [] + self.tabular_metadata = tabular_metadata + self.input_dim = int(self.x.shape[1]) + + def __len__(self) -> int: + return int(self.y.shape[0]) + + def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]: + x_i = torch.from_numpy(self.x[idx]) + y_i = torch.tensor(self.y[idx], dtype=torch.long) + return x_i, y_i + + +class KDDCUP99PartitionHandler(NebulaPartitionHandler): + """ + Partition handler for tabular datasets. + + NebulaPartitionHandler provides (data, target) from the partition storage. + For images, we usually convert to PIL and apply torchvision transforms. + Here we convert features to float32 torch tensors and targets to long. + """ + def __init__(self, file_path: str, prefix: str, config: Any, empty: bool = False): + super().__init__(file_path, prefix, config, empty) + + # Tabular features are already preprocessed before partitioning, so no + # torchvision-style transform is applied here. + self.transform = None + + def __getitem__(self, idx: int): + data, target = super().__getitem__(idx) + + # Partition storage can return lists, numpy arrays or tensors. The model + # expects a 1D float32 tensor for each tabular sample. + if isinstance(data, tuple): + data = data[0] + + if isinstance(data, torch.Tensor): + x = data.to(dtype=torch.float32) + else: + x = torch.tensor(np.asarray(data), dtype=torch.float32) + + # Targets are stored as class indices and consumed by CrossEntropyLoss. + if isinstance(target, torch.Tensor): + y = target.to(dtype=torch.long) + else: + y = torch.tensor(int(target), dtype=torch.long) + + if self.target_transform is not None: + y = self.target_transform(y) + + return x, y + + +class KDDCUP99Dataset(NebulaDataset): + """ + KDDCUP99 dataset integration for Nebula. + + Notes: + - KDDCUP99 is a tabular intrusion-detection dataset. + - sklearn fetch_kddcup99 exposes 41 features. + - Targets are mapped to a binary task: normal vs attack. + - Categorical string columns are one-hot encoded. + - Targets may come as bytes/strings, so we decode before mapping labels. + + Requirements: + - scikit-learn must be installed + - pandas must be installed + """ + RAW_FEATURE_COLUMNS = [ + "duration", + "protocol_type", + "service", + "flag", + "src_bytes", + "dst_bytes", + "land", + "wrong_fragment", + "urgent", + "hot", + "num_failed_logins", + "logged_in", + "num_compromised", + "root_shell", + "su_attempted", + "num_root", + "num_file_creations", + "num_shells", + "num_access_files", + "num_outbound_cmds", + "is_host_login", + "is_guest_login", + "count", + "srv_count", + "serror_rate", + "srv_serror_rate", + "rerror_rate", + "srv_rerror_rate", + "same_srv_rate", + "diff_srv_rate", + "srv_diff_host_rate", + "dst_host_count", + "dst_host_srv_count", + "dst_host_same_srv_rate", + "dst_host_diff_srv_rate", + "dst_host_same_src_port_rate", + "dst_host_srv_diff_host_rate", + "dst_host_serror_rate", + "dst_host_srv_serror_rate", + "dst_host_rerror_rate", + "dst_host_srv_rerror_rate", + ] + CONTINUOUS_COLUMNS = [ + "serror_rate", + "srv_serror_rate", + "rerror_rate", + "srv_rerror_rate", + "same_srv_rate", + "diff_srv_rate", + "srv_diff_host_rate", + "dst_host_same_srv_rate", + "dst_host_diff_srv_rate", + "dst_host_same_src_port_rate", + "dst_host_srv_diff_host_rate", + "dst_host_serror_rate", + "dst_host_srv_serror_rate", + "dst_host_rerror_rate", + "dst_host_srv_rerror_rate", + ] + INTEGER_COLUMNS = [ + "duration", + "src_bytes", + "dst_bytes", + "wrong_fragment", + "urgent", + "hot", + "num_failed_logins", + "num_compromised", + "num_root", + "num_file_creations", + "num_shells", + "num_access_files", + "num_outbound_cmds", + "count", + "srv_count", + "dst_host_count", + "dst_host_srv_count", + ] + CATEGORICAL_COLUMNS = [ + "protocol_type", + "service", + "flag", + ] + NON_PERTURBABLE_COLUMNS = [ + "land", + "logged_in", + "root_shell", + "su_attempted", + "is_host_login", + "is_guest_login", + ] + # KDDCUP99 exposes mixed network-traffic features. For the first supported + # adversarial-training version, constrained PGD may perturb numeric traffic + # measurements and counters. Protocol/service/flag one-hot columns and + # binary login/status flags stay immutable to avoid invalid records. + PERTURBABLE_CONTINUOUS_COLUMNS = list(CONTINUOUS_COLUMNS) + PERTURBABLE_INTEGER_COLUMNS = list(INTEGER_COLUMNS) + + def __init__( + self, + num_classes: int = 2, + partitions_number: int = 1, + batch_size: int = 32, + num_workers: int = 4, + iid: bool = True, + partition: str = "dirichlet", + partition_parameter: float = 0.5, + seed: int = 42, + config_dir: str | None = None, + test_size: float = 0.2, + train_limit: int | None = 20000, + test_limit: int | None = 4000, + subset: str | None = None, + percent10: bool = True, + ): + super().__init__( + num_classes=num_classes, + partitions_number=partitions_number, + batch_size=batch_size, + num_workers=num_workers, + iid=iid, + partition=partition, + partition_parameter=partition_parameter, + seed=seed, + config_dir=config_dir, + ) + self.test_size = float(test_size) + self.train_limit = train_limit + self.test_limit = test_limit + self.subset = subset + self.percent10 = percent10 + + def initialize_dataset(self): + if self.train_set is None or self.test_set is None: + self.train_set, self.test_set = self.load_kddcup99_dataset() + + self.data_partitioning(plot=True) + + @classmethod + def _ensure_raw_feature_names(cls, x): + if list(x.columns) == list(range(len(cls.RAW_FEATURE_COLUMNS))): + x = x.copy() + x.columns = cls.RAW_FEATURE_COLUMNS + return x + + @classmethod + def _validate_manual_schema(cls, columns) -> None: + continuous_columns = set(cls.CONTINUOUS_COLUMNS) + integer_columns = set(cls.INTEGER_COLUMNS) + categorical_columns = set(cls.CATEGORICAL_COLUMNS) + non_perturbable_columns = set(cls.NON_PERTURBABLE_COLUMNS) + overlapping_columns = sorted( + (continuous_columns & integer_columns) + | (continuous_columns & categorical_columns) + | (continuous_columns & non_perturbable_columns) + | (integer_columns & categorical_columns) + | (integer_columns & non_perturbable_columns) + | (categorical_columns & non_perturbable_columns) + ) + if overlapping_columns: + raise ValueError(f"KDDCUP99Dataset columns configured twice: {overlapping_columns}") + + configured_columns = continuous_columns | integer_columns | categorical_columns | non_perturbable_columns + dataset_columns = set(columns) + missing_columns = sorted(configured_columns - dataset_columns) + if missing_columns: + raise ValueError(f"KDDCUP99Dataset is missing configured columns: {missing_columns}") + unconfigured_columns = sorted(dataset_columns - configured_columns) + if unconfigured_columns: + raise ValueError(f"KDDCUP99Dataset has unconfigured columns: {unconfigured_columns}") + + def load_kddcup99_dataset(self): + """ + Loads KDDCUP99 via sklearn, performs deterministic preprocessing + and train/test split, and wraps into torch Datasets. + """ + data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data") + os.makedirs(data_dir, exist_ok=True) + + try: + import pandas as pd + from sklearn.datasets import fetch_kddcup99 + from sklearn.model_selection import train_test_split + from sklearn.preprocessing import StandardScaler + except Exception as e: + raise ImportError( + "KDDCUP99Dataset requires scikit-learn and pandas. " + ) from e + + kdd = fetch_kddcup99( + subset=self.subset, + data_home=data_dir, + shuffle=True, + random_state=self.seed, + percent10=self.percent10, + download_if_missing=True, + as_frame=True, + ) + + x = kdd.data + y = kdd.target + + # fetch_kddcup99 can return numpy arrays depending on sklearn version. + # The preprocessing below expects pandas columns. + if not hasattr(x, "columns"): + x = pd.DataFrame(x) + if not hasattr(y, "astype"): + y = pd.Series(y) + x = self._ensure_raw_feature_names(x) + self._validate_manual_schema(x.columns) + + def _decode_if_bytes(v): + if isinstance(v, (bytes, bytearray)): + return v.decode("utf-8", errors="ignore") + return v + + # Decode bytes before one-hot encoding categorical columns and mapping labels. + for col in x.columns: + if x[col].dtype == object: + x[col] = x[col].map(_decode_if_bytes) + + y = y.map(_decode_if_bytes) + + # One-hot encode protocol/service/flag and keep numeric columns as-is. + x = pd.get_dummies(x, drop_first=False) + feature_names = [str(col) for col in x.columns] + logger.info("[KDDCUP99] Encoded feature dimension: %s", len(feature_names)) + + # Map labels to a binary task: 0 = normal, 1 = attack. + y = pd.Series(y).astype(str) + y = y.str.strip() + y = (y != "normal.").astype(np.int64).to_numpy(copy=False) + self.num_classes = 2 + + # Build a deterministic stratified train/test split. + x_train, x_test, y_train, y_test = train_test_split( + x, y, + test_size=self.test_size, + random_state=self.seed, + shuffle=True, + stratify=y, + ) + + # Optional stratified limits keep experiments manageable without + # changing the class distribution unnecessarily. + if self.train_limit is not None and len(y_train) > self.train_limit: + x_train, _, y_train, _ = train_test_split( + x_train, y_train, + train_size=self.train_limit, + random_state=self.seed, + shuffle=True, + stratify=y_train, + ) + logger.info("[KDDCUP99] Limited train split to %s samples", len(y_train)) + + if self.test_limit is not None and len(y_test) > self.test_limit: + x_test, _, y_test, _ = train_test_split( + x_test, y_test, + train_size=self.test_limit, + random_state=self.seed, + shuffle=True, + stratify=y_test, + ) + logger.info("[KDDCUP99] Limited test split to %s samples", len(y_test)) + + x_train_np = x_train.astype(np.float32).to_numpy(copy=True) + x_test_np = x_test.astype(np.float32).to_numpy(copy=True) + + # Scale perturbable numeric columns after splitting. One-hot categorical + # columns and binary flags remain exact 0/1 values. + continuous_features = self._column_indices(x_train.columns, self.CONTINUOUS_COLUMNS) + integer_features = self._column_indices(x_train.columns, self.INTEGER_COLUMNS) + scaled_features = continuous_features + integer_features + integer_step_by_column = {} + if scaled_features: + scaler = StandardScaler() + x_train_np[:, scaled_features] = scaler.fit_transform(x_train_np[:, scaled_features]) + x_test_np[:, scaled_features] = scaler.transform(x_test_np[:, scaled_features]) + integer_scales = scaler.scale_[len(continuous_features):] + integer_step_by_column = { + column: float(1.0 / scale) + for column, scale in zip(self.INTEGER_COLUMNS, integer_scales, strict=False) + } + + metadata = self._build_adversarial_metadata(feature_names, x_train_np, integer_step_by_column) + self._log_adversarial_metadata(metadata, feature_names) + + return ( + self._make_dataset(x_train_np, y_train, feature_names, metadata), + self._make_dataset(x_test_np, y_test, feature_names, metadata), + ) + + @staticmethod + def _column_indices(columns, names: list[str]) -> list[int]: + return [columns.get_loc(name) for name in names if name in columns] + + @staticmethod + def _make_dataset( + x: np.ndarray, + y: np.ndarray, + feature_names: list[str], + metadata: dict[str, Any], + ) -> KDDCUP99TorchDataset: + dataset = KDDCUP99TorchDataset( + x, + y, + feature_names=feature_names, + continuous_features=metadata["continuous_features"], + integer_features=metadata["integer_features"], + non_perturbable_features=metadata["non_perturbable_features"], + binary_features=metadata["non_perturbable_features"], + tabular_metadata=metadata["tabular_metadata"], + ) + dataset.classes = ["normal", "attack"] + return dataset + + @classmethod + def _build_adversarial_metadata( + cls, + feature_names: list[str], + x_train: np.ndarray, + integer_step_by_column: dict[str, float], + ) -> dict[str, Any]: + # Dataset responsibility: declare which raw variables are perturbable. + # The shared builder maps that declaration to transformed feature masks, + # bounds and integer steps in model-input space. + return build_tabular_adversarial_metadata( + feature_names=feature_names, + x_train=x_train, + continuous_columns=cls.CONTINUOUS_COLUMNS, + integer_columns=cls.INTEGER_COLUMNS, + categorical_columns=cls.CATEGORICAL_COLUMNS, + perturbable_continuous_columns=cls.PERTURBABLE_CONTINUOUS_COLUMNS, + perturbable_integer_columns=cls.PERTURBABLE_INTEGER_COLUMNS, + integer_step_by_column=integer_step_by_column, + ) + + @staticmethod + def _log_adversarial_metadata(metadata: dict[str, Any], feature_names: list[str]) -> None: + continuous_features = metadata["continuous_features"] + integer_features = metadata["integer_features"] + non_perturbable_features = metadata["non_perturbable_features"] + logger.info( + "[KDDCUP99] Tabular adversarial feature mask | continuous=%s | integer=%s | " + "non_perturbable=%s | continuous_features=%s | integer_features=%s | " + "non_perturbable_preview=%s | integer_step_norm=%s", + len(continuous_features), + len(integer_features), + len(non_perturbable_features), + [feature_names[idx] for idx in continuous_features], + [feature_names[idx] for idx in integer_features], + [feature_names[idx] for idx in non_perturbable_features[:20]], + metadata["integer_step_norm"], + ) + + def generate_non_iid_map(self, dataset, partition: str = "dirichlet", partition_parameter: float = 0.5): + if partition == "dirichlet": + partitions_map = self.dirichlet_partition(dataset, alpha=partition_parameter) + elif partition == "percent": + partitions_map = self.percentage_partition(dataset, percentage=partition_parameter) + else: + raise ValueError(f"Partition {partition} is not supported for Non-IID map") + + return partitions_map + + def generate_iid_map(self, dataset, partition: str = "balancediid", partition_parameter: float = 2): + if partition == "balancediid": + partitions_map = self.balanced_iid_partition(dataset) + elif partition == "unbalancediid": + partitions_map = self.unbalanced_iid_partition(dataset, imbalance_factor=partition_parameter) + else: + raise ValueError(f"Partition {partition} is not supported for IID map") + + return partitions_map diff --git a/nebula/core/datasets/nebuladataset.py b/nebula/core/datasets/nebuladataset.py index 0c2e03d8a..4e5e6c903 100755 --- a/nebula/core/datasets/nebuladataset.py +++ b/nebula/core/datasets/nebuladataset.py @@ -1,4 +1,5 @@ import copy +import json import os import pickle from abc import ABC, abstractmethod @@ -74,6 +75,11 @@ def load_data(self): self.data = self.load_partition(f, f"{prefix}_data") self.targets = np.array(f[f"{prefix}_targets"]) self.num_classes = f[f"{prefix}_data"].attrs.get("num_classes", 0) + raw_tabular_metadata = f[f"{prefix}_data"].attrs.get("tabular_metadata", None) + if raw_tabular_metadata is not None: + if isinstance(raw_tabular_metadata, bytes): + raw_tabular_metadata = raw_tabular_metadata.decode("utf-8") + self.tabular_metadata = json.loads(raw_tabular_metadata) self.length = len(self.data) logging_training.info( f"[NebulaPartitionHandler] [{self.prefix}] Loaded {self.length} samples from {self.file_path} and {self.num_classes} classes." @@ -156,6 +162,9 @@ def load_partition(self, file, name): elif typ == "pickle_bytes": logging_training.info(f"Loading compressed pickled bytes object from {name}") return pickle.loads(item[()]) + elif typ == "array": + logging_training.info(f"Loading array object from {name}") + return item[()] else: logging_training.warning(f"[NebulaPartitionHandler] Unknown type encountered: {typ} for item {name}") return item[()] @@ -289,6 +298,8 @@ def load_partition(self): self.local_test_set = self.handler(test_partition_file, "local_test", config=self.config, empty=True) self.local_test_set.set_data(self.test_set.data, self.test_set.targets) + if hasattr(self.test_set, "tabular_metadata"): + self.local_test_set.tabular_metadata = self.test_set.tabular_metadata self.local_test_indices = self.set_local_test_indices() logging_training.info(f"Successfully loaded partition data for participant {p}.") @@ -458,6 +469,18 @@ def save_partition(self, obj, file, name): logging.exception(f"Error saving object to HDF5: {e}") raise + def save_dataset_partition(self, dataset, indices, file, name): + if hasattr(dataset, "x") and isinstance(dataset.x, np.ndarray): + logging.info(f"Saving array partition {name} with {len(indices)} samples") + data = dataset.x[indices].astype(np.float32, copy=False) + ds = file.create_dataset(name, data=data, compression="lzf", shuffle=True) + ds.attrs["__type__"] = "array" + logging.info(f"Saved array partition {name} with shape {data.shape}") + return + + partition_data = [dataset[i] for i in indices] + self.save_partition(partition_data, file, name) + def save_partitions(self): """ Save each partition data (train, test, and local test) to separate pickle files. @@ -481,9 +504,9 @@ def save_partitions(self): file_name = os.path.join(path, "global_test.h5") with h5py.File(file_name, "w") as f: indices = list(range(len(self.test_set))) - test_data = [self.test_set[i] for i in indices] - self.save_partition(test_data, f, "test_data") + self.save_dataset_partition(self.test_set, indices, f, "test_data") f["test_data"].attrs["num_classes"] = self.num_classes + self._save_tabular_metadata_attr(self.test_set, f["test_data"]) test_targets = np.array(self.test_set.targets) f.create_dataset("test_targets", data=test_targets, compression="gzip") @@ -492,9 +515,9 @@ def save_partitions(self): with h5py.File(file_name, "w") as f: logging.info(f"Saving training data for participant {participant} in {file_name}") indices = self.train_indices_map[participant] - train_data = [self.train_set[i] for i in indices] - self.save_partition(train_data, f, "train_data") + self.save_dataset_partition(self.train_set, indices, f, "train_data") f["train_data"].attrs["num_classes"] = self.num_classes + self._save_tabular_metadata_attr(self.train_set, f["train_data"]) train_targets = np.array([self.train_set.targets[i] for i in indices]) f.create_dataset("train_targets", data=train_targets, compression="gzip") logging.info(f"Partition saved for participant {participant}.") @@ -508,6 +531,14 @@ def save_partitions(self): self.clear() logging.info("Cleared dataset after saving partitions.") + def _save_tabular_metadata_attr(self, dataset, h5_dataset): + metadata = getattr(dataset, "tabular_metadata", None) + if metadata is None: + return + if hasattr(metadata, "to_dict"): + metadata = metadata.to_dict() + h5_dataset.attrs["tabular_metadata"] = json.dumps(metadata) + @abstractmethod def generate_non_iid_map(self, dataset, partition="dirichlet", plot=False): """ @@ -1285,11 +1316,19 @@ def factory_nebuladataset(dataset, **config) -> NebulaDataset: from nebula.core.datasets.cifar100.cifar100 import CIFAR100Dataset from nebula.core.datasets.emnist.emnist import EMNISTDataset from nebula.core.datasets.fashionmnist.fashionmnist import FashionMNISTDataset + from nebula.core.datasets.covtype.covtype import CovtypeDataset + from nebula.core.datasets.kddcup99.kddcup99 import KDDCUP99Dataset + from nebula.core.datasets.adultcensus.adultcensus import AdultCensusDataset + from nebula.core.datasets.breast_cancer.breast_cancer import BreastCancerDataset from nebula.core.datasets.mnist.mnist import MNISTDataset options = { "MNIST": MNISTDataset, "FashionMNIST": FashionMNISTDataset, + "Covtype": CovtypeDataset, + "KDDCUP99": KDDCUP99Dataset, + "AdultCensus": AdultCensusDataset, + "BreastCancer": BreastCancerDataset, "EMNIST": EMNISTDataset, "CIFAR10": CIFAR10Dataset, "CIFAR100": CIFAR100Dataset, diff --git a/nebula/core/datasets/tabular_metadata.py b/nebula/core/datasets/tabular_metadata.py new file mode 100644 index 000000000..85d240099 --- /dev/null +++ b/nebula/core/datasets/tabular_metadata.py @@ -0,0 +1,281 @@ +from __future__ import annotations + +from dataclasses import asdict, dataclass +from typing import Any + +CONTINUOUS = "continuous" +INTEGER = "integer" +CATEGORICAL = "categorical" +NON_PERTURBABLE = "non_perturbable" + +ERR_FEATURE_TYPES_LENGTH = "feature_types length must match feature_names length" +ERR_FEATURE_MIN_LENGTH = "feature_min_norm length must match feature_names length" +ERR_FEATURE_MAX_LENGTH = "feature_max_norm length must match feature_names length" +ERR_UNSUPPORTED_FEATURE_TYPES = "Unsupported tabular feature types: {feature_types}" +ERR_FEATURE_BOUNDS = "feature_min_norm must be <= feature_max_norm for every feature" +ERR_INTEGER_STEP_INDEX = "integer_step_norm contains invalid feature indices: {indices}" +ERR_INTEGER_STEP_VALUE = "integer_step_norm values must be > 0" +ERR_INTEGER_STEP_TYPE = "integer_step_norm contains non-integer feature indices: {indices}" +ERR_CATEGORICAL_GROUP_SIZE = "categorical_groups entries must contain at least two feature indices" +ERR_CATEGORICAL_GROUP_INDEX = "categorical_groups contains invalid feature indices: {indices}" +ERR_CATEGORICAL_GROUP_TYPE = "categorical_groups contains non-categorical feature indices: {indices}" +ERR_CATEGORICAL_GROUP_OVERLAP = "categorical_groups contains duplicated feature indices: {indices}" +ERR_CATEGORICAL_GROUP_COVERAGE = "categorical feature indices missing from categorical_groups: {indices}" + + +@dataclass(frozen=True) +class TabularAdversarialMetadata: + """Minimal metadata for tabular adversarial training.""" + + # These fields describe the exact vector received by the model after preprocessing. + # Bounds and steps must use the same normalized space as the training tensors. + feature_names: list[str] + feature_types: list[str] + feature_min_norm: list[float] + feature_max_norm: list[float] + integer_step_norm: dict[int, float] | None = None + categorical_groups: list[list[int]] | None = None + + def __post_init__(self): + # Fail early if a dataset exposes incomplete metadata. The attack relies on + # these arrays lining up feature-by-feature. + n_features = len(self.feature_names) + if len(self.feature_types) != n_features: + raise ValueError(ERR_FEATURE_TYPES_LENGTH) + if len(self.feature_min_norm) != n_features: + raise ValueError(ERR_FEATURE_MIN_LENGTH) + if len(self.feature_max_norm) != n_features: + raise ValueError(ERR_FEATURE_MAX_LENGTH) + + # Every feature needs a valid normalized interval so projection can clamp safely. + invalid_bounds = [ + idx + for idx, (min_value, max_value) in enumerate( + zip(self.feature_min_norm, self.feature_max_norm, strict=True) + ) + if min_value > max_value + ] + if invalid_bounds: + raise ValueError(ERR_FEATURE_BOUNDS) + invalid_types = set(self.feature_types) - {CONTINUOUS, INTEGER, CATEGORICAL, NON_PERTURBABLE} + if invalid_types: + raise ValueError(ERR_UNSUPPORTED_FEATURE_TYPES.format(feature_types=sorted(invalid_types))) + + # Integer steps represent the normalized distance between consecutive integer values. + # They only make sense for features marked as INTEGER. + invalid_step_indices = [ + idx + for idx in (self.integer_step_norm or {}) + if int(idx) < 0 or int(idx) >= n_features + ] + if invalid_step_indices: + raise ValueError(ERR_INTEGER_STEP_INDEX.format(indices=invalid_step_indices)) + non_integer_step_indices = [ + idx + for idx in (self.integer_step_norm or {}) + if self.feature_types[int(idx)] != INTEGER + ] + if non_integer_step_indices: + raise ValueError(ERR_INTEGER_STEP_TYPE.format(indices=non_integer_step_indices)) + if any(step <= 0 for step in (self.integer_step_norm or {}).values()): + raise ValueError(ERR_INTEGER_STEP_VALUE) + + # Categorical groups represent one original categorical column after one-hot encoding. + # Each group must be disjoint so projection can activate exactly one value per group. + grouped_counts: dict[int, int] = {} + for group in self.categorical_groups or []: + if len(group) < 2: + raise ValueError(ERR_CATEGORICAL_GROUP_SIZE) + invalid_indices = [idx for idx in group if idx < 0 or idx >= n_features] + if invalid_indices: + raise ValueError(ERR_CATEGORICAL_GROUP_INDEX.format(indices=invalid_indices)) + non_categorical_indices = [idx for idx in group if self.feature_types[idx] != CATEGORICAL] + if non_categorical_indices: + raise ValueError(ERR_CATEGORICAL_GROUP_TYPE.format(indices=non_categorical_indices)) + for idx in group: + grouped_counts[idx] = grouped_counts.get(idx, 0) + 1 + + duplicated_group_indices = sorted(idx for idx, count in grouped_counts.items() if count > 1) + if duplicated_group_indices: + raise ValueError(ERR_CATEGORICAL_GROUP_OVERLAP.format(indices=duplicated_group_indices)) + + # A categorical feature without a group cannot be projected back to a valid one-hot state. + grouped_categorical_indices = { + idx + for group in self.categorical_groups or [] + for idx in group + } + categorical_indices = { + idx + for idx, feature_type in enumerate(self.feature_types) + if feature_type == CATEGORICAL + } + missing_categorical_indices = sorted(categorical_indices - grouped_categorical_indices) + if missing_categorical_indices: + raise ValueError(ERR_CATEGORICAL_GROUP_COVERAGE.format(indices=missing_categorical_indices)) + + def to_dict(self) -> dict[str, Any]: + # Partitions persist metadata as JSON-like dictionaries in HDF5 attributes. + return asdict(self) + + @classmethod + def from_dict(cls, data: dict[str, Any]) -> TabularAdversarialMetadata: + # HDF5/JSON round-trips can turn integer keys into strings; normalize them here. + return cls( + feature_names=[str(value) for value in data["feature_names"]], + feature_types=[str(value) for value in data["feature_types"]], + feature_min_norm=[float(value) for value in data["feature_min_norm"]], + feature_max_norm=[float(value) for value in data["feature_max_norm"]], + integer_step_norm={int(k): float(v) for k, v in (data.get("integer_step_norm") or {}).items()}, + categorical_groups=[ + [int(idx) for idx in group] + for group in data.get("categorical_groups") or [] + ], + ) + + +def build_tabular_adversarial_metadata( + *, + feature_names: list[str], + x_train, + continuous_columns: list[str] | tuple[str, ...] = (), + integer_columns: list[str] | tuple[str, ...] = (), + categorical_columns: list[str] | tuple[str, ...] = (), + perturbable_continuous_columns: list[str] | tuple[str, ...] = (), + perturbable_integer_columns: list[str] | tuple[str, ...] = (), + perturbable_categorical_columns: list[str] | tuple[str, ...] = (), + integer_step_by_column: dict[str, float] | None = None, +) -> dict[str, Any]: + """Build tabular adversarial metadata from dataset-level perturbability lists.""" + # Datasets should only decide which raw columns are perturbable. This helper + # maps that decision to the transformed feature vector consumed by the model. + _validate_perturbable_columns( + continuous_columns=continuous_columns, + integer_columns=integer_columns, + categorical_columns=categorical_columns, + perturbable_continuous_columns=perturbable_continuous_columns, + perturbable_integer_columns=perturbable_integer_columns, + perturbable_categorical_columns=perturbable_categorical_columns, + ) + + perturbable_continuous = set(perturbable_continuous_columns) + perturbable_integer = set(perturbable_integer_columns) + perturbable_categorical = set(perturbable_categorical_columns) + + # Continuous/integer transformed features usually keep their raw column name + # after an optional transformer prefix, for example "integer__age". + continuous_features = [ + idx + for idx, name in enumerate(feature_names) + if _raw_feature_name(name) in perturbable_continuous + ] + integer_features = [ + idx + for idx, name in enumerate(feature_names) + if _raw_feature_name(name) in perturbable_integer + ] + # One raw categorical column becomes several one-hot features, for example + # "categorical__sex_Female" and "categorical__sex_Male". + categorical_features = [ + idx + for idx, name in enumerate(feature_names) + if _categorical_column_name(name, categorical_columns) in perturbable_categorical + ] + + continuous_feature_set = set(continuous_features) + integer_feature_set = set(integer_features) + categorical_feature_set = set(categorical_features) + perturbable_feature_set = continuous_feature_set | integer_feature_set | categorical_feature_set + non_perturbable_features = [ + idx + for idx in range(len(feature_names)) + if idx not in perturbable_feature_set + ] + + categorical_groups = _categorical_groups(feature_names, perturbable_categorical) + integer_step_norm = _integer_step_norm(feature_names, integer_features, integer_step_by_column or {}) + # The attack consumes only TabularAdversarialMetadata. The extra lists are + # returned so dataset wrappers and logs can expose the same mask clearly. + tabular_metadata = TabularAdversarialMetadata( + feature_names=feature_names, + feature_types=[ + CONTINUOUS if idx in continuous_feature_set + else INTEGER if idx in integer_feature_set + else CATEGORICAL if idx in categorical_feature_set + else NON_PERTURBABLE + for idx in range(len(feature_names)) + ], + feature_min_norm=[float(value) for value in x_train.min(axis=0)], + feature_max_norm=[float(value) for value in x_train.max(axis=0)], + integer_step_norm=integer_step_norm, + categorical_groups=categorical_groups, + ).to_dict() + + return { + "continuous_features": continuous_features, + "integer_features": integer_features, + "categorical_features": categorical_features, + "non_perturbable_features": non_perturbable_features, + "categorical_groups": categorical_groups, + "integer_step_norm": integer_step_norm, + "tabular_metadata": tabular_metadata, + } + + +def _validate_perturbable_columns( + *, + continuous_columns, + integer_columns, + categorical_columns, + perturbable_continuous_columns, + perturbable_integer_columns, + perturbable_categorical_columns, +) -> None: + invalid_continuous = sorted(set(perturbable_continuous_columns) - set(continuous_columns)) + invalid_integer = sorted(set(perturbable_integer_columns) - set(integer_columns)) + invalid_categorical = sorted(set(perturbable_categorical_columns) - set(categorical_columns)) + if invalid_continuous or invalid_integer or invalid_categorical: + raise ValueError( + "Perturbable columns must exist in the dataset schema: " + f"continuous={invalid_continuous}, integer={invalid_integer}, categorical={invalid_categorical}" + ) + + +def _raw_feature_name(feature_name: str) -> str: + # Strip sklearn ColumnTransformer prefixes such as "integer__" or + # "categorical__" while leaving plain feature names untouched. + return feature_name.split("__", maxsplit=1)[1] if "__" in feature_name else feature_name + + +def _categorical_column_name(feature_name: str, categorical_columns) -> str | None: + # Recover the raw categorical column name from a one-hot feature name. + raw_name = _raw_feature_name(feature_name) + for column in categorical_columns: + if raw_name.startswith(f"{column}_"): + return column + return None + + +def _categorical_groups(feature_names: list[str], perturbable_categorical_columns: set[str]) -> list[list[int]]: + # Constrained PGD projects each group back to exactly one active one-hot value. + groups = [] + for column in perturbable_categorical_columns: + prefix = f"categorical__{column}_" + group = [idx for idx, name in enumerate(feature_names) if name.startswith(prefix)] + if group: + groups.append(group) + return groups + + +def _integer_step_norm( + feature_names: list[str], + integer_features: list[int], + integer_step_by_column: dict[str, float], +) -> dict[int, float]: + # Integer columns may be scaled. The step tells constrained PGD what "+1 raw unit" + # means in the normalized model-input space. + return { + idx: float(integer_step_by_column[_raw_feature_name(feature_names[idx])]) + for idx in integer_features + if _raw_feature_name(feature_names[idx]) in integer_step_by_column + } diff --git a/nebula/core/engine.py b/nebula/core/engine.py index f43625ed2..831abf549 100644 --- a/nebula/core/engine.py +++ b/nebula/core/engine.py @@ -1,4 +1,5 @@ import asyncio +import json import logging import os import random @@ -94,7 +95,7 @@ def __init__( self.ip = config.participant["network_args"]["ip"] self.port = config.participant["network_args"]["port"] self.addr = config.participant["network_args"]["addr"] - + self.name = config.participant["device_args"]["name"] self.client = docker.from_env() @@ -117,6 +118,8 @@ def __init__( self._secure_neighbors = [] self._is_malicious = self.config.participant["adversarial_args"]["attack_params"]["attacks"] != "No Attack" + role = config.participant["device_args"]["role"] + msg = f"Trainer: {self._trainer.__class__.__name__}" msg += f"\nDataset: {self.config.participant['data_args']['dataset']}" msg += f"\nIID: {self.config.participant['data_args']['iid']}" @@ -139,7 +142,6 @@ def __init__( self._cm = CommunicationsManager(engine=self) - role = config.participant["device_args"]["role"] self._role_behavior: RoleBehavior = factory_role_behavior(role, self, config) self._role_behavior_performance_lock = Locker("role_behavior_performance_lock", async_lock=True) @@ -155,6 +157,12 @@ def __init__( self.sinchronized_status_lock = Locker(name="sinchronized_status_lock") self.trainning_in_progress_lock = Locker(name="trainning_in_progress_lock", async_lock=True) + self._global_model_received = asyncio.Event() + self._global_model_source = None + self._leadership_transfer_lock = Locker("leadership_transfer_lock", async_lock=True) + self._leadership_transfer_pending = None + self._leadership_transfer_ack = asyncio.Event() + self._leadership_transfer_counts = {} event_manager = EventManager.get_instance(verbose=False) self._addon_manager = AddondManager(self, self.config) @@ -165,7 +173,13 @@ def __init__( else: self._situational_awareness = None - if self.config.participant["defense_args"]["reputation"]["enabled"]: + self._reputation = None + + role = self.config.participant["device_args"]["role"] + federation = self.config.participant["scenario_args"].get("federation") + reputation_enabled = self.config.participant["defense_args"]["reputation"]["enabled"] + + if reputation_enabled and (role == "server" or federation!="CFL"): self._reputation = Reputation(engine=self, config=self.config) @property @@ -187,7 +201,7 @@ def aggregator(self): def trainer(self): """Trainer""" return self._trainer - + @property def rb(self): """Role Behavior""" @@ -215,6 +229,114 @@ async def update_federation_nodes(self, federation_nodes): async with self._federation_nodes_lock: self.federation_nodes = federation_nodes + async def mark_leadership_transfer_pending(self, successor: str): + async with self._leadership_transfer_lock: + self._leadership_transfer_pending = successor + self._leadership_transfer_ack.clear() + logging.info(f"SDFL leadership | Waiting ACK from successor {successor}") + + async def confirm_leadership_transfer_ack(self, source: str) -> bool: + async with self._leadership_transfer_lock: + if self._leadership_transfer_pending is None: + return False + if self._leadership_transfer_pending != source: + logging.info( + f"SDFL leadership | Ignoring ACK from {source}; " + f"pending successor is {self._leadership_transfer_pending}" + ) + return False + + logging.info(f"SDFL leadership | ACK received from successor {source}") + self._leadership_transfer_ack.set() + return True + + async def wait_pending_leadership_ack(self): + async with self._leadership_transfer_lock: + successor = self._leadership_transfer_pending + + if successor is None: + return + + timeout = float(self.config.participant.get("misc_args", {}).get("leadership_ack_timeout", 20)) + logging.info(f"SDFL leadership | Waiting up to {timeout}s for ACK from {successor}") + + ack_received = False + try: + await asyncio.wait_for(self._leadership_transfer_ack.wait(), timeout=timeout) + ack_received = True + except TimeoutError: + logging.warning( + f"SDFL leadership | ACK from {successor} not received before next round; " + "keeping aggregator role until ACK arrives" + ) + + async with self._leadership_transfer_lock: + if self._leadership_transfer_pending != successor: + return + + if self._leadership_transfer_ack.is_set(): + ack_received = True + + if not ack_received: + return + + self._leadership_transfer_pending = None + self._leadership_transfer_ack.clear() + + await self.rb.set_next_role(Role.TRAINER) + + async def select_leadership_successor(self, candidates) -> str | None: + candidates = sorted(set(candidates)) + if not candidates: + return None + + async with self._leadership_transfer_lock: + candidate_counts = { + candidate: self._leadership_transfer_counts.get(candidate, 0) + for candidate in candidates + } + + min_count = min(candidate_counts.values()) + least_used_candidates = [ + candidate + for candidate, count in candidate_counts.items() + if count == min_count + ] + successor = random.choice(least_used_candidates) + logging.info( + f"Leadership transfer candidate counts: {candidate_counts} | " + f"selected={successor}" + ) + return successor + + async def register_leadership_transfer(self, node: str): + async with self._leadership_transfer_lock: + self._leadership_transfer_counts[node] = ( + self._leadership_transfer_counts.get(node, 0) + 1 + ) + logging.info( + f"Leadership transfer count updated | node={node} | " + f"count={self._leadership_transfer_counts[node]}" + ) + + def get_sdfl_expected_trainers(self) -> set[str]: + nodes = self.config.participant.get("trust_args", {}).get("scenario", {}).get("nodes", {}) + expected_nodes = set() + roles_to_include = {"trainer", "aggregator", "trainer_aggregator", "malicious"} + + for node in nodes.values(): + role = node.get("role") + ip = node.get("ip") + port = node.get("port") + if role not in roles_to_include or ip is None or port is None: + continue + + addr = f"{ip}:{port}" + if addr != self.addr: + expected_nodes.add(addr) + + return expected_nodes + def get_initialization_status(self): return self.initialized @@ -272,6 +394,23 @@ async def model_update_callback(self, source, message): if not self.get_federation_ready_lock().locked() and len(await self.get_federation_nodes()) == 0: logging.info("🤖 handle_model_message | There are no defined federation nodes") return + if self.config.participant["scenario_args"].get("federation") == "SDFL": + direct_neighbors = await self.cm.get_addrs_current_connections(only_direct=True, myself=False) + if source not in direct_neighbors: + logging.info(f"SDFL reputation | Ignoring model/update from non-neighbor source={source}") + return + + decoded_model = self.trainer.deserialize_model(message.parameters) + updt_received_event = UpdateReceivedEvent( + decoded_model, + message.weight, + source, + message.round, + update_type=UpdateReceivedEvent.REPUTATION_UPDATE, + ) + await EventManager.get_instance().publish_node_event(updt_received_event) + logging.info(f"SDFL reputation | Published reputation UpdateReceivedEvent from {source}") + return decoded_model = self.trainer.deserialize_model(message.parameters) updt_received_event = UpdateReceivedEvent(decoded_model, message.weight, source, message.round) await EventManager.get_instance().publish_node_event(updt_received_event) @@ -317,7 +456,8 @@ async def _control_alive_callback(self, source, message): async def _control_leadership_transfer_callback(self, source, message): logging.info(f"🔧 handle_control_message | Trigger | Received leadership transfer message from {source}") - + await self.register_leadership_transfer(source) + if await self._round_in_process_lock.locked_async(): logging.info("Learning cycle is executing, role behavior will be modified next round") await self.rb.set_next_role(Role.AGGREGATOR, source_to_notificate=source) @@ -337,6 +477,9 @@ async def _control_leadership_transfer_callback(self, source, message): async def _control_leadership_transfer_ack_callback(self, source, message): logging.info(f"🔧 handle_control_message | Trigger | Received leadership transfer ack message from {source}") # No concurrence of difference ack received treated, be aware of that. + if await self.confirm_leadership_transfer_ack(source): + return + if await self._round_in_process_lock.locked_async(): logging.info("Learning cycle is executing, role behavior will be modified next round") await self.rb.set_next_role(Role.TRAINER) @@ -354,7 +497,7 @@ async def _control_leadership_transfer_ack_callback(self, source, message): except TimeoutError: logging.info("Learning cycle is locked, role behavior will be modified next round") await self.rb.set_next_role(Role.TRAINER) - + async def _connection_connect_callback(self, source, message): logging.info(f"🔗 handle_connection_message | Trigger | Received connection message from {source}") @@ -414,6 +557,190 @@ async def _reputation_share_callback(self, source, message): except Exception as e: logging.exception(f"Error handling reputation message: {e}") + async def _reputationtable_table_callback(self, source, message): + try: + # Reputation tables are an SDFL-only control plane for indirect reputation. + if self.config.participant["scenario_args"].get("federation") != "SDFL": + return + if self.rb.get_role_name(True) != "aggregator": + return + if not hasattr(self, "_reputation") or self._reputation is None: + return + + reputation_table = json.loads(message.reputation_table_json or "{}") + if not isinstance(reputation_table, dict): + logging.warning( + f"SDFL reputation | Ignoring reputation table from {message.node_id}; " + f"invalid payload type: {type(reputation_table)}" + ) + return + + await self._reputation.register_reputation_table( + message.node_id, + message.round, + reputation_table, + received_from=source, + ) + # Start or refresh the async collection window for this SDFL round. + expected_nodes = self.get_sdfl_expected_trainers() + timeout = float( + self.config.participant["defense_args"] + .get("reputation", {}) + .get("table_aggregation_timeout", 10) + ) + self._reputation.start_reputation_tables_collection(expected_nodes, message.round, timeout) + except json.JSONDecodeError as e: + logging.warning(f"SDFL reputation | Could not decode reputation table from {source}: {e}") + except Exception as e: + logging.exception(f"Error handling reputation table message: {e}") + + async def _trustworthiness_report_callback(self, source, message): + try: + report = { + "source": source, + "node_id": message.node_id, + "bytes_sent": message.bytes_sent, + "bytes_recv": message.bytes_recv, + "accuracy": message.accuracy, + "loss": message.loss, + "role": message.role, + "energy_grid": message.energy_grid, + "emissions": message.emissions, + "workload": message.workload, + "cpu_model": message.cpu_model, + "gpu_model": message.gpu_model, + "cpu_used": message.cpu_used, + "gpu_used": message.gpu_used, + "energy_consumed": message.energy_consumed, + "sample_size": message.sample_size, + "class_imbalance": message.class_imbalance, + "model_size": message.model_size, + "local_entropy": message.local_entropy, + "val_accuracy": message.val_accuracy, + "dp_enabled": message.dp_enabled, + "dp_epsilon": message.dp_epsilon, + "macro_f1": message.macro_f1, + "train_accuracy": message.train_accuracy, + } + + logging.info(f"handle_trustworthiness_message | Trigger | {report}") + + if hasattr(self, "trustworthiness") and self.trustworthiness is not None: + if hasattr(self.trustworthiness, "tw") and self.trustworthiness.tw is not None: + if hasattr(self.trustworthiness.tw, "register_trustworthiness_report"): + await self.trustworthiness.tw.register_trustworthiness_report(source, message) + + + except Exception as e: + logging.exception(f"Error handling trustworthiness message: {e}") + + async def _trustscores_share_callback(self, source, message): + try: + report = { + "source": source, + "node_id": message.node_id, + "trust_report_json": message.trust_report_json, + } + + logging.info(f"handle_trustscores_message | Trigger | {report}") + + trust_handler = getattr(self, "trustworthiness", None) + if trust_handler is None: + trust_handler = getattr(self, "trustscores", None) + + if trust_handler is not None: + if hasattr(trust_handler, "tw") and trust_handler.tw is not None: + if hasattr(trust_handler.tw, "register_trustscores_report"): + await trust_handler.tw.register_trustscores_report(source, message) + + + except Exception as e: + logging.exception(f"Error handling trustscores message: {e}") + + async def _sdflmodel_trainer_update_callback(self, source, message): + try: + logging.info( + f"SDFL | TRAINER_UPDATE callback triggered | " + f"source={source} | node_id={message.node_id} | " + f"target={message.target} | round={message.round} | " + f"local_round={self.round} | role={self.rb.get_role_name(True)}" + ) + + federation = self.config.participant["scenario_args"]["federation"] + + if federation != "SDFL": + logging.info("SDFL | Ignoring TRAINER_UPDATE because federation is not SDFL") + return + + role = self.rb.get_role_name(True) + + if role != "aggregator": + logging.info(f"SDFL | Ignoring TRAINER_UPDATE because role={role}") + return + + if message.target != "aggregator": + logging.info(f"SDFL | Ignoring TRAINER_UPDATE because target={message.target}") + return + + if message.round != self.round: + logging.info( + f"SDFL | Ignoring TRAINER_UPDATE from round={message.round}; " + f"current round={self.round}" + ) + return + + # Valid trainer updates are converted into the normal aggregation event stream. + decoded_model = self.trainer.deserialize_model(message.parameters) + + event = UpdateReceivedEvent( + decoded_model, + message.weight, + message.node_id, + message.round, + ) + + await EventManager.get_instance().publish_node_event(event) + + logging.info( + f"SDFL aggregator | Published UpdateReceivedEvent | " + f"trainer={message.node_id} | round={message.round} | weight={message.weight}" + ) + + except Exception as e: + logging.exception(f"Error handling SDFL TRAINER_UPDATE message: {e}") + + async def _sdflmodel_global_model_callback(self, source, message): + role = self.rb.get_role_name(True) + logging.info( + f"SDFL | GLOBAL_MODEL callback triggered | " + f"source={source} | node_id={message.node_id} | " + f"target={message.target} | round={message.round} | " + f"local_round={self.round} | role={role}" + ) + + if self.config.participant["scenario_args"].get("federation") == "SDFL": + if role != "trainer": + logging.info(f"SDFL | Ignoring GLOBAL_MODEL because role={role}") + return + + if message.target != "trainer": + logging.info(f"SDFL | Ignoring GLOBAL_MODEL because target={message.target}") + return + + if message.round != self.round: + logging.info( + f"SDFL | Ignoring GLOBAL_MODEL from round={message.round}; " + f"current round={self.round}" + ) + return + + # Trainers apply the aggregator's global model and unblock their SDFL round wait. + decoded_model = self.trainer.deserialize_model(message.parameters) + self.trainer.set_model_parameters(decoded_model) + + self._global_model_source = message.node_id + self._global_model_received.set() + """ ############################## # REGISTERING CALLBACKS # ############################## @@ -621,8 +948,8 @@ async def deploy_components(self): await self.aggregator.init() if "situational_awareness" in self.config.participant: await self.sa.init() - if self.config.participant["defense_args"]["reputation"]["enabled"]: - await self._reputation.setup() + if self._reputation is not None: + await self._reputation.setup() await self._reporter.start() await self._addon_manager.deploy_additional_services() @@ -710,10 +1037,10 @@ async def _start_learning(self): await self.get_federation_ready_lock().acquire_async() if self.config.participant["device_args"]["start"]: logging.info("Propagate initial model updates.") - + mpe = ModelPropagationEvent(await self.cm.get_addrs_current_connections(only_direct=True, myself=False), "initialization") await EventManager.get_instance().publish_node_event(mpe) - + await self.get_federation_ready_lock().release_async() self.trainer.set_epochs(epochs) @@ -764,7 +1091,8 @@ async def learning_cycle_finished(self): return False else: return current_round >= self.total_rounds - + #return False + async def resolve_missing_updates(self): """ Delegates the resolution strategy for missing updates to the current role behavior. @@ -778,7 +1106,7 @@ async def resolve_missing_updates(self): """ logging.info(f"Using Role behavior: {self.rb.get_role_name()} conflict resolve strategy") return await self.rb.resolve_missing_updates() - + async def update_self_role(self): """ Checks whether a role update is required and performs the transition if necessary. @@ -806,7 +1134,7 @@ async def update_self_role(self): logging.info(f"Sending role modification ACK to transferer: {source_to_notificate}") message = self.cm.create_message("control", "leadership_transfer_ack") asyncio.create_task(self.cm.send_message(source_to_notificate, message)) - + async def _learning_cycle(self): """ Main asynchronous loop for executing the Federated Learning process across multiple rounds. @@ -837,9 +1165,10 @@ async def _learning_cycle(self): indent=2, title="Round information", ) - + + await self.rb.before_round_start() await self.update_self_role() - + logging.info(f"Federation nodes: {self.federation_nodes}") await self.update_federation_nodes( await self.cm.get_addrs_current_connections(only_direct=True, myself=True) @@ -851,10 +1180,10 @@ async def _learning_cycle(self): logging.info(f"Expected nodes: {expected_nodes}") direct_connections = await self.cm.get_addrs_current_connections(only_direct=True) undirected_connections = await self.cm.get_addrs_current_connections(only_undirected=True) - + logging.info(f"Direct connections: {direct_connections} | Undirected connections: {undirected_connections}") logging.info(f"[Role {self.rb.get_role_name()}] Starting learning cycle...") - + await self.aggregator.update_federation_nodes(expected_nodes) async with self._role_behavior_performance_lock: await self.rb.extended_learning_cycle() @@ -882,13 +1211,13 @@ async def _learning_cycle(self): self.trainer.on_learning_cycle_end() await self.trainer.test() - + # Shutdown protocol await self._shutdown_protocol() - + async def _shutdown_protocol(self): logging.info("Starting graceful shutdown process...") - + # 1.- Publish Experiment Finish Event to the last update on modules logging.info("Publishing Experiment Finish Event...") efe = ExperimentFinishEvent() diff --git a/nebula/core/models/adultcensus/__init__.py b/nebula/core/models/adultcensus/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/models/adultcensus/mlp.py b/nebula/core/models/adultcensus/mlp.py new file mode 100644 index 000000000..3a7c2595c --- /dev/null +++ b/nebula/core/models/adultcensus/mlp.py @@ -0,0 +1,79 @@ +# nebula/core/models/adultcensus/mlp.py + +import torch + +from nebula.core.models.nebulamodel import NebulaModel + + +class AdultCensusModelMLP(NebulaModel): + """ + Simple MLP for Adult Census (tabular). + - input_dim MUST match the number of features after preprocessing (OneHot + scaling). + - num_classes = 2 (<=50K vs >50K) + """ + def __init__( + self, + input_dim: int = 104, + num_classes: int = 2, + learning_rate: float = 1e-3, + metrics=None, + confusion_matrix=None, + seed=None, + hidden1: int = 256, + hidden2: int = 128, + dropout: float = 0.0, + data_type="Tabular", + ): + # NebulaModel expects something like input_channels first; for tabular we pass input_dim there. + super().__init__(input_dim, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type + + self.config = {"beta1": 0.9, "beta2": 0.999, "amsgrad": True} + + self.example_input_array = torch.rand(1, int(input_dim)) + self.learning_rate = float(learning_rate) + self.criterion = torch.nn.CrossEntropyLoss() + + self.l1 = torch.nn.Linear(int(input_dim), int(hidden1)) + self.l2 = torch.nn.Linear(int(hidden1), int(hidden2)) + self.l3 = torch.nn.Linear(int(hidden2), int(num_classes)) + + self.dropout = torch.nn.Dropout(float(dropout)) if float(dropout) > 0.0 else None + + def forward(self, x: torch.Tensor) -> torch.Tensor: + # Expected: (batch, input_dim). Sometimes: (batch, 1, input_dim) + if x.dim() == 3 and x.size(1) == 1: + x = x.squeeze(1) + + x = self.l1(x) + x = torch.relu(x) + if self.dropout is not None: + x = self.dropout(x) + + x = self.l2(x) + x = torch.relu(x) + if self.dropout is not None: + x = self.dropout(x) + + x = self.l3(x) + return x + + def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) + return optimizer + + def get_learning_rate(self) -> float: + return float(self.learning_rate) + + def count_parameters(self) -> int: + return int(sum(p.numel() for p in self.parameters() if p.requires_grad)) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/breast_cancer/__init__.py b/nebula/core/models/breast_cancer/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/models/breast_cancer/mlp.py b/nebula/core/models/breast_cancer/mlp.py new file mode 100644 index 000000000..27c6a51ba --- /dev/null +++ b/nebula/core/models/breast_cancer/mlp.py @@ -0,0 +1,61 @@ +# nebula/core/models/covtype/mlp.py + +import torch + +from nebula.core.models.nebulamodel import NebulaModel + + +class BreastCancerModelMLP(NebulaModel): + def __init__( + self, + input_dim=30, + num_classes=2, + learning_rate=1e-3, + metrics=None, + confusion_matrix=None, + seed=None, + data_type="Tabular", + ): + super().__init__(input_dim, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type + + self.config = {"beta1": 0.9, "beta2": 0.999, "amsgrad": True} + + self.example_input_array = torch.rand(1, input_dim) + self.learning_rate = learning_rate + self.criterion = torch.nn.CrossEntropyLoss() + + self.l1 = torch.nn.Linear(input_dim, 256) + self.l2 = torch.nn.Linear(256, 128) + self.l3 = torch.nn.Linear(128, num_classes) + + def forward(self, x): + if x.dim() == 3 and x.size(1) == 1: + x = x.squeeze(1) + + x = self.l1(x) + x = torch.relu(x) + x = self.l2(x) + x = torch.relu(x) + x = self.l3(x) + return x + + def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) + return optimizer + + def get_learning_rate(self) -> float: + return float(self.learning_rate) + + def count_parameters(self) -> int: + return int(sum(p.numel() for p in self.parameters() if p.requires_grad)) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/cifar10/cnn.py b/nebula/core/models/cifar10/cnn.py index 473ff3b93..cdd70ddcf 100755 --- a/nebula/core/models/cifar10/cnn.py +++ b/nebula/core/models/cifar10/cnn.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -37,6 +39,10 @@ def forward(self, x): return x def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -45,3 +51,15 @@ def configure_optimizers(self): ) self._optimizer = optimizer return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/cifar10/cnnV2.py b/nebula/core/models/cifar10/cnnV2.py index d10a81996..f5bcb5c6f 100755 --- a/nebula/core/models/cifar10/cnnV2.py +++ b/nebula/core/models/cifar10/cnnV2.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -42,6 +44,10 @@ def forward(self, x): return x def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -49,3 +55,15 @@ def configure_optimizers(self): amsgrad=self.config["amsgrad"], ) return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/cifar10/cnnV3.py b/nebula/core/models/cifar10/cnnV3.py index 94389385c..2aff83dd0 100755 --- a/nebula/core/models/cifar10/cnnV3.py +++ b/nebula/core/models/cifar10/cnnV3.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -69,6 +71,10 @@ def forward(self, x): return x def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -76,3 +82,15 @@ def configure_optimizers(self): amsgrad=self.config["amsgrad"], ) return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/cifar10/fastermobilenet.py b/nebula/core/models/cifar10/fastermobilenet.py index 185587a6c..20ec7704a 100755 --- a/nebula/core/models/cifar10/fastermobilenet.py +++ b/nebula/core/models/cifar10/fastermobilenet.py @@ -13,8 +13,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -58,6 +60,10 @@ def forward(self, x): return x def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -65,3 +71,15 @@ def configure_optimizers(self): amsgrad=self.config["amsgrad"], ) return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/cifar10/resnet.py b/nebula/core/models/cifar10/resnet.py index 98ff9cf9f..255191511 100755 --- a/nebula/core/models/cifar10/resnet.py +++ b/nebula/core/models/cifar10/resnet.py @@ -39,14 +39,16 @@ def __init__( seed=None, implementation="scratch", classifier="resnet9", + data_type="Images", ): super().__init__() + self.data_type = data_type if metrics is None: metrics = MetricCollection([ MulticlassAccuracy(num_classes=num_classes), MulticlassPrecision(num_classes=num_classes), MulticlassRecall(num_classes=num_classes), - MulticlassF1Score(num_classes=num_classes), + MulticlassF1Score(num_classes=num_classes, average="macro"), ]) self.train_metrics = metrics.clone(prefix="Train/") self.val_metrics = metrics.clone(prefix="Validation/") @@ -141,6 +143,10 @@ def forward(self, x): raise NotImplementedError() def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + if self.implementation == "scratch" and self.classifier == "resnet9": params = [] for key, module in self.model.items(): @@ -149,3 +155,15 @@ def configure_optimizers(self): else: optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate, weight_decay=1e-4) return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/cifar10/simplemobilenet.py b/nebula/core/models/cifar10/simplemobilenet.py index d4643a79e..b394a101d 100755 --- a/nebula/core/models/cifar10/simplemobilenet.py +++ b/nebula/core/models/cifar10/simplemobilenet.py @@ -18,8 +18,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -65,5 +67,21 @@ def forward(self, x): return x def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/cifar100/cnn.py b/nebula/core/models/cifar100/cnn.py index fef6a4375..6c2de6b41 100755 --- a/nebula/core/models/cifar100/cnn.py +++ b/nebula/core/models/cifar100/cnn.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = { "lr": 8.0505e-05, @@ -94,9 +96,25 @@ def forward(self, x): return x def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + return torch.optim.Adam( self.parameters(), lr=self.config["lr"], betas=(self.config["beta1"], self.config["beta2"]), amsgrad=self.config["amsgrad"], ) + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/covtype/__init__.py b/nebula/core/models/covtype/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/models/covtype/mlp.py b/nebula/core/models/covtype/mlp.py new file mode 100644 index 000000000..bb93fbc97 --- /dev/null +++ b/nebula/core/models/covtype/mlp.py @@ -0,0 +1,61 @@ +# nebula/core/models/covtype/mlp.py + +import torch + +from nebula.core.models.nebulamodel import NebulaModel + + +class CovtypeModelMLP(NebulaModel): + def __init__( + self, + input_dim=54, + num_classes=7, + learning_rate=1e-3, + metrics=None, + confusion_matrix=None, + seed=None, + data_type="Tabular", + ): + super().__init__(input_dim, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type + + self.config = {"beta1": 0.9, "beta2": 0.999, "amsgrad": True} + + self.example_input_array = torch.rand(1, input_dim) + self.learning_rate = learning_rate + self.criterion = torch.nn.CrossEntropyLoss() + + self.l1 = torch.nn.Linear(input_dim, 256) + self.l2 = torch.nn.Linear(256, 128) + self.l3 = torch.nn.Linear(128, num_classes) + + def forward(self, x): + if x.dim() == 3 and x.size(1) == 1: + x = x.squeeze(1) + + x = self.l1(x) + x = torch.relu(x) + x = self.l2(x) + x = torch.relu(x) + x = self.l3(x) + return x + + def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) + return optimizer + + def get_learning_rate(self) -> float: + return float(self.learning_rate) + + def count_parameters(self) -> int: + return int(sum(p.numel() for p in self.parameters() if p.requires_grad)) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/emnist/cnn.py b/nebula/core/models/emnist/cnn.py index ea4277acb..22bd80a2e 100755 --- a/nebula/core/models/emnist/cnn.py +++ b/nebula/core/models/emnist/cnn.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -49,6 +51,10 @@ def forward(self, x): return logits def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -56,3 +62,15 @@ def configure_optimizers(self): amsgrad=self.config["amsgrad"], ) return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/emnist/mlp.py b/nebula/core/models/emnist/mlp.py index b5f93f56a..4887165fc 100755 --- a/nebula/core/models/emnist/mlp.py +++ b/nebula/core/models/emnist/mlp.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -35,6 +37,22 @@ def forward(self, x): x = self.l3(x) return x + def get_learning_rate(self): + return self.learning_rate + def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) return optimizer + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/fashionmnist/cnn.py b/nebula/core/models/fashionmnist/cnn.py index 5e1471f93..bef3d1eca 100755 --- a/nebula/core/models/fashionmnist/cnn.py +++ b/nebula/core/models/fashionmnist/cnn.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -49,6 +51,10 @@ def forward(self, x): return logits def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -56,3 +62,15 @@ def configure_optimizers(self): amsgrad=self.config["amsgrad"], ) return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/fashionmnist/mlp.py b/nebula/core/models/fashionmnist/mlp.py index bd4159b03..ac289c7d5 100755 --- a/nebula/core/models/fashionmnist/mlp.py +++ b/nebula/core/models/fashionmnist/mlp.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -35,6 +37,22 @@ def forward(self, x): x = self.l3(x) return x + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) return optimizer + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/kddcup99/__init__.py b/nebula/core/models/kddcup99/__init__.py new file mode 100755 index 000000000..e69de29bb diff --git a/nebula/core/models/kddcup99/mlp.py b/nebula/core/models/kddcup99/mlp.py new file mode 100644 index 000000000..2de38af46 --- /dev/null +++ b/nebula/core/models/kddcup99/mlp.py @@ -0,0 +1,61 @@ +import torch + +from nebula.core.models.nebulamodel import NebulaModel + + +class KDDCUP99ModelMLP(NebulaModel): + def __init__( + self, + input_channels=1, + num_classes=2, + learning_rate=1e-3, + metrics=None, + confusion_matrix=None, + seed=None, + input_size=118, + data_type="Tabular", + ): + super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type + + self.input_size = input_size + self.example_input_array = torch.zeros(1, self.input_size) + self.learning_rate = learning_rate + self.criterion = torch.nn.CrossEntropyLoss() + + self.l1 = torch.nn.Linear(self.input_size, 256) + self.l2 = torch.nn.Linear(256, 128) + self.l3 = torch.nn.Linear(128, num_classes) + + def forward(self, x): + if x.dim() == 1: + x = x.unsqueeze(0) + + x = x.view(x.size(0), -1) + x = self.l1(x) + x = torch.relu(x) + x = self.l2(x) + x = torch.relu(x) + x = self.l3(x) + return x + + def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) + self._optimizer = optimizer + return optimizer + + def get_learning_rate(self): + return self.learning_rate + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/mnist/cnn.py b/nebula/core/models/mnist/cnn.py index 7cec6b6c3..94bdcbdc5 100755 --- a/nebula/core/models/mnist/cnn.py +++ b/nebula/core/models/mnist/cnn.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -46,6 +48,10 @@ def forward(self, x): return logits def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -54,3 +60,15 @@ def configure_optimizers(self): ) self._optimizer = optimizer return optimizer + + def count_parameters(self): + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_learning_rate(self): + return self.learning_rate + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/mnist/mlp.py b/nebula/core/models/mnist/mlp.py index 64a0b1da9..f316dc110 100755 --- a/nebula/core/models/mnist/mlp.py +++ b/nebula/core/models/mnist/mlp.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Images", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.example_input_array = torch.zeros(1, 1, 28, 28) self.learning_rate = learning_rate @@ -33,12 +35,22 @@ def forward(self, x): return x def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) self._optimizer = optimizer return optimizer - + def get_learning_rate(self): return self.learning_rate def count_parameters(self): - return sum(p.numel() for p in self.parameters() if p.requires_grad) \ No newline at end of file + return sum(p.numel() for p in self.parameters() if p.requires_grad) + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/nebulamodel.py b/nebula/core/models/nebulamodel.py index 66aac2db5..5973f1518 100755 --- a/nebula/core/models/nebulamodel.py +++ b/nebula/core/models/nebulamodel.py @@ -83,6 +83,26 @@ def log_metrics_end(self, phase): f"{phase}/{key.replace('Multiclass', '').split('/')[-1]}": value.detach() for key, value in output.items() } + output_values = { + key: float(value.detach().cpu().item()) for key, value in output.items() + } + + if phase == "Train": + self._latest_train_metrics = output_values + + if phase == "Validation": + self._latest_validation_metrics = output_values + + if phase in {"Test", "Test (Local)"}: + self._latest_test_metrics = output_values + + if phase == "Train" and self._train_extra_metrics: + output.update({ + f"{phase}/{key}": torch.tensor(value["sum"] / value["count"], device=self.device) + for key, value in self._train_extra_metrics.items() + if value["count"] > 0 + }) + self.logger.log_data(output, step=self.global_number[phase]) metrics_str = "" @@ -140,7 +160,6 @@ def generate_confusion_matrix(self, phase, print_cm=False, plot_cm=False): del cm_numpy, classes, fig, ax - # Restablecer la matriz de confusión if phase == "Test (Local)": self.cm.reset() else: @@ -168,7 +187,7 @@ def __init__( MulticlassAccuracy(num_classes=num_classes), MulticlassPrecision(num_classes=num_classes), MulticlassRecall(num_classes=num_classes), - MulticlassF1Score(num_classes=num_classes), + MulticlassF1Score(num_classes=num_classes, average="macro"), ]) self.train_metrics = metrics.clone(prefix="Train/") self.val_metrics = metrics.clone(prefix="Validation/") @@ -199,6 +218,33 @@ def __init__( self._current_loss = -1 self._optimizer = None + self._optimizer_override = None + self._latest_train_metrics = {} + self._latest_validation_metrics = {} + self._latest_test_metrics = {} + self._train_extra_metrics = {} + + # DP trainers update these fields after querying the Opacus accountant. + self.dp_enabled = False + self.dp_epsilon = None + self.dp_delta = None + self.adversarial_training = None + + def set_optimizer_override(self, optimizer): + self._optimizer_override = optimizer + self._optimizer = optimizer + + def clear_optimizer_override(self): + self._optimizer_override = None + + def get_optimizer_override(self): + return self._optimizer_override + + def set_adversarial_training(self, adversarial_training): + self.adversarial_training = adversarial_training + + def clear_adversarial_training(self): + self.adversarial_training = None def set_communication_manager(self, communication_manager): self.communication_manager = communication_manager @@ -221,16 +267,53 @@ def configure_optimizers(self): def step(self, batch, batch_idx, phase): """Training/validation/test step.""" x, y = batch - y_pred = self.forward(x) - loss = self.criterion(y_pred, y) + extra_metrics = {} + if phase == "Train" and self.adversarial_training is not None: + loss, y_pred, extra_metrics = self.adversarial_training.compute_training_step( + self, + x, + y, + self.criterion, + ) + else: + y_pred = self.forward(x) + loss = self.criterion(y_pred, y) + self.process_metrics(phase, y_pred, y, loss) + if phase == "Train" and extra_metrics: + self._log_training_extra_metrics(extra_metrics) self._current_loss = loss return loss + def _log_training_extra_metrics(self, metrics): + if self.logger is None: + return + detached_metrics = {key: value.detach() for key, value in metrics.items()} + for key, value in detached_metrics.items(): + metric = self._train_extra_metrics.setdefault(key, {"sum": 0.0, "count": 0}) + metric["sum"] += float(value.cpu().item()) + metric["count"] += 1 + self.logger.log_data({f"Train/{key}": value for key, value in detached_metrics.items()}) + def get_loss(self): return self._current_loss + def get_latest_validation_metrics(self): + return self._latest_validation_metrics + + def get_latest_train_metrics(self): + return self._latest_train_metrics + + def get_latest_test_metrics(self): + return self._latest_test_metrics + + def get_latest_train_accuracy(self): + return self._latest_train_metrics.get("Train/Accuracy") + + def get_latest_test_macro_f1(self): + return self._latest_test_metrics.get("Test (Local)/F1Score") + def modify_learning_rate(self, new_lr): logging.info(f"Modifiying | learning rate, new value: {new_lr}") self.learning_rate = new_lr @@ -270,6 +353,7 @@ def on_train_end(self): def on_train_epoch_end(self): self.log_metrics_end("Train") self.train_metrics.reset() + self._train_extra_metrics = {} self.global_number["Train"] += 1 def validation_step(self, batch, batch_idx): @@ -306,7 +390,7 @@ def test_step(self, batch, batch_idx, dataloader_idx=None): loss = self.criterion(y_pred, y) y_pred_classes = torch.argmax(y_pred, dim=1) accuracy = torch.mean((y_pred_classes == y).float()) - + if dataloader_idx == 0: self.log(f"val_loss", loss, on_epoch=True, prog_bar=False) self.log(f"val_accuracy", accuracy, on_epoch=True, prog_bar=False) @@ -346,6 +430,7 @@ def on_train_end(self): def on_train_epoch_end(self): self.log_metrics_end("Train") self.train_metrics.reset() + self._train_extra_metrics = {} # NebulaModel registers training rounds # NebulaModelStandalone register the global number of epochs instead of rounds self.global_number["Train"] += 1 diff --git a/nebula/core/models/sentiment140/cnn.py b/nebula/core/models/sentiment140/cnn.py index 87541aa05..f5c2d9d46 100755 --- a/nebula/core/models/sentiment140/cnn.py +++ b/nebula/core/models/sentiment140/cnn.py @@ -14,8 +14,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Tabular", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} self.example_input_array = torch.zeros(1, 1, 28, 28) @@ -47,6 +49,10 @@ def forward(self, x): return out def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam( self.parameters(), lr=self.learning_rate, @@ -54,3 +60,9 @@ def configure_optimizers(self): amsgrad=self.config["amsgrad"], ) return optimizer + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/models/sentiment140/rnn.py b/nebula/core/models/sentiment140/rnn.py index cfbea66cf..d02b1e76e 100755 --- a/nebula/core/models/sentiment140/rnn.py +++ b/nebula/core/models/sentiment140/rnn.py @@ -12,8 +12,10 @@ def __init__( metrics=None, confusion_matrix=None, seed=None, + data_type="Tabular", ): super().__init__(input_channels, num_classes, learning_rate, metrics, confusion_matrix, seed) + self.data_type = data_type self.config = {"beta1": 0.851436, "beta2": 0.999689, "amsgrad": True} @@ -53,5 +55,15 @@ def forward(self, x): return out def configure_optimizers(self): + optimizer_override = self.get_optimizer_override() + if optimizer_override is not None: + return optimizer_override + optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate) return optimizer + + def get_num_classes(self): + return self.num_classes + + def get_data_type(self): + return self.data_type diff --git a/nebula/core/nebulaevents.py b/nebula/core/nebulaevents.py index f2ec08835..583f8facd 100644 --- a/nebula/core/nebulaevents.py +++ b/nebula/core/nebulaevents.py @@ -5,7 +5,7 @@ class AddonEvent(ABC): """ Abstract base class for all addon-related events in the system. """ - + @abstractmethod async def get_event_data(self): """ @@ -21,7 +21,7 @@ class NodeEvent(ABC): """ Abstract base class for all node-related events in the system. """ - + @abstractmethod async def get_event_data(self): """ @@ -52,7 +52,7 @@ class MessageEvent: source (str): Address or identifier of the message sender. message (Any): The actual message payload. """ - + def __init__(self, message_type, source, message): """ Initializes a MessageEvent instance. @@ -264,7 +264,7 @@ async def get_event_data(self) -> tuple[str, bool]: async def is_concurrent(self) -> bool: return True - + class ModelPropagationEvent(NodeEvent): def __init__(self, eligible_neighbors, strategy): """Event triggered when model propagation is ready. @@ -275,7 +275,7 @@ def __init__(self, eligible_neighbors, strategy): """ self.eligible_neighbors = eligible_neighbors self._strategy = strategy - + def __str__(self): return f"Model propagation event, strategy: {self._strategy}" @@ -291,12 +291,15 @@ async def get_event_data(self) -> tuple[set, str]: return (self.eligible_neighbors, self._strategy) async def is_concurrent(self) -> bool: - return False - + return False + class UpdateReceivedEvent(NodeEvent): - def __init__(self, decoded_model, weight, source, round, local=False): + FEDERATION_UPDATE = "federation" + REPUTATION_UPDATE = "reputation" + + def __init__(self, decoded_model, weight, source, round, local=False, update_type=FEDERATION_UPDATE): """ Initializes an UpdateReceivedEvent. @@ -306,12 +309,15 @@ def __init__(self, decoded_model, weight, source, round, local=False): source (str): The identifier or address of the node that sent the update. round (int): The round number in which the update was received. local (bool): Local update + update_type (str): Semantic channel for this update. Federation updates feed aggregation; + reputation updates only feed reputation metrics. """ self._source = source self._round = round self._model = decoded_model self._weight = weight self._local = local + self._update_type = update_type def __str__(self): return f"Update received from source: {self._source}, round: {self._round}" @@ -330,6 +336,12 @@ async def get_event_data(self) -> tuple[object, int, str, int, bool]: """ return (self._model, self._weight, self._source, self._round, self._local) + async def get_update_type(self) -> str: + return self._update_type + + def is_reputation_update(self) -> bool: + return self._update_type == self.REPUTATION_UPDATE + async def is_concurrent(self) -> bool: return False @@ -362,7 +374,7 @@ async def get_event_data(self) -> tuple[str, tuple[float, float]]: async def is_concurrent(self) -> bool: return True - + class DuplicatedMessageEvent(NodeEvent): """ Event triggered when a message is received that has already been processed. @@ -370,7 +382,7 @@ class DuplicatedMessageEvent(NodeEvent): Attributes: source (str): The address of the node that sent the duplicated message. """ - + def __init__(self, source: str, message_type: str): self.source = source @@ -396,7 +408,7 @@ class GPSEvent(AddonEvent): Attributes: distances (dict): A dictionary mapping node addresses to their respective distances. """ - + def __init__(self, distances: dict): """ Initializes a GPSEvent. @@ -427,7 +439,7 @@ class ChangeLocationEvent(AddonEvent): latitude (float): New latitude of the node. longitude (float): New longitude of the node. """ - + def __init__(self, latitude, longitude): """ Initializes a ChangeLocationEvent. @@ -450,14 +462,28 @@ async def get_event_data(self): tuple: A tuple containing latitude and longitude. """ return (self.latitude, self.longitude) - + class TestMetricsEvent(AddonEvent): - def __init__(self, loss, accuracy): + def __init__(self, loss, accuracy, macro_f1=None): self._loss = loss self._accuracy = accuracy + self._macro_f1 = macro_f1 def __str__(self): return "TestMetricsEvent" async def get_event_data(self): - return (self._loss, self._accuracy) + return (self._loss, self._accuracy, self._macro_f1) + + +class ValidationMetricsEvent(AddonEvent): + def __init__(self, loss, accuracy, train_accuracy=None): + self._loss = loss + self._accuracy = accuracy + self._train_accuracy = train_accuracy + + def __str__(self): + return "ValidationMetricsEvent" + + async def get_event_data(self): + return (self._loss, self._accuracy, self._train_accuracy) diff --git a/nebula/core/network/actions.py b/nebula/core/network/actions.py index 77e1997c5..98d8c93f8 100644 --- a/nebula/core/network/actions.py +++ b/nebula/core/network/actions.py @@ -83,6 +83,35 @@ class ReputationAction(Enum): SHARE = nebula_pb2.ReputationMessage.Action.SHARE +class ReputationtableAction(Enum): + """ + Enum for full reputation table exchange messages in SDFL. + """ + + TABLE = nebula_pb2.ReputationtableMessage.Action.TABLE + +class TrustworthinessAction(Enum): + """ + Enum for trustworthiness exchange messages in the federation. + """ + + REPORT = nebula_pb2.TrustworthinessMessage.Action.REPORT + +class TrustscoresAction(Enum): + """ + Enum for trustworthiness scores exchange messages in the federation. + """ + + SHARE = nebula_pb2.TrustscoresMessage.Action.SHARE + +class SdflmodelAction(Enum): + """ + Enum for SDFL model messages exchanged through broadcast/forwarding. + """ + + TRAINER_UPDATE = nebula_pb2.SdflmodelMessage.Action.TRAINER_UPDATE + GLOBAL_MODEL = nebula_pb2.SdflmodelMessage.Action.GLOBAL_MODEL + # Mapping between message type strings and their corresponding Enum classes ACTION_CLASSES = { @@ -94,6 +123,10 @@ class ReputationAction(Enum): "offer": OfferAction, "link": LinkAction, "reputation": ReputationAction, + "reputationtable": ReputationtableAction, + "trustworthiness": TrustworthinessAction, + "trustscores": TrustscoresAction, + "sdflmodel": SdflmodelAction, } diff --git a/nebula/core/network/communications.py b/nebula/core/network/communications.py index e0b1c17a5..a6cd861a0 100755 --- a/nebula/core/network/communications.py +++ b/nebula/core/network/communications.py @@ -21,7 +21,7 @@ BLACKLIST_EXPIRATION_TIME = 60 -_COMPRESSED_MESSAGES = ["model", "offer_model"] +_COMPRESSED_MESSAGES = ["model", "offer_model", "sdflmodel"] class CommunicationsManager: @@ -854,7 +854,7 @@ async def send_message_to_neighbors(self, message, neighbors=None, interval=0): if interval > 0: await asyncio.sleep(interval) - async def send_message(self, dest_addr, message, message_type=""): + async def send_message(self, dest_addr, message, message_type="", allow_after_learning_finished = False,): """ Sends a message to a specific destination address, with optional compression for large messages. @@ -868,7 +868,7 @@ async def send_message(self, dest_addr, message, message_type=""): try: if dest_addr in self.connections: conn = self.connections[dest_addr] - await conn.send(data=message) + await conn.send(data=message, allow_after_learning_finished=allow_after_learning_finished) except Exception as e: logging.exception(f"❗️ Cannot send message {message} to {dest_addr}. Error: {e!s}") await self.disconnect(dest_addr, mutual_disconnection=False) @@ -879,7 +879,7 @@ async def send_message(self, dest_addr, message, message_type=""): if conn is None: logging.info(f"❗️ Connection with {dest_addr} not found") return - await conn.send(data=message, is_compressed=True) + await conn.send(data=message, is_compressed=True, allow_after_learning_finished=allow_after_learning_finished) except Exception as e: logging.exception(f"❗️ Cannot send model to {dest_addr}: {e!s}") await self.disconnect(dest_addr, mutual_disconnection=False) diff --git a/nebula/core/network/connection.py b/nebula/core/network/connection.py index 6ba60749b..578907572 100755 --- a/nebula/core/network/connection.py +++ b/nebula/core/network/connection.py @@ -338,6 +338,7 @@ async def send( pb: bool = True, encoding_type: str = "utf-8", is_compressed: bool = False, + allow_after_learning_finished: bool = False, ) -> None: """ Sends data over the active connection. @@ -359,10 +360,13 @@ async def send( return # Check if learning cycle has finished - don't send messages - if await self.cm.learning_finished(): + if not allow_after_learning_finished and await self.cm.learning_finished(): logging.info(f"Not sending message to {self.addr} because learning cycle has finished") return + if await self.cm.learning_finished() and allow_after_learning_finished: + logging.info(f"Sending message to {self.addr} after learning cycle finished (allowed)") + try: message_id = uuid.uuid4().bytes data_prefix, encoded_data = self._prepare_data(data, pb, encoding_type) diff --git a/nebula/core/network/forwarder.py b/nebula/core/network/forwarder.py index 86ce75536..9eccc15fe 100755 --- a/nebula/core/network/forwarder.py +++ b/nebula/core/network/forwarder.py @@ -3,6 +3,7 @@ import time from nebula.addons.functions import print_msg_box +from nebula.core.pb import nebula_pb2 from nebula.core.utils.locker import Locker @@ -114,12 +115,17 @@ async def process_pending_messages(self, messages_left): """ while messages_left > 0 and not self.pending_messages.empty(): msg, neighbors = await self.pending_messages.get() + allow_after_learning_finished = self._allow_forward_after_learning_finished(msg) for neighbor in neighbors[:messages_left]: if neighbor not in self.cm.connections: continue try: logging.debug(f"🔁 Sending message (forwarding) --> to {neighbor}") - await self.cm.send_message(neighbor, msg) + await self.cm.send_message( + neighbor, + msg, + allow_after_learning_finished=allow_after_learning_finished, + ) except Exception as e: logging.exception(f"🔁 Error forwarding message to {neighbor}. Error: {e!s}") pass @@ -129,6 +135,24 @@ async def process_pending_messages(self, messages_left): logging.debug("🔁 Putting message back in queue for forwarding to the remaining neighbors") await self.pending_messages.put((msg, neighbors[messages_left:])) + def _allow_forward_after_learning_finished(self, msg: bytes) -> bool: + try: + message_wrapper = nebula_pb2.Wrapper() + message_wrapper.ParseFromString(msg) + message_type = message_wrapper.WhichOneof("message") + if message_type == "trustscores_message": + return True + if message_type == "sdflmodel_message": + # Trainers may finish their local cycle before the forwarded global model arrives. + return message_wrapper.sdflmodel_message.action == nebula_pb2.SdflmodelMessage.Action.GLOBAL_MODEL + if message_type == "reputationtable_message": + # SDFL reputation tables can be forwarded while the aggregator is waiting. + return True + return False + except Exception as e: + logging.warning(f"🔁 Could not inspect forwarded message type: {e!s}") + return False + async def forward(self, msg, addr_from): """ Enqueue a received message for forwarding to all other direct neighbors. diff --git a/nebula/core/network/messages.py b/nebula/core/network/messages.py index 7870acddf..29e7088bc 100644 --- a/nebula/core/network/messages.py +++ b/nebula/core/network/messages.py @@ -86,14 +86,68 @@ def _define_message_templates(self): "weight": 1, }, }, + "sdflmodel": { + # SDFL uses a dedicated model channel for forwarded trainer/global updates. + "parameters": ["action", "target", "parameters", "weight", "round", "node_id"], + "defaults": { + "weight": 1, + "node_id": self.addr, + }, + }, "reputation": { "parameters": ["node_id", "score", "round", "action"], "defaults": { "round": None, }, }, + "reputationtable": { + # Reputation tables carry one-hop trust scores for SDFL indirect reputation. + "parameters": ["action", "node_id", "round", "reputation_table_json"], + "defaults": { + "node_id": self.addr, + "round": None, + "reputation_table_json": "{}", + }, + }, "discover": {"parameters": ["action"], "defaults": {}}, "link": {"parameters": ["action", "addrs"], "defaults": {}}, + "trustworthiness": { + "parameters": [ + "action", + "node_id", + "bytes_sent", + "bytes_recv", + "accuracy", + "loss", + "role", + "energy_grid", + "emissions", + "workload", + "cpu_model", + "gpu_model", + "cpu_used", + "gpu_used", + "energy_consumed", + "sample_size", + "class_imbalance", + "model_size", + "local_entropy", + "val_accuracy", + "dp_enabled", + "dp_epsilon", + "macro_f1", + "train_accuracy" + ], + "defaults": {}, + }, + "trustscores": { + "parameters": [ + "action", + "node_id", + "trust_report_json" + ], + "defaults": {}, + } # Add additional message types here } @@ -122,7 +176,14 @@ async def process_message(self, data, addr_from): addr_from (str): Address from which the message was received. """ not_processing_messages = {"control_message", "connection_message"} - special_processing_messages = {"discovery_message", "federation_message", "model_message"} + special_processing_messages = { + "discovery_message", + "federation_message", + "model_message", + "trustscores_message", + "sdflmodel_message", + "reputationtable_message", + } try: message_wrapper = nebula_pb2.Wrapper() @@ -201,6 +262,18 @@ def _should_forward_message(self, message_type, message_wrapper): == nebula_pb2.FederationMessage.Action.Value("FEDERATION_START") ): return True + if message_type == "trustscores_message": + return True + + if self.cm.config.participant["scenario_args"]["federation"] == "SDFL" and message_type == "sdflmodel_message": + # SDFL model messages must still flow after the generic learning-finished gate. + return True + if ( + self.cm.config.participant["scenario_args"]["federation"] == "SDFL" + and message_type == "reputationtable_message" + ): + # Reputation tables can arrive late while aggregation is waiting for trust evidence. + return True def create_message(self, message_type: str, action: str = "", *args, **kwargs): """ diff --git a/nebula/core/network/propagator.py b/nebula/core/network/propagator.py index 717ea5f94..b3fa18de6 100755 --- a/nebula/core/network/propagator.py +++ b/nebula/core/network/propagator.py @@ -308,7 +308,7 @@ async def _propagate(self, mpe: ModelPropagationEvent): bool: True if propagation occurred (payload sent), False if halted early. """ eligible_neighbors, strategy_id = await mpe.get_event_data() - + self.reset_status_history() if strategy_id not in self.strategies: logging.info(f"Strategy {strategy_id} not found.") @@ -344,6 +344,7 @@ async def _propagate(self, mpe: ModelPropagationEvent): current_round = await self.get_round() round_number = -1 if strategy_id == "initialization" else current_round + await asyncio.sleep(10) parameters = serialized_model message = self.cm.create_message("model", "", round_number, parameters, weight) for neighbor_addr in eligible_neighbors: diff --git a/nebula/core/node.py b/nebula/core/node.py index 86a73cc2a..772c9d832 100755 --- a/nebula/core/node.py +++ b/nebula/core/node.py @@ -19,12 +19,18 @@ import logging from collections import Counter +from nebula.addons.defenses.adversarial_training import apply_adversarial_training_if_enabled +from nebula.addons.defenses.feature_squeezing import apply_feature_squeezing_if_enabled from nebula.config.config import Config from nebula.core.datasets.cifar10.cifar10 import CIFAR10PartitionHandler from nebula.core.datasets.cifar100.cifar100 import CIFAR100PartitionHandler from nebula.core.datasets.datamodule import DataModule from nebula.core.datasets.emnist.emnist import EMNISTPartitionHandler from nebula.core.datasets.fashionmnist.fashionmnist import FashionMNISTPartitionHandler +from nebula.core.datasets.covtype.covtype import CovtypePartitionHandler +from nebula.core.datasets.kddcup99.kddcup99 import KDDCUP99PartitionHandler +from nebula.core.datasets.adultcensus.adultcensus import AdultCensusPartitionHandler +from nebula.core.datasets.breast_cancer.breast_cancer import BreastCancerPartitionHandler from nebula.core.datasets.mnist.mnist import MNISTPartitionHandler from nebula.core.datasets.nebuladataset import NebulaPartition from nebula.core.models.cifar10.cnn import CIFAR10ModelCNN @@ -38,10 +44,15 @@ from nebula.core.models.emnist.mlp import EMNISTModelMLP from nebula.core.models.fashionmnist.cnn import FashionMNISTModelCNN from nebula.core.models.fashionmnist.mlp import FashionMNISTModelMLP +from nebula.core.models.covtype.mlp import CovtypeModelMLP +from nebula.core.models.kddcup99.mlp import KDDCUP99ModelMLP +from nebula.core.models.adultcensus.mlp import AdultCensusModelMLP +from nebula.core.models.breast_cancer.mlp import BreastCancerModelMLP from nebula.core.models.mnist.cnn import MNISTModelCNN from nebula.core.models.mnist.mlp import MNISTModelMLP from nebula.core.engine import Engine from nebula.core.training.lightning import Lightning +from nebula.core.training.lightning_dp import LightningDP from nebula.core.training.siamese import Siamese # os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" @@ -112,6 +123,34 @@ async def main(config: Config): model = FashionMNISTModelCNN() else: raise ValueError(f"Model {model} not supported for dataset {dataset_name}") + elif dataset_name == "Covtype": + batch_size = 32 + handler = CovtypePartitionHandler + if model_name == "MLP": + model = CovtypeModelMLP() + else: + raise ValueError(f"Model {model} not supported for dataset {dataset_name}") + elif dataset_name == "KDDCUP99": + batch_size = 32 + handler = KDDCUP99PartitionHandler + if model_name == "MLP": + model = KDDCUP99ModelMLP() + else: + raise ValueError(f"Model {model} not supported for dataset {dataset_name}") + elif dataset_name == "AdultCensus": + batch_size = 32 + handler = AdultCensusPartitionHandler + if model_name == "MLP": + model = AdultCensusModelMLP() + else: + raise ValueError(f"Model {model} not supported for dataset {dataset_name}") + elif dataset_name == "BreastCancer": + batch_size = 32 + handler = BreastCancerPartitionHandler + if model_name == "MLP": + model = BreastCancerModelMLP() + else: + raise ValueError(f"Model {model} not supported for dataset {dataset_name}") elif dataset_name == "EMNIST": batch_size = 32 handler = EMNISTPartitionHandler @@ -150,6 +189,8 @@ async def main(config: Config): dataset = NebulaPartition(handler=handler, config=config) dataset.load_partition() + apply_feature_squeezing_if_enabled(dataset, config.participant) + apply_adversarial_training_if_enabled(model, config.participant, dataset) dataset.log_partition() samples_per_label = Counter(dataset.get_train_labels()) @@ -167,8 +208,13 @@ async def main(config: Config): trainer = None trainer_str = config.participant["training_args"]["trainer"] + dp_enabled = config.participant["training_args"]["dp"]["enabled"] if trainer_str == "lightning": - trainer = Lightning + # DP is implemented as a Lightning-specific trainer wrapper around Opacus. + if dp_enabled: + trainer = LightningDP + else: + trainer = Lightning elif trainer_str == "scikit": raise NotImplementedError elif trainer_str == "siamese": diff --git a/nebula/core/noderole.py b/nebula/core/noderole.py index 9bd258fef..3d7f68456 100644 --- a/nebula/core/noderole.py +++ b/nebula/core/noderole.py @@ -7,7 +7,6 @@ from nebula.core.utils.locker import Locker from nebula.core.eventmanager import EventManager from nebula.core.nebulaevents import UpdateReceivedEvent, ModelPropagationEvent -import random from enum import Enum from abc import ABC, abstractmethod from typing import TYPE_CHECKING @@ -32,7 +31,7 @@ class Role(Enum): IDLE = "idle" SERVER = "server" MALICIOUS = "malicious" - + def factory_node_role(role: str) -> Role: if role == "trainer": return Role.TRAINER @@ -68,27 +67,27 @@ def __init__(self): self._next_role: Role = None self._next_role_locker = Locker("next_role_locker", async_lock=True) self._source_to_notificate = None - + @abstractmethod def get_role(self): """ Returns the Role enum value representing the current role of the node. """ raise NotImplementedError - + @abstractmethod def get_role_name(self, effective=False): """ Returns a string representation of the current role. - + Args: effective (bool): Whether to return the name of the current effective role when going as malicious. - + Returns: str: Name of the role. """ raise NotImplementedError - + @abstractmethod async def extended_learning_cycle(self): """ @@ -98,19 +97,19 @@ async def extended_learning_cycle(self): including training, aggregating updates, and coordinating with neighbors. """ raise NotImplementedError - + @abstractmethod async def select_nodes_to_wait(self): """ Determines which neighbors the node should wait for during the current cycle. This logic varies depending on whether the node is an aggregator, trainer, or other role. - + Returns: Set[Any]: A set of neighbor node identifiers to wait for. """ raise NotImplementedError - + @abstractmethod async def resolve_missing_updates(self): """ @@ -118,16 +117,16 @@ async def resolve_missing_updates(self): For example, an aggregator might default to a fresh model, while a trainer might proceed with its own local model. - + Returns: Any: The resolution outcome depending on the role's specific logic. """ raise NotImplementedError - + async def set_next_role(self, role: Role, source_to_notificate = None): """ Schedules a role change and optionally stores the source to notify upon completion. - + Args: role (Role): The new role to transition to. source_to_notificate (Optional[Any]): Identifier of the node that triggered the change. @@ -135,7 +134,7 @@ async def set_next_role(self, role: Role, source_to_notificate = None): async with self._next_role_locker: self._next_role = role self._source_to_notificate = source_to_notificate - + async def get_next_role(self) -> Role: """ Retrieves and clears the next role value. @@ -147,7 +146,7 @@ async def get_next_role(self) -> Role: next_role = self._next_role self._next_role = None return next_role - + async def get_source_to_notificate(self): """ Retrieves and clears the stored source to notify after a role change. @@ -159,7 +158,7 @@ async def get_source_to_notificate(self): source_to_notificate = self._source_to_notificate self._source_to_notificate = None return source_to_notificate - + async def update_role_needed(self): """ Checks whether a role update is scheduled. @@ -170,12 +169,16 @@ async def update_role_needed(self): async with self._next_role_locker: updt_needed = self._next_role != None return updt_needed - + + async def before_round_start(self): + """Hook for role-specific work before a round starts.""" + return None + """ ############################## # MALICIOUS BEHAVIOR # ############################## """ - + class MaliciousRoleBehavior(RoleBehavior): def __init__(self, engine: Engine, config: Config): super().__init__() @@ -193,28 +196,31 @@ def __init__(self, engine: Engine, config: Config): benign_role = self._config.participant["adversarial_args"]["fake_behavior"] self._fake_role_behavior = factory_role_behavior(benign_role, self._engine, self._config) self._role = factory_node_role("malicious") - + def get_role(self): return self._role - + def get_role_name(self, effective=False): if effective: return self._fake_role_behavior.get_role_name() return f"{self._role.value} as {self._fake_role_behavior.get_role_name()}" - - async def extended_learning_cycle(self): + + async def extended_learning_cycle(self): try: await self.attack.attack() except Exception: attack_name = self._config.participant["adversarial_args"]["attacks"] logging.exception(f"Attack {attack_name} failed") - + await self._fake_role_behavior.extended_learning_cycle() - + + async def before_round_start(self): + await self._fake_role_behavior.before_round_start() + async def select_nodes_to_wait(self): nodes = await self._fake_role_behavior.select_nodes_to_wait() return nodes - + async def resolve_missing_updates(self): return await self._fake_role_behavior.resolve_missing_updates() @@ -222,20 +228,20 @@ async def resolve_missing_updates(self): # TRAINER AGGREGATOR BEHAVIOR # ############################### """ - + class TrainerAggregatorRoleBehavior(RoleBehavior): def __init__(self, engine: Engine, config: Config): super().__init__() self._engine = engine self._config = config self._role = factory_node_role("trainer_aggregator") - + def get_role(self): - return self._role - + return self._role + def get_role_name(self, effective=False): return self._role.value - + async def extended_learning_cycle(self): await self._engine.trainer.test() await self._engine.trainning_in_progress_lock.acquire_async() @@ -249,13 +255,13 @@ async def extended_learning_cycle(self): mpe = ModelPropagationEvent(await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False), "stable") await EventManager.get_instance().publish_node_event(mpe) - + await self._engine._waiting_model_updates() - + async def select_nodes_to_wait(self): nodes = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=True) return nodes - + async def resolve_missing_updates(self): return {} @@ -263,7 +269,7 @@ async def resolve_missing_updates(self): # AGGREGATOR BEHAVIOR # ############################## """ - + class AggregatorRoleBehavior(RoleBehavior): def __init__(self, engine: Engine, config: Config): super().__init__() @@ -271,70 +277,185 @@ def __init__(self, engine: Engine, config: Config): self._config = config self._role = factory_node_role("aggregator") self._transfer_send = False - + def get_role(self): - return self._role - + return self._role + def get_role_name(self, effective=False): return self._role.value - + async def extended_learning_cycle(self): await self._engine.trainer.test() - + await self._engine._waiting_model_updates() - + mpe = ModelPropagationEvent(await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False), "stable") await EventManager.get_instance().publish_node_event(mpe) - - # Transfer leadership + + await self._transfer_leadership() + + async def _transfer_leadership(self): + if self._engine.round >= self._engine.total_rounds - 1: + logging.info( + f"Skipping leadership transfer in final round {self._engine.round} " + f"of {self._engine.total_rounds - 1}" + ) + return + neighbors = await self._engine.cm.get_addrs_current_connections(myself=False) if len(neighbors) and not self._transfer_send: - random_neighbor = random.choice(list(neighbors)) + successor = await self._engine.select_leadership_successor(neighbors) + if successor is None: + return lt_message = self._engine.cm.create_message("control", "leadership_transfer") - logging.info(f"Sending transfer leadership to: {random_neighbor}") - asyncio.create_task(self._engine.cm.send_message(random_neighbor, lt_message)) + logging.info(f"Sending transfer leadership to: {successor}") + await self._before_leadership_transfer(successor) + asyncio.create_task(self._engine.cm.send_message(successor, lt_message)) + await self._engine.register_leadership_transfer(successor) self._transfer_send = True - + + async def _before_leadership_transfer(self, successor): + return None + async def select_nodes_to_wait(self): nodes = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) return nodes - + async def resolve_missing_updates(self): return (self._engine.trainer.get_model_parameters(), self._engine.trainer.BYPASS_MODEL_WEIGHT) - + + +class SDFLRoleMixin: + async def _send_reputation_model_update(self): + # SDFL reputation evaluates direct neighbors from the latest local model update. + model_params = self._engine.trainer.get_model_parameters() + serialized_model = ( + model_params + if isinstance(model_params, bytes) + else self._engine.trainer.serialize_model(model_params) + ) + + message = self._engine.cm.create_message( + "model", + round=self._engine.round, + parameters=serialized_model, + weight=self._engine.trainer.get_model_weight(), + ) + + neighbors = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) + if not neighbors: + logging.info("SDFL reputation | No direct neighbors to send model/update") + return + + # Reputation model updates use the regular model channel and stay one-hop local. + logging.info(f"SDFL reputation | Broadcasting model/update to direct neighbors: {neighbors}") + await asyncio.gather( + *[ + asyncio.create_task(self._engine.cm.send_message(neighbor, message, "model")) + for neighbor in neighbors + ] + ) + + +class SDFLAggregatorRoleBehavior(SDFLRoleMixin, AggregatorRoleBehavior): + async def before_round_start(self): + # Leadership transfer must be acknowledged before the new aggregator starts a round. + await self._engine.wait_pending_leadership_ack() + + async def extended_learning_cycle(self): + # SDFL aggregators collect trainer updates, publish the global model, then rotate leadership. + await self._engine.trainer.test() + await self._send_reputation_model_update() + await self._engine._waiting_model_updates() + await self._send_global_model() + await self._transfer_leadership() + + async def _before_leadership_transfer(self, successor): + await self._engine.mark_leadership_transfer_pending(successor) + + async def select_nodes_to_wait(self): + # The aggregator waits for all expected trainers, not just currently direct neighbors. + nodes = self._engine.get_sdfl_expected_trainers() + if nodes: + return nodes + return await super().select_nodes_to_wait() + + async def _send_global_model(self) -> None: + # Send the aggregated model through the SDFL forwarding channel. + model_params = self._engine.trainer.get_model_parameters() + serialized_model = ( + model_params + if isinstance(model_params, bytes) + else self._engine.trainer.serialize_model(model_params) + ) + + message = self._engine.cm.create_message( + "sdflmodel", + "global_model", + target="trainer", + parameters=serialized_model, + weight=self._engine.trainer.get_model_weight(), + round=self._engine.round, + node_id=self._engine.addr, + ) + + neighbors = await self._engine.cm.get_addrs_current_connections( + only_direct=True, + myself=False, + ) + + logging.info(f"SDFL aggregator | Broadcasting GLOBAL_MODEL to neighbors: {neighbors}") + + tasks = [ + asyncio.create_task( + self._engine.cm.send_message( + neighbor, + message, + "sdflmodel", + allow_after_learning_finished=True, + ) + ) + for neighbor in neighbors + ] + + if tasks: + await asyncio.gather(*tasks) + else: + logging.warning("SDFL aggregator | No neighbors available to send GLOBAL_MODEL") + """ ############################## # SERVER BEHAVIOR # ############################## """ - + class ServerRoleBehavior(RoleBehavior): from datetime import datetime - + def __init__(self, engine: Engine, config: Config): super().__init__() self._engine = engine self._config = config self._start_time = ServerRoleBehavior.datetime.now().strftime("%d/%m/%Y %H:%M:%S") self._role = factory_node_role("server") - + def get_role(self): - return self._role - + return self._role + def get_role_name(self, effective=False): return self._role.value - + async def extended_learning_cycle(self): await self._engine.trainer.test() await self._engine._waiting_model_updates() - + mpe = ModelPropagationEvent(await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False), "stable") await EventManager.get_instance().publish_node_event(mpe) - + async def select_nodes_to_wait(self): nodes = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) - return nodes - + return nodes + async def resolve_missing_updates(self): return (self._engine.trainer.get_model_parameters(), self._engine.trainer.BYPASS_MODEL_WEIGHT) @@ -342,67 +463,180 @@ async def resolve_missing_updates(self): # TRAINER BEHAVIOR # ############################## """ - + class TrainerRoleBehavior(RoleBehavior): def __init__(self, engine: Engine, config: Config): super().__init__() self._engine = engine self._config = config self._role = factory_node_role("trainer") - + def get_role(self): - return self._role - + return self._role + def get_role_name(self, effective=False): return self._role.value - + async def extended_learning_cycle(self): logging.info("Waiting global update | Assign _waiting_global_update = True") await self._engine.trainer.test() - await self._engine.trainer.train() + await self._engine.trainning_in_progress_lock.acquire_async() + try: + await self._engine.trainer.train() + finally: + await self._engine.trainning_in_progress_lock.release_async() mpe = ModelPropagationEvent(await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False), "stable") await EventManager.get_instance().publish_node_event(mpe) - + await self._engine._waiting_model_updates() - + async def select_nodes_to_wait(self): nodes = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) return nodes - + async def resolve_missing_updates(self): return (self._engine.trainer.get_model_parameters(), self._engine.trainer.get_model_weight()) + +class SDFLTrainerRoleBehavior(SDFLRoleMixin, TrainerRoleBehavior): + async def extended_learning_cycle(self): + logging.info("Waiting global update | Assign _waiting_global_update = True") + + await self._engine.trainer.test() + self._prepare_waiting_global_model() + # Trainers train locally, exchange reputation evidence, send their update, then wait for aggregation. + await self._engine.trainning_in_progress_lock.acquire_async() + try: + await self._engine.trainer.train() + finally: + await self._engine.trainning_in_progress_lock.release_async() + + if self._engine._reputation is not None: + # Process reputation model updates that arrived before the local table is computed. + await self._engine._reputation.process_pending_sdfl_reputation_updates(self._engine.round) + + await self._send_reputation_model_update() + await self._calculate_and_send_reputation_table() + await self._send_trainer_update() + await self._waiting_global_model() + + def _prepare_waiting_global_model(self): + # Reset the per-round event used by trainers to block until a GLOBAL_MODEL arrives. + self._engine._global_model_source = None + self._engine._global_model_received.clear() + + async def _calculate_and_send_reputation_table(self): + # Trainers publish direct-neighbor reputation tables for the aggregator to combine. + if self._engine._reputation is None: + return + + expected_reputation_neighbors = await self._engine.cm.get_addrs_current_connections( + only_direct=True, + myself=False, + ) + reputation_timeout = float( + self._config.participant["defense_args"] + .get("reputation", {}) + .get( + "model_update_timeout", + self._config.participant["defense_args"] + .get("reputation", {}) + .get("table_aggregation_timeout", 30), + ) + ) + await self._engine._reputation.wait_sdfl_reputation_updates( + expected_reputation_neighbors, + self._engine.round, + reputation_timeout, + ) + await self._engine._reputation.calculate_and_send_sdfl_reputation_table() + + async def _send_trainer_update(self): + # Broadcast the local trainer update; forwarding delivers it to the current aggregator. + model_params = self._engine.trainer.get_model_parameters() + serialized_model = ( + model_params + if isinstance(model_params, bytes) + else self._engine.trainer.serialize_model(model_params) + ) + + message = self._engine.cm.create_message( + "sdflmodel", + "trainer_update", + target="aggregator", + parameters=serialized_model, + weight=self._engine.trainer.get_model_weight(), + round=self._engine.round, + node_id=self._engine.addr, + ) + + neighbors = await self._engine.cm.get_addrs_current_connections( + only_direct=True, + myself=False, + ) + + logging.info(f"SDFL trainer | Broadcasting TRAINER_UPDATE to neighbors: {neighbors}") + + tasks = [ + asyncio.create_task( + self._engine.cm.send_message( + neighbor, + message, + "sdflmodel", + ) + ) + for neighbor in neighbors + ] + + if tasks: + await asyncio.gather(*tasks) + else: + logging.warning("SDFL trainer | No neighbors available to send TRAINER_UPDATE") + + async def _waiting_global_model(self): + # A trainer continues only after the aggregator's GLOBAL_MODEL is received or times out. + timeout = self._config.participant["aggregator_args"]["aggregation_timeout"] + logging.info(f"💤 Waiting global SDFL model in round {self._engine.round}.") + try: + await asyncio.wait_for(self._engine._global_model_received.wait(), timeout=timeout) + logging.info( + f"🤖 SDFL trainer | Global model received from " + f"{self._engine._global_model_source} in round {self._engine.round}" + ) + except TimeoutError: + logging.error(f"🤖 SDFL trainer | Timeout waiting global model in round {self._engine.round}") + """ ############################## # IDLE BEHAVIOR # ############################## """ - + class IdleRoleBehavior(RoleBehavior): def __init__(self, engine: Engine, config: Config): super().__init__() self._engine = engine self._config = config self._role = factory_node_role("idle") - + def get_role(self): - return self._role - + return self._role + def get_role_name(self, effective=False): return self._role.value - + async def extended_learning_cycle(self): logging.info("Waiting global update | Assign _waiting_global_update = True") await self._engine._waiting_model_updates() - + async def select_nodes_to_wait(self): nodes = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) return nodes - + async def resolve_missing_updates(self): raise NotImplementedError - + """ ############################## # PROXY BEHAVIOR # ############################## @@ -414,21 +648,21 @@ def __init__(self, engine: Engine, config: Config): self._engine = engine self._config = config self._role = factory_node_role("proxy") - + def get_role(self): - return self._role - + return self._role + def get_role_name(self, effective=False): return self._role.value - + async def extended_learning_cycle(self): logging.info("Waiting global update | Assign _waiting_global_update = True") await self._engine._waiting_model_updates() - + async def select_nodes_to_wait(self): nodes = await self._engine.cm.get_addrs_current_connections(only_direct=True, myself=False) - return nodes - + return nodes + async def resolve_missing_updates(self): raise NotImplementedError @@ -436,12 +670,21 @@ async def resolve_missing_updates(self): # UTILS ROLE BEHAVIORS # ############################## """ - + class roleBehaviorException(Exception): pass -def factory_role_behavior(role: str, engine: Engine, config: Config) -> RoleBehavior | None: - +def factory_role_behavior(role: str, engine: Engine, config: Config) -> RoleBehavior | None: + federation = config.participant["scenario_args"].get("federation") + if federation == "SDFL": + sdfl_role_behaviors = { + "trainer": SDFLTrainerRoleBehavior, + "aggregator": SDFLAggregatorRoleBehavior, + } + node_role = sdfl_role_behaviors.get(role) + if node_role: + return node_role(engine, config) + role_behaviors = { "malicious": MaliciousRoleBehavior, "trainer": TrainerRoleBehavior, @@ -451,14 +694,14 @@ def factory_role_behavior(role: str, engine: Engine, config: Config) -> RoleBeha "proxy": ProxyRoleBehavior, "idle": IdleRoleBehavior, } - + node_role = role_behaviors.get(role, None) if node_role: return node_role(engine, config) else: raise roleBehaviorException(f"Node Role Behavior {role} not found") - + def change_role_behavior(old_role: RoleBehavior, new_role: Role, *parameters) -> RoleBehavior: engine, config = parameters if not isinstance(old_role, MaliciousRoleBehavior): @@ -466,8 +709,4 @@ def change_role_behavior(old_role: RoleBehavior, new_role: Role, *parameters) -> else: fake_behavior = factory_role_behavior(new_role.value, engine, config) old_role._fake_role_behavior = fake_behavior - return old_role - - - - + return old_role diff --git a/nebula/core/pb/nebula.proto b/nebula/core/pb/nebula.proto index 3360196ed..13b0fa74d 100755 --- a/nebula/core/pb/nebula.proto +++ b/nebula/core/pb/nebula.proto @@ -26,6 +26,10 @@ message Wrapper { DiscoverMessage discover_message = 9; OfferMessage offer_message = 10; LinkMessage link_message = 11; + TrustworthinessMessage trustworthiness_message = 12; + TrustscoresMessage trustscores_message = 13; + SdflmodelMessage sdflmodel_message = 14; + ReputationtableMessage reputationtable_message = 15; } } @@ -73,6 +77,19 @@ message ModelMessage { int32 round = 3; // Identifies the communication round, particularly useful in iterative processes. } +message SdflmodelMessage { + enum Action { + TRAINER_UPDATE = 0; + GLOBAL_MODEL = 1; + } + Action action = 1; + string target = 2; // Target role: "aggregator" or "trainer". + bytes parameters = 3; // Serialized form of the model parameters. + int64 weight = 4; // Significance or weighting factor of this model update. + int32 round = 5; // Identifies the communication round. + string node_id = 6; // Logical producer of the update/model, preserved during forwarding. +} + message ConnectionMessage { enum Action { CONNECT = 0; @@ -126,7 +143,58 @@ message ReputationMessage { Action action = 4; // Action type (default: SHARE) } +message ReputationtableMessage { + enum Action { + TABLE = 0; + } + string node_id = 1; // Logical source node of the reputation table. + int32 round = 2; // Round to which the reputation table belongs. + string reputation_table_json = 3; // JSON encoded reputation table. + Action action = 4; // Action type (default: TABLE) +} + // Response transmits the outcome of a requested operation, including any errors. message ResponseMessage { string response = 1; // Outcome of the requested operation. } + +message TrustworthinessMessage { + enum Action { + REPORT = 0; + } + + Action action = 1; + string node_id = 2; + int64 bytes_sent = 3; + int64 bytes_recv = 4; + double accuracy = 5; + double loss = 6; + string role = 7; + double energy_grid = 8; + double emissions = 9; + string workload = 10; + string cpu_model = 11; + string gpu_model = 12; + bool cpu_used = 13; + bool gpu_used = 14; + double energy_consumed = 15; + int32 sample_size = 16; + float class_imbalance = 17; + int64 model_size = 18; + float local_entropy = 19; + float val_accuracy = 20; + bool dp_enabled = 21; + float dp_epsilon = 22; + double macro_f1 = 23; + double train_accuracy = 24; +} + +message TrustscoresMessage { + enum Action { + SHARE = 0; + } + + Action action = 1; + string node_id = 2; + string trust_report_json = 3; +} diff --git a/nebula/core/pb/nebula_pb2.py b/nebula/core/pb/nebula_pb2.py index 448675b31..bc06160a2 100644 --- a/nebula/core/pb/nebula_pb2.py +++ b/nebula/core/pb/nebula_pb2.py @@ -1,12 +1,11 @@ # -*- coding: utf-8 -*- # Generated by the protocol buffer compiler. DO NOT EDIT! # source: nebula.proto -# Protobuf Python Version: 4.25.3 """Generated protocol buffer code.""" +from google.protobuf.internal import builder as _builder from google.protobuf import descriptor as _descriptor from google.protobuf import descriptor_pool as _descriptor_pool from google.protobuf import symbol_database as _symbol_database -from google.protobuf.internal import builder as _builder # @@protoc_insertion_point(imports) _sym_db = _symbol_database.Default() @@ -14,49 +13,65 @@ -DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x0cnebula.proto\x12\x06nebula\"\xae\x04\n\x07Wrapper\x12\x0e\n\x06source\x18\x01 \x01(\t\x12\x35\n\x11\x64iscovery_message\x18\x02 \x01(\x0b\x32\x18.nebula.DiscoveryMessageH\x00\x12\x31\n\x0f\x63ontrol_message\x18\x03 \x01(\x0b\x32\x16.nebula.ControlMessageH\x00\x12\x37\n\x12\x66\x65\x64\x65ration_message\x18\x04 \x01(\x0b\x32\x19.nebula.FederationMessageH\x00\x12-\n\rmodel_message\x18\x05 \x01(\x0b\x32\x14.nebula.ModelMessageH\x00\x12\x37\n\x12\x63onnection_message\x18\x06 \x01(\x0b\x32\x19.nebula.ConnectionMessageH\x00\x12\x33\n\x10response_message\x18\x07 \x01(\x0b\x32\x17.nebula.ResponseMessageH\x00\x12\x37\n\x12reputation_message\x18\x08 \x01(\x0b\x32\x19.nebula.ReputationMessageH\x00\x12\x33\n\x10\x64iscover_message\x18\t \x01(\x0b\x32\x17.nebula.DiscoverMessageH\x00\x12-\n\roffer_message\x18\n \x01(\x0b\x32\x14.nebula.OfferMessageH\x00\x12+\n\x0clink_message\x18\x0b \x01(\x0b\x32\x13.nebula.LinkMessageH\x00\x42\t\n\x07message\"\x9e\x01\n\x10\x44iscoveryMessage\x12/\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1f.nebula.DiscoveryMessage.Action\x12\x10\n\x08latitude\x18\x02 \x01(\x02\x12\x11\n\tlongitude\x18\x03 \x01(\x02\"4\n\x06\x41\x63tion\x12\x0c\n\x08\x44ISCOVER\x10\x00\x12\x0c\n\x08REGISTER\x10\x01\x12\x0e\n\nDEREGISTER\x10\x02\"\xd1\x01\n\x0e\x43ontrolMessage\x12-\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1d.nebula.ControlMessage.Action\x12\x0b\n\x03log\x18\x02 \x01(\t\"\x82\x01\n\x06\x41\x63tion\x12\t\n\x05\x41LIVE\x10\x00\x12\x0c\n\x08OVERHEAD\x10\x01\x12\x0c\n\x08MOBILITY\x10\x02\x12\x0c\n\x08RECOVERY\x10\x03\x12\r\n\tWEAK_LINK\x10\x04\x12\x17\n\x13LEADERSHIP_TRANSFER\x10\x05\x12\x1b\n\x17LEADERSHIP_TRANSFER_ACK\x10\x06\"\xcd\x01\n\x11\x46\x65\x64\x65rationMessage\x12\x30\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32 .nebula.FederationMessage.Action\x12\x11\n\targuments\x18\x02 \x03(\t\x12\r\n\x05round\x18\x03 \x01(\x05\"d\n\x06\x41\x63tion\x12\x14\n\x10\x46\x45\x44\x45RATION_START\x10\x00\x12\x0e\n\nREPUTATION\x10\x01\x12\x1e\n\x1a\x46\x45\x44\x45RATION_MODELS_INCLUDED\x10\x02\x12\x14\n\x10\x46\x45\x44\x45RATION_READY\x10\x03\"A\n\x0cModelMessage\x12\x12\n\nparameters\x18\x01 \x01(\x0c\x12\x0e\n\x06weight\x18\x02 \x01(\x03\x12\r\n\x05round\x18\x03 \x01(\x05\"\x8f\x01\n\x11\x43onnectionMessage\x12\x30\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32 .nebula.ConnectionMessage.Action\"H\n\x06\x41\x63tion\x12\x0b\n\x07\x43ONNECT\x10\x00\x12\x0e\n\nDISCONNECT\x10\x01\x12\x10\n\x0cLATE_CONNECT\x10\x02\x12\x0f\n\x0bRESTRUCTURE\x10\x03\"\x95\x01\n\x0f\x44iscoverMessage\x12.\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1e.nebula.DiscoverMessage.Action\"R\n\x06\x41\x63tion\x12\x11\n\rDISCOVER_JOIN\x10\x00\x12\x12\n\x0e\x44ISCOVER_NODES\x10\x01\x12\x10\n\x0cLATE_CONNECT\x10\x02\x12\x0f\n\x0bRESTRUCTURE\x10\x03\"\xce\x01\n\x0cOfferMessage\x12+\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1b.nebula.OfferMessage.Action\x12\x13\n\x0bn_neighbors\x18\x02 \x01(\x02\x12\x0c\n\x04loss\x18\x03 \x01(\x02\x12\x12\n\nparameters\x18\x04 \x01(\x0c\x12\x0e\n\x06rounds\x18\x05 \x01(\x05\x12\r\n\x05round\x18\x06 \x01(\x05\x12\x0e\n\x06\x65pochs\x18\x07 \x01(\x05\"+\n\x06\x41\x63tion\x12\x0f\n\x0bOFFER_MODEL\x10\x00\x12\x10\n\x0cOFFER_METRIC\x10\x01\"w\n\x0bLinkMessage\x12*\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1a.nebula.LinkMessage.Action\x12\r\n\x05\x61\x64\x64rs\x18\x02 \x01(\t\"-\n\x06\x41\x63tion\x12\x0e\n\nCONNECT_TO\x10\x00\x12\x13\n\x0f\x44ISCONNECT_FROM\x10\x01\"\x89\x01\n\x11ReputationMessage\x12\x0f\n\x07node_id\x18\x01 \x01(\t\x12\r\n\x05score\x18\x02 \x01(\x02\x12\r\n\x05round\x18\x03 \x01(\x05\x12\x30\n\x06\x61\x63tion\x18\x04 \x01(\x0e\x32 .nebula.ReputationMessage.Action\"\x13\n\x06\x41\x63tion\x12\t\n\x05SHARE\x10\x00\"#\n\x0fResponseMessage\x12\x10\n\x08response\x18\x01 \x01(\tb\x06proto3') +DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x0cnebula.proto\x12\x06nebula\"\xa6\x06\n\x07Wrapper\x12\x0e\n\x06source\x18\x01 \x01(\t\x12\x35\n\x11\x64iscovery_message\x18\x02 \x01(\x0b\x32\x18.nebula.DiscoveryMessageH\x00\x12\x31\n\x0f\x63ontrol_message\x18\x03 \x01(\x0b\x32\x16.nebula.ControlMessageH\x00\x12\x37\n\x12\x66\x65\x64\x65ration_message\x18\x04 \x01(\x0b\x32\x19.nebula.FederationMessageH\x00\x12-\n\rmodel_message\x18\x05 \x01(\x0b\x32\x14.nebula.ModelMessageH\x00\x12\x37\n\x12\x63onnection_message\x18\x06 \x01(\x0b\x32\x19.nebula.ConnectionMessageH\x00\x12\x33\n\x10response_message\x18\x07 \x01(\x0b\x32\x17.nebula.ResponseMessageH\x00\x12\x37\n\x12reputation_message\x18\x08 \x01(\x0b\x32\x19.nebula.ReputationMessageH\x00\x12\x33\n\x10\x64iscover_message\x18\t \x01(\x0b\x32\x17.nebula.DiscoverMessageH\x00\x12-\n\roffer_message\x18\n \x01(\x0b\x32\x14.nebula.OfferMessageH\x00\x12+\n\x0clink_message\x18\x0b \x01(\x0b\x32\x13.nebula.LinkMessageH\x00\x12\x41\n\x17trustworthiness_message\x18\x0c \x01(\x0b\x32\x1e.nebula.TrustworthinessMessageH\x00\x12\x39\n\x13trustscores_message\x18\r \x01(\x0b\x32\x1a.nebula.TrustscoresMessageH\x00\x12\x35\n\x11sdflmodel_message\x18\x0e \x01(\x0b\x32\x18.nebula.SdflmodelMessageH\x00\x12\x41\n\x17reputationtable_message\x18\x0f \x01(\x0b\x32\x1e.nebula.ReputationtableMessageH\x00\x42\t\n\x07message\"\x9e\x01\n\x10\x44iscoveryMessage\x12/\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1f.nebula.DiscoveryMessage.Action\x12\x10\n\x08latitude\x18\x02 \x01(\x02\x12\x11\n\tlongitude\x18\x03 \x01(\x02\"4\n\x06\x41\x63tion\x12\x0c\n\x08\x44ISCOVER\x10\x00\x12\x0c\n\x08REGISTER\x10\x01\x12\x0e\n\nDEREGISTER\x10\x02\"\xd1\x01\n\x0e\x43ontrolMessage\x12-\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1d.nebula.ControlMessage.Action\x12\x0b\n\x03log\x18\x02 \x01(\t\"\x82\x01\n\x06\x41\x63tion\x12\t\n\x05\x41LIVE\x10\x00\x12\x0c\n\x08OVERHEAD\x10\x01\x12\x0c\n\x08MOBILITY\x10\x02\x12\x0c\n\x08RECOVERY\x10\x03\x12\r\n\tWEAK_LINK\x10\x04\x12\x17\n\x13LEADERSHIP_TRANSFER\x10\x05\x12\x1b\n\x17LEADERSHIP_TRANSFER_ACK\x10\x06\"\xcd\x01\n\x11\x46\x65\x64\x65rationMessage\x12\x30\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32 .nebula.FederationMessage.Action\x12\x11\n\targuments\x18\x02 \x03(\t\x12\r\n\x05round\x18\x03 \x01(\x05\"d\n\x06\x41\x63tion\x12\x14\n\x10\x46\x45\x44\x45RATION_START\x10\x00\x12\x0e\n\nREPUTATION\x10\x01\x12\x1e\n\x1a\x46\x45\x44\x45RATION_MODELS_INCLUDED\x10\x02\x12\x14\n\x10\x46\x45\x44\x45RATION_READY\x10\x03\"A\n\x0cModelMessage\x12\x12\n\nparameters\x18\x01 \x01(\x0c\x12\x0e\n\x06weight\x18\x02 \x01(\x03\x12\r\n\x05round\x18\x03 \x01(\x05\"\xc7\x01\n\x10SdflmodelMessage\x12/\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1f.nebula.SdflmodelMessage.Action\x12\x0e\n\x06target\x18\x02 \x01(\t\x12\x12\n\nparameters\x18\x03 \x01(\x0c\x12\x0e\n\x06weight\x18\x04 \x01(\x03\x12\r\n\x05round\x18\x05 \x01(\x05\x12\x0f\n\x07node_id\x18\x06 \x01(\t\".\n\x06\x41\x63tion\x12\x12\n\x0eTRAINER_UPDATE\x10\x00\x12\x10\n\x0cGLOBAL_MODEL\x10\x01\"\x8f\x01\n\x11\x43onnectionMessage\x12\x30\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32 .nebula.ConnectionMessage.Action\"H\n\x06\x41\x63tion\x12\x0b\n\x07\x43ONNECT\x10\x00\x12\x0e\n\nDISCONNECT\x10\x01\x12\x10\n\x0cLATE_CONNECT\x10\x02\x12\x0f\n\x0bRESTRUCTURE\x10\x03\"\x95\x01\n\x0f\x44iscoverMessage\x12.\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1e.nebula.DiscoverMessage.Action\"R\n\x06\x41\x63tion\x12\x11\n\rDISCOVER_JOIN\x10\x00\x12\x12\n\x0e\x44ISCOVER_NODES\x10\x01\x12\x10\n\x0cLATE_CONNECT\x10\x02\x12\x0f\n\x0bRESTRUCTURE\x10\x03\"\xce\x01\n\x0cOfferMessage\x12+\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1b.nebula.OfferMessage.Action\x12\x13\n\x0bn_neighbors\x18\x02 \x01(\x02\x12\x0c\n\x04loss\x18\x03 \x01(\x02\x12\x12\n\nparameters\x18\x04 \x01(\x0c\x12\x0e\n\x06rounds\x18\x05 \x01(\x05\x12\r\n\x05round\x18\x06 \x01(\x05\x12\x0e\n\x06\x65pochs\x18\x07 \x01(\x05\"+\n\x06\x41\x63tion\x12\x0f\n\x0bOFFER_MODEL\x10\x00\x12\x10\n\x0cOFFER_METRIC\x10\x01\"w\n\x0bLinkMessage\x12*\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32\x1a.nebula.LinkMessage.Action\x12\r\n\x05\x61\x64\x64rs\x18\x02 \x01(\t\"-\n\x06\x41\x63tion\x12\x0e\n\nCONNECT_TO\x10\x00\x12\x13\n\x0f\x44ISCONNECT_FROM\x10\x01\"\x89\x01\n\x11ReputationMessage\x12\x0f\n\x07node_id\x18\x01 \x01(\t\x12\r\n\x05score\x18\x02 \x01(\x02\x12\r\n\x05round\x18\x03 \x01(\x05\x12\x30\n\x06\x61\x63tion\x18\x04 \x01(\x0e\x32 .nebula.ReputationMessage.Action\"\x13\n\x06\x41\x63tion\x12\t\n\x05SHARE\x10\x00\"\xa3\x01\n\x16ReputationtableMessage\x12\x0f\n\x07node_id\x18\x01 \x01(\t\x12\r\n\x05round\x18\x02 \x01(\x05\x12\x1d\n\x15reputation_table_json\x18\x03 \x01(\t\x12\x35\n\x06\x61\x63tion\x18\x04 \x01(\x0e\x32%.nebula.ReputationtableMessage.Action\"\x13\n\x06\x41\x63tion\x12\t\n\x05TABLE\x10\x00\"#\n\x0fResponseMessage\x12\x10\n\x08response\x18\x01 \x01(\t\"\xaa\x04\n\x16TrustworthinessMessage\x12\x35\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32%.nebula.TrustworthinessMessage.Action\x12\x0f\n\x07node_id\x18\x02 \x01(\t\x12\x12\n\nbytes_sent\x18\x03 \x01(\x03\x12\x12\n\nbytes_recv\x18\x04 \x01(\x03\x12\x10\n\x08\x61\x63\x63uracy\x18\x05 \x01(\x01\x12\x0c\n\x04loss\x18\x06 \x01(\x01\x12\x0c\n\x04role\x18\x07 \x01(\t\x12\x13\n\x0b\x65nergy_grid\x18\x08 \x01(\x01\x12\x11\n\temissions\x18\t \x01(\x01\x12\x10\n\x08workload\x18\n \x01(\t\x12\x11\n\tcpu_model\x18\x0b \x01(\t\x12\x11\n\tgpu_model\x18\x0c \x01(\t\x12\x10\n\x08\x63pu_used\x18\r \x01(\x08\x12\x10\n\x08gpu_used\x18\x0e \x01(\x08\x12\x17\n\x0f\x65nergy_consumed\x18\x0f \x01(\x01\x12\x13\n\x0bsample_size\x18\x10 \x01(\x05\x12\x17\n\x0f\x63lass_imbalance\x18\x11 \x01(\x02\x12\x12\n\nmodel_size\x18\x12 \x01(\x03\x12\x15\n\rlocal_entropy\x18\x13 \x01(\x02\x12\x14\n\x0cval_accuracy\x18\x14 \x01(\x02\x12\x12\n\ndp_enabled\x18\x15 \x01(\x08\x12\x12\n\ndp_epsilon\x18\x16 \x01(\x02\x12\x10\n\x08macro_f1\x18\x17 \x01(\x01\x12\x16\n\x0etrain_accuracy\x18\x18 \x01(\x01\"\x14\n\x06\x41\x63tion\x12\n\n\x06REPORT\x10\x00\"\x88\x01\n\x12TrustscoresMessage\x12\x31\n\x06\x61\x63tion\x18\x01 \x01(\x0e\x32!.nebula.TrustscoresMessage.Action\x12\x0f\n\x07node_id\x18\x02 \x01(\t\x12\x19\n\x11trust_report_json\x18\x03 \x01(\t\"\x13\n\x06\x41\x63tion\x12\t\n\x05SHARE\x10\x00\x62\x06proto3') -_globals = globals() -_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals) -_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'nebula_pb2', _globals) +_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals()) +_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'nebula_pb2', globals()) if _descriptor._USE_C_DESCRIPTORS == False: + DESCRIPTOR._options = None - _globals['_WRAPPER']._serialized_start=25 - _globals['_WRAPPER']._serialized_end=583 - _globals['_DISCOVERYMESSAGE']._serialized_start=586 - _globals['_DISCOVERYMESSAGE']._serialized_end=744 - _globals['_DISCOVERYMESSAGE_ACTION']._serialized_start=692 - _globals['_DISCOVERYMESSAGE_ACTION']._serialized_end=744 - _globals['_CONTROLMESSAGE']._serialized_start=747 - _globals['_CONTROLMESSAGE']._serialized_end=956 - _globals['_CONTROLMESSAGE_ACTION']._serialized_start=826 - _globals['_CONTROLMESSAGE_ACTION']._serialized_end=956 - _globals['_FEDERATIONMESSAGE']._serialized_start=959 - _globals['_FEDERATIONMESSAGE']._serialized_end=1164 - _globals['_FEDERATIONMESSAGE_ACTION']._serialized_start=1064 - _globals['_FEDERATIONMESSAGE_ACTION']._serialized_end=1164 - _globals['_MODELMESSAGE']._serialized_start=1166 - _globals['_MODELMESSAGE']._serialized_end=1231 - _globals['_CONNECTIONMESSAGE']._serialized_start=1234 - _globals['_CONNECTIONMESSAGE']._serialized_end=1377 - _globals['_CONNECTIONMESSAGE_ACTION']._serialized_start=1305 - _globals['_CONNECTIONMESSAGE_ACTION']._serialized_end=1377 - _globals['_DISCOVERMESSAGE']._serialized_start=1380 - _globals['_DISCOVERMESSAGE']._serialized_end=1529 - _globals['_DISCOVERMESSAGE_ACTION']._serialized_start=1447 - _globals['_DISCOVERMESSAGE_ACTION']._serialized_end=1529 - _globals['_OFFERMESSAGE']._serialized_start=1532 - _globals['_OFFERMESSAGE']._serialized_end=1738 - _globals['_OFFERMESSAGE_ACTION']._serialized_start=1695 - _globals['_OFFERMESSAGE_ACTION']._serialized_end=1738 - _globals['_LINKMESSAGE']._serialized_start=1740 - _globals['_LINKMESSAGE']._serialized_end=1859 - _globals['_LINKMESSAGE_ACTION']._serialized_start=1814 - _globals['_LINKMESSAGE_ACTION']._serialized_end=1859 - _globals['_REPUTATIONMESSAGE']._serialized_start=1862 - _globals['_REPUTATIONMESSAGE']._serialized_end=1999 - _globals['_REPUTATIONMESSAGE_ACTION']._serialized_start=1980 - _globals['_REPUTATIONMESSAGE_ACTION']._serialized_end=1999 - _globals['_RESPONSEMESSAGE']._serialized_start=2001 - _globals['_RESPONSEMESSAGE']._serialized_end=2036 + _WRAPPER._serialized_start=25 + _WRAPPER._serialized_end=831 + _DISCOVERYMESSAGE._serialized_start=834 + _DISCOVERYMESSAGE._serialized_end=992 + _DISCOVERYMESSAGE_ACTION._serialized_start=940 + _DISCOVERYMESSAGE_ACTION._serialized_end=992 + _CONTROLMESSAGE._serialized_start=995 + _CONTROLMESSAGE._serialized_end=1204 + _CONTROLMESSAGE_ACTION._serialized_start=1074 + _CONTROLMESSAGE_ACTION._serialized_end=1204 + _FEDERATIONMESSAGE._serialized_start=1207 + _FEDERATIONMESSAGE._serialized_end=1412 + _FEDERATIONMESSAGE_ACTION._serialized_start=1312 + _FEDERATIONMESSAGE_ACTION._serialized_end=1412 + _MODELMESSAGE._serialized_start=1414 + _MODELMESSAGE._serialized_end=1479 + _SDFLMODELMESSAGE._serialized_start=1482 + _SDFLMODELMESSAGE._serialized_end=1681 + _SDFLMODELMESSAGE_ACTION._serialized_start=1635 + _SDFLMODELMESSAGE_ACTION._serialized_end=1681 + _CONNECTIONMESSAGE._serialized_start=1684 + _CONNECTIONMESSAGE._serialized_end=1827 + _CONNECTIONMESSAGE_ACTION._serialized_start=1755 + _CONNECTIONMESSAGE_ACTION._serialized_end=1827 + _DISCOVERMESSAGE._serialized_start=1830 + _DISCOVERMESSAGE._serialized_end=1979 + _DISCOVERMESSAGE_ACTION._serialized_start=1897 + _DISCOVERMESSAGE_ACTION._serialized_end=1979 + _OFFERMESSAGE._serialized_start=1982 + _OFFERMESSAGE._serialized_end=2188 + _OFFERMESSAGE_ACTION._serialized_start=2145 + _OFFERMESSAGE_ACTION._serialized_end=2188 + _LINKMESSAGE._serialized_start=2190 + _LINKMESSAGE._serialized_end=2309 + _LINKMESSAGE_ACTION._serialized_start=2264 + _LINKMESSAGE_ACTION._serialized_end=2309 + _REPUTATIONMESSAGE._serialized_start=2312 + _REPUTATIONMESSAGE._serialized_end=2449 + _REPUTATIONMESSAGE_ACTION._serialized_start=2430 + _REPUTATIONMESSAGE_ACTION._serialized_end=2449 + _REPUTATIONTABLEMESSAGE._serialized_start=2452 + _REPUTATIONTABLEMESSAGE._serialized_end=2615 + _REPUTATIONTABLEMESSAGE_ACTION._serialized_start=2596 + _REPUTATIONTABLEMESSAGE_ACTION._serialized_end=2615 + _RESPONSEMESSAGE._serialized_start=2617 + _RESPONSEMESSAGE._serialized_end=2652 + _TRUSTWORTHINESSMESSAGE._serialized_start=2655 + _TRUSTWORTHINESSMESSAGE._serialized_end=3209 + _TRUSTWORTHINESSMESSAGE_ACTION._serialized_start=3189 + _TRUSTWORTHINESSMESSAGE_ACTION._serialized_end=3209 + _TRUSTSCORESMESSAGE._serialized_start=3212 + _TRUSTSCORESMESSAGE._serialized_end=3348 + _TRUSTSCORESMESSAGE_ACTION._serialized_start=2430 + _TRUSTSCORESMESSAGE_ACTION._serialized_end=2449 # @@protoc_insertion_point(module_scope) diff --git a/nebula/core/role.py b/nebula/core/role.py index 6bc4343f8..dc5281983 100755 --- a/nebula/core/role.py +++ b/nebula/core/role.py @@ -10,7 +10,7 @@ class Role(Enum): PROXY = "proxy" IDLE = "idle" SERVER = "server" - + def factory_node_role(role: str) -> Role: if role == "trainer": return Role.TRAINER diff --git a/nebula/core/situationalawareness/awareness/arbitrationpolicies/staticarbitrationpolicy.py b/nebula/core/situationalawareness/awareness/arbitrationpolicies/staticarbitrationpolicy.py index e0dedab5c..dd17b5c35 100644 --- a/nebula/core/situationalawareness/awareness/arbitrationpolicies/staticarbitrationpolicy.py +++ b/nebula/core/situationalawareness/awareness/arbitrationpolicies/staticarbitrationpolicy.py @@ -8,11 +8,11 @@ class SAP(ArbitrationPolicy): # Static Arbitatrion Policy """ Static Arbitration Policy for the Reasoner module. - This class implements a fixed priority arbitration mechanism for - SA (Situational Awareness) components. Each SA component category + This class implements a fixed priority arbitration mechanism for + SA (Situational Awareness) components. Each SA component category is assigned a static weight representing its priority level. - In case of conflicting SA commands, the policy selects the command + In case of conflicting SA commands, the policy selects the command whose originating component has the highest priority weight. Attributes: @@ -21,7 +21,7 @@ class SAP(ArbitrationPolicy): # Static Arbitatrion Policy Methods: init(config): Placeholder for initialization with external configuration. - tie_break(sac1, sac2): Resolves conflicts between two SA commands by + tie_break(sac1, sac2): Resolves conflicts between two SA commands by comparing their category weights, returning True if sac1 wins. """ def __init__(self, verbose): diff --git a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/distanceneighborpolicy.py b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/distanceneighborpolicy.py index a1d29c675..421cb8e37 100644 --- a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/distanceneighborpolicy.py +++ b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/distanceneighborpolicy.py @@ -16,7 +16,7 @@ class DistanceNeighborPolicy(NeighborPolicy): - When to discard or replace existing neighbors. - Keeping track of current neighbors and known nodes with their distances. - The policy operates under the assumption that physical proximity + The policy operates under the assumption that physical proximity can be beneficial for performance and robustness in the network. Attributes: @@ -26,7 +26,7 @@ class DistanceNeighborPolicy(NeighborPolicy): addr (str | None): The address of this node (used for self-identification). neighbors_lock (Locker): Async lock for safe access to `neighbors`. nodes_known_lock (Locker): Async lock for safe access to `nodes_known`. - nodes_distances (dict[str, tuple[float, tuple[float, float]]] | None): + nodes_distances (dict[str, tuple[float, tuple[float, float]]] | None): Mapping from node IDs to a tuple containing (distance, (latitude, longitude)). nodes_distances_lock (Locker): Async lock for safe access to `nodes_distances`. _verbose (bool): Whether to enable verbose logging for debugging purposes. diff --git a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/fcneighborpolicy.py b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/fcneighborpolicy.py index e395a199a..443282f65 100644 --- a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/fcneighborpolicy.py +++ b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/fcneighborpolicy.py @@ -8,8 +8,8 @@ class FCNeighborPolicy(NeighborPolicy): """ Neighbor policy for fully-connected (FC) structured topologies. - This policy assumes a fully-connected topology where every node should attempt - to connect to all known nodes. It always accepts incoming neighbor connections + This policy assumes a fully-connected topology where every node should attempt + to connect to all known nodes. It always accepts incoming neighbor connections and considers the neighbor list incomplete if there are known nodes that are not yet connected. The goal is to maintain full connectivity across all known nodes in the federation. @@ -23,7 +23,7 @@ class FCNeighborPolicy(NeighborPolicy): nodes_known_lock (Locker): Async lock for safe access to `nodes_known`. _verbose (bool): Whether to enable verbose logging for debugging purposes. """ - + def __init__(self): self.max_neighbors = None self.nodes_known = set() diff --git a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/idleneighborpolicy.py b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/idleneighborpolicy.py index 648c8605e..d1d7d5025 100644 --- a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/idleneighborpolicy.py +++ b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/idleneighborpolicy.py @@ -8,11 +8,11 @@ class IDLENeighborPolicy(NeighborPolicy): """ Neighbor policy for minimal connectivity scenarios. - This policy only attempts to discover or establish new neighbor connections - if the node is currently isolated (i.e., has no neighbors). All incoming + This policy only attempts to discover or establish new neighbor connections + if the node is currently isolated (i.e., has no neighbors). All incoming connection requests are accepted regardless of the current neighbor state. - This policy is suitable for scenarios where minimal intervention is preferred, + This policy is suitable for scenarios where minimal intervention is preferred, and connections are formed opportunistically rather than proactively. Attributes: @@ -24,7 +24,7 @@ class IDLENeighborPolicy(NeighborPolicy): nodes_known_lock (Locker): Async lock for thread-safe access to `nodes_known`. _verbose (bool): Enables verbose logging for debugging and traceability. """ - + def __init__(self): self.max_neighbors = None self.nodes_known = set() diff --git a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/ringneighborpolicy.py b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/ringneighborpolicy.py index afd9b1d59..e6933b5b7 100644 --- a/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/ringneighborpolicy.py +++ b/nebula/core/situationalawareness/awareness/sanetwork/neighborpolicies/ringneighborpolicy.py @@ -10,9 +10,9 @@ class RINGNeighborPolicy(NeighborPolicy): """ Neighbor policy for ring topologies. - This policy maintains a strict limit on the number of neighbors per node, - enforcing a ring-like structure. Each node connects to a fixed number of - neighbors (by default 2), and excess connections are detected and marked + This policy maintains a strict limit on the number of neighbors per node, + enforcing a ring-like structure. Each node connects to a fixed number of + neighbors (by default 2), and excess connections are detected and marked for removal. The policy ensures: @@ -34,7 +34,7 @@ class RINGNeighborPolicy(NeighborPolicy): _excess_neighbors_removed_lock (Locker): Lock for accessing the removal tracking set. _verbose (bool): Enables verbose logging. """ - + RECENTLY_REMOVED_BAN_TIME = 20 def __init__(self): diff --git a/nebula/core/situationalawareness/awareness/satraining/satraining.py b/nebula/core/situationalawareness/awareness/satraining/satraining.py index 94c18f40c..813cc11fe 100644 --- a/nebula/core/situationalawareness/awareness/satraining/satraining.py +++ b/nebula/core/situationalawareness/awareness/satraining/satraining.py @@ -6,9 +6,9 @@ from nebula.addons.functions import print_msg_box from nebula.core.situationalawareness.awareness.sareasoner import SAReasoner, SAMComponent from nebula.core.eventmanager import EventManager - -RESTRUCTURE_COOLDOWN = 5 - + +RESTRUCTURE_COOLDOWN = 5 + class SATraining(SAMComponent): """ SATraining is a Situational Awareness (SA) component responsible for enhancing @@ -24,7 +24,7 @@ class SATraining(SAMComponent): _sar (SAReasoner): Reference to the shared situational reasoner. _trainning_policy: Instantiated training policy strategy. """ - + def __init__(self, config): """ Initialize the SATraining component with a given configuration. @@ -61,7 +61,7 @@ def tp(self): """ Returns the currently active training policy instance. """ - return self._trainning_policy + return self._trainning_policy async def init(self): """ @@ -69,7 +69,7 @@ async def init(self): This setup enables the policy to make informed decisions based on local topology. """ config = {} - config["nodes"] = set(await self.sar.get_nodes_known(neighbors_only=True)) + config["nodes"] = set(await self.sar.get_nodes_known(neighbors_only=True)) await self.tp.init(config) async def sa_component_actions(self): @@ -79,4 +79,3 @@ async def sa_component_actions(self): """ logging.info("SA Trainng evaluating current scenario") asyncio.create_task(self.tp.get_evaluation_results()) - diff --git a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/bpstrainingpolicy.py b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/bpstrainingpolicy.py index 0353a8020..32874c6ae 100644 --- a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/bpstrainingpolicy.py +++ b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/bpstrainingpolicy.py @@ -4,18 +4,18 @@ from nebula.core.nebulaevents import RoundEndEvent class BPSTrainingPolicy(TrainingPolicy): - + def __init__(self, config=None): pass - + async def init(self, config): - await self.register_sa_agent() + await self.register_sa_agent() async def get_evaluation_results(self): sac = factory_sa_command( "connectivity", SACommandAction.MAINTAIN_CONNECTIONS, - self, + self, "", SACommandPRIO.LOW, False, @@ -24,15 +24,15 @@ async def get_evaluation_results(self): ) await self.suggest_action(sac) await self.notify_all_suggestions_done(RoundEndEvent) - + async def get_agent(self) -> str: return "SATraining_BPSTP" async def register_sa_agent(self): await SuggestionBuffer.get_instance().register_event_agents(RoundEndEvent, self) - + async def suggest_action(self, sac : SACommand): await SuggestionBuffer.get_instance().register_suggestion(RoundEndEvent, self, sac) - + async def notify_all_suggestions_done(self, event_type): - await SuggestionBuffer.get_instance().notify_all_suggestions_done_for_agent(self, event_type) \ No newline at end of file + await SuggestionBuffer.get_instance().notify_all_suggestions_done_for_agent(self, event_type) diff --git a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/fastreboot.py b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/fastreboot.py index dd8fc438d..39a0791c4 100644 --- a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/fastreboot.py +++ b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/fastreboot.py @@ -24,13 +24,13 @@ def __init__( self._upgrade_lr = FR_LEARNING_RATE # Increased value for learning rate self._current_lr = VANILLA_LEARNING_RATE self._verbose = config["verbose"] - + self._learning_rate_lock = Locker(name="learning_rate_lock", async_lock=True) self._weight_modifier = {} self._weight_modifier_lock = Locker(name="weight_modifier_lock", async_lock=True) self._fr_in_progress = False - + async def init(self, config): #await EventManager.get_instance().subscribe_node_event(UpdateNeighborEvent) #await EventManager.get_instance().subscribe_node_event(AggregationEvent) diff --git a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/htstrainingpolicy.py b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/htstrainingpolicy.py index e37209ece..29b0524ba 100644 --- a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/htstrainingpolicy.py +++ b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/htstrainingpolicy.py @@ -6,19 +6,19 @@ # "Hybrid Training Strategy" (HTS) class HTSTrainingPolicy(TrainingPolicy): """ - Implements a Hybrid Training Strategy (HTS) that combines multiple training policies - (e.g., QDS, FRTS) to collaboratively decide on the evaluation and potential pruning + Implements a Hybrid Training Strategy (HTS) that combines multiple training policies + (e.g., QDS, FRTS) to collaboratively decide on the evaluation and potential pruning of neighbors in a decentralized federated learning scenario. - + Attributes: TRAINING_POLICY (set): Names of training policy classes to instantiate and manage. """ - + TRAINING_POLICY = { "Quality-Driven Selection", "Fast Reboot Training Strategy", } - + def __init__(self, config): """ Initializes the HTS policy with the node's address and verbosity level. @@ -33,34 +33,34 @@ def __init__(self, config): self._verbose = config["verbose"] self._training_policies : set[TrainingPolicy] = set() self._training_policies.add([factory_training_policy(x, config) for x in self.TRAINING_POLICY]) - + def __str__(self): - return "HTS" - + return "HTS" + @property def tps(self): - return self._training_policies + return self._training_policies async def init(self, config): for tp in self.tps: - await tp.init(config) + await tp.init(config) async def update_neighbors(self, node, remove=False): pass - + async def get_evaluation_results(self): """ Asynchronously calls the `get_evaluation_results` of each policy, and logs the nodes each policy would remove. - + Returns: None (future version may merge all evaluations). """ nodes_to_remove = dict() for tp in self.tps: nodes_to_remove[tp] = await tp.get_evaluation_results() - + for tp, nodes in nodes_to_remove.items(): logging.info(f"Training Policy: {tp}, nodes to remove: {nodes}") - - return None \ No newline at end of file + + return None diff --git a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/qdstrainingpolicy.py b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/qdstrainingpolicy.py index f067f7e84..535097b45 100644 --- a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/qdstrainingpolicy.py +++ b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/qdstrainingpolicy.py @@ -15,13 +15,13 @@ class QDSTrainingPolicy(TrainingPolicy): """ Implements a Quality-Driven Selection (QDS) strategy for training in DFL. - + This policy tracks the cosine similarity of neighbor model updates over time, and detects nodes that are inactive or provide redundant updates. Based on these evaluations, the policy suggests disconnecting such nodes to promote better model convergence and network efficiency. """ - + MAX_HISTORIC_SIZE = 10 SIMILARITY_THRESHOLD = 0.73 INACTIVE_THRESHOLD = 3 @@ -31,7 +31,7 @@ class QDSTrainingPolicy(TrainingPolicy): def __init__(self, config : dict): """ Initializes the QDS training policy. - + Args: config (dict): Configuration dictionary with keys: - "addr": Local node address. @@ -46,7 +46,7 @@ def __init__(self, config : dict): self._last_check = 0 self._check_done = False self._evaluation_results = set() - + def __str__(self): return "QDS" @@ -94,21 +94,21 @@ async def _process_aggregation_event(self, agg_ev : AggregationEvent): for addr, updt in updates.items(): if addr == self._addr: continue if not addr in self._nodes.keys(): continue - + deque_history, missed_count = self._nodes[addr] if addr in missing_nodes: if self._verbose: logging.info(f"Node inactivity counter increased for: {addr}") self._nodes[addr] = (deque_history, missed_count + 1) # Inactive rounds counter +1 else: self._nodes[addr] = (deque_history, 0) # Reset inactive counter - - #TODO Do it for the ones not using the last update received cause they are missing this round + + #TODO Do it for the ones not using the last update received cause they are missing this round (model,_) = updt - (self_model, _) = self_updt + (self_model, _) = self_updt cos_sim = cosine_metric(self_model, model, similarity=True) self._nodes[addr][0].append(cos_sim) self._evaluation_results = await self.evaluate() - + async def _get_nodes(self): """ Safely returns a copy of the current node tracking dictionary. @@ -118,8 +118,8 @@ async def _get_nodes(self): """ async with self._nodes_lock: nodes = self._nodes.copy() - return nodes - + return nodes + async def evaluate(self): """ Evaluates the current neighbor set to determine inactive or redundant nodes. @@ -131,10 +131,10 @@ async def evaluate(self): self._grace_rounds -= 1 if self._verbose: logging.info("Grace time hasnt finished...") return None - + if self._verbose: logging.info("Evaluation in process") - - result = set() + + result = set() if self._last_check == 0: self._check_done = True nodes = await self._get_nodes() @@ -149,18 +149,18 @@ async def evaluate(self): if self._verbose: logging.info(f"Node: {node} hadn't participated in any of the last {self.INACTIVE_THRESHOLD} rounds") else: if self._verbose: logging.info(f"Node: {node} inactivity counter: {inactivity_counter}") - + if node not in self._round_missing_nodes: if last_sim < self.SIMILARITY_THRESHOLD: if self._verbose: logging.info(f"Node: {node} got a similarity value of: {last_sim} under threshold: {self.SIMILARITY_THRESHOLD}") else: if self._verbose: logging.info(f"Node: {node} got a redundant model, cossine simmilarity: {last_sim} over threshold: {self.SIMILARITY_THRESHOLD}") redundant_nodes.add((node, last_sim)) - + if self._verbose: logging.info(f"Inactive nodes on aggregations: {inactive_nodes}") if self._verbose: logging.info(f"Redundant nodes on aggregations: {redundant_nodes}") if inactive_nodes: - result = result.union(inactive_nodes) + result = result.union(inactive_nodes) if len(redundant_nodes): sorted_redundant_nodes = sorted(redundant_nodes, key=lambda x: x[1]) n_discarded = math.ceil((len(redundant_nodes)/2)) @@ -171,11 +171,11 @@ async def evaluate(self): else: if self._verbose: logging.info(f"Evaluation is on cooldown... | {self.CHECK_COOLDOWN - self._last_check} rounds remaining") self._check_done = False - + self._last_check = (self._last_check + 1) % self.CHECK_COOLDOWN - + return result - + async def get_evaluation_results(self): """ Triggers suggested actions based on last evaluation results. @@ -186,14 +186,14 @@ async def get_evaluation_results(self): for node_discarded in self._evaluation_results: args = (node_discarded, False, True) sac = factory_sa_command( - "connectivity", + "connectivity", SACommandAction.DISCONNECT, - self, - node_discarded, - SACommandPRIO.MEDIUM, - False, - CommunicationsManager.get_instance().disconnect, - *args + self, + node_discarded, + SACommandPRIO.MEDIUM, + False, + CommunicationsManager.get_instance().disconnect, + *args ) await self.suggest_action(sac) await self.notify_all_suggestions_done(RoundEndEvent) @@ -203,9 +203,9 @@ async def get_agent(self) -> str: async def register_sa_agent(self): await SuggestionBuffer.get_instance().register_event_agents(RoundEndEvent, self) - + async def suggest_action(self, sac : SACommand): await SuggestionBuffer.get_instance().register_suggestion(RoundEndEvent, self, sac) - + async def notify_all_suggestions_done(self, event_type): - await SuggestionBuffer.get_instance().notify_all_suggestions_done_for_agent(self, event_type) \ No newline at end of file + await SuggestionBuffer.get_instance().notify_all_suggestions_done_for_agent(self, event_type) diff --git a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/trainingpolicy.py b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/trainingpolicy.py index cd9dae7c1..74d1b426f 100644 --- a/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/trainingpolicy.py +++ b/nebula/core/situationalawareness/awareness/satraining/trainingpolicy/trainingpolicy.py @@ -2,7 +2,7 @@ from nebula.core.situationalawareness.awareness.sautils.samoduleagent import SAModuleAgent class TrainingPolicy(SAModuleAgent): - + @abstractmethod async def init(self, config): pass @@ -10,20 +10,20 @@ async def init(self, config): @abstractmethod async def get_evaluation_results(self): pass - - + + def factory_training_policy(training_policy, config) -> TrainingPolicy: from nebula.core.situationalawareness.awareness.satraining.trainingpolicy.bpstrainingpolicy import BPSTrainingPolicy from nebula.core.situationalawareness.awareness.satraining.trainingpolicy.qdstrainingpolicy import QDSTrainingPolicy from nebula.core.situationalawareness.awareness.satraining.trainingpolicy.htstrainingpolicy import HTSTrainingPolicy from nebula.core.situationalawareness.awareness.satraining.trainingpolicy.fastreboot import FastReboot - + options = { "Broad-Propagation Strategy": BPSTrainingPolicy, # "Broad-Propagation Strategy" (BPS) -- default value "Quality-Driven Selection": QDSTrainingPolicy, # "Quality-Driven Selection" (QDS) "Hybrid Training Strategy": HTSTrainingPolicy, # "Hybrid Training Strategy" (HTS) "Fast Reboot Training Strategy": FastReboot, # "Fast Reboot Training Strategy" (FRTS) - } - + } + cs = options.get(training_policy, BPSTrainingPolicy) - return cs(config) \ No newline at end of file + return cs(config) diff --git a/nebula/core/situationalawareness/discovery/candidateselection/distcandidateselector.py b/nebula/core/situationalawareness/discovery/candidateselection/distcandidateselector.py index d389f8bbb..fec0c1b09 100644 --- a/nebula/core/situationalawareness/discovery/candidateselection/distcandidateselector.py +++ b/nebula/core/situationalawareness/discovery/candidateselection/distcandidateselector.py @@ -10,17 +10,17 @@ class DistanceCandidateSelector(CandidateSelector): """ Selects candidate nodes based on their physical proximity. - This selector uses geolocation data to filter candidates within a - maximum distance threshold. It listens for GPS updates and maintains + This selector uses geolocation data to filter candidates within a + maximum distance threshold. It listens for GPS updates and maintains a mapping of node identifiers to their distances and coordinates. Attributes: - MAX_DISTANCE_THRESHOLD (int): Maximum distance (in meters) allowed + MAX_DISTANCE_THRESHOLD (int): Maximum distance (in meters) allowed for a node to be considered a valid candidate. candidates (list): List of candidate nodes to be evaluated. - candidates_lock (Locker): Async lock for managing concurrent access + candidates_lock (Locker): Async lock for managing concurrent access to the candidate list. - nodes_distances (dict): Maps node IDs to a tuple containing the + nodes_distances (dict): Maps node IDs to a tuple containing the distance and GPS coordinates. nodes_distances_lock (Locker): Async lock for the distance mapping. _verbose (bool): Flag to enable verbose logging for debugging. diff --git a/nebula/core/situationalawareness/discovery/candidateselection/fccandidateselector.py b/nebula/core/situationalawareness/discovery/candidateselection/fccandidateselector.py index 5e82db6b8..b0840b804 100644 --- a/nebula/core/situationalawareness/discovery/candidateselection/fccandidateselector.py +++ b/nebula/core/situationalawareness/discovery/candidateselection/fccandidateselector.py @@ -24,7 +24,7 @@ class FCCandidateSelector(CandidateSelector): Inherits from: CandidateSelector: Base class interface for candidate selection logic. """ - + def __init__(self): self.candidates = [] self.candidates_lock = Locker(name="candidates_lock") diff --git a/nebula/core/situationalawareness/discovery/candidateselection/ringcandidateselector.py b/nebula/core/situationalawareness/discovery/candidateselection/ringcandidateselector.py index 5a90df6c5..d1b3bb33c 100644 --- a/nebula/core/situationalawareness/discovery/candidateselection/ringcandidateselector.py +++ b/nebula/core/situationalawareness/discovery/candidateselection/ringcandidateselector.py @@ -27,7 +27,7 @@ class RINGCandidateSelector(CandidateSelector): Inherits from: CandidateSelector: Base interface for candidate selection strategies. """ - + def __init__(self): self._candidates = [] self._rejected_candidates = [] diff --git a/nebula/core/situationalawareness/discovery/candidateselection/stdcandidateselector.py b/nebula/core/situationalawareness/discovery/candidateselection/stdcandidateselector.py index fb20b59a7..bbb5fd7db 100644 --- a/nebula/core/situationalawareness/discovery/candidateselection/stdcandidateselector.py +++ b/nebula/core/situationalawareness/discovery/candidateselection/stdcandidateselector.py @@ -9,8 +9,8 @@ class STDandidateSelector(CandidateSelector): Candidate selector for scenarios without a predefined structural topology. In cases where the federation topology is not explicitly structured, - this selector chooses candidates based on the average number of neighbors - indicated in their offers. It selects approximately as many candidates as the + this selector chooses candidates based on the average number of neighbors + indicated in their offers. It selects approximately as many candidates as the average neighbor count, aiming to balance connectivity dynamically. Attributes: @@ -27,7 +27,7 @@ class STDandidateSelector(CandidateSelector): Inherits from: CandidateSelector: Base interface for candidate selection strategies. """ - + def __init__(self): self.candidates = [] self.candidates_lock = Locker(name="candidates_lock") diff --git a/nebula/core/situationalawareness/discovery/modelhandlers/defaultmodelhandler.py b/nebula/core/situationalawareness/discovery/modelhandlers/defaultmodelhandler.py index fa8aec8d4..bd16bae8b 100644 --- a/nebula/core/situationalawareness/discovery/modelhandlers/defaultmodelhandler.py +++ b/nebula/core/situationalawareness/discovery/modelhandlers/defaultmodelhandler.py @@ -7,14 +7,14 @@ class DefaultModelHandler(ModelHandler): """ Provides the initial default model. - This handler returns the baseline model with default weights, - typically used at the start of the federation or when no suitable + This handler returns the baseline model with default weights, + typically used at the start of the federation or when no suitable model offers have been received from peers. Inherits from: ModelHandler: Provides the base interface for model operations. """ - + def __init__(self): self.model = None self.rounds = 0 diff --git a/nebula/core/situationalawareness/discovery/modelhandlers/stdmodelhandler.py b/nebula/core/situationalawareness/discovery/modelhandlers/stdmodelhandler.py index 15975dee1..83506249e 100644 --- a/nebula/core/situationalawareness/discovery/modelhandlers/stdmodelhandler.py +++ b/nebula/core/situationalawareness/discovery/modelhandlers/stdmodelhandler.py @@ -4,7 +4,7 @@ class STDModelHandler(ModelHandler): """ - Handles the selection and acquisition of the most up-to-date model + Handles the selection and acquisition of the most up-to-date model during the discovery phase of the federation process. This handler choose the first model received. @@ -13,10 +13,10 @@ class STDModelHandler(ModelHandler): ModelHandler: Provides the base interface for model operations. Intended Use: - Used during the initial, when a node discovers others and must + Used during the initial, when a node discovers others and must align itself with the most recent global model state. """ - + def __init__(self): self.model = None self.rounds = 0 diff --git a/nebula/core/training/dp.py b/nebula/core/training/dp.py new file mode 100644 index 000000000..446511409 --- /dev/null +++ b/nebula/core/training/dp.py @@ -0,0 +1,101 @@ +# Opacus: User-Friendly Differential Privacy Library in PyTorch. Yousefpour et al. (2021). arXiv:2109.12298. +# Licensed under Apache License 2.0: https://github.com/meta-pytorch/opacus/blob/main/LICENSE + +class SimpleDPState: + # Minimal mutable state used to pass Opacus-wrapped objects between hooks. + def __init__(self): + self.extras = {} + + +class DifferentialPrivacyPlugin: + name = "differential_privacy" + + def __init__( + self, + *, + noise_multiplier=1.0, + max_grad_norm=1.0, + target_delta=1e-5, + accountant="prv", + secure_mode=False, + poisson_sampling=True, + clipping="flat", + ): + # Fixed DP-SGD controls. Epsilon is not configured here; it is computed + # from the accountant as the consumed privacy budget after training. + self.noise_multiplier = float(noise_multiplier) + self.max_grad_norm = float(max_grad_norm) + self.target_delta = target_delta + self.accountant = accountant + self.secure_mode = bool(secure_mode) + self.poisson_sampling = bool(poisson_sampling) + self.clipping = clipping + self._privacy_engine = None + + def on_train_start(self, model, optimizer, state): + # Import Opacus lazily so non-DP trainers do not need to load it. + from opacus import PrivacyEngine + + dataloader = state.extras["dataloader"] + model.train() + + # Keep one PrivacyEngine per plugin instance so the accountant composes + # privacy loss across Nebula rounds instead of resetting every round. + if self._privacy_engine is None: + self._privacy_engine = PrivacyEngine( + accountant=self.accountant, + secure_mode=self.secure_mode, + ) + privacy_engine = self._privacy_engine + + private_model, private_optimizer, private_dataloader = privacy_engine.make_private( + module=model, + optimizer=optimizer, + data_loader=dataloader, + noise_multiplier=self.noise_multiplier, + max_grad_norm=self.max_grad_norm, + poisson_sampling=self.poisson_sampling, + clipping=self.clipping, + ) + + # Replace the training components with DP-aware versions used by LightningDP. + state.extras["privacy_engine"] = privacy_engine + state.extras["model"] = private_model + state.extras["optimizer"] = private_optimizer + state.extras["dataloader"] = private_dataloader + + def on_train_end(self, state): + # Query the accumulated epsilon for the configured delta after this round. + privacy_engine = state.extras.get("privacy_engine") + private_model = state.extras.get("model") + + if privacy_engine is not None and self.target_delta is not None: + try: + epsilon = privacy_engine.get_epsilon(delta=self.target_delta) + state.extras["dp_epsilon"] = float(epsilon) + state.extras["dp_delta"] = float(self.target_delta) + except Exception: + pass + + if private_model is not None: + # Clean Opacus hook state so the same model can continue through later + # Nebula phases without stale per-sample gradient hooks. + try: + private_model.zero_grad(set_to_none=True) + except Exception: + pass + + try: + private_model.forbid_grad_accumulation() + except Exception: + pass + + try: + private_model.disable_hooks() + except Exception: + pass + + try: + private_model.remove_hooks() + except Exception: + pass diff --git a/nebula/core/training/lightning.py b/nebula/core/training/lightning.py index d83975147..b7551ea5d 100755 --- a/nebula/core/training/lightning.py +++ b/nebula/core/training/lightning.py @@ -19,7 +19,7 @@ from nebula.config.config import TRAINING_LOGGER from nebula.core.utils.deterministic import enable_deterministic from nebula.core.utils.nebulalogger_tensorboard import NebulaTensorBoardLogger -from nebula.core.nebulaevents import TestMetricsEvent +from nebula.core.nebulaevents import TestMetricsEvent, ValidationMetricsEvent from nebula.core.eventmanager import EventManager logging_training = logging.getLogger(TRAINING_LOGGER) @@ -295,8 +295,10 @@ async def train(self): try: self.create_trainer() logging.info(f"{'=' * 10} [Training] Started (check training logs for progress) {'=' * 10}") - await asyncio.to_thread(self._train_sync) + val_loss, val_accuracy, train_accuracy = await asyncio.to_thread(self._train_sync) logging.info(f"{'=' * 10} [Training] Finished (check training logs for progress) {'=' * 10}") + vme = ValidationMetricsEvent(val_loss, val_accuracy, train_accuracy) + await EventManager.get_instance().publish_addonevent(vme) except Exception as e: logging_training.error(f"Error training model: {e}") logging_training.error(traceback.format_exc()) @@ -304,37 +306,62 @@ async def train(self): def _train_sync(self): try: self._trainer.fit(self.model, self.datamodule) + validation_metrics = {} + if hasattr(self.model, "get_latest_validation_metrics"): + validation_metrics = self.model.get_latest_validation_metrics() or {} + + loss = None + model_loss = getattr(self.model, "get_loss", None) + if callable(model_loss): + raw_loss = model_loss() + loss = raw_loss.item() if hasattr(raw_loss, "item") else raw_loss + + accuracy = validation_metrics.get("Validation/Accuracy") + train_accuracy = None + get_train_accuracy = getattr(self.model, "get_latest_train_accuracy", None) + if callable(get_train_accuracy): + train_accuracy = get_train_accuracy() + + return loss, accuracy, train_accuracy except Exception as e: logging_training.error(f"Error in _train_sync: {e}") tb = traceback.format_exc() logging_training.error(f"Traceback: {tb}") # If "raise", the exception will be managed by the main thread + return None, None, None async def test(self): try: self.create_trainer() logging.info(f"{'=' * 10} [Testing] Started (check training logs for progress) {'=' * 10}") - loss, accuracy = await asyncio.to_thread(self._test_sync) + loss, accuracy, macro_f1 = await asyncio.to_thread(self._test_sync) logging.info(f"{'=' * 10} [Testing] Finished (check training logs for progress) {'=' * 10}") - tme = TestMetricsEvent(loss, accuracy) + tme = TestMetricsEvent(loss, accuracy, macro_f1) await EventManager.get_instance().publish_addonevent(tme) except Exception as e: logging_training.error(f"Error testing model: {e}") logging_training.error(traceback.format_exc()) + def _metric_value(self, value): + return value.item() if hasattr(value, "item") else value + def _test_sync(self): try: self._trainer.test(self.model, self.datamodule, verbose=True) metrics = self._trainer.callback_metrics - loss = metrics.get('val_loss/dataloader_idx_0', None).item() - accuracy = metrics.get('val_accuracy/dataloader_idx_0', None).item() - return loss, accuracy + loss = self._metric_value(metrics.get('val_loss/dataloader_idx_0')) + accuracy = self._metric_value(metrics.get('val_accuracy/dataloader_idx_0')) + macro_f1 = None + get_macro_f1 = getattr(self.model, "get_latest_test_macro_f1", None) + if callable(get_macro_f1): + macro_f1 = get_macro_f1() + + return loss, accuracy, macro_f1 except Exception as e: logging_training.error(f"Error in _test_sync: {e}") tb = traceback.format_exc() logging_training.error(f"Traceback: {tb}") - # If "raise", the exception will be managed by the main thread - return None, None + return None, None, None def cleanup(self): if self._trainer is not None: @@ -373,3 +400,11 @@ def update_model_learning_rate(self, new_lr): def show_current_learning_rate(self): self.model.show_current_learning_rate() + + def get_privacy_metrics(self): + # Non-DP trainers expose the same metrics contract with neutral values. + return { + "dp_enabled": False, + "dp_epsilon": 0, + "dp_delta": 0, + } diff --git a/nebula/core/training/lightning_dp.py b/nebula/core/training/lightning_dp.py new file mode 100644 index 000000000..6fde329f4 --- /dev/null +++ b/nebula/core/training/lightning_dp.py @@ -0,0 +1,178 @@ +import logging +import traceback + +import torch + +from nebula.config.config import TRAINING_LOGGER +from nebula.core.training.lightning import Lightning +from nebula.core.training.dp import DifferentialPrivacyPlugin, SimpleDPState + +logging_training = logging.getLogger(TRAINING_LOGGER) + + +class LightningDP(Lightning): + """ + Lightning-based trainer with Differential Privacy support. + + This class inherits the standard Nebula Lightning trainer. + """ + + def __init__(self, model, datamodule, config=None): + super().__init__(model, datamodule, config) + # The DP plugin owns the Opacus PrivacyEngine and its cumulative accountant. + self._dp_plugin = self.create_dp_plugin() + self.dp_epsilon = None + self.dp_delta = None + + def create_dp_plugin(self): + # Translate Nebula participant config into the fixed DP-SGD controls used by Opacus. + dp_config = self.config.participant["training_args"].get("dp") + + if dp_config is None or not dp_config.get("enabled", False): + raise ValueError("LightningDP was selected, but Differential Privacy is not enabled in the configuration.") + + return DifferentialPrivacyPlugin( + noise_multiplier=dp_config["noise_multiplier"], + max_grad_norm=dp_config["max_grad_norm"], + target_delta=dp_config["target_delta"], + accountant=dp_config["accountant"], + secure_mode=dp_config["secure_mode"], + poisson_sampling=dp_config["poisson_sampling"], + clipping=dp_config["clipping"], + ) + + def _train_sync(self): + # Keep the public Lightning trainer contract: train once and return validation and training metrics. + try: + self._fit_with_dp() + + validation_metrics = {} + if hasattr(self.model, "get_latest_validation_metrics"): + validation_metrics = self.model.get_latest_validation_metrics() or {} + + loss = None + model_loss = getattr(self.model, "get_loss", None) + if callable(model_loss): + raw_loss = model_loss() + loss = raw_loss.item() if hasattr(raw_loss, "item") else raw_loss + + accuracy = validation_metrics.get("Validation/Accuracy") + train_accuracy = None + get_train_accuracy = getattr(self.model, "get_latest_train_accuracy", None) + if callable(get_train_accuracy): + train_accuracy = get_train_accuracy() + + return loss, accuracy, train_accuracy + + except Exception as e: + logging_training.error(f"Error in _train_sync with Differential Privacy: {e}") + tb = traceback.format_exc() + logging_training.error(f"Traceback: {tb}") + raise + + def _get_training_device(self): + # Resolve the effective device for any manual DP path that needs it. + if ( + self.config.participant["device_args"]["accelerator"] == "gpu" + and torch.cuda.is_available() + and self.config.participant["device_args"]["gpu_id"] + ): + return torch.device(f"cuda:{self.config.participant['device_args']['gpu_id'][0]}") + + return torch.device("cpu") + + def _log_manual_metrics(self, phase, metrics): + # Log manually computed metrics using the same naming scheme as Lightning. + output = metrics.compute() + output = { + f"{phase}/{key.replace('Multiclass', '').split('/')[-1]}": value.detach() + for key, value in output.items() + } + + if phase == "Validation": + self.model._latest_validation_metrics = { + key: float(value.detach().cpu().item()) + for key, value in output.items() + } + + self._logger.log_data(output, step=self.model.global_number[phase]) + + def _fit_with_dp(self): + # Bridge Nebula's Lightning trainer with Opacus' private optimizer/dataloader. + state = SimpleDPState() + + if hasattr(self.model, "clear_optimizer_override"): + # Start from a clean optimizer so a previous round cannot leak into this fit. + self.model.clear_optimizer_override() + + try: + self.model.train() + self.datamodule.setup("fit") + train_dataloader = self.datamodule.train_dataloader() + val_dataloader = self.datamodule.val_dataloader() + + optimizer = self.model.configure_optimizers() + state.extras["dataloader"] = train_dataloader + + # Opacus wraps the model, optimizer and dataloader, and updates the accountant. + self._dp_plugin.on_train_start(self.model, optimizer, state) + + private_optimizer = state.extras["optimizer"] + private_dataloader = state.extras["dataloader"] + + if not hasattr(self.model, "set_optimizer_override"): + raise ValueError("DP training requires the model to support optimizer overrides.") + + # Opacus keeps the grad-sample hooks on self.model, while Lightning gets + # the original LightningModule and a DPOptimizer through configure_optimizers. + self.model.dp_enabled = True + self.model.set_optimizer_override(private_optimizer) + # Lightning still drives the training loop; the injected optimizer/dataloader + # make the loop perform DP-SGD instead of standard SGD. + self._trainer.fit( + self.model, + train_dataloaders=private_dataloader, + val_dataloaders=val_dataloader, + ) + + self.model.train() + + finally: + # Always restore the model/trainer state, even if Lightning raises. + self.model.dp_enabled = False + if hasattr(self.model, "clear_optimizer_override"): + self.model.clear_optimizer_override() + self._dp_plugin.on_train_end(state) + self.datamodule.teardown("fit") + + dp_epsilon = state.extras.get("dp_epsilon") + + if dp_epsilon is not None: + # Store the accumulated privacy budget for logging and trustworthiness reports. + dp_delta = state.extras["dp_delta"] + + self.dp_epsilon = float(dp_epsilon) + self.dp_delta = float(dp_delta) + + self.model.dp_epsilon = self.dp_epsilon + self.model.dp_delta = self.dp_delta + + if self._logger is not None: + self._logger.log_data( + { + "DP/Epsilon": dp_epsilon, + "DP/Delta": dp_delta, + } + ) + + logging_training.info( + f"DP privacy budget | epsilon={dp_epsilon:.4f} | delta={dp_delta}" + ) + + def get_privacy_metrics(self): + # Trustworthiness consumes these values at experiment finish. + return { + "dp_enabled": True, + "dp_epsilon": self.dp_epsilon, + "dp_delta": self.dp_delta, + } diff --git a/nebula/core/utils/locker.py b/nebula/core/utils/locker.py index 160897bdc..2bd69d8f4 100755 --- a/nebula/core/utils/locker.py +++ b/nebula/core/utils/locker.py @@ -92,6 +92,7 @@ async def locked_async(self): result = self._lock.locked() if self._verbose: logging.debug(f"🔐 Async lock [{self._name}] is locked? {result}") + return result async def __aenter__(self): logging.debug(f"🔒 Acquiring async lock [{self._name}] using [async with] statement") diff --git a/nebula/frontend/config/participant.json.example b/nebula/frontend/config/participant.json.example index ca1d1cfd6..9d9552fa3 100755 --- a/nebula/frontend/config/participant.json.example +++ b/nebula/frontend/config/participant.json.example @@ -84,14 +84,45 @@ }, "training_args": { "trainer": "lightning", - "epochs": 3 + "epochs": 3, + "dp": { + "enabled": false, + "noise_multiplier": 1.0, + "max_grad_norm": 1.0, + "target_delta": 1e-5, + "accountant": "prv", + "secure_mode": false, + "poisson_sampling": true, + "clipping": "flat" + } }, "aggregator_args": { "algorithm": "FedAvg", - "aggregation_timeout": 60, + "aggregation_timeout": 240, "aggregation_push": "slow" }, "defense_args": { + "feature_squeezing": { + "enabled": false, + "bit_depth": 4, + "apply_to_train": true, + "apply_to_test": true, + "apply_to_local_test": true + }, + "adversarial_training": { + "enabled": false, + "domain": "image", + "attack": "fgsm", + "epsilon": 0.03, + "steps": 1, + "mode": "mixed", + "apply_probability": 0.3, + "candidate_selection": "none", + "target_loss_increase": null, + "max_loss_increase": null, + "target_margin": 0, + "max_margin": 0.5 + }, "reputation": { "enabled": false, "metrics": {}, @@ -148,6 +179,7 @@ }, "misc_args": { "grace_time_connection": 10, - "grace_time_start_federation": 10 + "grace_time_start_federation": 10, + "leadership_ack_timeout": 20 } } diff --git a/nebula/frontend/static/css/deployment.css b/nebula/frontend/static/css/deployment.css index 03fa0ab30..2fecffe9a 100644 --- a/nebula/frontend/static/css/deployment.css +++ b/nebula/frontend/static/css/deployment.css @@ -234,4 +234,4 @@ button[title]:hover::after { #predefined-topology-nodes:disabled{ background:#e9ecef; cursor:not-allowed; -} \ No newline at end of file +} diff --git a/nebula/frontend/static/js/deployment/adversarial-training.js b/nebula/frontend/static/js/deployment/adversarial-training.js new file mode 100644 index 000000000..be1f3aca5 --- /dev/null +++ b/nebula/frontend/static/js/deployment/adversarial-training.js @@ -0,0 +1,364 @@ +// Adversarial Training Module +const AdversarialTrainingManager = (function() { + const DEFAULT_ADVERSARIAL_TRAINING_CONFIG = { + enabled: false, + domain: "image", + attack: "fgsm", + epsilon: 0.03, + alpha: null, + steps: 1, + mode: "mixed", + apply_probability: 0.3, + log_adversarial_metrics: true, + candidate_selection: "none", + target_loss_increase: null, + max_loss_increase: null, + target_margin: 0, + max_margin: 0.5 + }; + + const IMAGE_DATASETS = new Set(["MNIST", "FashionMNIST", "EMNIST", "CIFAR10", "CIFAR100"]); + const TABULAR_ADVERSARIAL_DATASETS = new Set(["AdultCensus", "BreastCancer", "Covtype", "KDDCUP99"]); + const IMAGE_ATTACK_OPTIONS = [ + {value: "fgsm", label: "FGSM"}, + {value: "pgd", label: "PGD"} + ]; + const TABULAR_ATTACK_OPTIONS = [ + {value: "constrained_pgd", label: "Constrained PGD"} + ]; + + function initializeAdversarialTraining() { + setupAdversarialTrainingSwitch(); + setupAttackSelector(); + setupCandidateSelectionSelector(); + setupDatasetAwareness(); + setAdversarialTrainingConfig(DEFAULT_ADVERSARIAL_TRAINING_CONFIG); + } + + function setupAdversarialTrainingSwitch() { + const adversarialTrainingSwitch = document.getElementById("adversarialTrainingSwitch"); + if (!adversarialTrainingSwitch) return; + + adversarialTrainingSwitch.addEventListener("change", function() { + if (this.checked && window.DpManager) { + window.DpManager.setDpConfig({enabled: false}); + } + toggleAdversarialTrainingSettings(this.checked); + }); + } + + function setupAttackSelector() { + const attackSelect = document.getElementById("adversarialTrainingAttack"); + if (!attackSelect) return; + + attackSelect.addEventListener("change", function() { + toggleAttackSettings(this.value); + }); + } + + function setupCandidateSelectionSelector() { + const candidateSelectionSelect = document.getElementById("adversarialTrainingCandidateSelection"); + if (!candidateSelectionSelect) return; + + candidateSelectionSelect.addEventListener("change", function() { + toggleCandidateSelectionSettings(this.value); + }); + } + + function setupDatasetAwareness() { + const datasetSelect = document.getElementById("datasetSelect"); + if (!datasetSelect) return; + + datasetSelect.addEventListener("change", updateDatasetAvailability); + updateDatasetAvailability(); + } + + function toggleAdversarialTrainingSettings(enabled) { + const settings = document.getElementById("adversarial-training-settings"); + if (!settings) return; + + settings.style.display = enabled ? "block" : "none"; + toggleAttackSettings(document.getElementById("adversarialTrainingAttack")?.value || "fgsm"); + } + + function toggleAttackSettings(attack) { + const pgdSettings = document.getElementById("adversarial-training-pgd-settings"); + const stepsTitle = document.getElementById("adversarialTrainingStepsTitle"); + const candidateSelectionSettings = document.getElementById("adversarial-training-candidate-selection-settings"); + const lossWindowSettings = document.getElementById("adversarial-training-loss-window-settings"); + const marginWindowSettings = document.getElementById("adversarial-training-margin-window-settings"); + const domain = document.getElementById("adversarialTrainingDomain")?.value || DEFAULT_ADVERSARIAL_TRAINING_CONFIG.domain; + if (!pgdSettings) return; + + pgdSettings.style.display = ["pgd", "constrained_pgd"].includes(attack) ? "block" : "none"; + if (candidateSelectionSettings) { + candidateSelectionSettings.style.display = domain === "tabular" ? "block" : "none"; + } + if (stepsTitle) { + stepsTitle.textContent = domain === "tabular" ? "Constrained PGD steps" : "PGD steps"; + } + if (domain !== "tabular") { + if (lossWindowSettings) lossWindowSettings.style.display = "none"; + if (marginWindowSettings) marginWindowSettings.style.display = "none"; + return; + } + toggleCandidateSelectionSettings( + document.getElementById("adversarialTrainingCandidateSelection")?.value + || DEFAULT_ADVERSARIAL_TRAINING_CONFIG.candidate_selection + ); + } + + function toggleCandidateSelectionSettings(candidateSelection) { + const lossWindowSettings = document.getElementById("adversarial-training-loss-window-settings"); + const marginWindowSettings = document.getElementById("adversarial-training-margin-window-settings"); + if (lossWindowSettings) { + lossWindowSettings.style.display = candidateSelection === "loss_window" ? "block" : "none"; + } + if (marginWindowSettings) { + marginWindowSettings.style.display = candidateSelection === "margin_window" ? "block" : "none"; + } + } + + function updateDatasetAvailability() { + const dataset = document.getElementById("datasetSelect")?.value; + const domain = getDatasetDomain(dataset); + const adversarialTrainingSwitch = document.getElementById("adversarialTrainingSwitch"); + const datasetNote = document.getElementById("adversarial-training-dataset-note"); + const domainInput = document.getElementById("adversarialTrainingDomain"); + const settings = document.getElementById("adversarial-training-settings"); + + if (datasetNote) { + datasetNote.style.display = domain === "unsupported" ? "block" : "none"; + datasetNote.textContent = "Adversarial Training for tabular datasets currently supports AdultCensus, BreastCancer, Covtype, and KDDCUP99 with constrained PGD."; + } + if (domainInput) { + domainInput.value = domain === "unsupported" ? "tabular" : domain; + } + + if (!adversarialTrainingSwitch) return; + adversarialTrainingSwitch.disabled = domain === "unsupported"; + if (domain === "unsupported") { + adversarialTrainingSwitch.checked = false; + if (settings) { + settings.style.display = "none"; + } + return; + } + + adversarialTrainingSwitch.disabled = false; + refreshAttackOptions(domain); + toggleAdversarialTrainingSettings(adversarialTrainingSwitch.checked); + } + + function getDatasetDomain(dataset) { + if (IMAGE_DATASETS.has(dataset)) { + return "image"; + } + if (TABULAR_ADVERSARIAL_DATASETS.has(dataset)) { + return "tabular"; + } + return "unsupported"; + } + + function refreshAttackOptions(domain, preferredAttack = null) { + const attackSelect = document.getElementById("adversarialTrainingAttack"); + if (!attackSelect) return; + + // Tabular datasets intentionally expose only constrained PGD; image datasets expose FGSM/PGD. + const options = domain === "tabular" ? TABULAR_ATTACK_OPTIONS : IMAGE_ATTACK_OPTIONS; + const currentAttack = preferredAttack || attackSelect.value; + attackSelect.innerHTML = ""; + options.forEach(({value, label}) => { + const option = document.createElement("option"); + option.value = value; + option.textContent = label; + attackSelect.appendChild(option); + }); + + const validAttack = options.some(option => option.value === currentAttack) + ? currentAttack + : options[0].value; + attackSelect.value = validAttack; + attackSelect.disabled = domain === "tabular"; + toggleAttackSettings(validAttack); + } + + function numberValue(id, fallback) { + const value = parseFloat(document.getElementById(id)?.value); + return Number.isFinite(value) ? value : fallback; + } + + function integerValue(id, fallback) { + const value = parseInt(document.getElementById(id)?.value, 10); + return Number.isFinite(value) ? value : fallback; + } + + function optionalNumberValue(id, fallback) { + const rawValue = document.getElementById(id)?.value; + if (rawValue === undefined || rawValue === null || rawValue === "") { + return fallback; + } + const value = parseFloat(rawValue); + return Number.isFinite(value) ? value : fallback; + } + + function getAdversarialTrainingConfig() { + const domain = document.getElementById("adversarialTrainingDomain")?.value || DEFAULT_ADVERSARIAL_TRAINING_CONFIG.domain; + const attack = domain === "tabular" + ? "constrained_pgd" + : (document.getElementById("adversarialTrainingAttack")?.value || DEFAULT_ADVERSARIAL_TRAINING_CONFIG.attack); + const config = { + enabled: Boolean(document.getElementById("adversarialTrainingSwitch")?.checked), + domain, + attack, + epsilon: numberValue("adversarialTrainingEpsilon", DEFAULT_ADVERSARIAL_TRAINING_CONFIG.epsilon), + alpha: optionalNumberValue("adversarialTrainingAlpha", DEFAULT_ADVERSARIAL_TRAINING_CONFIG.alpha), + steps: integerValue("adversarialTrainingSteps", DEFAULT_ADVERSARIAL_TRAINING_CONFIG.steps), + mode: document.getElementById("adversarialTrainingMode")?.value || DEFAULT_ADVERSARIAL_TRAINING_CONFIG.mode, + apply_probability: numberValue("adversarialTrainingApplyProbability", DEFAULT_ADVERSARIAL_TRAINING_CONFIG.apply_probability), + candidate_selection: document.getElementById("adversarialTrainingCandidateSelection")?.value + || DEFAULT_ADVERSARIAL_TRAINING_CONFIG.candidate_selection, + target_loss_increase: optionalNumberValue( + "adversarialTrainingTargetLossIncrease", + DEFAULT_ADVERSARIAL_TRAINING_CONFIG.target_loss_increase + ), + max_loss_increase: optionalNumberValue( + "adversarialTrainingMaxLossIncrease", + DEFAULT_ADVERSARIAL_TRAINING_CONFIG.max_loss_increase + ), + target_margin: optionalNumberValue( + "adversarialTrainingTargetMargin", + DEFAULT_ADVERSARIAL_TRAINING_CONFIG.target_margin + ), + max_margin: optionalNumberValue( + "adversarialTrainingMaxMargin", + DEFAULT_ADVERSARIAL_TRAINING_CONFIG.max_margin + ), + log_adversarial_metrics: true + }; + + if (config.alpha === null || config.attack !== "pgd") { + delete config.alpha; + } + if (config.domain !== "tabular") { + delete config.candidate_selection; + } + if (config.candidate_selection !== "loss_window" || config.target_loss_increase === null) { + delete config.target_loss_increase; + } + if (config.candidate_selection !== "loss_window" || config.max_loss_increase === null) { + delete config.max_loss_increase; + } + if (config.candidate_selection !== "margin_window" || config.target_margin === null) { + delete config.target_margin; + } + if (config.candidate_selection !== "margin_window" || config.max_margin === null) { + delete config.max_margin; + } + return config; + } + + function setAdversarialTrainingConfig(config = DEFAULT_ADVERSARIAL_TRAINING_CONFIG) { + const adversarialTrainingConfig = { + ...DEFAULT_ADVERSARIAL_TRAINING_CONFIG, + ...(config || {}) + }; + + const adversarialTrainingSwitch = document.getElementById("adversarialTrainingSwitch"); + if (!adversarialTrainingSwitch) return; + + adversarialTrainingSwitch.checked = Boolean(adversarialTrainingConfig.enabled); + setValue("adversarialTrainingEpsilon", adversarialTrainingConfig.epsilon); + setValue("adversarialTrainingAlpha", adversarialTrainingConfig.alpha ?? ""); + setValue("adversarialTrainingSteps", adversarialTrainingConfig.steps); + setValue( + "adversarialTrainingMode", + ["mixed", "adversarial"].includes(adversarialTrainingConfig.mode) + ? adversarialTrainingConfig.mode + : DEFAULT_ADVERSARIAL_TRAINING_CONFIG.mode + ); + setValue("adversarialTrainingApplyProbability", adversarialTrainingConfig.apply_probability); + setValue( + "adversarialTrainingCandidateSelection", + ["none", "loss_window", "margin_window"].includes(adversarialTrainingConfig.candidate_selection) + ? adversarialTrainingConfig.candidate_selection + : DEFAULT_ADVERSARIAL_TRAINING_CONFIG.candidate_selection + ); + setValue("adversarialTrainingTargetLossIncrease", adversarialTrainingConfig.target_loss_increase ?? ""); + setValue("adversarialTrainingMaxLossIncrease", adversarialTrainingConfig.max_loss_increase ?? ""); + setValue("adversarialTrainingTargetMargin", adversarialTrainingConfig.target_margin ?? 0); + setValue("adversarialTrainingMaxMargin", adversarialTrainingConfig.max_margin ?? 0.5); + + updateDatasetAvailability(); + const domain = document.getElementById("adversarialTrainingDomain")?.value || adversarialTrainingConfig.domain; + refreshAttackOptions(domain, adversarialTrainingConfig.attack); + toggleAdversarialTrainingSettings(adversarialTrainingSwitch.checked); + } + + function setValue(id, value) { + const element = document.getElementById(id); + if (element) { + element.value = value; + } + } + + function resetAdversarialTrainingConfig() { + setAdversarialTrainingConfig(DEFAULT_ADVERSARIAL_TRAINING_CONFIG); + } + + function validateConfig() { + const config = getAdversarialTrainingConfig(); + if (!config.enabled) { + return null; + } + if (config.epsilon < 0) { + return "[Adversarial Training] Epsilon must be greater than or equal to 0."; + } + if (["pgd", "constrained_pgd"].includes(config.attack) && config.steps < 1) { + return "[Adversarial Training] Search steps must be at least 1."; + } + if (!["mixed", "adversarial"].includes(config.mode)) { + return "[Adversarial Training] Training mode must be Clean + adversarial or Adversarial only."; + } + if (config.apply_probability < 0 || config.apply_probability > 1) { + return "[Adversarial Training] Apply probability must be between 0 and 1."; + } + if ( + config.candidate_selection !== undefined + && !["none", "loss_window", "margin_window"].includes(config.candidate_selection) + ) { + return "[Adversarial Training] Candidate selection must be None, Loss window, or Margin window."; + } + if (config.target_loss_increase !== undefined && config.target_loss_increase < 0) { + return "[Adversarial Training] Target loss increase must be greater than or equal to 0."; + } + if (config.max_loss_increase !== undefined && config.max_loss_increase < 0) { + return "[Adversarial Training] Max loss increase must be greater than or equal to 0."; + } + if ( + config.target_loss_increase !== undefined + && config.max_loss_increase !== undefined + && config.target_loss_increase > config.max_loss_increase + ) { + return "[Adversarial Training] Target loss increase must be smaller than or equal to max loss increase."; + } + if ( + config.target_margin !== undefined + && config.max_margin !== undefined + && config.target_margin > config.max_margin + ) { + return "[Adversarial Training] Target margin must be smaller than or equal to max margin."; + } + return null; + } + + return { + initializeAdversarialTraining, + getAdversarialTrainingConfig, + setAdversarialTrainingConfig, + resetAdversarialTrainingConfig, + validateConfig + }; +})(); + +export default AdversarialTrainingManager; diff --git a/nebula/frontend/static/js/deployment/attack.js b/nebula/frontend/static/js/deployment/attack.js index ea04e1309..77de2f957 100644 --- a/nebula/frontend/static/js/deployment/attack.js +++ b/nebula/frontend/static/js/deployment/attack.js @@ -1,5 +1,6 @@ // Attack Configuration Module const AttackManager = (function() { + const IMAGE_DATASETS = new Set(["MNIST", "FashionMNIST", "EMNIST", "CIFAR10", "CIFAR100"]); const ATTACK_TYPES = { NO_ATTACK: 'No Attack', LABEL_FLIPPING: 'Label Flipping', @@ -86,13 +87,19 @@ const AttackManager = (function() { updateAttackUI(this.value); }); + const datasetSelect = document.getElementById("datasetSelect"); + if (datasetSelect) { + datasetSelect.addEventListener("change", updateDatasetAvailability); + updateDatasetAvailability(); + } + document.getElementById("targeted").addEventListener("change", function() { const attackType = document.getElementById("poisoning-attack-select").value; const elements = { targetLabel: {title: document.getElementById("target_label-title"), container: document.getElementById("target_label-container")}, targetChangedLabel: {title: document.getElementById("target_changed_label-title"), container: document.getElementById("target_changed_label-container")} }; - + if (this.checked && attackType === ATTACK_TYPES.LABEL_FLIPPING) { showElements(elements, ['targetLabel', 'targetChangedLabel']); } else if (this.checked && attackType === ATTACK_TYPES.SAMPLE_POISONING) { @@ -116,9 +123,47 @@ const AttackManager = (function() { }); } + function updateDatasetAvailability() { + const dataset = document.getElementById("datasetSelect")?.value; + const enabledForDataset = IMAGE_DATASETS.has(dataset); + const attackSelect = document.getElementById("poisoning-attack-select"); + const samplePoisoningOption = Array.from(attackSelect?.options || []) + .find(option => option.value === ATTACK_TYPES.SAMPLE_POISONING || option.textContent === ATTACK_TYPES.SAMPLE_POISONING); + const datasetNote = document.getElementById("sample-poisoning-dataset-note"); + + if (samplePoisoningOption) { + samplePoisoningOption.disabled = !enabledForDataset; + samplePoisoningOption.title = enabledForDataset ? "" : "Sample Poisoning is currently available only for image datasets."; + } + + if (datasetNote) { + datasetNote.style.display = enabledForDataset ? "none" : "block"; + } + + if (attackSelect?.value === ATTACK_TYPES.SAMPLE_POISONING && !enabledForDataset) { + attackSelect.value = ATTACK_TYPES.NO_ATTACK; + updateAttackUI(ATTACK_TYPES.NO_ATTACK); + } + } + + function validateConfig() { + const dataset = document.getElementById("datasetSelect")?.value; + const attackType = document.getElementById("poisoning-attack-select")?.value; + + if (attackType === ATTACK_TYPES.SAMPLE_POISONING && !IMAGE_DATASETS.has(dataset)) { + return "Sample Poisoning is currently available only for image datasets."; + } + + return null; + } + function getAttackConfig() { const attackType = document.getElementById("poisoning-attack-select").value; - + const validationMessage = validateConfig(); + if (validationMessage) { + throw new Error(validationMessage); + } + // Validate numeric inputs function validateNumericInput(id, min = 0, max = 100) { const value = parseFloat(document.getElementById(id).value); @@ -185,10 +230,14 @@ const AttackManager = (function() { function setAttackConfig(config) { if (!config) return; + const attackType = Array.isArray(config.attacks) + ? config.attacks[0] + : (config.type || config.attacks || ATTACK_TYPES.NO_ATTACK); // Set attack type and update UI - document.getElementById("poisoning-attack-select").value = config.type; - updateAttackUI(config.type); + document.getElementById("poisoning-attack-select").value = attackType; + updateAttackUI(attackType); + updateDatasetAvailability(); // Set common fields document.getElementById("poisoned-node-percent").value = config.poisoned_node_percent || 0; @@ -197,7 +246,7 @@ const AttackManager = (function() { document.getElementById("attack-interval").value = config.attack_interval || 1; // Set attack-specific fields - switch(config.type) { + switch(attackType) { case ATTACK_TYPES.LABEL_FLIPPING: document.getElementById("poisoned-sample-percent").value = config.poisoned_sample_percent || 0; document.getElementById("targeted").checked = config.targeted || false; @@ -243,12 +292,15 @@ const AttackManager = (function() { function resetAttackConfig() { document.getElementById("poisoning-attack-select").value = ATTACK_TYPES.NO_ATTACK; updateAttackUI(ATTACK_TYPES.NO_ATTACK); + updateDatasetAvailability(); } return { ATTACK_TYPES, initializeEventListeners, updateAttackUI, + updateDatasetAvailability, + validateConfig, getAttackConfig, setAttackConfig, resetAttackConfig diff --git a/nebula/frontend/static/js/deployment/dp.js b/nebula/frontend/static/js/deployment/dp.js new file mode 100644 index 000000000..9087152ad --- /dev/null +++ b/nebula/frontend/static/js/deployment/dp.js @@ -0,0 +1,98 @@ +// Differential Privacy Module +const DpManager = (function() { + const DEFAULT_DP_CONFIG = { + enabled: false, + noise_multiplier: 1.0, + max_grad_norm: 1.0 + }; + + function initializeDifferentialPrivacy() { + setupDpSwitch(); + setDpConfig(DEFAULT_DP_CONFIG); + } + + function setupDpSwitch() { + const dpSwitch = document.getElementById("dpSwitch"); + if (!dpSwitch) return; + + dpSwitch.addEventListener("change", function() { + if (this.checked) { + disableAdversarialTraining(); + } + toggleDpSettings(this.checked); + }); + } + + function disableAdversarialTraining() { + if (window.AdversarialTrainingManager) { + window.AdversarialTrainingManager.setAdversarialTrainingConfig({enabled: false}); + } + + const adversarialTrainingSwitch = document.getElementById("adversarialTrainingSwitch"); + const adversarialTrainingSettings = document.getElementById("adversarial-training-settings"); + if (adversarialTrainingSwitch) { + adversarialTrainingSwitch.checked = false; + } + if (adversarialTrainingSettings) { + adversarialTrainingSettings.style.display = "none"; + } + } + + function toggleDpSettings(enabled) { + const dpSettings = document.getElementById("dp-settings"); + if (!dpSettings) return; + + dpSettings.style.display = enabled ? "block" : "none"; + } + + function getDpConfig() { + const noiseMultiplierInput = document.getElementById("dpNoiseMultiplier"); + const noiseMultiplier = parseFloat(noiseMultiplierInput?.value); + const maxGradNormInput = document.getElementById("dpMaxGradNorm"); + const maxGradNorm = parseFloat(maxGradNormInput?.value); + + return { + enabled: Boolean(document.getElementById("dpSwitch")?.checked), + noise_multiplier: Number.isFinite(noiseMultiplier) + ? noiseMultiplier + : DEFAULT_DP_CONFIG.noise_multiplier, + max_grad_norm: Number.isFinite(maxGradNorm) + ? maxGradNorm + : DEFAULT_DP_CONFIG.max_grad_norm + }; + } + + function setDpConfig(config = DEFAULT_DP_CONFIG) { + const dpConfig = { + ...DEFAULT_DP_CONFIG, + ...(config || {}) + }; + + const dpSwitch = document.getElementById("dpSwitch"); + if (!dpSwitch) return; + + dpSwitch.checked = Boolean(dpConfig.enabled); + const noiseMultiplierInput = document.getElementById("dpNoiseMultiplier"); + if (noiseMultiplierInput) { + noiseMultiplierInput.value = dpConfig.noise_multiplier; + } + const maxGradNormInput = document.getElementById("dpMaxGradNorm"); + if (maxGradNormInput) { + maxGradNormInput.value = dpConfig.max_grad_norm; + } + toggleDpSettings(dpSwitch.checked); + } + + function resetDpConfig() { + setDpConfig(DEFAULT_DP_CONFIG); + } + + return { + initializeDifferentialPrivacy, + getDpConfig, + setDpConfig, + resetDpConfig + }; +})(); + +export default DpManager; diff --git a/nebula/frontend/static/js/deployment/feature-squeezing.js b/nebula/frontend/static/js/deployment/feature-squeezing.js new file mode 100644 index 000000000..cb205371d --- /dev/null +++ b/nebula/frontend/static/js/deployment/feature-squeezing.js @@ -0,0 +1,107 @@ +// Feature Squeezing Module +const FeatureSqueezingManager = (function() { + const DEFAULT_FEATURE_SQUEEZING_CONFIG = { + enabled: false, + bit_depth: 4 + }; + const ALLOWED_BIT_DEPTHS = [1, 2, 4, 8, 16, 32, 64]; + const IMAGE_DATASETS = new Set(["MNIST", "FashionMNIST", "EMNIST", "CIFAR10", "CIFAR100"]); + + function initializeFeatureSqueezing() { + setupFeatureSqueezingSwitch(); + setupDatasetAwareness(); + setFeatureSqueezingConfig(DEFAULT_FEATURE_SQUEEZING_CONFIG); + } + + function setupFeatureSqueezingSwitch() { + const featureSqueezingSwitch = document.getElementById("featureSqueezingSwitch"); + if (!featureSqueezingSwitch) return; + + featureSqueezingSwitch.addEventListener("change", function() { + toggleFeatureSqueezingSettings(this.checked); + }); + } + + function setupDatasetAwareness() { + const datasetSelect = document.getElementById("datasetSelect"); + if (!datasetSelect) return; + + datasetSelect.addEventListener("change", updateDatasetAvailability); + updateDatasetAvailability(); + } + + function toggleFeatureSqueezingSettings(enabled) { + const featureSqueezingSettings = document.getElementById("feature-squeezing-settings"); + if (!featureSqueezingSettings) return; + + featureSqueezingSettings.style.display = enabled ? "block" : "none"; + } + + function updateDatasetAvailability() { + const dataset = document.getElementById("datasetSelect")?.value; + const enabledForDataset = IMAGE_DATASETS.has(dataset); + const featureSqueezingSwitch = document.getElementById("featureSqueezingSwitch"); + const datasetNote = document.getElementById("feature-squeezing-dataset-note"); + + if (datasetNote) { + datasetNote.style.display = enabledForDataset ? "none" : "block"; + } + + if (!featureSqueezingSwitch) return; + featureSqueezingSwitch.disabled = !enabledForDataset; + if (!enabledForDataset) { + featureSqueezingSwitch.checked = false; + toggleFeatureSqueezingSettings(false); + } + } + + function getFeatureSqueezingConfig() { + const nInput = document.getElementById("featureSqueezingN"); + const bitDepth = parseInt(nInput?.value, 10); + + return { + enabled: Boolean(document.getElementById("featureSqueezingSwitch")?.checked), + bit_depth: normalizeBitDepth(bitDepth) + }; + } + + function setFeatureSqueezingConfig(config = DEFAULT_FEATURE_SQUEEZING_CONFIG) { + const featureSqueezingConfig = { + ...DEFAULT_FEATURE_SQUEEZING_CONFIG, + ...(config || {}) + }; + const bitDepth = featureSqueezingConfig.bit_depth ?? featureSqueezingConfig.n; + + const featureSqueezingSwitch = document.getElementById("featureSqueezingSwitch"); + if (!featureSqueezingSwitch) return; + + featureSqueezingSwitch.checked = Boolean(featureSqueezingConfig.enabled); + const nInput = document.getElementById("featureSqueezingN"); + if (nInput) { + nInput.value = normalizeBitDepth(bitDepth); + } + toggleFeatureSqueezingSettings(featureSqueezingSwitch.checked); + updateDatasetAvailability(); + } + + function normalizeBitDepth(value) { + const bitDepth = parseInt(value, 10); + if (ALLOWED_BIT_DEPTHS.includes(bitDepth)) { + return bitDepth; + } + return DEFAULT_FEATURE_SQUEEZING_CONFIG.bit_depth; + } + + function resetFeatureSqueezingConfig() { + setFeatureSqueezingConfig(DEFAULT_FEATURE_SQUEEZING_CONFIG); + } + + return { + initializeFeatureSqueezing, + getFeatureSqueezingConfig, + setFeatureSqueezingConfig, + resetFeatureSqueezingConfig + }; +})(); + +export default FeatureSqueezingManager; diff --git a/nebula/frontend/static/js/deployment/help-content.js b/nebula/frontend/static/js/deployment/help-content.js index 673cae881..6e9b1f1ff 100644 --- a/nebula/frontend/static/js/deployment/help-content.js +++ b/nebula/frontend/static/js/deployment/help-content.js @@ -61,6 +61,10 @@ const HelpContent = (function() {
  • MNIST: The MNIST dataset
  • FashionMNIST: The FashionMNIST dataset
  • CIFAR10: The CIFAR10 dataset
  • +
  • Covtype: The Covtype dataset
  • +
  • KDDCUP99: The KDDCUP99 dataset
  • +
  • AdultCensus: The AdultCensus dataset
  • +
  • BreastCancer: The BreastCancer dataset
  • `; diff --git a/nebula/frontend/static/js/deployment/main.js b/nebula/frontend/static/js/deployment/main.js index 3ec18a8ba..de5b3d0e5 100644 --- a/nebula/frontend/static/js/deployment/main.js +++ b/nebula/frontend/static/js/deployment/main.js @@ -8,6 +8,9 @@ import SaManager from './situational-awareness.js'; import GraphSettings from './graph-settings.js'; import Utils from './utils.js'; import TrustworthinessManager from './trustworthiness.js'; +import DpManager from './dp.js'; +import FeatureSqueezingManager from './feature-squeezing.js'; +import AdversarialTrainingManager from './adversarial-training.js'; const DeploymentManager = (function() { function initialize() { @@ -31,6 +34,9 @@ const DeploymentManager = (function() { ReputationManager.initializeReputationSystem(); SaManager.initializeSa(); TrustworthinessManager.initializeTrustworthinessSystem(); + DpManager.initializeDifferentialPrivacy(); + FeatureSqueezingManager.initializeFeatureSqueezing(); + AdversarialTrainingManager.initializeAdversarialTraining(); GraphSettings.initializeDistanceControls(); // Make modules globally available @@ -41,6 +47,9 @@ const DeploymentManager = (function() { window.ReputationManager = ReputationManager; window.SaManager = SaManager; window.TrustworthinessManager = TrustworthinessManager; + window.DpManager = DpManager; + window.FeatureSqueezingManager = FeatureSqueezingManager; + window.AdversarialTrainingManager = AdversarialTrainingManager; window.GraphSettings = GraphSettings; window.DeploymentManager = DeploymentManager; window.Utils = Utils; @@ -111,9 +120,82 @@ const DeploymentManager = (function() { return false; } + const trustWeightsValidationMessage = validateTrustworthinessWeights(); + if (trustWeightsValidationMessage) { + Utils.showAlert('error', trustWeightsValidationMessage); + return false; + } + + const adversarialTrainingValidationMessage = validateAdversarialTraining(); + if (adversarialTrainingValidationMessage) { + Utils.showAlert('error', adversarialTrainingValidationMessage); + return false; + } + + const attackValidationMessage = validateAttack(); + if (attackValidationMessage) { + Utils.showAlert('error', attackValidationMessage); + return false; + } + return true; } + function validateAttack() { + const manager = window.AttackManager || AttackManager; + if (manager && typeof manager.validateConfig === "function") { + return manager.validateConfig(); + } + return null; + } + + function validateAdversarialTraining() { + const manager = window.AdversarialTrainingManager || AdversarialTrainingManager; + if (manager && typeof manager.validateConfig === "function") { + return manager.validateConfig(); + } + return null; + } + + function validateTrustworthinessWeights() { + const manager = window.TrustworthinessManager || TrustworthinessManager; + if (manager && typeof manager.validateWeights === "function") { + return manager.validateWeights(); + } + + if (!manager || typeof manager.getTrustworthinessConfig !== "function") { + return null; + } + + const config = manager.getTrustworthinessConfig(); + if (!config?.enabled) { + return null; + } + + const sumValues = (values) => values.reduce((sum, value) => sum + (parseFloat(value) || 0), 0); + const getWeightValidationMessage = (groupLabel, total) => { + if (total > 100) { + return `[Trustworthiness] ${groupLabel} weights exceed 100%. Please review the configuration.`; + } + + if (total < 100) { + return `[Trustworthiness] ${groupLabel} weights are below 100%. Please review the configuration.`; + } + + return null; + }; + + return ( + getWeightValidationMessage("Pillars", sumValues(Object.values(config.pillars || {}))) || + Object.entries(config.notions || {}).reduce((message, [groupName, weights]) => { + if (message) return message; + + const label = `${groupName.charAt(0).toUpperCase()}${groupName.slice(1)} notions`; + return getWeightValidationMessage(label, sumValues(weights || [])); + }, null) + ); + } + function setupDatasetListeners() { const datasetSelect = document.getElementById("datasetSelect"); if (datasetSelect) { @@ -210,7 +292,7 @@ const DeploymentManager = (function() { datasetSelect.innerHTML = ""; // Add dataset options - const datasets = ['MNIST', 'FashionMNIST', 'EMNIST', 'CIFAR10', 'CIFAR100']; + const datasets = ['MNIST', 'FashionMNIST', 'EMNIST', 'CIFAR10', 'CIFAR100', 'Covtype', 'KDDCUP99', 'AdultCensus', 'BreastCancer']; datasets.forEach(dataset => { const option = document.createElement("option"); option.value = dataset; @@ -251,6 +333,14 @@ const DeploymentManager = (function() { return ['CNN', 'ResNet9', 'fastermobilenet', 'simplemobilenet', 'CNNv2', 'CNNv3']; case 'cifar100': return ['CNN']; + case 'covtype': + return ['MLP']; + case 'kddcup99': + return ['MLP']; + case 'adultcensus': + return ['MLP']; + case 'breast_cancer': + return ['MLP']; default: return ['MLP', 'CNN']; } diff --git a/nebula/frontend/static/js/deployment/scenario.js b/nebula/frontend/static/js/deployment/scenario.js index feb5c978b..890804430 100644 --- a/nebula/frontend/static/js/deployment/scenario.js +++ b/nebula/frontend/static/js/deployment/scenario.js @@ -74,6 +74,9 @@ const ScenarioManager = (function () { logginglevel: document.getElementById("loggingLevel").value === "true", report_status_data_queue: document.getElementById("reportingSwitch").checked, epochs: parseInt(document.getElementById("epochs").value), + dp: window.DpManager.getDpConfig(), + feature_squeezing: window.FeatureSqueezingManager.getFeatureSqueezingConfig(), + adversarial_training: window.AdversarialTrainingManager.getAdversarialTrainingConfig(), attack_params: attackConfig, reputation: { enabled: window.ReputationManager.getReputationConfig().enabled || false, @@ -100,31 +103,121 @@ const ScenarioManager = (function () { sar_training: window.SaManager.getSaConfig().sar_training || false, sar_training_policy: window.SaManager.getSaConfig().sar_training_policy || "Broad-Propagation Strategy", random_topology_probability: document.getElementById("random-probability").value || 0.5, + // --- Trustworthiness (CFL/DFL) --- with_trustworthiness: document.getElementById("TrustworthinessSwitch").checked ? true : false, - robustness_pillar: document.getElementById("robustness-pillar").value, - resilience_to_attacks: document.getElementById("robustness-notion-1").value, - algorithm_robustness: document.getElementById("robustness-notion-2").value, - client_reliability: document.getElementById("robustness-notion-3").value, - privacy_pillar: document.getElementById("privacy-pillar").value, - technique: document.getElementById("privacy-notion-1").value, - uncertainty: document.getElementById("privacy-notion-2").value, - indistinguishability: document.getElementById("privacy-notion-3").value, - fairness_pillar: document.getElementById("fairness-pillar").value, - selection_fairness: document.getElementById("fairness-notion-1").value, - performance_fairness: document.getElementById("fairness-notion-2").value, - class_distribution: document.getElementById("fairness-notion-3").value, - explainability_pillar: document.getElementById("explainability-pillar").value, - interpretability: document.getElementById("explainability-notion-1").value, - post_hoc_methods: document.getElementById("explainability-notion-2").value, - accountability_pillar: document.getElementById("accountability-pillar").value, - factsheet_completeness: document.getElementById("accountability-notion-1").value, - architectural_soundness_pillar: document.getElementById("architectural-soundness-pillar").value, - client_management: document.getElementById("architectural-soundness-notion-1").value, - optimization: document.getElementById("architectural-soundness-notion-2").value, - sustainability_pillar: document.getElementById("sustainability-pillar").value, - energy_source: document.getElementById("sustainability-notion-1").value, - hardware_efficiency: document.getElementById("sustainability-notion-2").value, - federation_complexity: document.getElementById("sustainability-notion-3").value, + + ...(document.getElementById("TrustworthinessSwitch").checked + ? (() => { + const federationType = document.getElementById("federationArchitecture").value; + const useDFL = (federationType === "DFL" || federationType === "SDFL"); + + if (useDFL) { + return { + robustness_pillar: document.getElementById("dfl-robustness-pillar")?.value || "0", + resilience_to_attacks: document.getElementById("dfl-robustness-notion-1")?.value || "0", + algorithm_robustness: document.getElementById("dfl-robustness-notion-2")?.value || "0", + client_reliability: document.getElementById("dfl-robustness-notion-3")?.value || "0", + + privacy_pillar: document.getElementById("dfl-privacy-pillar")?.value || "0", + technique: document.getElementById("dfl-privacy-notion-1")?.value || "0", + uncertainty: document.getElementById("dfl-privacy-notion-2")?.value || "0", + indistinguishability: document.getElementById("dfl-privacy-notion-3")?.value || "0", + + fairness_pillar: document.getElementById("dfl-fairness-pillar")?.value || "0", + + selection_fairness: "0", + performance_fairness: "0", + class_distribution: document.getElementById("dfl-fairness-notion-3")?.value || "0", + outcome_fairness: document.getElementById("dfl-fairness-notion-4")?.value || "0", + + explainability_pillar: document.getElementById("dfl-explainability-pillar")?.value || "0", + interpretability: document.getElementById("dfl-explainability-notion-1")?.value || "0", + post_hoc_methods: document.getElementById("dfl-explainability-notion-2")?.value || "0", + + accountability_pillar: document.getElementById("dfl-accountability-pillar")?.value || "0", + factsheet_completeness: document.getElementById("dfl-accountability-notion-1")?.value || "0", + monitoring: document.getElementById("dfl-accountability-notion-2")?.value || "0", + + architectural_soundness_pillar: document.getElementById("dfl-architectural-soundness-pillar")?.value || "0", + client_management: document.getElementById("dfl-architectural-soundness-notion-1")?.value || "0", + optimization: document.getElementById("dfl-architectural-soundness-notion-2")?.value || "0", + federation_management: document.getElementById("dfl-architectural-soundness-notion-3")?.value || "0", + + sustainability_pillar: document.getElementById("dfl-sustainability-pillar")?.value || "0", + energy_source: document.getElementById("dfl-sustainability-notion-1")?.value || "0", + hardware_efficiency: "0", + federation_complexity: document.getElementById("dfl-sustainability-notion-3")?.value || "0", + }; + } + + // CFL + return { + robustness_pillar: document.getElementById("cfl-robustness-pillar")?.value || "0", + resilience_to_attacks: document.getElementById("cfl-robustness-notion-1")?.value || "0", + algorithm_robustness: document.getElementById("cfl-robustness-notion-2")?.value || "0", + client_reliability: document.getElementById("cfl-robustness-notion-3")?.value || "0", + + privacy_pillar: document.getElementById("cfl-privacy-pillar")?.value || "0", + technique: document.getElementById("cfl-privacy-notion-1")?.value || "0", + uncertainty: document.getElementById("cfl-privacy-notion-2")?.value || "0", + indistinguishability: document.getElementById("cfl-privacy-notion-3")?.value || "0", + + fairness_pillar: document.getElementById("cfl-fairness-pillar")?.value || "0", + selection_fairness: document.getElementById("cfl-fairness-notion-1")?.value || "0", + performance_fairness: document.getElementById("cfl-fairness-notion-2")?.value || "0", + class_distribution: document.getElementById("cfl-fairness-notion-3")?.value || "0", + outcome_fairness: document.getElementById("cfl-fairness-notion-4")?.value || "0", + + explainability_pillar: document.getElementById("cfl-explainability-pillar")?.value || "0", + interpretability: document.getElementById("cfl-explainability-notion-1")?.value || "0", + post_hoc_methods: document.getElementById("cfl-explainability-notion-2")?.value || "0", + + accountability_pillar: document.getElementById("cfl-accountability-pillar")?.value || "0", + factsheet_completeness: document.getElementById("cfl-accountability-notion-1")?.value || "0", + monitoring: document.getElementById("cfl-accountability-notion-2")?.value || "0", + + + architectural_soundness_pillar: document.getElementById("cfl-architectural-soundness-pillar")?.value || "0", + client_management: document.getElementById("cfl-architectural-soundness-notion-1")?.value || "0", + optimization: document.getElementById("cfl-architectural-soundness-notion-2")?.value || "0", + federation_management: document.getElementById("cfl-architectural-soundness-notion-3")?.value || "0", + + sustainability_pillar: document.getElementById("cfl-sustainability-pillar")?.value || "0", + energy_source: document.getElementById("cfl-sustainability-notion-1")?.value || "0", + hardware_efficiency: document.getElementById("cfl-sustainability-notion-2")?.value || "0", + federation_complexity: document.getElementById("cfl-sustainability-notion-3")?.value || "0", + }; + })() + : { + robustness_pillar: "0", + resilience_to_attacks: "0", + algorithm_robustness: "0", + client_reliability: "0", + privacy_pillar: "0", + technique: "0", + uncertainty: "0", + indistinguishability: "0", + fairness_pillar: "0", + selection_fairness: "0", + performance_fairness: "0", + class_distribution: "0", + outcome_fairness: "0", + explainability_pillar: "0", + interpretability: "0", + post_hoc_methods: "0", + accountability_pillar: "0", + factsheet_completeness: "0", + monitoring: "0", + architectural_soundness_pillar: "0", + client_management: "0", + optimization: "0", + federation_management: "0", + sustainability_pillar: "0", + energy_source: "0", + hardware_efficiency: "0", + federation_complexity: "0", + }), + // --- /Trustworthiness --- network_subnet: "172.20.0.0/16", network_gateway: "172.20.0.1", additional_participants: window.MobilityManager.getMobilityConfig().additionalParticipants || [], @@ -173,6 +266,15 @@ const ScenarioManager = (function () { document.getElementById("loggingLevel").value = scenario.logginglevel ? "true" : "false"; document.getElementById("reportingSwitch").checked = scenario.report_status_data_queue; document.getElementById("epochs").value = scenario.epochs; + if (window.DpManager) { + window.DpManager.setDpConfig(scenario.dp); + } + if (window.FeatureSqueezingManager) { + window.FeatureSqueezingManager.setFeatureSqueezingConfig(scenario.feature_squeezing); + } + if (window.AdversarialTrainingManager) { + window.AdversarialTrainingManager.setAdversarialTrainingConfig(scenario.adversarial_training); + } // Load module configurations if (scenario.attacks && scenario.attacks.length > 0) { @@ -346,6 +448,15 @@ const ScenarioManager = (function () { if (window.SaManager) { window.SaManager.resetSaConfig(); } + if (window.DpManager) { + window.DpManager.resetDpConfig(); + } + if (window.FeatureSqueezingManager) { + window.FeatureSqueezingManager.resetFeatureSqueezingConfig(); + } + if (window.AdversarialTrainingManager) { + window.AdversarialTrainingManager.resetAdversarialTrainingConfig(); + } // Trigger necessary events document.getElementById("federationArchitecture").dispatchEvent(new Event('change')); diff --git a/nebula/frontend/static/js/deployment/topology.js b/nebula/frontend/static/js/deployment/topology.js index 28d5ec0ad..5b6a99ce9 100644 --- a/nebula/frontend/static/js/deployment/topology.js +++ b/nebula/frontend/static/js/deployment/topology.js @@ -533,7 +533,7 @@ const TopologyManager = (function() { function updateIPsAndPorts() { const isPhysical = document.getElementById("physical-devices-radio").checked; - + /* ⬅︎ if physical deployment get default IPs */ if (isPhysical) { gData.nodes.forEach((node, idx) => { @@ -541,11 +541,11 @@ const TopologyManager = (function() { }); return; } - + /* Docker or Process → generate sintetic IPs */ const isProcess = document.getElementById("process-radio").checked; const baseIP = "192.168.50"; - + gData.nodes.forEach((node, idx) => { node.ip = isProcess ? "127.0.0.1" : `${baseIP}.${idx + 2}`; node.port = (45001 + idx).toString(); @@ -572,23 +572,23 @@ const TopologyManager = (function() { function setPhysicalIPs(ipList = []) { if (!ipList.length) return; - + /* 1. Update input for the user */ const nodesInput = document.getElementById('predefined-topology-nodes'); if (nodesInput) { nodesInput.value = ipList.length; - nodesInput.disabled = true; - nodesInput.classList.add('disabled'); + nodesInput.disabled = true; + nodesInput.classList.add('disabled'); } - + /* 2. Regenerate topology */ generatePredefinedTopology(); // ← create Nodes and Links - + /* 3. Assign IPs */ gData.nodes.forEach((n, idx) => { n.ip = ipList[idx] || n.ip; // if more nodes than IPs }); - + updateGraph(); // redraw } @@ -660,7 +660,7 @@ const TopologyManager = (function() { generatePredefinedTopology(); return; } - + // Ensure each node has the required properties data.nodes = data.nodes.map(node => ({ id: node.id, @@ -670,13 +670,13 @@ const TopologyManager = (function() { neighbors: node.neighbors || [], links: node.links || [] })); - + // Ensure each link has the required properties data.links = data.links.map(link => ({ source: link.source, target: link.target })); - + gData = data; updateGraph(); }, @@ -690,7 +690,7 @@ const TopologyManager = (function() { nodes: [], links: [] }; - // Update graph + // Update graph if (Graph) { Graph.graphData(gData); } diff --git a/nebula/frontend/static/js/deployment/trustworthiness.js b/nebula/frontend/static/js/deployment/trustworthiness.js index 7ba4d3f43..25fe3a20f 100644 --- a/nebula/frontend/static/js/deployment/trustworthiness.js +++ b/nebula/frontend/static/js/deployment/trustworthiness.js @@ -1,275 +1,558 @@ // Trustworthiness System Module const TrustworthinessManager = (function() { + function isTrustworthinessEnabled() { + const sw = document.getElementById("TrustworthinessSwitch"); + return Boolean(sw?.checked); + } + function initializeTrustworthinessSystem() { setupTrustworthinessSwitch(); + setupTrustworthinessFederationSwitch(); setupWeightValidation(); } - + + function isDFL() { + const ft = document.getElementById("federationArchitecture")?.value || "CFL"; + return (ft === "DFL" || ft === "SDFL"); + } + + function showTrustworthinessWeightsBlock() { + const cflBlock = document.getElementById("tw-cfl"); + const dflBlock = document.getElementById("tw-dfl"); + if (!cflBlock || !dflBlock) return; + + const use = isDFL(); + cflBlock.style.display = use ? "none" : "block"; + dflBlock.style.display = use ? "block" : "none"; + } + function setupTrustworthinessSwitch() { - document.getElementById("TrustworthinessSwitch").addEventListener("change", function() { + const sw = document.getElementById("TrustworthinessSwitch"); + if (!sw) return; + + sw.addEventListener("change", function() { const trustworthinessOptionsDiv = document.getElementById("trustworthiness-options"); - - if(this.checked){ - document.getElementById("federationArchitecture").value = "CFL"; - document.getElementById("federationArchitecture").dispatchEvent(new Event('change')); - document.getElementById("federationArchitecture").disabled = true; - trustworthinessOptionsDiv.style.display = "block" + if (!trustworthinessOptionsDiv) return; + + if (this.checked) { + trustworthinessOptionsDiv.style.display = "block"; + showTrustworthinessWeightsBlock(); + validateWeights(); } else { - document.getElementById("federationArchitecture").disabled = false; trustworthinessOptionsDiv.style.display = "none"; } }); } - + + function setupTrustworthinessFederationSwitch() { + const fed = document.getElementById("federationArchitecture"); + if (!fed) return; + + fed.addEventListener("change", function() { + const trustworthinessOptionsDiv = document.getElementById("trustworthiness-options"); + if (trustworthinessOptionsDiv?.style.display === "block") { + showTrustworthinessWeightsBlock(); + validateWeights(); + } + }); + } + function setupWeightValidation() { - const pillarIds = [ - "robustness-pillar", - "privacy-pillar", - "fairness-pillar", - "explainability-pillar", - "accountability-pillar", - "architectural-soundness-pillar", - "sustainability-pillar" + // IDs CFL + const cflPillarIds = [ + "cfl-robustness-pillar", + "cfl-privacy-pillar", + "cfl-fairness-pillar", + "cfl-explainability-pillar", + "cfl-accountability-pillar", + "cfl-architectural-soundness-pillar", + "cfl-sustainability-pillar" + ]; + const cflNotionIds = [ + "cfl-robustness-notion-1", + "cfl-robustness-notion-2", + "cfl-robustness-notion-3", + "cfl-privacy-notion-1", + "cfl-privacy-notion-2", + "cfl-privacy-notion-3", + "cfl-fairness-notion-1", + "cfl-fairness-notion-2", + "cfl-fairness-notion-3", + "cfl-fairness-notion-4", + "cfl-explainability-notion-1", + "cfl-explainability-notion-2", + "cfl-accountability-notion-1", + "cfl-accountability-notion-2", + "cfl-architectural-soundness-notion-1", + "cfl-architectural-soundness-notion-2", + "cfl-architectural-soundness-notion-3", + "cfl-sustainability-notion-1", + "cfl-sustainability-notion-2", + "cfl-sustainability-notion-3" ]; - const notionIds = [ - "robustness-notion-1", - "robustness-notion-2", - "robustness-notion-3", - "privacy-notion-1", - "privacy-notion-2", - "privacy-notion-3", - "fairness-notion-1", - "fairness-notion-2", - "fairness-notion-3", - "explainability-notion-1", - "explainability-notion-2", - "architectural-soundness-notion-1", - "architectural-soundness-notion-2", - "sustainability-notion-1", - "sustainability-notion-2", - "sustainability-notion-3" + + const dflPillarIds = [ + "dfl-robustness-pillar", + "dfl-privacy-pillar", + "dfl-fairness-pillar", + "dfl-explainability-pillar", + "dfl-accountability-pillar", + "dfl-architectural-soundness-pillar", + "dfl-sustainability-pillar" ]; - - pillarIds.concat(notionIds).forEach(id => { + const dflNotionIds = [ + "dfl-robustness-notion-1", + "dfl-robustness-notion-2", + "dfl-robustness-notion-3", + "dfl-privacy-notion-1", + "dfl-privacy-notion-2", + "dfl-privacy-notion-3", + "dfl-fairness-notion-3", + "dfl-fairness-notion-4", + "dfl-explainability-notion-1", + "dfl-explainability-notion-2", + "dfl-accountability-notion-1", + "dfl-accountability-notion-2", + "dfl-architectural-soundness-notion-1", + "dfl-architectural-soundness-notion-2", + "dfl-architectural-soundness-notion-3", + "dfl-sustainability-notion-1", + "dfl-sustainability-notion-3" + ]; + + cflPillarIds.concat(cflNotionIds, dflPillarIds, dflNotionIds).forEach(id => { const input = document.getElementById(id); - if (input) { - input.addEventListener("input", validateWeights); - } + if (input) input.addEventListener("input", validateWeights); }); } - + function validateWeights() { - const robustnessPercent = parseFloat(document.getElementById("robustness-pillar").value) || 0; - const privacyPercent = parseFloat(document.getElementById("privacy-pillar").value) || 0; - const fairnessPercent = parseFloat(document.getElementById("fairness-pillar").value) || 0; - const explainabilityPercent = parseFloat(document.getElementById("explainability-pillar").value) || 0; - const accountabilityPercent = parseFloat(document.getElementById("accountability-pillar").value) || 0; - const architecturalSoundnessPercent = parseFloat(document.getElementById("architectural-soundness-pillar").value) || 0; - const sustainabilityPercent = parseFloat(document.getElementById("sustainability-pillar").value) || 0; - - const robustnessNotion1 = parseFloat(document.getElementById("robustness-notion-1").value) || 0; - const robustnessNotion2 = parseFloat(document.getElementById("robustness-notion-2").value) || 0; - const robustnessNotion3 = parseFloat(document.getElementById("robustness-notion-3").value) || 0; - const privacyNotion1 = parseFloat(document.getElementById("privacy-notion-1").value) || 0; - const privacyNotion2 = parseFloat(document.getElementById("privacy-notion-2").value) || 0; - const privacyNotion3 = parseFloat(document.getElementById("privacy-notion-3").value) || 0; - const fairnessNotion1 = parseFloat(document.getElementById("fairness-notion-1").value) || 0; - const fairnessNotion2 = parseFloat(document.getElementById("fairness-notion-2").value) || 0; - const fairnessNotion3 = parseFloat(document.getElementById("fairness-notion-3").value) || 0; - const explainabilityNotion1 = parseFloat(document.getElementById("explainability-notion-1").value) || 0; - const explainabilityNotion2 = parseFloat(document.getElementById("explainability-notion-2").value) || 0; - const architecturalSoundnessNotion1 = parseFloat(document.getElementById("architectural-soundness-notion-1").value) || 0; - const architecturalSoundnessNotion2 = parseFloat(document.getElementById("architectural-soundness-notion-2").value) || 0; - const sustainabilityNotion1 = parseFloat(document.getElementById("sustainability-notion-1").value) || 0; - const sustainabilityNotion2 = parseFloat(document.getElementById("sustainability-notion-2").value) || 0; - const sustainabilityNotion3 = parseFloat(document.getElementById("sustainability-notion-3").value) || 0; - + if (!isTrustworthinessEnabled()) { + return null; + } + + if (isDFL()) { + return validateWeightsDFL(); + } + return validateWeightsCFL(); + } + + function getWeightValidationMessage(groupLabel, total) { + if (total > 100) { + return `[Trustworthiness] ${groupLabel} weights exceed 100%. Please review the configuration.`; + } + + if (total < 100) { + return `[Trustworthiness] ${groupLabel} weights are below 100%. Please review the configuration.`; + } + + return null; + } + + function validateWeightsCFL() { + const robustnessPercent = parseFloat(document.getElementById("cfl-robustness-pillar").value) || 0; + const privacyPercent = parseFloat(document.getElementById("cfl-privacy-pillar").value) || 0; + const fairnessPercent = parseFloat(document.getElementById("cfl-fairness-pillar").value) || 0; + const explainabilityPercent = parseFloat(document.getElementById("cfl-explainability-pillar").value) || 0; + const accountabilityPercent = parseFloat(document.getElementById("cfl-accountability-pillar").value) || 0; + const architecturalSoundnessPercent = parseFloat(document.getElementById("cfl-architectural-soundness-pillar").value) || 0; + const sustainabilityPercent = parseFloat(document.getElementById("cfl-sustainability-pillar").value) || 0; + + const robustnessNotion1 = parseFloat(document.getElementById("cfl-robustness-notion-1").value) || 0; + const robustnessNotion2 = parseFloat(document.getElementById("cfl-robustness-notion-2").value) || 0; + const robustnessNotion3 = parseFloat(document.getElementById("cfl-robustness-notion-3").value) || 0; + + const privacyNotion1 = parseFloat(document.getElementById("cfl-privacy-notion-1").value) || 0; + const privacyNotion2 = parseFloat(document.getElementById("cfl-privacy-notion-2").value) || 0; + const privacyNotion3 = parseFloat(document.getElementById("cfl-privacy-notion-3").value) || 0; + + const fairnessNotion1 = parseFloat(document.getElementById("cfl-fairness-notion-1").value) || 0; + const fairnessNotion2 = parseFloat(document.getElementById("cfl-fairness-notion-2").value) || 0; + const fairnessNotion3 = parseFloat(document.getElementById("cfl-fairness-notion-3").value) || 0; + const fairnessNotion4 = parseFloat(document.getElementById("cfl-fairness-notion-4").value) || 0; + + const explainabilityNotion1 = parseFloat(document.getElementById("cfl-explainability-notion-1").value) || 0; + const explainabilityNotion2 = parseFloat(document.getElementById("cfl-explainability-notion-2").value) || 0; + + const architecturalSoundnessNotion1 = parseFloat(document.getElementById("cfl-architectural-soundness-notion-1").value) || 0; + const architecturalSoundnessNotion2 = parseFloat(document.getElementById("cfl-architectural-soundness-notion-2").value) || 0; + const architecturalSoundnessNotion3 = parseFloat(document.getElementById("cfl-architectural-soundness-notion-3").value) || 0; + + const sustainabilityNotion1 = parseFloat(document.getElementById("cfl-sustainability-notion-1").value) || 0; + const sustainabilityNotion2 = parseFloat(document.getElementById("cfl-sustainability-notion-2").value) || 0; + const sustainabilityNotion3 = parseFloat(document.getElementById("cfl-sustainability-notion-3").value) || 0; + + const accountabilityNotion1 = parseFloat(document.getElementById("cfl-accountability-notion-1").value) || 0; + const accountabilityNotion2 = parseFloat(document.getElementById("cfl-accountability-notion-2").value) || 0; + const totalPillar = - robustnessPercent + - privacyPercent + - fairnessPercent + - explainabilityPercent + - accountabilityPercent + - architecturalSoundnessPercent + - sustainabilityPercent; - + robustnessPercent + privacyPercent + fairnessPercent + explainabilityPercent + + accountabilityPercent + architecturalSoundnessPercent + sustainabilityPercent; + const totalRobustnessNotion = robustnessNotion1 + robustnessNotion2 + robustnessNotion3; const totalPrivacyNotion = privacyNotion1 + privacyNotion2 + privacyNotion3; - const totalFairnessNotion = fairnessNotion1 + fairnessNotion2 + fairnessNotion3; + const totalFairnessNotion = fairnessNotion1 + fairnessNotion2 + fairnessNotion3 + fairnessNotion4; const totalExplainabilityNotion = explainabilityNotion1 + explainabilityNotion2; - const totalArchitecturalSoundnessNotion = architecturalSoundnessNotion1 + architecturalSoundnessNotion2; + const totalArchitecturalSoundnessNotion = architecturalSoundnessNotion1 + architecturalSoundnessNotion2 + architecturalSoundnessNotion3; const totalSustainabilityNotion = sustainabilityNotion1 + sustainabilityNotion2 + sustainabilityNotion3; - - if (totalPillar !== 100) { - return "[Trustworthiness] Check pillars weights"; - } - if (totalRobustnessNotion !== 100) { - return "[Trustworthiness] Check robustness notions weights"; - } - if (totalPrivacyNotion !== 100) { - return "[Trustworthiness] Check privacy notions weights"; - } - if (totalFairnessNotion !== 100) { - return "[Trustworthiness] Check fairness notions weights"; - } - if (totalExplainabilityNotion !== 100) { - return "[Trustworthiness] Check explainability notions weights"; - } - if (totalArchitecturalSoundnessNotion !== 100) { - return "[Trustworthiness] Check architectural soundness notions weights"; - } - if (totalSustainabilityNotion !== 100) { - return "[Trustworthiness] Check sustainability notions weights"; - } + const totalAccountabilityNotion = accountabilityNotion1 + accountabilityNotion2; + + + return ( + getWeightValidationMessage("Pillars", totalPillar) || + getWeightValidationMessage("Robustness notions", totalRobustnessNotion) || + getWeightValidationMessage("Privacy notions", totalPrivacyNotion) || + getWeightValidationMessage("Fairness notions", totalFairnessNotion) || + getWeightValidationMessage("Explainability notions", totalExplainabilityNotion) || + getWeightValidationMessage("Architectural soundness notions", totalArchitecturalSoundnessNotion) || + getWeightValidationMessage("Sustainability notions", totalSustainabilityNotion) || + getWeightValidationMessage("Accountability notions", totalAccountabilityNotion) + ); } - + + function validateWeightsDFL() { + const robustnessPercent = parseFloat(document.getElementById("dfl-robustness-pillar").value) || 0; + const privacyPercent = parseFloat(document.getElementById("dfl-privacy-pillar").value) || 0; + const fairnessPercent = parseFloat(document.getElementById("dfl-fairness-pillar").value) || 0; + const explainabilityPercent = parseFloat(document.getElementById("dfl-explainability-pillar").value) || 0; + const accountabilityPercent = parseFloat(document.getElementById("dfl-accountability-pillar").value) || 0; + const architecturalSoundnessPercent = parseFloat(document.getElementById("dfl-architectural-soundness-pillar").value) || 0; + const sustainabilityPercent = parseFloat(document.getElementById("dfl-sustainability-pillar").value) || 0; + + const robustnessNotion1 = parseFloat(document.getElementById("dfl-robustness-notion-1").value) || 0; + const robustnessNotion2 = parseFloat(document.getElementById("dfl-robustness-notion-2").value) || 0; + const robustnessNotion3 = parseFloat(document.getElementById("dfl-robustness-notion-3").value) || 0; + + const privacyNotion1 = parseFloat(document.getElementById("dfl-privacy-notion-1").value) || 0; + const privacyNotion2 = parseFloat(document.getElementById("dfl-privacy-notion-2").value) || 0; + const privacyNotion3 = parseFloat(document.getElementById("dfl-privacy-notion-3").value) || 0; + + const fairnessNotion3 = parseFloat(document.getElementById("dfl-fairness-notion-3").value) || 0; + const fairnessNotion4 = parseFloat(document.getElementById("dfl-fairness-notion-4").value) || 0; + + const explainabilityNotion1 = parseFloat(document.getElementById("dfl-explainability-notion-1").value) || 0; + const explainabilityNotion2 = parseFloat(document.getElementById("dfl-explainability-notion-2").value) || 0; + + const architecturalSoundnessNotion1 = parseFloat(document.getElementById("dfl-architectural-soundness-notion-1").value) || 0; + const architecturalSoundnessNotion2 = parseFloat(document.getElementById("dfl-architectural-soundness-notion-2").value) || 0; + const architecturalSoundnessNotion3 = parseFloat(document.getElementById("dfl-architectural-soundness-notion-3").value) || 0; + + const sustainabilityNotion1 = parseFloat(document.getElementById("dfl-sustainability-notion-1").value) || 0; + const sustainabilityNotion3 = parseFloat(document.getElementById("dfl-sustainability-notion-3").value) || 0; + + const accountabilityNotion1 = parseFloat(document.getElementById("cfl-accountability-notion-1").value) || 0; + const accountabilityNotion2 = parseFloat(document.getElementById("cfl-accountability-notion-2").value) || 0; + + const totalPillar = + robustnessPercent + privacyPercent + fairnessPercent + explainabilityPercent + + accountabilityPercent + architecturalSoundnessPercent + sustainabilityPercent; + + const totalRobustnessNotion = robustnessNotion1 + robustnessNotion2 + robustnessNotion3; + const totalPrivacyNotion = privacyNotion1 + privacyNotion2 + privacyNotion3; + const totalFairnessNotion = fairnessNotion3 + fairnessNotion4; + const totalExplainabilityNotion = explainabilityNotion1 + explainabilityNotion2; + const totalArchitecturalSoundnessNotion = architecturalSoundnessNotion1 + architecturalSoundnessNotion2 + architecturalSoundnessNotion3; + const totalSustainabilityNotion = sustainabilityNotion1 + sustainabilityNotion3; + const totalAccountabilityNotion = accountabilityNotion1 + accountabilityNotion2; + + return ( + getWeightValidationMessage("Pillars", totalPillar) || + getWeightValidationMessage("Robustness notions", totalRobustnessNotion) || + getWeightValidationMessage("Privacy notions", totalPrivacyNotion) || + getWeightValidationMessage("Fairness notions", totalFairnessNotion) || + getWeightValidationMessage("Explainability notions", totalExplainabilityNotion) || + getWeightValidationMessage("Architectural soundness notions", totalArchitecturalSoundnessNotion) || + getWeightValidationMessage("Sustainability notions", totalSustainabilityNotion) || + getWeightValidationMessage("Accountability notions", totalAccountabilityNotion) + ); + } + function getTrustworthinessConfig() { const enabled = document.getElementById("trustworthiness-options").style.display === "block"; const federationArchitecture = document.getElementById("federationArchitecture").value; - + + if (isDFL()) return getTrustworthinessConfigDFL(enabled, federationArchitecture); + return getTrustworthinessConfigCFL(enabled, federationArchitecture); + } + + function getTrustworthinessConfigCFL(enabled, federationArchitecture) { const pillars = { - robustness: parseFloat(document.getElementById("robustness-pillar").value) || 0, - privacy: parseFloat(document.getElementById("privacy-pillar").value) || 0, - fairness: parseFloat(document.getElementById("fairness-pillar").value) || 0, - explainability: parseFloat(document.getElementById("explainability-pillar").value) || 0, - accountability: parseFloat(document.getElementById("accountability-pillar").value) || 0, - architecturalSoundness: parseFloat(document.getElementById("architectural-soundness-pillar").value) || 0, - sustainability: parseFloat(document.getElementById("sustainability-pillar").value) || 0 + robustness: parseFloat(document.getElementById("cfl-robustness-pillar").value) || 0, + privacy: parseFloat(document.getElementById("cfl-privacy-pillar").value) || 0, + fairness: parseFloat(document.getElementById("cfl-fairness-pillar").value) || 0, + explainability: parseFloat(document.getElementById("cfl-explainability-pillar").value) || 0, + accountability: parseFloat(document.getElementById("cfl-accountability-pillar").value) || 0, + architecturalSoundness: parseFloat(document.getElementById("cfl-architectural-soundness-pillar").value) || 0, + sustainability: parseFloat(document.getElementById("cfl-sustainability-pillar").value) || 0 }; - + const notions = { robustness: [ - parseFloat(document.getElementById("robustness-notion-1").value) || 0, - parseFloat(document.getElementById("robustness-notion-2").value) || 0, - parseFloat(document.getElementById("robustness-notion-3").value) || 0 + parseFloat(document.getElementById("cfl-robustness-notion-1").value) || 0, + parseFloat(document.getElementById("cfl-robustness-notion-2").value) || 0, + parseFloat(document.getElementById("cfl-robustness-notion-3").value) || 0 ], privacy: [ - parseFloat(document.getElementById("privacy-notion-1").value) || 0, - parseFloat(document.getElementById("privacy-notion-2").value) || 0, - parseFloat(document.getElementById("privacy-notion-3").value) || 0 + parseFloat(document.getElementById("cfl-privacy-notion-1").value) || 0, + parseFloat(document.getElementById("cfl-privacy-notion-2").value) || 0, + parseFloat(document.getElementById("cfl-privacy-notion-3").value) || 0 ], fairness: [ - parseFloat(document.getElementById("fairness-notion-1").value) || 0, - parseFloat(document.getElementById("fairness-notion-2").value) || 0, - parseFloat(document.getElementById("fairness-notion-3").value) || 0 + parseFloat(document.getElementById("cfl-fairness-notion-1").value) || 0, + parseFloat(document.getElementById("cfl-fairness-notion-2").value) || 0, + parseFloat(document.getElementById("cfl-fairness-notion-3").value) || 0, + parseFloat(document.getElementById("cfl-fairness-notion-4").value) || 0 ], explainability: [ - parseFloat(document.getElementById("explainability-notion-1").value) || 0, - parseFloat(document.getElementById("explainability-notion-2").value) || 0 + parseFloat(document.getElementById("cfl-explainability-notion-1").value) || 0, + parseFloat(document.getElementById("cfl-explainability-notion-2").value) || 0 + ], + accountability: [ + parseFloat(document.getElementById("cfl-accountability-notion-1")?.value) || 0, + parseFloat(document.getElementById("cfl-accountability-notion-2")?.value) || 0 ], architecturalSoundness: [ - parseFloat(document.getElementById("architectural-soundness-notion-1").value) || 0, - parseFloat(document.getElementById("architectural-soundness-notion-2").value) || 0 + parseFloat(document.getElementById("cfl-architectural-soundness-notion-1").value) || 0, + parseFloat(document.getElementById("cfl-architectural-soundness-notion-2").value) || 0, + parseFloat(document.getElementById("cfl-architectural-soundness-notion-3").value) || 0 ], sustainability: [ - parseFloat(document.getElementById("sustainability-notion-1").value) || 0, - parseFloat(document.getElementById("sustainability-notion-2").value) || 0, - parseFloat(document.getElementById("sustainability-notion-3").value) || 0 + parseFloat(document.getElementById("cfl-sustainability-notion-1").value) || 0, + parseFloat(document.getElementById("cfl-sustainability-notion-2").value) || 0, + parseFloat(document.getElementById("cfl-sustainability-notion-3").value) || 0 ] }; - - return { - enabled, - federationArchitecture, - pillars, - notions + + return { enabled, federationArchitecture, pillars, notions }; + } + + function getTrustworthinessConfigDFL(enabled, federationArchitecture) { + const pillars = { + robustness: parseFloat(document.getElementById("dfl-robustness-pillar").value) || 0, + privacy: parseFloat(document.getElementById("dfl-privacy-pillar").value) || 0, + fairness: parseFloat(document.getElementById("dfl-fairness-pillar").value) || 0, + explainability: parseFloat(document.getElementById("dfl-explainability-pillar").value) || 0, + accountability: parseFloat(document.getElementById("dfl-accountability-pillar").value) || 0, + architecturalSoundness: parseFloat(document.getElementById("dfl-architectural-soundness-pillar").value) || 0, + sustainability: parseFloat(document.getElementById("dfl-sustainability-pillar").value) || 0 + }; + + const notions = { + robustness: [ + parseFloat(document.getElementById("dfl-robustness-notion-1").value) || 0, + parseFloat(document.getElementById("dfl-robustness-notion-2").value) || 0, + parseFloat(document.getElementById("dfl-robustness-notion-3").value) || 0 + ], + privacy: [ + parseFloat(document.getElementById("dfl-privacy-notion-1").value) || 0, + parseFloat(document.getElementById("dfl-privacy-notion-2").value) || 0, + parseFloat(document.getElementById("dfl-privacy-notion-3").value) || 0 + ], + fairness: [ + parseFloat(document.getElementById("dfl-fairness-notion-3").value) || 0, + parseFloat(document.getElementById("dfl-fairness-notion-4").value) || 0 + ], + explainability: [ + parseFloat(document.getElementById("dfl-explainability-notion-1").value) || 0, + parseFloat(document.getElementById("dfl-explainability-notion-2").value) || 0 + ], + accountability: [ + parseFloat(document.getElementById("dfl-accountability-notion-1")?.value) || 0, + parseFloat(document.getElementById("dfl-accountability-notion-2")?.value) || 0 + ], + architecturalSoundness: [ + parseFloat(document.getElementById("dfl-architectural-soundness-notion-1").value) || 0, + parseFloat(document.getElementById("dfl-architectural-soundness-notion-2").value) || 0, + parseFloat(document.getElementById("dfl-architectural-soundness-notion-3").value) || 0 + ], + sustainability: [ + parseFloat(document.getElementById("dfl-sustainability-notion-1").value) || 0, + parseFloat(document.getElementById("dfl-sustainability-notion-3").value) || 0 + ] }; + + return { enabled, federationArchitecture, pillars, notions }; } - + function setTrustworthinessConfig(config) { if (!config) return; - - // Set pillar weights + + if (isDFL()) setTrustworthinessConfigDFL(config); + else setTrustworthinessConfigCFL(config); + + validateWeights(); + } + + function setTrustworthinessConfigCFL(config) { if (config.pillars) { - document.getElementById("robustness-pillar").value = config.pillars.robustness || 0; - document.getElementById("privacy-pillar").value = config.pillars.privacy || 0; - document.getElementById("fairness-pillar").value = config.pillars.fairness || 0; - document.getElementById("explainability-pillar").value = config.pillars.explainability || 0; - document.getElementById("accountability-pillar").value = config.pillars.accountability || 0; - document.getElementById("architectural-soundness-pillar").value = config.pillars.architecturalSoundness || 0; - document.getElementById("sustainability-pillar").value = config.pillars.sustainability || 0; + document.getElementById("cfl-robustness-pillar").value = config.pillars.robustness || 0; + document.getElementById("cfl-privacy-pillar").value = config.pillars.privacy || 0; + document.getElementById("cfl-fairness-pillar").value = config.pillars.fairness || 0; + document.getElementById("cfl-explainability-pillar").value = config.pillars.explainability || 0; + document.getElementById("cfl-accountability-pillar").value = config.pillars.accountability || 0; + document.getElementById("cfl-architectural-soundness-pillar").value = config.pillars.architecturalSoundness || 0; + document.getElementById("cfl-sustainability-pillar").value = config.pillars.sustainability || 0; } - - // Set notion weights + if (config.notions) { - const rNotions = config.notions.robustness || [0, 0, 0]; - document.getElementById("robustness-notion-1").value = rNotions[0]; - document.getElementById("robustness-notion-2").value = rNotions[1]; - document.getElementById("robustness-notion-3").value = rNotions[2]; - - const pNotions = config.notions.privacy || [0, 0, 0]; - document.getElementById("privacy-notion-1").value = pNotions[0]; - document.getElementById("privacy-notion-2").value = pNotions[1]; - document.getElementById("privacy-notion-3").value = pNotions[2]; - - const fNotions = config.notions.fairness || [0, 0, 0]; - document.getElementById("fairness-notion-1").value = fNotions[0]; - document.getElementById("fairness-notion-2").value = fNotions[1]; - document.getElementById("fairness-notion-3").value = fNotions[2]; - - const eNotions = config.notions.explainability || [0, 0]; - document.getElementById("explainability-notion-1").value = eNotions[0]; - document.getElementById("explainability-notion-2").value = eNotions[1]; - - const aNotions = config.notions.architecturalSoundness || [0, 0]; - document.getElementById("architectural-soundness-notion-1").value = aNotions[0]; - document.getElementById("architectural-soundness-notion-2").value = aNotions[1]; - - const sNotions = config.notions.sustainability || [0, 0, 0]; - document.getElementById("sustainability-notion-1").value = sNotions[0]; - document.getElementById("sustainability-notion-2").value = sNotions[1]; - document.getElementById("sustainability-notion-3").value = sNotions[2]; + const r = config.notions.robustness || [0, 0, 0]; + document.getElementById("cfl-robustness-notion-1").value = r[0]; + document.getElementById("cfl-robustness-notion-2").value = r[1]; + document.getElementById("cfl-robustness-notion-3").value = r[2]; + + const p = config.notions.privacy || [0, 0, 0]; + document.getElementById("cfl-privacy-notion-1").value = p[0]; + document.getElementById("cfl-privacy-notion-2").value = p[1]; + document.getElementById("cfl-privacy-notion-3").value = p[2]; + + const f = config.notions.fairness || [0, 0, 0, 0]; + document.getElementById("cfl-fairness-notion-1").value = f[0]; + document.getElementById("cfl-fairness-notion-2").value = f[1]; + document.getElementById("cfl-fairness-notion-3").value = f[2]; + document.getElementById("cfl-fairness-notion-4").value = f[3]; + + const e = config.notions.explainability || [0, 0]; + document.getElementById("cfl-explainability-notion-1").value = e[0]; + document.getElementById("cfl-explainability-notion-2").value = e[1]; + + const a = config.notions.architecturalSoundness || [0, 0, 0]; + document.getElementById("cfl-architectural-soundness-notion-1").value = a[0]; + document.getElementById("cfl-architectural-soundness-notion-2").value = a[1]; + document.getElementById("cfl-architectural-soundness-notion-3").value = a[2]; + + const s = config.notions.sustainability || [0, 0, 0]; + document.getElementById("cfl-sustainability-notion-1").value = s[0]; + document.getElementById("cfl-sustainability-notion-2").value = s[1]; + document.getElementById("cfl-sustainability-notion-3").value = s[2]; } - - // Perform a weight validation check to update any warnings if needed - validateWeights(); } - + + function setTrustworthinessConfigDFL(config) { + if (config.pillars) { + document.getElementById("dfl-robustness-pillar").value = config.pillars.robustness || 0; + document.getElementById("dfl-privacy-pillar").value = config.pillars.privacy || 0; + document.getElementById("dfl-fairness-pillar").value = config.pillars.fairness || 0; + document.getElementById("dfl-explainability-pillar").value = config.pillars.explainability || 0; + document.getElementById("dfl-accountability-pillar").value = config.pillars.accountability || 0; + document.getElementById("dfl-architectural-soundness-pillar").value = config.pillars.architecturalSoundness || 0; + document.getElementById("dfl-sustainability-pillar").value = config.pillars.sustainability || 0; + } + + if (config.notions) { + const r = config.notions.robustness || [0, 0, 0]; + document.getElementById("dfl-robustness-notion-1").value = r[0]; + document.getElementById("dfl-robustness-notion-2").value = r[1]; + document.getElementById("dfl-robustness-notion-3").value = r[2]; + + const p = config.notions.privacy || [0, 0, 0]; + document.getElementById("dfl-privacy-notion-1").value = p[0]; + document.getElementById("dfl-privacy-notion-2").value = p[1]; + document.getElementById("dfl-privacy-notion-3").value = p[2]; + + const f = config.notions.fairness || [0, 0]; + document.getElementById("dfl-fairness-notion-3").value = f[0]; + document.getElementById("dfl-fairness-notion-4").value = f[1]; + + const e = config.notions.explainability || [0, 0]; + document.getElementById("dfl-explainability-notion-1").value = e[0]; + document.getElementById("dfl-explainability-notion-2").value = e[1]; + + const a = config.notions.architecturalSoundness || [0, 0, 0]; + document.getElementById("dfl-architectural-soundness-notion-1").value = a[0]; + document.getElementById("dfl-architectural-soundness-notion-2").value = a[1]; + document.getElementById("dfl-architectural-soundness-notion-3").value = a[2]; + + const s = config.notions.sustainability || [0, 0]; + document.getElementById("dfl-sustainability-notion-1").value = s[0]; + document.getElementById("dfl-sustainability-notion-3").value = s[1]; + } + } + function resetTrustworthinessConfig() { const trustworthinessOptionsDiv = document.getElementById("trustworthiness-options"); const fedArchElement = document.getElementById("federationArchitecture"); - - // Hide options and re-enable federationArchitecture + trustworthinessOptionsDiv.style.display = "none"; fedArchElement.disabled = false; - - // Reset pillars to 0 - document.getElementById("robustness-pillar").value = "0"; - document.getElementById("privacy-pillar").value = "0"; - document.getElementById("fairness-pillar").value = "0"; - document.getElementById("explainability-pillar").value = "0"; - document.getElementById("accountability-pillar").value = "0"; - document.getElementById("architectural-soundness-pillar").value = "0"; - document.getElementById("sustainability-pillar").value = "0"; - - // Reset notions to 0 - document.getElementById("robustness-notion-1").value = "0"; - document.getElementById("robustness-notion-2").value = "0"; - document.getElementById("robustness-notion-3").value = "0"; - document.getElementById("privacy-notion-1").value = "0"; - document.getElementById("privacy-notion-2").value = "0"; - document.getElementById("privacy-notion-3").value = "0"; - document.getElementById("fairness-notion-1").value = "0"; - document.getElementById("fairness-notion-2").value = "0"; - document.getElementById("fairness-notion-3").value = "0"; - document.getElementById("explainability-notion-1").value = "0"; - document.getElementById("explainability-notion-2").value = "0"; - document.getElementById("architectural-soundness-notion-1").value = "0"; - document.getElementById("architectural-soundness-notion-2").value = "0"; - document.getElementById("sustainability-notion-1").value = "0"; - document.getElementById("sustainability-notion-2").value = "0"; - document.getElementById("sustainability-notion-3").value = "0"; - - // Re-validate weights after reset + + if (isDFL()) resetTrustworthinessConfigDFL(); + else resetTrustworthinessConfigCFL(); + validateWeights(); } - + + function resetTrustworthinessConfigCFL() { + document.getElementById("cfl-robustness-pillar").value = "0"; + document.getElementById("cfl-privacy-pillar").value = "0"; + document.getElementById("cfl-fairness-pillar").value = "0"; + document.getElementById("cfl-explainability-pillar").value = "0"; + document.getElementById("cfl-accountability-pillar").value = "0"; + document.getElementById("cfl-architectural-soundness-pillar").value = "0"; + document.getElementById("cfl-sustainability-pillar").value = "0"; + + document.getElementById("cfl-robustness-notion-1").value = "0"; + document.getElementById("cfl-robustness-notion-2").value = "0"; + document.getElementById("cfl-robustness-notion-3").value = "0"; + + document.getElementById("cfl-privacy-notion-1").value = "0"; + document.getElementById("cfl-privacy-notion-2").value = "0"; + document.getElementById("cfl-privacy-notion-3").value = "0"; + + document.getElementById("cfl-fairness-notion-1").value = "0"; + document.getElementById("cfl-fairness-notion-2").value = "0"; + document.getElementById("cfl-fairness-notion-3").value = "0"; + document.getElementById("cfl-fairness-notion-4").value = "0"; + + document.getElementById("cfl-explainability-notion-1").value = "0"; + document.getElementById("cfl-explainability-notion-2").value = "0"; + + document.getElementById("cfl-architectural-soundness-notion-1").value = "0"; + document.getElementById("cfl-architectural-soundness-notion-2").value = "0"; + document.getElementById("cfl-architectural-soundness-notion-3").value = "0"; + + document.getElementById("cfl-sustainability-notion-1").value = "0"; + document.getElementById("cfl-sustainability-notion-2").value = "0"; + document.getElementById("cfl-sustainability-notion-3").value = "0"; + } + + function resetTrustworthinessConfigDFL() { + document.getElementById("dfl-robustness-pillar").value = "0"; + document.getElementById("dfl-privacy-pillar").value = "0"; + document.getElementById("dfl-fairness-pillar").value = "0"; + document.getElementById("dfl-explainability-pillar").value = "0"; + document.getElementById("dfl-accountability-pillar").value = "0"; + document.getElementById("dfl-architectural-soundness-pillar").value = "0"; + document.getElementById("dfl-sustainability-pillar").value = "0"; + + document.getElementById("dfl-robustness-notion-1").value = "0"; + document.getElementById("dfl-robustness-notion-2").value = "0"; + document.getElementById("dfl-robustness-notion-3").value = "0"; + + document.getElementById("dfl-privacy-notion-1").value = "0"; + document.getElementById("dfl-privacy-notion-2").value = "0"; + document.getElementById("dfl-privacy-notion-3").value = "0"; + + document.getElementById("dfl-fairness-notion-3").value = "0"; + document.getElementById("dfl-fairness-notion-4").value = "0"; + + document.getElementById("dfl-explainability-notion-1").value = "0"; + document.getElementById("dfl-explainability-notion-2").value = "0"; + + document.getElementById("dfl-architectural-soundness-notion-1").value = "0"; + document.getElementById("dfl-architectural-soundness-notion-2").value = "0"; + document.getElementById("dfl-architectural-soundness-notion-3").value = "0"; + + document.getElementById("dfl-sustainability-notion-1").value = "0"; + document.getElementById("dfl-sustainability-notion-3").value = "0"; + } + return { initializeTrustworthinessSystem, getTrustworthinessConfig, setTrustworthinessConfig, - resetTrustworthinessConfig + resetTrustworthinessConfig, + validateWeights }; })(); - -export default TrustworthinessManager; \ No newline at end of file + +export default TrustworthinessManager; diff --git a/nebula/frontend/static/js/deployment/ui-controls.js b/nebula/frontend/static/js/deployment/ui-controls.js index ed02efa76..97a76dce9 100644 --- a/nebula/frontend/static/js/deployment/ui-controls.js +++ b/nebula/frontend/static/js/deployment/ui-controls.js @@ -11,11 +11,11 @@ const UIControls = (function() { /* === control Physical + Predefined => block input === */ document.querySelectorAll('input[name="deploymentRadioOptions"]') .forEach(r => r.addEventListener('change', togglePredefinedNodesInput)); - + ['custom-topology-btn', 'predefined-topology-btn'] .forEach(id => document.getElementById(id) .addEventListener('change', togglePredefinedNodesInput)); - + togglePredefinedNodesInput(); setupVpnDiscover(); setupParticipantDisplay(); @@ -650,33 +650,33 @@ const UIControls = (function() { const radios = document.querySelectorAll('input[name="deploymentRadioOptions"]'); const discoverBtn = document.getElementById('discoverDevicesBtn'); if (!discoverBtn || !radios.length) return; - + const toggle = () => { const sel = document.querySelector('input[name="deploymentRadioOptions"]:checked'); discoverBtn.disabled = sel.value !== 'physical'; }; - + radios.forEach(r => r.addEventListener('change', toggle)); toggle(); } - + function setupVpnDiscover() { const discoverBtn = document.getElementById('discoverDevicesBtn'); if (!discoverBtn) return; - + discoverBtn.addEventListener('click', async () => { try { const res = await fetch('/platform/api/discover-vpn'); if (!res.ok) throw new Error(res.statusText); - + const { ips } = await res.json(); - + const form = document.getElementById('vpn-form'); form.innerHTML = ''; - + const currentScenario = window.ScenarioManager.getScenariosList()[window.ScenarioManager.getActualScenario()]; const selectedIPs = currentScenario?.physical_ips || []; - + ips.forEach(ip => { const wrapper = document.createElement('div'); wrapper.classList.add('form-check'); @@ -687,18 +687,18 @@ const UIControls = (function() { `; form.appendChild(wrapper); }); - + const modal = new bootstrap.Modal(document.getElementById('vpnModal')); modal.show(); - + document.getElementById('vpn-accept-btn').onclick = () => { const selected = Array.from(form.querySelectorAll('input:checked')) .map(i => i.value); - + window.ScenarioManager.setPhysicalIPs(selected); - + window.TopologyManager.setPhysicalIPs(selected); - + modal.hide(); }; } catch (err) { @@ -707,19 +707,19 @@ const UIControls = (function() { } }); } - + function togglePredefinedNodesInput() { const deployment = document.querySelector('input[name="deploymentRadioOptions"]:checked')?.value; const isPredefined = document.getElementById('predefined-topology-btn').checked; const nodesInput = document.getElementById('predefined-topology-nodes'); - + if (!nodesInput) return; - + const disable = deployment === 'physical' && isPredefined; nodesInput.disabled = disable; nodesInput.classList.toggle('disabled', disable); } - + function setupDeploymentRadios() { const radios = document.querySelectorAll('input[name="deploymentRadioOptions"]'); radios.forEach(radio => { diff --git a/nebula/frontend/templates/deployment.html b/nebula/frontend/templates/deployment.html index 8572403ff..f5171f14e 100755 --- a/nebula/frontend/templates/deployment.html +++ b/nebula/frontend/templates/deployment.html @@ -143,6 +143,22 @@
    Dataset FashionMNIST + + + +