diff --git a/.github/workflows/update-plugin-docs.yml b/.github/workflows/update-plugin-docs.yml index a1961ea3..2778efb1 100644 --- a/.github/workflows/update-plugin-docs.yml +++ b/.github/workflows/update-plugin-docs.yml @@ -1,9 +1,10 @@ -# Workflow to run plugin documentation generation then commit the updated changes +# Workflow to run plugin documentation generation then create a PR with the updated changes name: Plugin Documentation Generator permissions: contents: write + pull-requests: write on: workflow_dispatch: @@ -15,8 +16,15 @@ jobs: runs-on: [ self-hosted ] # To disable this workflow, set DISABLE_AUTO_DOCS to 'true' in repository variables if: vars.DISABLE_AUTO_DOCS != 'true' + env: + HOME: /tmp/github-actions-home steps: + - name: Setup HOME directory + run: | + mkdir -p /tmp/github-actions-home + export HOME=/tmp/github-actions-home + - name: Checkout repository uses: actions/checkout@v4 with: @@ -37,10 +45,18 @@ jobs: source venv/bin/activate pre-commit run --files docs/PLUGIN_DOC.md || true - - name: Commit and push changes - run: | - git config user.name "github-actions[bot]" - git config user.email "github-actions[bot]@users.noreply.github.com" - git add docs/PLUGIN_DOC.md - git diff --staged --quiet || git commit --no-verify -m "docs: Update plugin documentation [automated]" - git push + - name: Create Pull Request + uses: peter-evans/create-pull-request@v6 + with: + token: ${{ secrets.GITHUB_TOKEN }} + commit-message: "docs: Update plugin documentation [automated]" + committer: "github-actions[bot] " + author: "github-actions[bot] " + branch: automated-plugin-docs-update + delete-branch: true + title: "docs: Update plugin documentation [automated]" + body: | + Automated plugin documentation update generated by workflow. + + This PR was automatically created by the Plugin Documentation Generator workflow. + labels: documentation,automated diff --git a/docs/PLUGIN_DOC.md b/docs/PLUGIN_DOC.md index 801adde8..bda43772 100644 --- a/docs/PLUGIN_DOC.md +++ b/docs/PLUGIN_DOC.md @@ -4,7 +4,7 @@ | Plugin | Collection | Analysis | DataModel | Collector | Analyzer | | --- | --- | --- | --- | --- | --- | -| AmdSmiPlugin | amd-smi firmware --json
amd-smi list --json
amd-smi partition --json
amd-smi process --json
amd-smi static -g all --json
amd-smi version --json | **Analyzer Args:**
- `check_static_data`: bool
- `expected_gpu_processes`: Optional[int]
- `expected_max_power`: Optional[int]
- `expected_driver_version`: Optional[str]
- `expected_memory_partition_mode`: Optional[str]
- `expected_compute_partition_mode`: Optional[str]
- `expected_pldm_version`: Optional[str]
- `l0_to_recovery_count_error_threshold`: Optional[int]
- `l0_to_recovery_count_warning_threshold`: Optional[int]
- `vendorid_ep`: Optional[str]
- `vendorid_ep_vf`: Optional[str]
- `devid_ep`: Optional[str]
- `devid_ep_vf`: Optional[str]
- `sku_name`: Optional[str] | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) | +| AmdSmiPlugin | firmware --json
list --json
partition --json
process --json
ras --cper --folder={folder}
static -g all --json
static -g {gpu_id} --json
version --json | **Analyzer Args:**
- `check_static_data`: bool
- `expected_gpu_processes`: Optional[int]
- `expected_max_power`: Optional[int]
- `expected_driver_version`: Optional[str]
- `expected_memory_partition_mode`: Optional[str]
- `expected_compute_partition_mode`: Optional[str]
- `expected_pldm_version`: Optional[str]
- `l0_to_recovery_count_error_threshold`: Optional[int]
- `l0_to_recovery_count_warning_threshold`: Optional[int]
- `vendorid_ep`: Optional[str]
- `vendorid_ep_vf`: Optional[str]
- `devid_ep`: Optional[str]
- `devid_ep_vf`: Optional[str]
- `sku_name`: Optional[str]
- `expected_xgmi_speed`: Optional[list[float]]
- `analysis_range_start`: Optional[datetime.datetime]
- `analysis_range_end`: Optional[datetime.datetime] | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) | | BiosPlugin | sh -c 'cat /sys/devices/virtual/dmi/id/bios_version'
wmic bios get SMBIOSBIOSVersion /Value | **Analyzer Args:**
- `exp_bios_version`: list[str]
- `regex_match`: bool | [BiosDataModel](#BiosDataModel-Model) | [BiosCollector](#Collector-Class-BiosCollector) | [BiosAnalyzer](#Data-Analyzer-Class-BiosAnalyzer) | | CmdlinePlugin | cat /proc/cmdline | **Analyzer Args:**
- `required_cmdline`: Union[str, list]
- `banned_cmdline`: Union[str, list] | [CmdlineDataModel](#CmdlineDataModel-Model) | [CmdlineCollector](#Collector-Class-CmdlineCollector) | [CmdlineAnalyzer](#Data-Analyzer-Class-CmdlineAnalyzer) | | DeviceEnumerationPlugin | lscpu \| grep Socket \| awk '{ print $2 }'
powershell -Command "(Get-WmiObject -Class Win32_Processor \| Measure-Object).Count"
lspci -d {vendorid_ep}: \| grep -i 'VGA\\|Display\\|3D' \| wc -l
powershell -Command "(wmic path win32_VideoController get name \| findstr AMD \| Measure-Object).Count"
lspci -d {vendorid_ep}: \| grep -i 'Virtual Function' \| wc -l
powershell -Command "(Get-VMHostPartitionableGpu \| Measure-Object).Count" | **Analyzer Args:**
- `cpu_count`: Optional[list[int]]
- `gpu_count`: Optional[list[int]]
- `vf_count`: Optional[list[int]] | [DeviceEnumerationDataModel](#DeviceEnumerationDataModel-Model) | [DeviceEnumerationCollector](#Collector-Class-DeviceEnumerationCollector) | [DeviceEnumerationAnalyzer](#Data-Analyzer-Class-DeviceEnumerationAnalyzer) | @@ -14,7 +14,7 @@ | JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [JournalData](#JournalData-Model) | [JournalCollector](#Collector-Class-JournalCollector) | - | | KernelPlugin | sh -c 'uname -a'
wmic os get Version /Value | **Analyzer Args:**
- `exp_kernel`: Union[str, list]
- `regex_match`: bool | [KernelDataModel](#KernelDataModel-Model) | [KernelCollector](#Collector-Class-KernelCollector) | [KernelAnalyzer](#Data-Analyzer-Class-KernelAnalyzer) | | KernelModulePlugin | cat /proc/modules
wmic os get Version /Value | **Analyzer Args:**
- `kernel_modules`: dict[str, dict]
- `regex_filter`: list[str] | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) | -| MemoryPlugin | free -b
wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) | +| MemoryPlugin | free -b
/usr/bin/lsmem
wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) | | NvmePlugin | nvme smart-log {dev}
nvme error-log {dev} --log-entries=256
nvme id-ctrl {dev}
nvme id-ns {dev}{ns}
nvme fw-log {dev}
nvme self-test-log {dev}
nvme get-log {dev} --log-id=6 --log-len=512
nvme telemetry-log {dev} --output-file={dev}_{f_name} | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - | | OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/*release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'
cat /etc/*release \| grep VERSION_ID
wmic os get Version /value
wmic os get Caption /Value | **Analyzer Args:**
- `exp_os`: Union[str, list]
- `exact_match`: bool | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) | | PackagePlugin | dnf list --installed
dpkg-query -W
pacman -Q
cat /etc/*release
wmic product get name,version | **Analyzer Args:**
- `exp_package_ver`: Dict[str, Optional[str]]
- `regex_match`: bool | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) | @@ -42,12 +42,14 @@ Class for collection of inband tool amd-smi data. - **AMD_SMI_EXE**: `amd-smi` - **SUPPORTED_OS_FAMILY**: `{}` -- **CMD_VERSION**: `amd-smi version --json` -- **CMD_LIST**: `amd-smi list --json` -- **CMD_PROCESS**: `amd-smi process --json` -- **CMD_PARTITION**: `amd-smi partition --json` -- **CMD_FIRMWARE**: `amd-smi firmware --json` -- **CMD_STATIC**: `amd-smi static -g all --json` +- **CMD_VERSION**: `version --json` +- **CMD_LIST**: `list --json` +- **CMD_PROCESS**: `process --json` +- **CMD_PARTITION**: `partition --json` +- **CMD_FIRMWARE**: `firmware --json` +- **CMD_STATIC**: `static -g all --json` +- **CMD_STATIC_GPU**: `static -g {gpu_id} --json` +- **CMD_RAS**: `ras --cper --folder={folder}` ### Provides Data @@ -55,12 +57,14 @@ AmdSmiDataModel ### Commands -- amd-smi firmware --json -- amd-smi list --json -- amd-smi partition --json -- amd-smi process --json -- amd-smi static -g all --json -- amd-smi version --json +- firmware --json +- list --json +- partition --json +- process --json +- ras --cper --folder={folder} +- static -g all --json +- static -g {gpu_id} --json +- version --json ## Collector Class BiosCollector @@ -300,6 +304,7 @@ Collect memory usage details - **CMD_WINDOWS**: `wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value` - **CMD**: `free -b` +- **CMD_LSMEM**: `/usr/bin/lsmem` ### Provides Data @@ -308,6 +313,7 @@ MemoryDataModel ### Commands - free -b +- /usr/bin/lsmem - wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value ## Collector Class NvmeCollector @@ -646,10 +652,15 @@ Data model for amd-smi data. - **gpu_list**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiListItem]]` - **partition**: `Optional[nodescraper.plugins.inband.amdsmi.amdsmidata.Partition]` - **process**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Processes]]` +- **topology**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Topo]]` - **firmware**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Fw]]` - **bad_pages**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.BadPages]]` - **static**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiStatic]]` - **metric**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiMetric]]` +- **xgmi_metric**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.XgmiMetrics]]` +- **xgmi_link**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.XgmiLinks]]` +- **cper_data**: `Optional[list[nodescraper.models.datamodel.FileModel]]` +- **amdsmitst_data**: `nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiTstData` ## BiosDataModel Model @@ -763,6 +774,7 @@ Data model for journal logs - **mem_free**: `str` - **mem_total**: `str` +- **lsmem_output**: `Optional[dict]` ## NvmeDataModel Model @@ -915,7 +927,11 @@ Data model for in band syslog logs ## Data Analyzer Class AmdSmiAnalyzer -**Bases**: ['DataAnalyzer'] +### Description + +Check AMD SMI Application data for PCIe, ECC errors, CPER data, and analyze amdsmitst metrics + +**Bases**: ['CperAnalysisTaskMixin', 'DataAnalyzer'] **Link to code**: [amdsmi_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/amdsmi/amdsmi_analyzer.py) @@ -1213,6 +1229,9 @@ Check sysctl matches expected sysctl details - **devid_ep**: `Optional[str]` - **devid_ep_vf**: `Optional[str]` - **sku_name**: `Optional[str]` +- **expected_xgmi_speed**: `Optional[list[float]]` +- **analysis_range_start**: `Optional[datetime.datetime]` +- **analysis_range_end**: `Optional[datetime.datetime]` ## Analyzer Args Class BiosAnalyzerArgs