You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+67-55Lines changed: 67 additions & 55 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
[](https://www.codacy.com/gh/SC-SGS/PLSSVM/dashboard?utm_source=github.com&utm_medium=referral&utm_content=SC-SGS/PLSSVM&utm_campaign=Badge_Grade) [](https://sc-sgs.github.io/PLSSVM/) [](https://simsgs.informatik.uni-stuttgart.de/jenkins/view/PLSSVM/job/PLSSVM/job/Multibranch-Github/job/main/) [](https://github.com/SC-SGS/PLSSVM/actions/workflows/msvc_windows.yml)
4
4
5
5
A [Support Vector Machine (SVM)](https://en.wikipedia.org/wiki/Support-vector_machine) is a supervised machine learning model.
6
-
In its basic form SVMs are used for binary classification tasks.
6
+
In its basic form SVMs are used for binary classification tasks.
7
7
Their fundamental idea is to learn a hyperplane which separates the two classes best, i.e., where the widest possible margin around its decision boundary is free of data.
8
8
This is also the reason, why SVMs are also called "large margin classifiers".
9
9
To predict to which class a new, unseen data point belongs, the SVM simply has to calculate on which side of the previously calculated hyperplane the data point lies.
@@ -28,40 +28,46 @@ We decided to use the [Conjugate Gradient (CG)](https://en.wikipedia.org/wiki/Co
28
28
Since one of our main goals was performance, we parallelized the implicit matrix-vector multiplication inside the CG algorithm.
29
29
To do so, we use multiple different frameworks to be able to target a broad variety of different hardware platforms.
30
30
The currently available frameworks (also called backends in our PLSSVM implementation) are:
31
-
-[OpenMP](https://www.openmp.org/)
32
-
-[CUDA](https://developer.nvidia.com/cuda-zone)
33
-
-[OpenCL](https://www.khronos.org/opencl/)
34
-
-[SYCL](https://www.khronos.org/sycl/) (tested implementations are [DPC++](https://github.com/intel/llvm) and [hipSYCL](https://github.com/illuhad/hipSYCL))
31
+
32
+
-[OpenMP](https://www.openmp.org/)
33
+
-[CUDA](https://developer.nvidia.com/cuda-zone)
34
+
-[OpenCL](https://www.khronos.org/opencl/)
35
+
-[SYCL](https://www.khronos.org/sycl/) (tested implementations are [DPC++](https://github.com/intel/llvm) and [hipSYCL](https://github.com/illuhad/hipSYCL))
35
36
36
37
## Getting Started
37
38
38
39
### Dependencies
39
40
40
41
General dependencies:
41
-
- a C++17 capable compiler (e.g. [`gcc`](https://gcc.gnu.org/) or [`clang`](https://clang.llvm.org/))
42
-
-[CMake](https://cmake.org/) 3.18 or newer
43
-
-[cxxopts](https://github.com/jarro2783/cxxopts), [fast_float](https://github.com/fastfloat/fast_float) and [{fmt}](https://github.com/fmtlib/fmt) (all three are automatically build during the CMake configuration if they couldn't be found using the respective `find_package` call)
44
-
-[GoogleTest](https://github.com/google/googletest) if testing is enabled (automatically build during the CMake configuration if `find_package(GTest)` wasn't successful)
45
-
-[doxygen](https://www.doxygen.nl/index.html) if documentation generation is enabled
46
-
-[OpenMP](https://www.openmp.org/) 4.0 or newer (optional) to speed-up file parsing
47
-
- multiple Python modules used in the utility scripts; <br>to install all modules use `pip install --user -r install/python_requirements.txt`
42
+
43
+
- a C++17 capable compiler (e.g. [`gcc`](https://gcc.gnu.org/) or [`clang`](https://clang.llvm.org/))
44
+
-[CMake](https://cmake.org/) 3.18 or newer
45
+
-[cxxopts](https://github.com/jarro2783/cxxopts), [fast_float](https://github.com/fastfloat/fast_float) and [{fmt}](https://github.com/fmtlib/fmt) (all three are automatically build during the CMake configuration if they couldn't be found using the respective `find_package` call)
46
+
-[GoogleTest](https://github.com/google/googletest) if testing is enabled (automatically build during the CMake configuration if `find_package(GTest)` wasn't successful)
47
+
-[doxygen](https://www.doxygen.nl/index.html) if documentation generation is enabled
48
+
-[OpenMP](https://www.openmp.org/) 4.0 or newer (optional) to speed-up file parsing
49
+
- multiple Python modules used in the utility scripts, to install all modules use `pip install --user -r install/python_requirements.txt`
48
50
49
51
Additional dependencies for the OpenMP backend:
50
-
- compiler with OpenMP support
52
+
53
+
- compiler with OpenMP support
51
54
52
55
Additional dependencies for the CUDA backend:
53
-
- CUDA SDK
54
-
- either NVIDIA [`nvcc`](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html) or [`clang` with CUDA support enabled](https://llvm.org/docs/CompileCudaWithLLVM.html)
56
+
57
+
- CUDA SDK
58
+
- either NVIDIA [`nvcc`](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html) or [`clang` with CUDA support enabled](https://llvm.org/docs/CompileCudaWithLLVM.html)
55
59
56
60
Additional dependencies for the OpenCL backend:
57
-
- OpenCL runtime and header files
61
+
62
+
- OpenCL runtime and header files
58
63
59
64
Additional dependencies for the SYCL backend:
60
-
- the code must be compiled with a SYCL capable compiler; currently tested with [DPC++](https://github.com/intel/llvm) and [hipSYCL](https://github.com/illuhad/hipSYCL)
65
+
66
+
- the code must be compiled with a SYCL capable compiler; currently tested with [DPC++](https://github.com/intel/llvm) and [hipSYCL](https://github.com/illuhad/hipSYCL)
61
67
62
68
Additional dependencies if `PLSSVM_ENABLE_TESTING` and `PLSSVM_GENERATE_TEST_FILE` are both set to `ON`:
63
-
-[Python3](https://www.python.org/) with the [`argparse`](https://docs.python.org/3/library/argparse.html), [`timeit`](https://docs.python.org/3/library/timeit.html) and [`sklearn`](https://scikit-learn.org/stable/) modules
64
69
70
+
-[Python3](https://www.python.org/) with the [`argparse`](https://docs.python.org/3/library/argparse.html), [`timeit`](https://docs.python.org/3/library/timeit.html) and [`sklearn`](https://scikit-learn.org/stable/) modules
65
71
66
72
### Building
67
73
@@ -79,17 +85,18 @@ cmake --build .
79
85
80
86
The **required** CMake option `PLSSVM_TARGET_PLATFORMS` is used to determine for which targets the backends should be compiled.
81
87
Valid targets are:
82
-
-`cpu`: compile for the CPU; an **optional** architectural specifications is allowed but only used when compiling with DPC++, e.g., `cpu:avx2`
83
-
-`nvidia`: compile for NVIDIA GPUs; **at least one** architectural specification is necessary, e.g., `nvidia:sm_86,sm_70`
84
-
-`amd`: compile for AMD GPUs; **at least one** architectural specification is necessary, e.g., `amd:gfx906`
85
-
-`intel`: compile for Intel GPUs; **at least one** architectural specification is necessary, e.g., `intel:skl`
88
+
89
+
-`cpu`: compile for the CPU; an **optional** architectural specifications is allowed but only used when compiling with DPC++, e.g., `cpu:avx2`
90
+
-`nvidia`: compile for NVIDIA GPUs; **at least one** architectural specification is necessary, e.g., `nvidia:sm_86,sm_70`
91
+
-`amd`: compile for AMD GPUs; **at least one** architectural specification is necessary, e.g., `amd:gfx906`
92
+
-`intel`: compile for Intel GPUs; **at least one** architectural specification is necessary, e.g., `intel:skl`
86
93
87
94
At least one of the above targets must be present.
88
95
89
96
Note that when using DPC++ only a single architectural specification for `cpu` or `amd` is allowed.
90
97
91
98
To retrieve the architectural specifications of the current system, a simple Python3 script `utility/plssvm_target_platforms.py` is provided
[`GPUtil`](https://pypi.org/project/GPUtil/), [`pyamdgpuinfo`](https://pypi.org/project/pyamdgpuinfo/), and
95
102
[`pylspci`](https://pypi.org/project/pylspci/))
@@ -120,46 +127,50 @@ cpu:avx512;intel:dg1
120
127
```
121
128
122
129
If the architectural information for the requested GPU could not be retrieved, one option would be to have a look at:
123
-
- for NVIDIA GPUs: [Your GPU Compute Capability](https://developer.nvidia.com/cuda-gpus)
124
-
- for AMD GPUs: [clang AMDGPU backend usage](https://llvm.org/docs/AMDGPUUsage.html)
125
-
- for Intel GPUs and CPUs: [Ahead of Time Compilation](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compilation/ahead-of-time-compilation.html) and [Intel graphics processor table](https://dgpu-docs.intel.com/devices/hardware-table.html)
126
130
131
+
- for NVIDIA GPUs: [Your GPU Compute Capability](https://developer.nvidia.com/cuda-gpus)
132
+
- for AMD GPUs: [clang AMDGPU backend usage](https://llvm.org/docs/AMDGPUUsage.html)
133
+
- for Intel GPUs and CPUs: [Ahead of Time Compilation](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compilation/ahead-of-time-compilation.html) and [Intel graphics processor table](https://dgpu-docs.intel.com/devices/hardware-table.html)
127
134
128
135
#### Optional CMake Options
129
136
130
137
The `[optional_options]` can be one or multiple of:
-`ON`: check for the SYCL backend and fail if not available
156
+
-`AUTO`: check for the SYCL backend but **do not** fail if not available
157
+
-`OFF`: do not check for the SYCL backend
148
158
149
159
**Attention:** at least one backend must be enabled and available!
150
160
151
-
-`PLSSVM_ENABLE_ASSERTS=ON|OFF` (default: `OFF`): enables custom assertions regardless whether the `DEBUG` macro is defined or not
152
-
-`PLSSVM_THREAD_BLOCK_SIZE` (default: `16`): set a specific thread block size used in the GPU kernels (for fine-tuning optimizations)
153
-
-`PLSSVM_INTERNAL_BLOCK_SIZE` (default: `6`: set a specific internal block size used in the GPU kernels (for fine-tuning optimizations)
154
-
-`PLSSVM_EXECUTABLES_USE_SINGLE_PRECISION` (default: `OFF`): enables single precision calculations instead of double precision for the `svm-train` and `svm-predict` executables
155
-
-`PLSSVM_ENABLE_LTO=ON|OFF` (default: `ON`): enable interprocedural optimization (IPO/LTO) if supported by the compiler
156
-
-`PLSSVM_ENABLE_DOCUMENTATION=ON|OFF` (default: `OFF`): enable the `doc` target using doxygen
157
-
-`PLSSVM_ENABLE_TESTING=ON|OFF` (default: `ON`): enable testing using GoogleTest and ctest
158
-
-`PLSSVM_GENERATE_TIMING_SCRIPT=ON|OFF` (default: `OFF`): configure a timing script usable for performance measurement
161
+
-`PLSSVM_ENABLE_ASSERTS=ON|OFF` (default: `OFF`): enables custom assertions regardless whether the `DEBUG` macro is defined or not
162
+
-`PLSSVM_THREAD_BLOCK_SIZE` (default: `16`): set a specific thread block size used in the GPU kernels (for fine-tuning optimizations)
163
+
-`PLSSVM_INTERNAL_BLOCK_SIZE` (default: `6`: set a specific internal block size used in the GPU kernels (for fine-tuning optimizations)
164
+
-`PLSSVM_EXECUTABLES_USE_SINGLE_PRECISION` (default: `OFF`): enables single precision calculations instead of double precision for the `svm-train` and `svm-predict` executables
165
+
-`PLSSVM_ENABLE_LTO=ON|OFF` (default: `ON`): enable interprocedural optimization (IPO/LTO) if supported by the compiler
166
+
-`PLSSVM_ENABLE_DOCUMENTATION=ON|OFF` (default: `OFF`): enable the `doc` target using doxygen
167
+
-`PLSSVM_ENABLE_TESTING=ON|OFF` (default: `ON`): enable testing using GoogleTest and ctest
168
+
-`PLSSVM_GENERATE_TIMING_SCRIPT=ON|OFF` (default: `OFF`): configure a timing script usable for performance measurement
159
169
160
170
If `PLSSVM_ENABLE_TESTING` is set to `ON`, the following options can also be set:
161
-
-`PLSSVM_GENERATE_TEST_FILE=ON|OFF` (default: `ON`): automatically generate test files
162
-
-`PLSSVM_TEST_FILE_NUM_DATA_POINTS` (default: `5000`): the number of data points in the test file
171
+
172
+
-`PLSSVM_GENERATE_TEST_FILE=ON|OFF` (default: `ON`): automatically generate test files
173
+
-`PLSSVM_TEST_FILE_NUM_DATA_POINTS` (default: `5000`): the number of data points in the test file
163
174
164
175
If the SYCL backend is available and DPC++ is used, the option `PLSSVM_SYCL_DPCPP_USE_LEVEL_ZERO` can be used to select Level-Zero as the
165
176
DPC++ backend instead of OpenCL.
@@ -190,9 +201,11 @@ The resulting `html` coverage report is located in the `coverage` folder in the
190
201
### Creating the documentation
191
202
192
203
If doxygen is installed and `PLSSVM_ENABLE_DOCUMENTATION` is set to `ON` the documentation can be build using
204
+
193
205
```bash
194
206
make doc
195
207
```
208
+
196
209
The documentation of the current state of the main branch can be found [here](https://sc-sgs.github.io/PLSSVM/).
197
210
198
211
## Installing
@@ -211,8 +224,8 @@ The repository comes with a Python3 script (in the `utility_scripts/` directory)
211
224
212
225
In order to use all functionality, the following Python3 modules must be installed:
0 commit comments