Skip to content

Commit 4122d89

Browse files
author
Ian
committed
version bump.
Updated libraries Updated README Addde documentaiton to functions
1 parent 36ed57f commit 4122d89

9 files changed

Lines changed: 548 additions & 112 deletions

File tree

Cargo.lock

Lines changed: 9 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
[package]
22
name = "single_algebra"
3-
version = "0.8.6"
3+
version = "0.8.7"
44
edition = "2021"
55
license-file = "LICENSE.md"
66
description = "A linear algebra convenience library for the single-rust library. Can be used externally as well."
77
categories = ["science"]
88
repository = "https://github.com/SingleRust/single-algebra"
9+
homepage = "https://singlerust.com"
910

1011
[[bench]]
1112
name = "csc_matrix_benchmark"
@@ -28,10 +29,10 @@ nalgebra-sparse = "0.10"
2829
ndarray = { version = "0.16", features = ["rayon"] }
2930
nshare = { version = "0.10.0", features = ["ndarray", "nalgebra"] }
3031
num-traits = "0.2.19"
31-
rayon = "1.10.0"
32-
simba = { version = "0.9.0", optional = true }
32+
rayon = "1.11.0"
33+
simba = { version = "0.9.1", optional = true }
3334
rand = "0.9.2"
34-
single-utilities = "0.8.0"
35+
single-utilities = "0.8.5"
3536
linfa-tsne = {version = "0.7.1"}
3637
linfa = "0.7.1"
3738
single-svdlib = "1.0.6"

README.md

Lines changed: 103 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -1,99 +1,98 @@
11
# single-algebra 🧮
22

3-
A powerful linear algebra and machine learning utilities library for Rust, providing efficient matrix operations, dimensionality reduction, and statistical analysis tools.
3+
A high-performance linear algebra library optimized for sparse matrices and dimensionality reduction algorithms. Designed for machine learning, data analysis, and scientific computing applications where efficiency with sparse data is crucial.
44

55
## Features 🚀
66

7-
- **Efficient Matrix Operations**: Support for both dense and sparse matrices (CSR/CSC formats)
8-
- **Dimensionality Reduction**: PCA implementations for both dense and sparse matrices
9-
- **SVD Implementations**: Multiple SVD backends including LAPACK and Faer
10-
- **Statistical Analysis**: Comprehensive statistical operations with batch processing support
11-
- **Similarity Measures**: Collection of distance/similarity metrics for high-dimensional data
12-
- **Masking Support**: Selective data processing with boolean masks
13-
- **Parallel Processing**: Efficient multi-threaded implementations using Rayon
14-
- **Feature-Rich**: Configurable through feature flags for specific needs
7+
- **Sparse Matrix Operations**: Efficient CSR/CSC matrix implementations with comprehensive operations
8+
- **Advanced PCA**: Multiple PCA variants including standard and masked sparse PCA
9+
- **Flexible SVD**: Support for both Lanczos and randomized SVD algorithms
10+
- **Feature Masking**: Selective analysis of feature subsets for targeted dimensionality reduction
11+
- **Parallel Processing**: Multi-threaded operations using Rayon for large datasets
12+
- **Memory Efficient**: Optimized for large, sparse datasets that don't fit in memory
13+
- **Type Generic**: Supports both `f32` and `f64` numeric types
14+
- **Utilities**: Data preprocessing with normalization and logarithmic transformations
1515

16-
## Matrix Operations 📊
16+
## Core Modules 📊
1717

18-
- **SVD Decomposition**: Choose between parallel, LAPACK, or Faer implementations
19-
- **Sparse Matrix Support**: Comprehensive operations for CSR and CSC sparse matrix formats
20-
- **Masked Operations**: Selective data processing with boolean masks
21-
- **Batch Processing**: Statistical operations grouped by batch identifiers
22-
- **Normalization**: Row and column normalization with customizable targets
23-
24-
## Dimensionality Reduction ⬇️
25-
26-
- **PCA Framework**: Flexible implementation with customizable SVD backends
27-
- **Dense Matrix PCA**: Optimized implementation for dense matrices
28-
- **Sparse Matrix PCA**: Memory-efficient PCA for sparse matrices
29-
- **Masked Sparse PCA**: Apply PCA on selected features only
30-
- **Incremental Processing**: Support for large datasets that don't fit in memory
31-
32-
## Similarity Measures 📏
33-
34-
- **Cosine Similarity**: Measure similarity based on the cosine of the angle between vectors
35-
- **Euclidean Similarity**: Similarity based on Euclidean distance
36-
- **Pearson Similarity**: Measure linear correlation between vectors
37-
- **Manhattan Similarity**: Similarity based on Manhattan distance
38-
- **Jaccard Similarity**: Measure similarity as intersection over union
39-
40-
## Statistical Analysis 📈
41-
42-
- **Basic Statistics**: Mean, variance, sum, min/max operations
43-
- **Batch Statistics**: Compute statistics grouped by batch identifiers
44-
- **Matrix Variance**: Efficient variance calculations for matrices
45-
- **Nonzero Counting**: Count non-zero elements in sparse matrices
46-
- **Masked Statistics**: Compute statistics on selected rows/columns only
18+
### Sparse Matrix Operations
19+
- **CSR/CSC Formats**: Comprehensive sparse matrix support with efficient storage
20+
- **Matrix Arithmetic**: Sum operations, column statistics, and element-wise operations
21+
- **Memory Optimization**: Designed for large, high-dimensional sparse datasets
22+
23+
### Dimensionality Reduction ⬇️
24+
- **Sparse PCA**: Principal Component Analysis optimized for sparse CSR matrices
25+
- **Masked Sparse PCA**: PCA with feature masking for selective analysis
26+
- **SVD Algorithms**: Choice between Lanczos (exact) and randomized (fast) SVD methods
27+
- **Variance Analysis**: Explained variance ratios and cumulative variance calculations
28+
- **Feature Importance**: Component loading analysis for feature interpretation
29+
30+
### Data Preprocessing �
31+
- **Normalization**: Row and column normalization utilities
32+
- **Log Transformations**: Log1P transformations for numerical stability
33+
- **Centering**: Optional data centering for PCA and other algorithms
4734

4835
## Installation
4936

5037
Add this to your `Cargo.toml`:
5138

5239
```toml
5340
[dependencies]
54-
single-algebra = "0.5.0"
41+
single-algebra = "0.8.6"
5542
```
5643

57-
### Feature Flags
44+
## Usage Examples
5845

59-
Enable optional features based on your needs:
46+
### Sparse PCA with Builder Pattern
6047

61-
```toml
62-
[dependencies]
63-
single-algebra = { version = "0.5.0", features = ["lapack", "faer"] }
64-
```
48+
```rust
49+
use nalgebra_sparse::CsrMatrix;
50+
use single_algebra::dimred::pca::{SparsePCABuilder, SVDMethod};
51+
use single_algebra::dimred::pca::sparse::PowerIterationNormalizer;
6552

66-
Available features:
67-
- `smartcore`: Enable integration with the SmartCore machine learning library
68-
- `lapack`: Use the LAPACK backend for linear algebra operations
69-
- `faer`: Use the Faer backend for linear algebra operations
70-
- `simba`: Enable SIMD optimizations via simba
53+
// Create or load your sparse matrix (samples × features)
54+
let sparse_matrix: CsrMatrix<f64> = create_your_sparse_matrix();
7155

72-
## Usage Examples
56+
// Build PCA with customized parameters
57+
let mut pca = SparsePCABuilder::new()
58+
.n_components(50)
59+
.center(true)
60+
.verbose(true)
61+
.svd_method(SVDMethod::Random {
62+
n_oversamples: 10,
63+
n_power_iterations: 7,
64+
normalizer: PowerIterationNormalizer::QR,
65+
})
66+
.build();
67+
68+
// Fit and transform data
69+
let transformed = pca.fit_transform(&sparse_matrix).unwrap();
70+
71+
// Analyze results
72+
let explained_variance_ratio = pca.explained_variance_ratio().unwrap();
73+
let cumulative_variance = pca.cumulative_explained_variance_ratio().unwrap();
74+
let feature_importance = pca.feature_importances().unwrap();
75+
```
7376

74-
### Basic PCA with LAPACK Backend
77+
### Masked Sparse PCA for Feature Subset Analysis
7578

7679
```rust
77-
use ndarray::{Array2, ArrayView2};
78-
use single_algebra::dimred::pca::dense::{PCABuilder, LapackSVD};
80+
use single_algebra::dimred::pca::{MaskedSparsePCABuilder, SVDMethod};
7981

80-
// Create a sample matrix
81-
let data = array![[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]];
82+
// Create a feature mask (true = include, false = exclude)
83+
let feature_mask = vec![true, false, true, true, false, true]; // Include features 0, 2, 3, 5
8284

83-
// Build PCA with LAPACK backend
84-
let mut pca = PCABuilder::new(LapackSVD)
85-
.n_components(2)
85+
// Build masked PCA
86+
let mut masked_pca = MaskedSparsePCABuilder::new()
87+
.n_components(10)
88+
.mask(feature_mask)
8689
.center(true)
87-
.scale(false)
90+
.verbose(true)
91+
.svd_method(SVDMethod::Lanczos)
8892
.build();
8993

90-
// Fit and transform data
91-
pca.fit(data.view()).unwrap();
92-
let transformed = pca.transform(data.view()).unwrap();
93-
94-
// Access results
95-
let components = pca.components().unwrap();
96-
let explained_variance = pca.explained_variance_ratio().unwrap();
94+
// Perform PCA on selected features only
95+
let transformed = masked_pca.fit_transform(&sparse_matrix).unwrap();
9796
```
9897

9998
### Sparse Matrix Operations
@@ -103,52 +102,58 @@ use nalgebra_sparse::{CooMatrix, CsrMatrix};
103102
use single_algebra::sparse::MatrixSum;
104103

105104
// Create a sparse matrix
106-
let mut coo = CooMatrix::new(3, 3);
107-
coo.push(0, 0, 1.0);
108-
coo.push(1, 1, 2.0);
109-
coo.push(2, 2, 3.0);
105+
let mut coo = CooMatrix::new(1000, 5000);
106+
// ... populate with data ...
110107
let csr: CsrMatrix<f64> = (&coo).into();
111108

112-
// Calculate column sums
109+
// Efficient column operations
113110
let col_sums: Vec<f64> = csr.sum_col().unwrap();
111+
let col_squared_sums: Vec<f64> = csr.sum_col_squared().unwrap();
114112
```
115113

116-
### Batch Processing
114+
### Data Preprocessing
117115

118116
```rust
119-
use nalgebra_sparse::CsrMatrix;
120-
use single_algebra::sparse::BatchMatrixMean;
117+
use single_algebra::{Normalize, Log1P};
121118

122-
// Sample data with batch identifiers
123-
let matrix = create_sparse_matrix();
124-
let batches = vec!["batch1", "batch1", "batch2", "batch2", "batch3"];
119+
// Apply preprocessing transformations
120+
let normalized_data = your_data.normalize()?;
121+
let log_transformed = your_data.log1p()?;
122+
```
125123

126-
// Calculate mean per batch
127-
let batch_means = matrix.mean_batch_col(&batches).unwrap();
124+
## Algorithm Selection Guide
128125

129-
// Access results for a specific batch
130-
let batch1_means = batch_means.get("batch1").unwrap();
131-
```
126+
### When to Use Each PCA Variant
132127

133-
### Similarity Measures
128+
- **SparsePCA**: For standard dimensionality reduction on sparse matrices
129+
- **MaskedSparsePCA**: When you need to analyze specific feature subsets or handle missing data patterns
134130

135-
```rust
136-
use ndarray::Array1;
137-
use single_algebra::similarity::{SimilarityMeasure, CosineSimilarity};
131+
### SVD Method Selection
138132

139-
let a = Array1::from_vec(vec![1.0, 2.0, 3.0]);
140-
let b = Array1::from_vec(vec![4.0, 5.0, 6.0]);
133+
- **Lanczos**: More accurate, deterministic results. Best for smaller problems or when precision is critical
134+
- **Randomized**: Faster computation, especially for large matrices. Configurable accuracy vs. speed trade-off
141135

142-
let cosine = CosineSimilarity;
143-
let similarity = cosine.calculate(a.view(), b.view());
144-
```
136+
### Performance Optimization
137+
138+
- Use sparse matrices (CSR format) for datasets with >90% zero values
139+
- Enable verbose mode to monitor performance and convergence
140+
- For very large datasets, consider using randomized SVD with appropriate oversampling
141+
- Parallel processing is automatically utilized for transformation operations
142+
143+
## Planned Features 🚧
144+
145+
- **t-SNE**: t-Distributed Stochastic Neighbor Embedding for non-linear visualization
146+
- **UMAP**: Uniform Manifold Approximation and Projection for manifold learning
147+
- **Additional similarity measures**: More distance metrics and similarity functions
148+
- **Batch processing**: Enhanced support for processing data in chunks
145149

146-
## Performance Considerations
150+
## Performance Focus
147151

148-
- For large matrices, consider using sparse representations (CSR/CSC)
149-
- Enable the appropriate backend (`lapack` or `faer`) based on your needs
150-
- Use masked operations when working with subsets of data
151-
- Batch processing can significantly improve performance for grouped operations
152+
This library is specifically optimized for:
153+
- **Large sparse datasets** (text analysis, genomics, recommendation systems)
154+
- **Memory-constrained environments**
155+
- **High-dimensional data** requiring dimensionality reduction
156+
- **Scientific computing** workflows requiring numerical precision
152157

153158
## Contributing
154159

src/dimred/mod.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,21 @@
1+
//! # Dimensionality Reduction
2+
//!
3+
//! This module provides algorithms for reducing the dimensionality of high-dimensional data
4+
//! while preserving important structural properties. These techniques are essential for
5+
//! data visualization, noise reduction, and computational efficiency improvements.
6+
//!
7+
//! ## Currently Available
8+
//! - **PCA** ([`pca`]): Principal Component Analysis for linear dimensionality reduction
9+
//!
10+
//! ## Planned Implementations
11+
//! - **t-SNE**: t-Distributed Stochastic Neighbor Embedding for non-linear visualization
12+
//! - **UMAP**: Uniform Manifold Approximation and Projection for manifold learning
13+
//! - Additional manifold learning techniques
14+
//!
15+
//! ## Algorithm Selection Guide
16+
//! - Use **PCA** for linear relationships, feature analysis, and when interpretability is important
17+
//! - Use **t-SNE** (when available) for non-linear visualization of clusters and local structure
18+
//! - Use **UMAP** (when available) for preserving both local and global structure in embeddings
19+
120
pub mod pca;
221
//pub mod tsne;

0 commit comments

Comments
 (0)