3. Performance Benchmarks

The eqsp package includes a suite of performance benchmarks to measure the efficiency of core mathematical operations and identify potential bottlenecks.

3.1. Running Benchmarks

3.1.1. All-in-One Runner

To run the entire benchmark suite with default settings:

python3 benchmarks/run_benchmarks.py

3.1.2. Symmetric (Even-Collar) Benchmarks

To run the benchmark suite specifically for symmetric partitions (forcing an even number of collars):

python3 benchmarks/run_benchmarks.py --even-collars

3.1.3. Logarithmic Sampling & Scaling

The Python benchmark suite now matches the MATLAB EQSP Toolbox by using 1-2-5 logarithmic sampling (e.g., 10, 20, 50, 100, 200, 500…). This allows for more precise verification of asymptotic behavior across many orders of magnitude.

Additionally, each script now performs a Scaling Analysis by calculating the best-fitting power \(x\) in \(O(N^x)\) using a log-log regression. This automatically verifies that the implementation follows its theoretical complexity (e.g., \(O(N \log N)\) for spatial lookups).

3.1.4. Configurable Runs

The top-level runner (run_benchmarks.py) supports a subset of common configuration flags:

Flag	Description	Default
`--results-dir`	Directory to save log files.	`benchmarks/results`
`--n-max`	Scales the problem size for all benchmarks.	Varied (see below)
`--regions`	Number of regions in the partition for histogram lookups.	1000
`--s`	Exponent for the Riesz energy calculation.	`dim - 1`

Note: To run benchmarks for specific dimensions or other advanced parameters, use the individual scripts in benchmarks/src/ directly.

# Example: High-dimensional benchmark for recursive partitioning
python3 benchmarks/src/benchmark_eq_regions.py --dim 4 --n-max 1000000

# Example: Symmetric min-distance benchmark on S^2
python3 benchmarks/src/benchmark_mindist.py --dim 2 --even-collars

3.1.5. Default Scales and Dimensions (`--n-max` overrides)

The benchmark suite uses different default dimensions and problem sizes depending on the mathematical property being tested:

eq_area_error: n-max=100,000,000, dim=2 (\(S^2\))
point_set_energy_dist: n-max=50,000, dim=2 (\(S^2\))
sradius_of_cap: n-max=10,000,000, dim=3 (\(S^3\)) (standard runner only)
eq_regions: n-max=100,000,000, dim=2 (\(S^2\))
eq_min_dist: n-max=10,000,000, dim=2 (\(S^2\))
eq_find_s2_region: n-max=10,000,000, dim=2 (\(S^2\))

3.2. Benchmark Categories

eq_area_error: Measures the time to calculate area errors for a range of partition sizes. This captures \(O(N)\) recurrence overhead.
point_set_energy_dist: Measures energy and distance calculations. This captures the optimized block-based summation performance.
sradius_of_cap (standard runner only): Benchmarks the root-finding logic used for spherical cap calculations. Defaults to \(S^3\) to exercise higher-dimensional root finding.
eq_regions: Measures the overhead of the Python loop used in recursive partitioning.
eq_min_dist: Measures the performance of the structure-aware min-distance calculation.
eq_find_s2_region: Measures the performance of the vectorized histogram-based region lookup on \(S^2\).

3.3. Thesis Scaling Benchmark (Section 3.10.2)

For formal verification of the partitioning algorithm’s scaling, use the dedicated thesis benchmark:

# Run the formal d=[1..8], N=2^k sweep
python3 benchmarks/src/benchmark_eq_regions.py --show-progress

# Compare with even collars
python3 benchmarks/src/benchmark_eq_regions.py --show-progress --even-collars

This script generates high-fidelity timing data to verify the \(O(\mathcal{N}^{0.6})\) scaling theory described in Chapter 3 of the thesis. The symmetric partition method (even_collars=True) generally follows the same scaling but may be slightly faster for certain \(N\) due to the forced even number of collars simplifying the recurrence.

3.4. Interpreting Results

Benchmarks are printed to the console and also saved as individual log files in benchmarks/results/.

3.4.1. Log Naming Convention

Standard Runs (run_benchmarks.py without flags): Individual logs are saved as benchmark_[name].log; the summary log is run_benchmarks.log.
Symmetric Runs (run_benchmarks.py --even-collars): Individual logs are saved with an _even suffix (e.g., benchmark_eq_regions_even.log); the summary log is run_benchmarks_even.log.

3.4.2. Warm-up Phases

To minimize variability from JIT/caching effects, each benchmark task includes an initial un-timed warm-up call. This ensures that subsequent measurements represent steady-state performance.

This allows for side-by-side performance comparison of the different partitioning strategies.

\(O(N^2)\) scaling: Common in distance matrix calculations.
Python Loop Overhead: Evident in the recursive partitioning logic.
Cache Misses: May occur when \(N\) exceeds specific size thresholds.

For a detailed analysis of known bottlenecks and optimization opportunities, see the performance highlights in Algorithmic Implementation & Optimizations.

Note

Beta Feedback Wanted We are specifically looking for performance data and algorithm scaling results across different Operating Systems and CPU architectures (e.g., Apple Silicon, Windows 11, different Linux distros). If you run these benchmarks, please consider sharing your results in our GitHub Discussions.

3.5. Future Optimization Opportunities

3.5.1. Persistent Caching

The current implementation of eq_regions use a local, ephemeral cache to exploit hemispherical symmetry within a single function call.

An opportunity exists to add a persistent caching layer (e.g., using functools.lru_cache or a disk-backed cache) for common partition sizes (\(N\)). This would provide significant speedups for analyses that repeatedly iterate through ranges of \(N\), common in property convergence studies.

3.5.2. Energy Calculation Optimization

The calculation of the full Riesz energy sum (\(s \neq 0\)) is inherently \(O(N^2)\), as it requires visiting every pair of points.

To address the \(O(N^2)\) memory bottleneck, eqsp uses a block-based processing (tiling) approach that limits peak memory usage to \(O(N \times \text{block\_size})\). Additionally, the implementation exploits symmetry (\(d_{ij} = d_{ji}\)) to reduce the computational work by half.

3.5.3. Accelerated Loops

The recursive structure of the partitioning algorithm and the coordinate search logic are prime candidates for acceleration using Numba or Cython to remove Python interpreter overhead in hot loops.

For a comprehensive view of proposed enhancements currently under review, see the Release Roadmap.