3. Performance Benchmarks

The eqsp package includes a suite of performance benchmarks to measure the efficiency of core mathematical operations and identify potential bottlenecks.

3.1. Running Benchmarks

3.1.1. All-in-One Runner

To run the entire benchmark suite with default settings:

python3 benchmarks/run_benchmarks.py

3.1.2. Symmetric (Even-Collar) Benchmarks

To run the benchmark suite specifically for symmetric partitions (forcing an even number of collars):

python3 benchmarks/run_benchmarks.py --even-collars

3.1.3. Logarithmic Sampling & Scaling

The Python benchmark suite now matches the MATLAB EQSP Toolbox by using 1-2-5 logarithmic sampling (e.g., 10, 20, 50, 100, 200, 500…). This allows for more precise verification of asymptotic behavior across many orders of magnitude.

Additionally, each script now performs a Scaling Analysis by calculating the best-fitting power \(x\) in \(O(N^x)\) using a log-log regression. This automatically verifies that the implementation follows its theoretical complexity (e.g., \(O(N \log N)\) for spatial lookups).

3.1.4. Configurable Runs

The top-level runner (run_benchmarks.py) supports a subset of common configuration flags:

Flag

Description

Default

--results-dir

Directory to save log files.

benchmarks/results

--n-max

Scales the problem size for all benchmarks.

Varied (see below)

--regions

Number of regions in the partition for histogram lookups.

1000

--s

Exponent for the Riesz energy calculation.

dim - 1

Note: To run benchmarks for specific dimensions or other advanced parameters, use the individual scripts in benchmarks/src/ directly.

# Example: High-dimensional benchmark for recursive partitioning
python3 benchmarks/src/benchmark_eq_regions.py --dim 4 --n-max 1000000

# Example: Symmetric min-distance benchmark on S^2
python3 benchmarks/src/benchmark_mindist.py --dim 2 --even-collars

3.1.5. Default Scales and Dimensions (--n-max overrides)

The benchmark suite uses different default dimensions and problem sizes depending on the mathematical property being tested:

  • eq_area_error: n-max=100,000,000, dim=2 (\(S^2\))

  • point_set_energy_dist: n-max=50,000, dim=2 (\(S^2\))

  • sradius_of_cap: n-max=10,000,000, dim=3 (\(S^3\)) (standard runner only)

  • eq_regions: n-max=100,000,000, dim=2 (\(S^2\))

  • eq_min_dist: n-max=10,000,000, dim=2 (\(S^2\))

  • eq_find_s2_region: n-max=10,000,000, dim=2 (\(S^2\))

3.2. Benchmark Categories

  1. eq_area_error: Measures the time to calculate area errors for a range of partition sizes. This captures \(O(N)\) recurrence overhead.

  2. point_set_energy_dist: Measures energy and distance calculations. This captures the optimized block-based summation performance.

  3. sradius_of_cap (standard runner only): Benchmarks the root-finding logic used for spherical cap calculations. Defaults to \(S^3\) to exercise higher-dimensional root finding.

  4. eq_regions: Measures the overhead of the Python loop used in recursive partitioning.

  5. eq_min_dist: Measures the performance of the structure-aware min-distance calculation.

  6. eq_find_s2_region: Measures the performance of the vectorized histogram-based region lookup on \(S^2\).

3.3. Thesis Scaling Benchmark (Section 3.10.2)

For formal verification of the partitioning algorithm’s scaling, use the dedicated thesis benchmark:

# Run the formal d=[1..8], N=2^k sweep
python3 benchmarks/src/benchmark_eq_regions.py --show-progress

# Compare with even collars
python3 benchmarks/src/benchmark_eq_regions.py --show-progress --even-collars

This script generates high-fidelity timing data to verify the \(O(\mathcal{N}^{0.6})\) scaling theory described in Chapter 3 of the thesis. The symmetric partition method (even_collars=True) generally follows the same scaling but may be slightly faster for certain \(N\) due to the forced even number of collars simplifying the recurrence.

3.4. Interpreting Results

Benchmarks are printed to the console and also saved as individual log files in benchmarks/results/.

3.4.1. Log Naming Convention

  • Standard Runs (run_benchmarks.py without flags): Individual logs are saved as benchmark_[name].log; the summary log is run_benchmarks.log.

  • Symmetric Runs (run_benchmarks.py --even-collars): Individual logs are saved with an _even suffix (e.g., benchmark_eq_regions_even.log); the summary log is run_benchmarks_even.log.

3.4.2. Warm-up Phases

To minimize variability from JIT/caching effects, each benchmark task includes an initial un-timed warm-up call. This ensures that subsequent measurements represent steady-state performance.

This allows for side-by-side performance comparison of the different partitioning strategies.

  • \(O(N^2)\) scaling: Common in distance matrix calculations.

  • Python Loop Overhead: Evident in the recursive partitioning logic.

  • Cache Misses: May occur when \(N\) exceeds specific size thresholds.

For a detailed analysis of known bottlenecks and optimization opportunities, see the performance highlights in Algorithmic Implementation & Optimizations.

Note

Beta Feedback Wanted We are specifically looking for performance data and algorithm scaling results across different Operating Systems and CPU architectures (e.g., Apple Silicon, Windows 11, different Linux distros). If you run these benchmarks, please consider sharing your results in our GitHub Discussions.

3.5. Future Optimization Opportunities

3.5.1. Persistent Caching

The current implementation of eq_regions use a local, ephemeral cache to exploit hemispherical symmetry within a single function call.

An opportunity exists to add a persistent caching layer (e.g., using functools.lru_cache or a disk-backed cache) for common partition sizes (\(N\)). This would provide significant speedups for analyses that repeatedly iterate through ranges of \(N\), common in property convergence studies.

3.5.2. Energy Calculation Optimization

The calculation of the full Riesz energy sum (\(s \neq 0\)) is inherently \(O(N^2)\), as it requires visiting every pair of points.

To address the \(O(N^2)\) memory bottleneck, eqsp uses a block-based processing (tiling) approach that limits peak memory usage to \(O(N \times \text{block\_size})\). Additionally, the implementation exploits symmetry (\(d_{ij} = d_{ji}\)) to reduce the computational work by half.

3.5.3. Accelerated Loops

The recursive structure of the partitioning algorithm and the coordinate search logic are prime candidates for acceleration using Numba or Cython to remove Python interpreter overhead in hot loops.


For a comprehensive view of proposed enhancements currently under review, see the Release Roadmap.