# Appendix F: Algorithmic & Performance Details This document provides a technical summary of the key algorithmic optimizations implemented in the **PyEQSP** Python package to ensure scalability and efficiency for large-scale sphere partitioning analysis. ## Algorithmic Improvements ### Min-Distance Optimization **Status:** Improved from $O(N^2)$ to $O(N \log N)$. - **Previous Bottleneck:** Pairwise distance matrices created using `scipy.spatial.distance.pdist` or NumPy broadcasting consumed $O(N^2)$ memory and time. - **Approach:** We leverage **KDTrees** (`scipy.spatial.KDTree`) for $S^d$ ($d \le 4$) to perform localized neighbour searches. For higher dimensions or specific partition types, we use **structure-aware searches** that exploit the recursive nature of the EQ algorithm to bound the search space. - **Result:** Calculating the min-distance for $N=100,000$ points now takes seconds rather than minutes, and memory usage remains linear. ### Recursive Partitioning Scaling **Status:** Verified $O(\mathcal{N}^{0.6})$ scaling ({ref}`[Leo07] `, Section 3.10.2). - **Approach:** The `eq_regions` function implements the Recursive Zonal Equal Area Partitioning algorithm. We performed a high-fidelity verification sweep for $d$ up to 11 and $\mathcal{N}$ up to $2^{22}$ ($\approx 4.2 \times 10^6$). - **Result:** The performance follows the theoretical $O(\mathcal{N}^{0.6})$ scaling for $S^2$, ensuring that even millions of regions can be calculated in minutes. ### Riesz Energy Calculations **Status:** Memory-efficient $O(N^2)$ with Symmetry Exploitation ($0.5 \times$ work). - **Previous Bottleneck:** Creating a full $N \times N$ distance matrix to sum $d_{ij}^{-s}$ lead to $O(N^2)$ memory exhaustion (e.g., 3.2 GB for $N=20,000$). - **Approach:** - **Block-based Summation (Tiling):** Calculations are performed in blocks of size $M \times M$, keeping the peak memory footprint at $O(N \times M)$ instead of $O(N^2)$. - **Symmetry Exploitation:** Since $d_{ij} = d_{ji}$, we only compute the upper triangle of the interaction matrix, effectively doubling the performance. - **Result:** $N=50,000$ energy calculations complete in under 15 seconds on standard 2026 hardware (e.g. Ryzen 7). ### Histogram-Based Region Lookup ($S^2$) **Status**: Fully Vectorized (Stable with "Index Rotation"). - **Previous Bottleneck**: Assigning points to regions on $S^2$ used a recursive Python loop. The first vectorized iteration faced accuracy failures at the $0/2\pi$ longitude boundary due to non-monotonic search tables in wrapping collars. - **Improved Approach**: We implement **Domain Translation (Index Rotation)** within each collar: a. **Relative Mapping**: Points and region boundaries are shifted by the collar's first boundary ($\phi_0$) using a modulo $2\pi$ operation: `(v - phi0) % 2pi`. b. **Monotonic Conversion**: The final boundary (which would wrap back to $0$) is explicitly set to $2\pi$ to ensure the table is strictly increasing. c. **Direct Lookup**: A single `np.searchsorted` call handles the entire collar in one pass, without the overhead of the legacy `lookup_table` utility (which has been retired to achieve 100% test coverage). d. **Precision Rounding**: Latitude (band) lookups include $10^{-12}$ rounding to prevent points on a boundary from jumping bands due to float variance. - **Result**: Millions of sample points can be binned with mathematical fidelity in milliseconds. This provides a $1000\times$ speedup over the previous recursive loop. ### Symmetric Partition Performance (`even_collars`) **Status:** Vectorized support in all properties functions. - **Approach:** The symmetric partitioning logic (`even_collars=True`) ensures an even number of collars. All property calculation functions (`eq_area_error`, `eq_min_dist`, `eq_energy_dist`, `eq_diam_coeff`) are fully vectorized to support this parameter. - **Result:** Symmetry calculation adds negligible overhead compared to standard partitions. In some cases, the simplified collar recurrence in symmetric mode can lead to slight performance improvements for high $N$. ## Optimized NumPy & SciPy Patterns During development, many "hot" paths were refactored to use more efficient library patterns: ### Vectorized Root Finding In `sradius_of_cap`, we replaced a Python loop over `scipy.optimize.root_scalar` with a vectorized Newton-Raphson implementation using `scipy.optimize.newton` on arrays. This provides a 10–50x speedup for high-dimensional spherical cap radius calculations. ### Efficient Coordinate Transforms Functions in `utilities.py` (like `cart2polar2`) were refactored to use: - `np.atleast_2d` to handle both single points and large arrays uniformly. - Direct array operations instead of `for` loops. - `np.arctan2` and `np.arccos` for numerically stable conversions. ### Avoiding Massive Intermediate Arrays Common patterns like `np.linalg.norm(a[:, None] - b, axis=2)` were replaced with more memory-conscious implementations or block-based loops where the intermediate broadcasting would exceed available RAM. ### Parallel Dimension Calculations ({ref}`[Leo07] `, Figure 3.7) `fig_3_7_max_diam_multi_dim.py` calculates max diameter coefficients for EQ partitions across dimensions $d=2$ to $d=8$. The dimension-8 calculation alone accounts for approximately **81%** of the total CPU time, making it the dominant bottleneck. - **Approach:** Uses `concurrent.futures.ProcessPoolExecutor` with `max_workers=2`. Dimensions are dispatched in **decreasing order** so that `dim=8` begins immediately, while the second worker handles `dim=7` and then the remaining smaller dimensions sequentially. - **Result:** For a full-fidelity thesis run ($N=2^{20}$), wall-clock time is reduced (e.g., from ~1h 45m to ~1h 24m). This matches the theoretical max improvement for a 2-worker strategy given the Amdahl's Law limit imposed by the serial `dim=8` task. --- For more details on the performance roadmap and items currently under review for future releases, see the [Release Roadmap](release_roadmap.md). For detailed benchmarks and instructions on running performance tests yourself, see the [Performance Benchmarks](benchmarks.md).