v1.0.1 · MIT License · Python ≥ 3.9

Ultra-optimized heap operations for Python,
written in C.

A single-file C extension delivering SIMD acceleration, GIL-releasing computation, and a multi-tier dispatch system that selects the optimal algorithm at runtime.

$ pip install heapx
✓ Copied to clipboard
$ conda install mukherjee08::heapx
✓ Copied to clipboard
Drop-in. Zero dependencies.
Six functions. One import. Immediate performance.
import heapx

# Min-heap — Floyd's bottom-up O(n) construction
data = [5, 2, 8, 1, 9, 3, 7]
heapx.heapify(data)

# Max-heap — native support, no negation hacks
heapx.heapify(data, max_heap=True)

# Push, pop, remove, replace, merge
heapx.push(data, 6, max_heap=True)
largest = heapx.pop(data, max_heap=True)
top3 = heapx.pop(data, n=3, max_heap=True)

# GIL-releasing for multi-threaded workloads
floats = [float(x) for x in range(1_000_000)]
heapx.heapify(floats, nogil=True)

# Custom keys — computed once in O(n), not O(n log n)
tasks = [{"pri": 10}, {"pri": 1}, {"pri": 5}]
heapx.heapify(tasks, cmp=lambda x: x["pri"])
Engineered at every layer.
9451 lines of C. An 11-priority dispatch table. Zero compromises.
🔁

Drop-In heapq Replacement

Six functions (heapify, push, pop, remove, replace, merge) with native max-heap support, bulk operations, and custom key functions. No negation hacks. No wrappers.

Floyd's Bottom-Up Heapify

Wegener 1993 variant descends to a leaf comparing only children, then bubbles up, reducing comparisons by ~25% versus standard sift-down. O(n) construction.

🧠

Runtime Algorithm Selection

An 11-priority dispatch table selects the optimal code path at runtime based on data type, heap size, arity, key function presence, and element homogeneity.

🔬

SIMD Acceleration

AVX2, SSE2, and ARM NEON intrinsics for type pointer scanning (4 ptrs/cycle) and child selection in quaternary/octonary heaps. Scalar fallbacks on all platforms.

🏎️

Fast Comparison Paths

Type-specialized C comparisons for int, float, bytes, str, bool, and tuple, bypassing PyObject_RichCompareBool entirely. Recursive fast-compare for nested tuples.

🧵

GIL-Releasing Computation

For homogeneous int/float arrays: extracts raw C values, releases the GIL, performs pure-C heapify, then rearranges Python objects via cycle-following permutation.

🌳

Multi-Arity Heaps

Binary, ternary, quaternary, and octonary heaps with bit-shift parent/child calculations. Arity 1–64 supported. Cache-friendly access patterns for large n.

🔑

Precomputed Key Caching

Keys computed in a single O(n) pass via Python 3.8+ vectorcall protocol, eliminating one tuple allocation per element. Stack-buffered for n ≤ 128.

🎯

Zero-Overhead Fast Paths

push(heap, item) and pop(heap) bypass all argument parsing for the common case, saving ~20ns per call. Bulk push delegates to O(n+k) heapify when k ≥ n.

🛡️

Memory Safety

Py_INCREF/DECREF on all objects across comparisons. List-size validation after every Python callback. NaN-aware ordering across all float paths. No undefined behavior.

📦

Stack-First Allocation

Key arrays ≤128 and value arrays ≤2,048 elements use stack buffers with architecture-aligned hints, avoiding malloc/free overhead for the common case.

🌍

Universal Compatibility

Python 3.9–3.14. macOS, Linux, Windows. x86-64, ARM64, RISC-V, POWER. GCC, Clang, MSVC. Works on any mutable sequence, not just lists. Zero dependencies.

Six functions. Complete control. README.md
Every function shares consistent parameters: max_heap, cmp, arity, nogil.
heapify
heapx.heapify(heap, max_heap=False, cmp=None, arity=2, nogil=False)

Transforms a mutable sequence into a valid heap in-place using Floyd's bottom-up algorithm. Two-phase dispatch: homogeneous type-specialized paths first, then generic arity-selected paths.

O(n)
push
heapx.push(heap, items, max_heap=False, cmp=None, arity=2, nogil=False)

Inserts one or more items. Lists trigger batch mode. Zero-overhead fast path for single binary min-heap push saves ~20ns per call. Bulk push delegates to heapify when k ≥ n.

O(log n) single · O(n+k) bulk
pop
heapx.pop(heap, n=1, max_heap=False, cmp=None, arity=2, nogil=False)

Removes and returns top element(s). n=1 returns single item, n>1 returns list. Type-specialized sift-down for bulk pop with GIL-releasing path for homogeneous arrays.

O(log n) single · O(k log n) bulk
remove
heapx.remove(heap, indices=None, object=None, predicate=None, ...)

Removes items by index, object identity, or predicate. Single-index hot path achieves O(log n) via inline sift-up/sift-down without full re-heapify.

O(log n) single · O(k + n) batch
replace
heapx.replace(heap, values, indices=None, object=None, predicate=None, ...)

Replaces items by index, identity, or predicate while maintaining heap property. Single-index path writes value and performs O(log n) sift.

O(log n) single · O(k + n) batch
merge
heapx.merge(*heaps, max_heap=False, cmp=None, arity=2, nogil=False)

Merges two or more sequences into a single valid heap. O(N) concatenation followed by Floyd's heapify with full dispatch including SIMD/nogil paths.

O(N) total
Real-world applications.
Explore how heapx powers production workloads across domains.
Runs everywhere Python runs.
Native SIMD on x86-64 and ARM64. Scalar fallbacks on all other architectures.
macOS · ARM64 (Apple Silicon) · NEON
macOS · x86-64 · AVX2 / SSE2
Linux · x86-64 · AVX2 / SSE2
Linux · ARM64 · NEON / SVE
Windows · x86-64 · AVX2 / SSE2
RISC-V · RV64 · Scalar
IBM POWER · PPC64 · Scalar
Compilers: GCC ≥ 4.9 · Clang ≥ 3.6 · MSVC ≥ 2015. Pip builds with -O3 -march=native -flto. Conda uses portable baselines.