TimeAtlas

The TimeAtlas is a purpose-built data structure for storing, querying, and analyzing genome-wide pairwise TMRCA predictions at scale.

Motivation

When running fastcxt on thousands of sample pairs across entire chromosomes, the output is a large collection of per-block, per-pair (mean, variance) vectors. The TimeAtlas organizes these into a queryable structure indexed by chromosome arm, sample pair, and genomic position.

Creating an atlas

from fastcxt.atlas import TimeAtlas
import numpy as np

atlas = TimeAtlas()

# Add results for each chromosome arm
atlas.add_arm(
    "2L",
    means=means_2L,           # (n_pairs, n_windows)
    variances=variances_2L,   # (n_pairs, n_windows)
    pairs=pairs_array,        # (n_pairs, 2)
    window_size=2000,
    mutation_rate=3.5e-9,
)

Querying pairs

# Get TMRCA profile for one pair across a chromosome arm
m, v = atlas.query_pair("2L", sample_a=0, sample_b=42)

# m: (n_windows,) log-TMRCA means
# v: (n_windows,) log-TMRCA variances

Querying positions

# Get all pairwise TMRCAs at a specific genomic position
pairs, means_at_pos, vars_at_pos = atlas.query_window("2L", position_bp=5_000_000)

# Get TMRCAs across a region
pairs, means_region, vars_region = atlas.query_region("2L", 5_000_000, 6_000_000)

Finding extreme pairs

# Which pairs have the deepest coalescence at a position?
deep = atlas.deepest_pairs("2L", position_bp=5_000_000, k=10)

# Which pairs are most closely related?
shallow = atlas.shallowest_pairs("2L", position_bp=5_000_000, k=10)

Summary statistics

print(atlas.summary())
# {
#   "n_arms": 5,
#   "total_pairs": 4950,
#   "total_windows": 115000,
#   "per_arm": {
#     "2L": {"n_pairs": 4950, "n_windows": 24682, ...},
#     ...
#   }
# }

# Per-pair genome-wide mean TMRCA
mean_tmrca_per_pair = atlas.mean_tmrca("2L")

Serialization

# Save
atlas.save("my_atlas/")

# Load
atlas = TimeAtlas.load("my_atlas/")

Storage format:

my_atlas/
├── manifest.json       # metadata, arm list, parameters
├── 2L.npz              # means, variances, pairs, window_starts
├── 2R.npz
├── 3L.npz
├── 3R.npz
└── X.npz

Iterating over pairs

for sample_a, sample_b, means, variances in atlas.iter_pairs("2L"):
    # Process each pair's TMRCA profile
    avg_tmrca = np.exp(means).mean()
    print(f"Pair ({sample_a}, {sample_b}): mean TMRCA = {avg_tmrca:.0f}")

Visualization

The TimeAtlas integrates directly with the showcase visualization script to produce publication-quality figures. See Visualization for the full gallery and usage guide.

# Generate all figures from simulated data
python scripts/plot_atlas_showcase.py --outdir figures/

The script generates geographic maps of collection sites, TMRCA landscapes across chromosome arms, population heatmaps, selective sweep panels, dense raster heatmaps, and a composite dashboard — all powered by the TimeAtlas query API.

Genome-wide TMRCA landscape

Genome-wide TMRCA landscape across 5 chromosome arms with per-population ribbons and a selective sweep dip visible on chr2L.