Stealth mode — Selective conversations by introduction — contact@cathedralcompute.com
Open-weight inference, engineered at the site.
A structural 40–60% discount to hyperscaler marginal cost —
anchored by contract, captured by operation.
A 1 MW IT load cluster at PUE 1.2 draws ~1.2 MW from the grid. Market-rate data-centre power against stranded-power sites is a multiple-x cost differential that compounds over every hour the facility operates. At site-level power below $0.03/kWh, the operating cost of a megawatt of compute is a fraction of what a market-rate competitor pays — and the gap does not close with scale, silicon generations, or model releases. Real. Structural. Permanent.
Cheap power alone does not guarantee a competitive token price. Commodity GPU throughput at stranded-power cost still produces an electricity cost too high to compete at open-weight market rates. The throughput lever is not exotic silicon; it is used V100 and A100 GPUs acquired from hyperscaler refresh cycles — depreciated-but-capable data-centre silicon moving off cloud balance sheets at a meaningful discount to new-GPU cost — run under vLLM and TensorRT-LLM with quantisation and continuous batching. That stack achieves target throughput for the open-weight 7B–70B execution tier at an electricity cost of roughly $0.07 per million output tokens. Used fleets from neocloud wind-downs and cloud-operator refresh auctions are a structural supply channel, not a one-off opportunity.
The first two variables produce a cost floor. The third variable determines whether that floor is captured as revenue. A pure merchant posture at commodity open-weight rates underperforms at target site scale. A pure capacity commitment to a single counterparty is bankable but forecloses the upside. The operative structure is hybrid: approximately half of each site's capacity committed to a frontier-lab anchor counterparty on take-or-pay terms for dedicated open-weight serving, with the balance run as merchant open-weight token supply at market clearing rates. The anchor makes the site bankable at commissioning; the merchant overlay captures the spread between Cathedral's cost floor and the open-weight clearing price. This is the structure the work converges on.
The served model class is the premium open-weight band — Llama 70B-class, DeepSeek, Qwen, GLM, gpt-oss — closing the quality gap to proprietary frontier APIs for the majority of agentic execution workloads. At target throughput and site-level power cost, the hybrid structure produces a 40–60% structural discount to hyperscaler marginal cost for equivalent workloads. Site-level economics clear institutional return hurdles on the anchor slice alone; the merchant overlay is pure spread capture. These are site-level economics; the captured return accrues to Cathedral as operator.
The AI inference market is a routing stack. Per-call model selection is already standard practice: high-stakes orchestration runs on proprietary frontier models; cost-sensitive agentic sub-tasks, batch classification, document synthesis, and retrieval route to open-weight models where quality is sufficient and cost is the constraint.
Cathedral's commercial reference point is the open-weight inference clouds serving that cost-routed layer — Together AI, Fireworks, Replicate class — pricing 70B-class and larger open-weight models on market-rate electricity and new-GPU fleets. Cathedral's sites target structurally lower power costs on fleets acquired meaningfully below new-silicon replacement cost — a compounded cost advantage that stacks multiplicatively. Applied to the open-weight executor tier, that advantage produces a token cost floor at the site level.
Stranded power + used GPUs + open-weight models = a token cost floor that is structural, not cyclical. That is the asset.
The $5–25/M token pricing of proprietary frontier APIs is not Cathedral's competitive set. It is Cathedral's demand driver — the pressure that forces frontier labs and enterprise AI builders to route cost-tolerant workloads to cheaper infrastructure, creating and sustaining the market Cathedral's sites are built to serve.
The electricity advantage does not erode with chip generations or model releases. A new chip available to Cathedral is equally available to any competitor — and the used-fleet supply channel Cathedral is built on grows as the industry refreshes into each new generation. The stranded-power condition — local demand absent, transmission to market uneconomical — is not solved by a model release or a faster chip. It is a function of geography, physics, and the economics of power transmission. It accrues to whoever holds the site.
The first chapter of AI was about training. Enormous GPU clusters, hundreds of megawatts, billions of dollars, months of computation — all to produce a model weight file. That chapter is closing. The frontier models now exist. They are trained. Many are open. The capital and the engineering attention of the global AI industry is rotating, decisively, toward inference — the act of running those models at production scale, billions of times per day.
By 2025, inference already accounted for 80–90% of all AI compute cycles. By 2026, inference spending is projected to represent two-thirds of total AI infrastructure investment. Within that aggregate, the fastest-growing slice by token volume is the agentic executor tier — cost-sensitive, latency-tolerant, open-weight-model-dominant — which already consumes the majority of tokens routed through multi-agent systems. The inference market is segmenting, and Cathedral is built for the largest, fastest-growing, most price-sensitive segment.
Massive GPU clusters building foundation models. Dominated by Nvidia H100/B200. Requires ultra-high bandwidth interconnects, precision-critical arithmetic, enormous power budgets. The largest frontier labs continue to train, but open-weight releases have fundamentally altered the economics — frontier-class capability is now freely deployable by anyone. The capital intensity of training does not reduce the inference opportunity; it amplifies it. Every new open-weight release becomes a new revenue-generating workload for Cathedral.
Real-time chat, voice assistants, coding copilots — applications where a human is waiting for a response and sub-200ms time-to-first-token matters. Latency is the product. Expensive real estate, premium connectivity, proximity to population centres. A competitive market with established players. Requires proximity to end users. Not Cathedral's primary target.
Autonomous agents, multi-step reasoning, batch synthesis, background processing — workloads where no human waits for the next token, and the primary constraint is cost per million tokens at sustained throughput. Crucially, this is not a purely open-weight market. Frontier labs themselves route a growing share of their cost-sensitive, latency-tolerant workloads through open-weight models on cheaper infrastructure, blending proprietary and open-weight inference across a single agentic pipeline depending on the task. The orchestrator plans; the executor serves. Cathedral's infrastructure is built for this executor tier of the stack — wherever open-weight models are the right tool and token price is the governing constraint. This segment is growing faster than consumer-facing inference by an order of magnitude and is structurally uncaptured by existing high-cost infrastructure.
Autonomous agent hierarchies are replacing the human-at-keyboard model of AI interaction. An orchestrator agent decomposes a goal and delegates to sub-agents — each of which calls specialist models for reasoning, coding, retrieval, verification, and summarisation. A single user intent can propagate to hundreds or thousands of model invocations before the goal is resolved. Agentic pipelines are not monolithic: high-stakes orchestration may use a proprietary frontier model, but the bulk of sub-agent calls — document parsing, code generation, classification, summarisation, retrieval synthesis — are well-served by open-weight models running at a fraction of the cost. The intelligence hierarchy in an agentic system is a cost-routing problem as much as a capability problem.
IDC projects a thousandfold increase in token loads by 2027. Deloitte found enterprise AI inference bills growing 340% quarter-on-quarter from Q4 2025 to Q1 2026. The pressure to find cheaper inference is the defining operational constraint of the agentic era — and it compounds as pipeline complexity grows. Ultra-low latency is not a requirement for the executor tier, which is precisely what makes stranded-power geography viable. Cathedral's target jurisdictions all sit within 160ms round-trip latency of the EU, US, and Middle East demand centres they serve. A sub-agent synthesising a research brief, processing a document batch, or running a background classification task is indifferent to whether the model responds in 40ms or 140ms. Token cost at sustained throughput is the governing variable.
The training era was built on new GPUs. The inference era is being subsidised by them. Every hyperscaler refresh into H100, B200, and B300 creates a structural supply of depreciated V100 and A100 fleets moving off cloud balance sheets at a meaningful discount to new-GPU cost basis. These older GPUs remain highly capable for the open-weight 7B–70B executor-tier workloads Cathedral is built to serve; the inefficiency was in pricing them at cloud retail, not in the silicon itself.
Cathedral acquires this fleet through hyperscaler refresh auctions, neocloud wind-downs, end-of-lease channels, and direct broker relationships — then runs it under vLLM, TensorRT-LLM, and SGLang with quantisation and continuous batching to achieve ~150,000 output tok/s/MW on the target open-weight workload. No inference-native ASIC is required to clear the economics. The combination of used silicon, mature open-source serving stacks, and site-level power below $0.03/kWh is what produces the structural cost floor.
The workhorse for the smallest executor-tier workloads: document parsing, classification, retrieval synthesis, short-form agentic sub-calls. V100s acquired through the structural refresh supply channel pay back at conservative utilisation on a rapid cycle. The premise that training-class GPUs are obsolete for inference work under 13B parameters is mistaken; at Cathedral's cost basis they are structurally competitive with anything new in their workload class.
The core of Cathedral's fleet. Four A100 80GBs deliver a Llama 3.3 70B serving endpoint at competitive token rates; two A100 80GBs handle DeepSeek V3-class distilled models. The 80GB memory envelope is the critical spec — it is what distinguishes A100 80GB from A100 40GB for modern open-weight serving, and what makes end-of-lease supply meaningful rather than incremental. These are the units the commercial model is built on.
The throughput advantage does not come from new silicon; it comes from the open-source serving stack that has matured through 2024–2026. Continuous batching materially lifts used-GPU effective throughput over batch-one serving. Quantisation to 4–8 bit preserves quality on the target open-weight models while expanding tokens-per-dollar. Prefill caching under SGLang lowers the marginal cost of long-context agentic workloads. This stack is what makes a used-GPU fleet competitive with new silicon on tokens-per-kWh; it is as much a part of the cost moat as the power contract.
Why not new H100s: the cost basis is wrong for this workload class. H100s are built for frontier training and inference-on-frontier-models; the executor tier Cathedral serves does not need them and cannot amortise them. The economics favour volume of capable-and-cheap silicon over the performance ceiling of new-and-expensive silicon.
Why not inference-native ASICs (SN50, WSE-3, Rebel100): the software ecosystem is not yet mature at Cathedral's scale. Open-source serving stacks (vLLM, TensorRT-LLM, SGLang) are tightly bound to Nvidia CUDA; porting to alternative silicon is an engineering tax that the cost-per-token advantage does not yet cover at realistic deployment sizes. The ASIC case will clear as those ecosystems mature, and Cathedral's PropCo/OpCo structure accommodates hardware-class evolution at the compute-overlay level without re-underwriting the site. For now, the used-GPU arbitrage is the decisive variable.
Cathedral is a vertically integrated power, real-estate, and inference operator. Each site is a physical asset — long-tenure power contract, purpose-built modular shell, used-GPU compute fleet running open-weight models — held in a per-site SPV under a PropCo / OpCo structure, with capital matched layer-by-layer to the useful life of each component. Power and shell are infrastructure-grade and sit in PropCo; the compute fleet and commercial contracts sit in OpCo, where Cathedral operates the site and captures the operating margin. The physical and operational advantage compounds at the site.
Each site sits in a dedicated per-site SPV. Physical assets — power contracts, land, shell, cooling — are ring-fenced in PropCo. Operating activity — the GPU fleet, anchor and merchant contracts, the inference serving stack, commercial relationships — sits in OpCo. Failure at any single site is non-recourse to the holdco. Capital stacks and commercial structures can vary site by site without disturbing the portfolio.
The foundation asset and the moat. Long-dated offtake on stranded hydro, associated gas, or structurally underpriced grid. Contracted duration defines the outer horizon of every downstream decision. Financed against PPA quality with infrastructure-fund and insurance-company debt. Capital matched to the useful life of the power asset, not to chip generations. This is the reason Cathedral exists.
Modular, rapidly-deployable, purpose-built for air-cooled and immersion-cooled GPU racks at 20–30 kW/rack density, compatible with used V100 and A100 SXM fleets. Standardised design compresses lead time from site origination to first MW energised. Rebuild cadence tied to cooling and power topology, not GPU generation. Anchor take-or-pay contracts unlock CRE-style debt at investment-grade coupons once site-level credit is established.
Cathedral as operator: used V100 and A100 80GB fleet acquired from hyperscaler refresh cycles, running open-weight models under vLLM and TensorRT-LLM. Revenue split between an anchor take-or-pay slice and a merchant token-serving overlay — the hybrid posture detailed in Section 007. Selection of specific hybrid mix is made per site against anchor credit, merchant demand absorption, and risk-adjusted return. The operating margin belongs to Cathedral.
Cathedral captures this advantage as operator — anchor + merchant, per site, per contract.
Cathedral deploys wherever electricity is structurally underpriced — where generation capacity exists but local demand is insufficient to absorb it, transmission to higher-demand markets is uneconomical, or associated gas is flared at the wellhead for lack of monetisation. That may mean grid power from a gas-fired system in a jurisdiction with structurally low tariffs, associated gas at a remote production site with no pipeline access, or a purpose-built SEZ with allocated industrial capacity. Grid-connected or off-grid, the selection criteria are identical: kWh price and supply reliability.
Two distinct site channels operate in parallel. The first is purpose-built digital infrastructure capacity inside Salalah Free Zone — 400 MW+ planned, 30-yr corporate tax holiday, 100% foreign ownership, power delivered from the Dhofar Power System at $0.035/kWh under the Cost Reflective Tariff for large industrial users. Sohar Industrial Port offers a second grid-connected SEZ site with similar economics. The second channel is on-pad and near-pad deployment alongside Oman's large upstream operators — PDO, Oxy, BP, OQ — whose interior production blocks generate associated gas at remote sites with no economic pipeline route to market. Oman flared ~1.0–1.2 bcm of associated gas across 111 flare sites in 2022 (World Bank GGFR) — equivalent to ~500–575 MW of continuous power potential at standard generator efficiency, with additional upside from gas currently reinjected at non-EOR-critical fields. PDO endorsed Zero Routine Flaring by 2030, creating active commercial pull for offtakers. These are bilaterally negotiable stranded-power opportunities at gas-cost economics, sized to the field. Direct access to AAE-1, PEACE, OEG, SMW5 subsea cables from the coast. Equinix SN1 (opened Nov 2024) provides cross-connect proximity. USD/OMR peg since 1986 — zero FX risk.
Algeria has 25.1 GW of installed gas-fired generation capacity (2024) against a peak demand of 19.1 GW — a nominal surplus of ~6 GW, with ~3–4 GW realistically available as industrial offtake outside peak periods. Separately, Algeria flared ~8.6 bcm of associated gas in 2022 (World Bank) — equivalent to ~4,100 MW continuous power potential at standard generator efficiency across 209 remote Saharan flare sites. Sonatrach is under hard regulatory pressure to eliminate routine flaring by 2030. Domestic industrial electricity: $0.02–0.04/kWh subsidised. Mediterranean subsea cables provide connectivity.
Congo-Brazzaville flared ~64 Bcf (~1.81 bcm) of associated gas in 2022 (World Bank GGFR) — equivalent to ~850 MW of continuous power potential at standard generator efficiency. As of 2024, ~33% of gross gas production was still flared and ~17% reinjected (Cedigaz), against a domestic installed capacity of just 830 MW serving a country with ~50% transmission losses. The FLNG export project (live since Feb 2024) monetises only a portion of associated gas; remaining flare volumes at upstream sites operated by Perenco, TotalEnergies, and Eni remain stranded at the wellhead. Government has set 2030 flare-elimination targets and is actively recruiting industrial offtakers. Domestic gas-fired generation (CEC Pointe-Noire, 450 MW) demonstrates the gas-to-power infrastructure baseline. WACS subsea cable lands at Pointe-Noire. CFA franc pegged to EUR — structural FX stability.
Cathedral is an operator at every site. The company owns the hardware, runs the compute, and captures the operating margin. The commercial question at each site is how to combine a bankable anchor counterparty with merchant token supply to produce returns that clear both a debt test and an equity-return test.
The operative structure, validated in the site-level financial model, is hybrid anchor + merchant: approximately half of each 10 MW site's capacity committed to a frontier-lab anchor counterparty on take-or-pay terms, with the balance run as merchant open-weight token supply. The anchor makes the site bankable at commissioning; the merchant overlay captures the spread between Cathedral's cost floor and the open-weight market clearing price. Two alternative postures exist as fallback and as evolution; neither is the target.
The served model class across all postures is the premium open-weight band — Llama 70B-class, DeepSeek, Qwen, GLM, gpt-oss — where quality meets cost and proprietary-API pricing creates structural routing pressure from the agentic executor tier. The commodity 70B floor is not the strategy.