The dominant assumption in AI is that you win by building more: more data centres, more GPUs, more gigawatts. It is largely an American bet, underwritten by capital Europe has never matched, which is why the continent’s standard refrain is that it has already lost the compute race for lack of funding. Ora Computing, a Vienna seed-stage company, just raised €3.5 million on a sharper reading of that same constraint: if you cannot out-build the hyperscalers, out-shrink them, and make the model itself smaller rather than the cluster bigger. The figure that makes the bet legible is not the raise, it is the demonstration. Ora compressed a 70-billion-parameter model in a matter of hours for a compute cost of under $1,000, against an industry norm it puts in the hundreds of thousands of dollars for comparable work.

The company was founded in 2025 by Stefan Sack and Raimel Medina, two researchers from the Serbyn group at the Institute of Science and Technology Austria, the well-funded research campus outside Vienna that likes to compare itself to a European MIT. They were doing quantum computing, a field where the payoff is famously always a decade out, and they left it for a problem with a bill attached today: the models work, but running them is ruinously expensive. That is a revealing place for two physicists to point themselves. The glamour in AI is in training the next frontier model. The cost, increasingly, is in serving the last one.

The expensive part is the part nobody photographs

Inference is the unglamorous half of AI: not training a model, but running it, over and over, every time someone asks it a question. At scale it is also the fast-growing half. Ora cites deployments where compute alone runs to tens of millions of euros a month, and the bill compounds as models keep getting larger. The market underneath this is not small. One industry estimate puts global AI inference at roughly $255 billion by 2030, growing close to 20% a year. The second problem is physical rather than financial: a frontier model is often simply too big to fit on the hardware where you actually want it, a car, a factory machine, a phone, so it has to phone home to a data centre, which costs money and latency and rules out anything that needs to run offline.

Ora’s software compresses those models, cutting memory use by up to 80% and running them as much as four times faster, while keeping accuracy loss between zero and 5%. The design choice that matters is what it does not require. Where rival tools force a binary pick between a handful of preset compression levels, Ora says its algorithm continuously maps the full trade-off between size and accuracy, so a customer dials in exactly the point that suits their hardware and budget. And it drops into standard inference frameworks without custom software layers, infrastructure changes, or expensive retraining. The company has already run the approach past customers in automotive and edge-silicon, the two sectors where “run the model on the device, not in the cloud” is a hard requirement rather than a preference.

A crowded room with a narrow door

The reasons to doubt are real, and they have names. Model compression is a busy field. Neural Magic raised $50 million from Andreessen Horowitz before Red Hat bought it in late 2024. Qualcomm ships its own efficiency toolkit, and Intel maintains a free compression library. Both are good, and both work best on their maker’s own chips, which is the catch. The case for funding Ora anyway is that hardware-agnostic, vendor-neutral compression has a genuine reason to exist precisely because the incumbents’ tools quietly lock you in, and inference is the line item every AI budget is trying to bend down. A tool that lowers the bill without marrying you to one silicon vendor is selling the one thing the chip companies will never sell.

The round was led by Constructor Capital, a Swiss deep-tech investor, and Greencode Ventures, a Helsinki firm that backs applied AI in energy and industry, with founding backer XISTA Science Ventures, part of the ISTA innovation network, returning. It is Ora’s first outside money, and the €3.5 million goes to hiring, extending the compression to the largest frontier models, and shipping a commercial product aimed at cloud inference providers and edge deployments. There is a climate line in the pitch too, the kind that helps with a Helsinki energy fund: Ora estimates that at one% market penetration its technology would save more than 50,000 tonnes of CO2 a year, because a smaller model burns less power to answer the same question.

It is worth seeing where the rest of the efficiency money is going, because it frames the bet. Most capital chasing “more efficient AI” is buying hardware: custom chips, better cooling, bigger and denser data centres. Ora is one of the few wagering that the cheapest way to make AI cheaper is to make the model itself smaller in software. That is the rarer position, and it is unproven at commercial scale. Ora is a €3.5 million answer to a tens-of-millions-a-month question, and the people selling shovels would much prefer you not notice you can dig with a smaller one. The economics, not the pitch, will decide whether the answer holds.