AI Breakthroughs: Math, Reasoning, and New AI Methods
DeepMind solves math problems for little money, agents save compute on reasoning, and new papers show where AI research is heading right now.
Inhaltsverzeichnis
Today brings several examples of AI research moving forward on two fronts at once: more capability, lower cost. What’s especially exciting is that it’s not just bigger models that are shining, but also clever methods that make better use of compute time, verification, and search strategies. In short: less “just add more GPUs,” more “think more efficiently, machine.”
🔬 DeepMind solves math problems for a few hundred dollars
Google DeepMind’s new system AlphaProof Nexus has autonomously solved nine open Erdős problems, including two questions that had remained open for 56 years. What’s impressive is not just the result, but the path to it: every proof step is machine-verified via the Lean compiler. So instead of “trust me, bro,” there’s a formal proof the computer actually accepts.
Why does that matter? Because several trends come together here: math, formal verification, LLMs, and agentic search. According to the report, inference costs are only a few hundred dollars per problem — though with a success rate of around 2.5 percent. That may sound modest at first, but in research that is often exactly the point: if a system is rarely right, but extremely reliable when it is, it can still be a huge lever. For the AI and math community, this is a strong signal that automated theorem proving is no longer just demo material.
Source: The Decoder
🤖 AutoTTS saves around 70 percent compute on reasoning
A research team from UMD, Google, Meta, and other institutions has developed AutoTTS, an AI agent that learns on its own when language models should “think further.” The key idea: instead of rigidly using the same reasoning mode for every task, the agent optimizes the control policy itself. The result is said to require around 70 percent less compute than the standard Self-Consistency method at comparable accuracy.
That matters quite a bit for everyday LLM use. Many costs do not come from the model itself, but from thinking unnecessarily hard in the wrong place. If an agent can learn when a model needs an extra reasoning step and when it does not, reasoning becomes much more efficient — and therefore cheaper. According to the report, the search for the algorithm cost only 40 US dollars and 160 minutes. In AI terms, that’s roughly in the category of “the coffee was more expensive than the experiment.”
Source: The Decoder
🧭 SeedER: search in knowledge graphs gets smarter
With SeedER (Seed-and-Expand Retrieval from Knowledge Graphs), a new approach aims to make retrieval over knowledge graphs more efficient. The problem is familiar: knowledge graphs are strong for relational facts, but their structure is irregular. Classic expansions quickly blow up, and dense embeddings often struggle with multi-hop, compositional queries.
SeedER tackles exactly that by first initiating the search from promising seeds and then expanding in a targeted way. For LLM-based systems, this is exciting because retrieval is not just about “finding a few matching documents,” but often the difference between a useful answer and elegantly packaged nonsense. Especially in agent setups, scientific search, or enterprise RAG, an approach like this can help reduce costs and improve hit quality. If knowledge graphs sometimes felt like a well-organized but overloaded office archive, SeedER seems to be trying to find the right folder faster.
Source: arXiv
🧠 Learnability-Informed Fine-Tuning for Diffusion Language Models
Another paper looks at how to improve the reasoning abilities of Diffusion Language Models (DLMs). Standard supervised fine-tuning often works well for autoregressive models, but it can also hurt DLMs. The authors argue that classic SFT pays too little attention to what a model learns and when it should learn it.
The interesting point here is less “yet another fine-tuning recipe” and more the underlying idea: training data and training order should be guided by a model’s learnability. This fits a broader trend in current AI research: not every model automatically benefits from the old familiar recipes. Especially with new architectures, post-training strategy often has to be rethought from scratch. For ambitious newcomers, that means the era in which “just slap on SFT and you’re done” reliably works is over for good. Welcome to the engine room.
Source: arXiv
📣 The Pope and AI: It’s really about power
TechCrunch reports on the first encyclical of Pope Leo XIV, and the takeaway is striking: it is not really written about AI as a technology, but uses AI as a lens for older, bigger problems — namely concentration of power, loss of democracy, and a tech elite shaping the world according to its own interests. That is almost uncomfortably timeless.
For the AI news landscape, this matters because AI is increasingly serving as a catalyst for social debate. We’re long past model sizes, benchmarks, or benchmarks with even more benchmarks. It’s about the distribution of influence, regulation, and the question of who controls the infrastructure of digital everyday life. That a papal text strikes this tone shows one thing above all: the AI debate has firmly arrived at the center of society. And it will stay there for now.
Source: TechCrunch
🛠️ Tool tip of the day
If you’re experimenting with reasoning, verification, or agentic workflows, it’s worth taking a look at tools around Lean, RAG pipelines, or benchmarking frameworks for LLMs. Projects like AlphaProof in particular show how important formal validation becomes once AI is expected not just to generate text, but to deliver real proofs, decisions, or critical workflows.
Recommendation: [AFFILIATE:lean] and [AFFILIATE:llm-eval]
Don’t want to miss any news? Subscribe to the newsletter