AI Blog
· daily-digest · 5 min read

AI Fake Sources, Math Comeback, and the Price of Validity

AI fabricates biomedicine sources, DeepMind cracks math problems cheaply, and new studies show: validity in LLMs has its price.

Inhaltsverzeichnis

Today brings several pieces of news that show how quickly AI is moving from “impressive” to “please take a closer look.” Especially interesting: while language models are being used more and more broadly in science, law, and structuring tasks, the risks of errors, fakes, and systemic side effects are also increasing.

In short: more performance does not automatically mean more reliability. And that is exactly where things are getting interesting for research, regulation, and practice right now.

🧬 Thousands of Fake Sources in Biomedicine Papers

An audit by Columbia University and other institutions across 2.5 million biomedicine papers shows an ugly pattern: since 2023, the rate of fabricated literature references has increased by more than twelvefold. The suspicious sources look remarkably convincing — correctly formatted, topically relevant, and at first glance completely unremarkable. That is exactly what makes the problem so tricky: when AI “invents” sources, the damage is not just a minor citation mistake, but potentially a problem for clinical guidelines and therefore for real patient care. According to the analysis, 98 percent of the affected papers were noticeable in this pattern. This is no longer a fringe phenomenon, but a clear signal that scientific quality control needs tightening. For you, that means: especially with medical studies, you should verify references even more consistently in the future. AI can help with writing — but apparently also with very convincing cheating.
Source: The Decoder

➗ AlphaProof Nexus Solves Old Math Problems — for Little Money

Google DeepMind has autonomously solved nine open Erdős problems with AlphaProof Nexus, including two tasks that had remained unsolved for 56 years. What stands out is not only the quality of the results, but also the efficiency: inference costs were only a few hundred dollars per problem. The catch: the success rate is quite low at around 2.5 percent. So the system tries a lot, fails often — and then occasionally hits something spectacular. The real advance lies in verification: unlike purely language-based approaches, AlphaProof checks every proof step mechanically via the Lean compiler. That is an important difference, because mathematics does not win by “sounds plausible,” but by verifiable correctness. For AI research, this is a strong signal: verifiable systems could be far more robust in domains with hard rules than generative “all-rounders.”
Source: The Decoder

✅ Small LLMs Become Shape-Compliant — but Not Necessarily Smarter

A new study on schema constraints in small LLMs shows a classic AI dilemma: if you force models more strongly into structured outputs, formal validity increases — but content correctness can suffer. Put differently: JSON looks nicer, but it is not automatically true. This is especially relevant for smaller models, since companies like to use them for extraction tasks, agent workflows, or internal tools. Strict schemas help with parsing, databases, and automation, but they can also cause the model to squeeze “something that fits” into the form instead of reasoning carefully. For production use, that is an important lesson: structured output is no substitute for verification. You get a prettier package — just not guaranteed the right contents.
Source: TechCrunch

⚖️ AI Lawsuits Are Putting Pressure on the US Justice System

A new study by MIT and the University of Southern California shows that the number of self-represented lawsuits in US federal courts has nearly doubled since the spread of ChatGPT. At the same time, one in five complaints now contains AI-generated text. At first that sounds like greater access to justice — and that is exactly the dilemma. On the one hand, people without expensive legal advice can now assert claims at all. On the other hand, the volume of poorly checked, sometimes erroneous filings is growing so much that judges are resorting to drastic measures to keep operations running. For Legal Tech, this is a real stress test: AI democratizes access, but at the same time creates new friction in the system. In short: more filings, less calm, and the judiciary suddenly has to learn prompt management too.
Source: The Decoder

🏛️ EU Pressure on Google Over the DMA

In the EU regulatory environment, Google remains under scrutiny. According to reports, the company faces a record fine over possible violations of the Digital Markets Act. This matters for the AI world because platform regulation and AI infrastructure are increasingly intertwined: search engines, app stores, advertising markets, and cloud ecosystems are exactly the layers on which many AI products are built. If the EU cracks down harder, that affects not only Google itself, but also the startups and teams that tie their distribution, reach, or data access to these platforms. The core question remains: how much power should a gatekeeper retain when more and more digital value creation depends on exactly these points?
Source: heise online

🚗 China’s Talent Policy and the Fight for AI Experts

China is apparently tightening exit requirements for AI talent from private companies in order to keep skilled workers in the country. This is more than just a personnel issue: in the global AI industry, talent is increasingly treated as a strategic resource — similar to chips, energy, or cloud infrastructure. If highly qualified engineers and researchers can no longer move internationally as easily, that changes not only company careers, but also knowledge transfer between ecosystems. For Western companies, that means competition for top talent is becoming even more political. For China, it is an attempt to bundle key expertise within the country. A bit like “brain export controls,” just without the nice label.
Source: heise online

🛠️ Tool Tip of the Day

If you work with LLMs in production workflows, a tool for structured outputs, validation, and schema checks is worth it. Especially with small models, it can make the difference between “works most of the time” and “production-ready.” Choose a setup that checks responses against JSON Schema, types, and rules before they are processed further. #


Want to never miss any news? Subscribe to the newsletter


Weekly AI news highlights

No spam. No ads. Just the essentials — concisely summarized. Weekly in your inbox.