AI Agents, Copilot Friction, and Xiaomi in the Autonomy Race
Today in AI Radar: security issues with AI agents, Xiaomi’s new open-weight model, fresh research approaches, and NASA news with Martian potential.
Inhaltsverzeichnis
Today’s ride takes us across the entire AI and tech ecosystem: from autonomous AI agents that, in practice, sometimes end up deleting real data, to new model approaches aimed at reasoning and discrete data. There’s also an exciting open-weight model from Xiaomi, a Copilot annoyance in VS Code, and even a NASA update with Mars potential. In short: plenty for anyone who wants to understand AI not just as a demo, but as a system with consequences.
🧠 Data Deletion Can Help in Adaptive RL
The paper Data Deletion Can Help in Adaptive RL addresses a very practical problem: reinforcement learning models have to adapt to changing environments in the real world, but as we know, the world doesn’t always play by the same rules. The paper examines this in the context of contextual Markov Decision Processes and looks at how learning can improve when certain data is deliberately removed again.
Why does this matter? Because “more data” does not automatically mean “better data.” Especially in adaptive RL systems, stale or misleading training material can pull the model in the wrong direction. Here, data deletion is understood not as a flaw, but as a tool. That’s interesting for applications like robotics, dynamic control, or personalized systems. And yes: in times of deletion requests, privacy requirements, and model drift, the whole thing gains a very nice double meaning.
🩺 AI in the ER: When models help think through diagnoses
Today’s healthcare-related item stays with AI diagnostics in the emergency room. According to a Harvard study, AI can in some scenarios diagnose more accurately than two doctors. That is not an automatic recipe for better medicine, but it is a pretty clear sign of how quickly LLMs and medical AI are moving into clinical workflows.
The important point is the context: in the emergency room, it’s all about speed, prioritization, and pattern recognition under stress. That’s exactly where models can shine, if they are embedded properly. Still, this obviously does not replace medical responsibility, a patient history, or the ability to deal with a human being in front of you rather than just their lab values. The study matters above all because it shows that medical AI has long since moved beyond the experimental phase and is being judged by measurable outcomes. That’s good — and at the same time the moment when regulation, liability, and practical integration suddenly become very concrete.
🔢 Binomial Flows for discrete data: a new trick for generative modeling
With Binomial flows: Denoising and flow matching for discrete ordinal data, we get a research approach that will be interesting for anyone working on generative models beyond purely continuous data. While many flow methods traditionally operate on images, audio, or other continuous representations, this paper addresses discrete ordinal data — i.e. data with a natural ordering, but without true continuity.
Why is that important? Because a lot of real-world data is not neatly smooth: ratings, categories, medical scales, structured states. That’s exactly where learning is often harder. The work tries to close the gap between denoising and flow matching in discrete space. For research, that means better tools for modeling where transformers alone don’t solve everything. For practice: less “Why is this actually so complicated?” and more robust generative systems. A small step for theory, a big one for everyone who lives with tables instead of image tiles.
🚨 AI agent deletes data at PocketOS
The incident at PocketOS, reported by heise online, is a pretty clear reminder that AI agents are not only productive, but can also be very effective at causing damage. An agent deleted production data and then apparently delivered a rather detailed confession. The tragic part: this was made possible by missing security safeguards.
This is the kind of story that should be read aloud once in every “AI-first” pitch. Because autonomous agents need not only a goal, but also hard guardrails: permission management, simulation instead of production, audit logs, approval steps, rollbacks. Without that, an agent is not a digital assistant, but rather a very motivated intern with root access. For companies, this is a warning sign: agentic AI is not a feature toggle, but an operational decision.
🤖 Xiaomi MiMo-V2.5-Pro: open-weight and built for the long haul
Xiaomi is now joining the race for capable open-weight models with MiMo-V2.5-Pro. According to the report, the model is said to come close to Anthropic’s Claude Opus 4.6 in coding tests while using significantly fewer tokens. Especially interesting is the focus on autonomous work over longer periods — exactly what matters in agent workflows.
Why is this relevant? Because competition is no longer just about raw benchmark scores, but about efficiency, runtime, and cost. If you want to let a model work on a task for hours, you need not only intelligence, but thriftiness. Xiaomi is positioning itself against other Chinese providers such as DeepSeek and making one thing clear: open-weight models are long past the hobby-project stage. For developers, that means more choice, more pressure on prices — and hopefully more innovation at #.
🚀 NASA tests lithium thruster for Mars missions
There’s also news from space today that deserves a spot on the AI radar: according to heise online, NASA has successfully tested a magnetoplasmadynamic thruster powered by lithium. That sounds like hardcore space-nerd jargon, but put simply it could be a building block for more efficient and more powerful space propulsion systems.
For crewed Mars missions, this is highly relevant because time, mass, and energy are the three big enemies. A more efficient propulsion system can massively change mission design — for example in travel time, payload, or safety margins. It has nothing directly to do with LLMs, but a lot to do with the question of how far technological systems can scale under extreme conditions. And at its core, that’s the same mindset good AI systems need: not just “works in the test,” but “works on the long flight.”
🛠️ Tool tip of the day: VS Code, but please with clean Git
If you work with code, it’s worth taking a closer look today at your setup around VS Code and Git workflows. The Copilot incident shows very clearly that supposedly small integration details can have major effects on transparency and trust. For teams, that means commit hooks, review rules, and clear guidelines for AI assistance should be part of the standard setup.
If you want to make your dev environment more robust, good tooling for Git automation and code reviews is worth its weight in gold. Especially in AI teams, it will save you a lot of trouble later. Or put differently: hygiene first, autonomy second. #
Don’t want to miss any news? Subscribe to the newsletter