The AI industry is moving from simple chat to complex agentic execution at an unprecedented pace. Today's news highlights the shift from chatbots to "work engines" in math, infrastructure, and physical-world operations.
AI Co-Mathematician Cracks Unsolved Problems
DeepMind's new agentic system based on Gemini 3.1 is setting new benchmarks in research-level mathematics, proving that AI can now tackle unsolved problems.
Here's everything you need to know:
- DeepMind modeled the tool after AI coding environments like Claude Code.
- A coordinator agent breaks research into parallel workstreams.
- Sub-agents are tasked with writing code, searching literature, and attempting proofs.
- Oxford's Marc Lackenby resolved a long-standing open problem in the Kourovka Notebook using a strategy suggested by the system.
- On Epoch AI's FrontierMath Tier 4 benchmark, the system achieved a 48% score.
- This score is more than double the 19% raw score achieved by Gemini 3.1 Pro.
This is a clear signal that AI is evolving from a research assistant to an active participant in scientific discovery. For founders building in the research space, this changes the game—the barrier to entry for complex scientific modeling is collapsing.
Chip Rivals Back Shared Inference Layer
NVIDIA, AMD, and Intel have joined the same cap table, investing $100M in RadixArk to standardize inference across hardware.
Here's everything you need to know:
- RadixArk commercializes SGLang, a widely used open-source inference engine.
- SGLang is currently deployed across 400,000+ GPUs.
- The engine is used by Google, Microsoft, xAI, Oracle, NVIDIA, and AMD.
- The investment round reached a $400 million valuation.
- The technology optimizes performance by reusing context and managing memory more efficiently.
- It allows models to run faster and cheaper without hardware vendor lock-in.
This investment signals a shift in AI infrastructure: the value is moving from pure hardware capability to software-driven compute control. Builders should note that vendor lock-in is becoming a legacy strategy.
AI Finds 100+ New Exoplanets
Astronomers at the University of Warwick have confirmed over 100 new exoplanets using the RAVEN AI system to scan NASA TESS data.
Here's everything you need to know:
- The system scanned 4 years of NASA data covering 2.2 million stars.
- RAVEN handles detection, vetting, and confirmation in a single pass.
- The discovery includes 31 previously unknown exoplanets.
- It identified worlds in the "Neptunian Desert," a region previously thought too hot for planet survival.
- The system operates at 10x the precision of previous methods.
- The improvement comes from smarter AI algorithms rather than new hardware.
This demonstrates that AI is not just for software tasks; it is fundamentally accelerating discovery in science by processing massive, noisy datasets that were previously beyond human capacity.
Anthropic's Mythos Finds Thousands of Zero-Days
Anthropic's unreleased Claude Mythos Preview has demonstrated capabilities that are alarming security experts and government officials alike.
Here's everything you need to know:
- The system found 271 flaws in Firefox in a single run.
- It identified a 27-year-old OpenBSD vulnerability.
- It uncovered a 17-year-old FreeBSD Remote Code Execution (RCE) flaw.
- The scale of vulnerability discovery has prompted direct calls from the Fed Chair and Treasury Secretary to bank CEOs.
- This is not a standard benchmark but a demonstration of industrialized vulnerability discovery.
For founders, this is a massive wake-up call. Security is no longer just about patching; it is about defending against an adversary that can scan your entire stack for decades-old flaws in seconds.
⚡ Quick Hits
- NVIDIA: Jensen Huang warns agentic AI could demand 1,000% more compute than standard GenAI, shifting infrastructure needs from answers to execution.
- Verdify: Running a real-world greenhouse in Colorado using an OpenClaw agent, demonstrating physical-world agent loops.
- OpenAI: GPT-Realtime-2 models are shipping, bringing GPT-5-level reasoning to live speech with 96.6% accuracy on audio benchmarks.
- Span: Partnering with NVIDIA to mount mini-servers on homes, tapping into residential grid capacity for localized AI compute.
- RadixArk: The new inference layer is becoming the neutral ground for AI infrastructure, backed by all major chipmakers.
Techlook — AI & tech signal for founders and builders.