The gap between AI's hype and AI's proof is closing faster than expected. Today brought three stories that, together, paint a picture of a technology that's no longer just generating plausible-sounding text — it's producing verifiable results, real security findings, and market-shaping financials all in the same 24 hours. Here's what matters for builders and founders.
AlphaProof Nexus Solved Nine Open Math Problems for Under $1,000 Each
Google DeepMind's latest system, AlphaProof Nexus, combines a large language model with a Lean proof assistant — a formal verification environment where every step must mathematically check out. The result is a system that doesn't hallucinate proofs; it either produces a verified one or fails outright.
Here's everything you need to know:
- AlphaProof Nexus solved nine open Erdős problems, including two that had been unsolved for 56 years
- Cost per solved problem: approximately $100–900 in compute
- The system also proved 44 open conjectures from the Online Encyclopedia of Integer Sequences
- This comes one week after OpenAI's model disproved an 80-year-old mathematical conjecture
- The cost trajectory is striking — a 56-year-old problem solved for less than the price of a cloud compute invoice
This is what "AI for science" actually looks like when it works. Formal verification forces the model to show its work, which eliminates the confabulation problem entirely. No plausible-sounding nonsense — either the proof holds in the formal system or it doesn't.
For builders, the implication is concrete: formal AI systems are now solving previously intractable problems at essentially software prices. The question is whether the formalization overhead — translating a real-world problem into a formal proof environment — will remain a bottleneck, or whether that too will be automated.
Whether this generalizes beyond pure math to domains like code verification, drug interaction proofs, or distributed system correctness is the real open question.
Anthropic's Security Tool Found 10,000+ Vulnerabilities in One Month
Anthropic's Claude Mythos, part of a program called Project Glasswing with roughly 50 partners, has produced the most concrete red-team results seen from an AI security tool to date. In a single month, Mythos found over 10,000 high and critical vulnerabilities across partner systems.
Here's everything you need to know:
- Cloudflare: Mythos found 2,000 bugs with a false positive rate lower than human testers
- Mozilla: 271 vulnerabilities found and fixed in Firefox 150
- One partner bank caught and blocked a $1.5 million fraudulent wire transfer in real time
- Anthropic says no company — including itself — has safeguards strong enough for full public release of Mythos-class capabilities
- Expansion to U.S. and allied governments is planned, with a general release of Mythos-class models to follow
This is the most substantive AI security result in recent memory. Not a demo, not a CTF challenge — real production bugs, in real systems, at scale.
For founders, this is a preview of what your security audits will look like in 18 months: automated, continuous, and catching things humans miss. The lower false positive rate versus human testers is the critical detail — it means AI security tools are already past the "useful but noisy" phase for some use cases.
The uncomfortable corollary: Anthropic itself says no one — including Anthropic — is ready to release these capabilities publicly. The gap between what AI can find and what AI should be allowed to find is widening fast, and disclosure norms haven't caught up.
Nvidia Is Now the First $5 Trillion Company. Here's the Number That Matters.
Nvidia reported $81.6 billion in Q1 revenue, up 85% year-over-year, and the stock has gained roughly 60% over the past year. The company now accounts for approximately 8% of the entire S&P 500 — the largest single-stock weight in the index's history.
Here's everything you need to know:
- Q1 revenue: $81.6 billion, up 85% year-over-year, beating Wall Street expectations
- Disclosed order backlog through 2026: $500 billion (this excludes OpenAI's separate $100 billion commitment)
- Nvidia is now roughly 8% of S&P 500 — largest single-stock weight ever
- CEO Jensen Huang: "The world is rebuilding computing for agentic and robotic physical AI — and Nvidia sits at the center of these transitions"
- U.S. approved 10 Chinese firms to purchase Nvidia H200 chips, though Beijing is steering companies toward domestic alternatives
$81.6B in a single quarter. At that scale, Nvidia isn't a chip company anymore — it's the electricity grid of the AI era.
For builders, this is a supply chain reality check. If you're building AI products, your margins are partially determined by a company that has this kind of pricing power and demand signal. The $500B backlog through 2026 tells you: the compute demand isn't theoretical, it's contracted. The infrastructure layer is consolidating faster than most people realize.
The China subplot matters too. Every builder making infrastructure choices needs to factor in geopolitical supply risk alongside technical performance.
OpenAI Filed to Go Public at a $1 Trillion Valuation. The Fine Print Is Brutal.
OpenAI confidentially filed an S-1 with the SEC on May 22, targeting a Q4 2026 public listing led by Goldman Sachs and Morgan Stanley. The valuation range: $852 billion to $1 trillion. On the surface, this is the biggest IPO in AI history. Under the hood, the numbers tell a more complicated story.
Here's everything you need to know:
- Confidential S-1 filed May 22; prospectus stays private until roughly 15 days before the roadshow
- Valuation range: $852 billion to $1 trillion
- Monthly revenue: approximately $2 billion ($25 billion annualized run rate)
- Burn rate: loses $1.22 for every $1 of revenue it takes in
- User base: 50 million consumer subscribers, 9 million business users
- At a $1 trillion valuation, the company would trade at roughly 40x revenue — for a business years from cash-flow breakeven
$2 billion a month in revenue, still losing $1.22 on every dollar. That math only works if you believe the trajectory matters more than the current P&L — which, to be fair, is often the right bet for platform companies in hypergrowth. But "years from cash-flow breakeven" at 40x revenue is a different risk profile than it was two years ago when rates were zero.
For founders, OpenAI's IPO will set the valuation benchmark for the entire AI ecosystem. Whatever multiple it prices at will flow through to every AI startup raising capital, acquiring talent, or competing for the same investor checks. Watch the burn multiple closely — it's the number that will determine whether this is a defining tech IPO or a peak-era Facebook moment.
DeepSeek Cuts V4-Pro Pricing by 75% — Confirmed Across Multiple Sources
DeepSeek has permanently reduced pricing on its V4 Pro model by roughly 75%, bringing input tokens to $0.435 per million and output tokens to $0.87 per million. The cuts are confirmed across multiple sources and represent a sustained pricing position, not a promotional rate.
Here's everything you need to know:
- New input pricing: $0.435 per million tokens
- New output pricing: $0.87 per million tokens
- Far below comparable closed-source frontier models
- The cuts appear permanent, not a limited-time promotion
- Anthropic has previously accused DeepSeek of extracting Claude capabilities to train its own models
This is the pricing floor moving decisively downward. At these price points, the economics of AI-powered products change significantly — lower inference costs flow directly into better unit economics for any product that uses language model inference as a core component.
For builders, this is an inflection point. DeepSeek isn't subsidized by venture losses the way some competitors might be — this appears to be a real cost structure. Every AI company that isn't DeepSeek now faces genuine pricing pressure or needs to demonstrate clear differentiation that justifies a premium.
The Anthropic tension — the accusation that DeepSeek extracted Claude outputs to train its own models — adds a layer of strategic risk for any builder going all-in on the DeepSeek stack.
⚡ Quick Hits
- Gemini Omni: Google unveiled a model that synthesizes images, audio, and text into 10-second video clips with synchronized audio. Early, but the multimodal coherence is a step forward for prototyping workflows.
- Figma AI Design Agent: Figma launched an AI agent, powered by Anthropic and OpenAI, that generates, edits, and automates design tasks from natural language prompts. Designers become art directors; the execution layer gets automated.
- Google "Disregard" Bug: Queries for words like "disregard," "stop," and "ignore" in Google's AI Search box triggered chatbot responses instead of word definitions. Google acknowledged the issue; fix incoming. A reminder that consumer AI integration creates unpredictable edge-case failure modes.
- Perplexity Bumblebee: Perplexity open-sourced Bumblebee, a supply-chain security scanner for macOS and Linux that flags risky packages, extensions, and AI tool configurations. Supply-chain attacks are one of the most underappreciated AI-specific risks right now.
- Polsia: An AI platform claiming to run 8,000+ businesses with zero employees closed a $30M round at $250M valuation, nearing $10M ARR. The announcement went viral at 5M views. A note of skepticism: the company name is "AI Slop" spelled backwards.
Techlook — AI & tech signal for founders and builders.