AI Week Radar — Your weekly radar for AI.

/ what friday looks like

AI Week Radar <hi@aiweekradar.com>

scroll ↓

AI Week Radar

#42

Fri, May 22

A quiet week on the model-release side, a noisy one on the agent-tooling side. Two papers worth a coffee, one open-source release that quietly does what closed labs charge for, and a sponsor we genuinely use. Let's go.

— Jarek, editor

Featured

Anthropic ships Claude 4.7 with a 1M-token context window — Mike Krieger (Anthropic)

The 1M context tier is the headline, but the more interesting line in the post is the new prompt-cache TTL — caches now stick for an hour instead of five minutes, which makes long-running agent loops noticeably cheaper. Benchmarks on SWE-bench and Terminal-Bench are competitive, but the practical win is for anyone who fits a whole codebase into a single prompt.

Why your agents fail: a forensic look at 1,200 production runs — Sahar Mor (Stripe)

Sahar collected traces from a year of production agent runs and found ~70% of failures come from three places: tool-schema drift, silent truncation in retrievers, and prompt-cache invalidation no one notices. The post-mortems are the kind of thing you'd normally only see in a postmortem channel — generously shared.

Articles & Videos

A practical guide to evaluating RAG pipelines without lying to yourself — Jerry Liu (LlamaIndex)

Skips the academic eval suites and walks through a budget-friendly setup using Ragas, a handful of synthetic-but-realistic questions, and a small reviewer model. The honest framing is the best part: "if your eval set was hand-picked by the same person who built the retriever, you don't have an eval set."

How OpenAI Realtime v2 actually handles barge-in (annotated) — Linh Tran (Plivo)

30-minute teardown of the new WebRTC-based realtime API: how interruption events are sequenced, why the audio buffer is now server-authoritative, and where you still need client-side VAD. Useful even if you're on a different vendor.

Releases

Mistral Large 3 (open weights, 22B) — Mistral AI · model release

A surprise mid-week drop. Apache 2.0, multilingual out of the box (incl. PL/CZ/JP at competitive levels), and benchmarks within striking distance of Llama 4 70B at a fraction of the VRAM. Runs at usable speeds on a single H100.

LangGraph 0.4 — graph-native checkpointing — LangChain · framework release

Long-running agents can now resume from any step without re-running upstream nodes. The DX is closer to a workflow engine than a chat loop, which is probably where this whole space is heading.

Tools & Papers

PaperScout — paper-of-the-week digest tailored to your reading history — Cara Sun · CLI tool

An open-source CLI that hooks into your Zotero library and your Arxiv RSS, then ranks each week's drops by how close they are to what you've read recently. Quietly excellent at surfacing the one paper you'd otherwise scroll past.

Last but not least

An LLM that can't draw a clock is still better than most product managers at Jira — Andrej Karpathy

The annual "draw a clock" benchmark resurfaces. Comforting in the way only AI Twitter can be.

You're receiving this because you subscribed at aiweekradar.com. Unsubscribe in one click

/ how we pick

200+ sources in. 8 stories out.

Each week we read so you don't have to. The funnel:

200+ scanned

Arxiv, GitHub trending, vendor changelogs, niche Discord/Slack communities, 50+ other newsletters, conference talks.

~60 shortlisted

Anything that looks more than a press release or a tweet thread gets opened, read, and triaged.

~20 read end-to-end

Papers skim-read for the abstract + method. Tools tried, not just bookmarked. Releases checked against the actual changelog.

8 shipped

Each with one sentence of editor's context. If a week's quiet, you get fewer. We don't pad to hit a number.

/ the deal

What's in — what's out.

/ you'll get

+ 8 stories per issue, max — incl. a clearly-marked sponsor at most.
+ Editor's note on every story — one sentence on why it matters.
+ Quick "in brief" bullets for the week's smaller news.
+ Release-notes round-up for tools you already use.
+ A reply-able human at the other end.

/ you won't get

− Daily emails or LinkedIn-thinkfluencer hot takes.
− "Top 27 AI tools for productivity!!!" listicles.
− Affiliate-link traps disguised as recommendations.
− Tracking pixels, third-party cookies, address-list resales.
− Three-step "are you sure?" unsubscribe flows.

/ faq

The usual questions.

Yet another AI newsletter — why this one?

Most AI newsletters either rehash Twitter or pad a list with everything that moved. We do the opposite: 8 stories per issue, max, each with one sentence of editor's context on why it matters. If a week is quiet, the email is shorter.

How is this different from TLDR AI, Ben's Bites, or The Rundown?

Those are aggregators — broad, daily, optimized for not missing anything. We're weekly, opinionated, and built for people who ship: a developer's perspective on what's actually production-ready vs. demo-ware. Subscribe to both if you want — we won't be offended.

Can I read a sample issue before subscribing?

Yes — the archive opens with issue #001. Until then, the preview on this page is a faithful mock of the format and editorial voice.

Is it really free?

Yes. Costs are covered by one or two clearly-labelled sponsor slots per issue. Never disguised as content. Never tracked across the web.

How do I unsubscribe?

One click — either from the link at the bottom of every issue, or via the one-click unsubscribe button Gmail/Apple Mail now show natively.

Something not here? Email hi@aiweekradar.com.