A loss curve is reassurance, not analysis — pull a hundred failures and read every one
Experiments throw off far more information than you consume — transcripts, failure cases, the strange tail — and most of it dies unread. Most ML bugs live in the data and fail silently; Ng's move is to pull 100 failures, sort them into piles, and attack the biggest pile
@itsreallyvivek (vivek) — how to be good at research · · 6 connections
Connected Insights
References (5)
→ LLM-as-judge must be calibrated against human judgment — uncalibrated judges are worse than no judges → Observability is the missing discipline for agent systems — you can't improve what you can't measure → Revealed preferences trump stated preferences — track what users do, not what they say → Similarity is not relevance — relevance requires reasoning → Traces replace code as the source of truth for agent systems — debugging shifts from 'show me the code' to 'send me the trace'