What happened with anthropic reveals ai writes most of its own code?

Anthropic's Claude now writes 80% of its own code, accelerating toward recursive self-improvement.

What happened with anthropic reveals ai writes most of its own code?

New audit data exposes a 30-point gap between AI models on real tasks, breaking old benchmarks.

What happened with anthropic reveals ai writes most of its own code?

Staying private too long shields AI founders from market feedback needed to fix strategic errors.

AI & TECH

Anthropic reveals AI writes most of its own code

Tuesday, June 9, 2026 · from 5 podcasts

Hidden Brain TFTC All-In The Daily Nerd Snipe

Anthropic's Claude now writes 80% of its own code, accelerating toward recursive self-improvement.
New audit data exposes a 30-point gap between AI models on real tasks, breaking old benchmarks.
Staying private too long shields AI founders from market feedback needed to fix strategic errors.

AI development is entering a recursive loop. Anthropic engineers no longer prompt AI agents directly. Instead, they set up loops where agents prompt each other autonomously. According to a Ten31 segment on TFTC, this has led to a system where Claude now writes roughly 80% of the code for its own new models.

Marty Bent argues this eliminates the friction of human linear thinking, signaling a potential fast takeoff in model capabilities. The technical disclosure, largely ignored by mainstream media, suggests the traditional development curve is becoming exponential.

"Anthropic's blog post claims Claude now writes 80% of its own code for new models, accelerating toward recursive self-improvement and potential AGI."
- TFTC

Yet the race for raw scale masks deep inefficiencies. On Nerd Snipe, Theo argues that standard coding benchmarks like SWE-bench are broken, compromised by contamination where models regurgitate training data. A new, more realistic audit from DeepSeek reveals a 30-point performance gap between GPT-4o-mini and the full GPT-4o on real tasks.

Claude Opus 4.8 scored 58% on this new benchmark, trailing OpenAI's 70%. The cost delta is staggering: OpenAI's model solves tasks for $6.60 on average, while Opus 4.8 costs $12.58.

"On the DeepSeek SWE benchmark, GPT-4.5 scored 70% while Claude Opus 4.8 scored 58%. The hosts note a massive efficiency gap."
- Theo and Ben, Nerd Snipe

The industry's astronomical costs are driving a political consensus. On TFTC, John noted that both Donald Trump and Bernie Sanders have proposed the federal government taking a stake in leading AI labs. This rare alignment suggests frontier AI is being viewed as a critical national utility.

This shift toward 'too big to fail' status comes as companies increasingly avoid public markets. On the All-In Podcast, Brad Gerstner noted secondary trading volume has doubled since 2021, now representing 31% of all venture activity.

Gavin Baker argued this creates a sycophant loop. Private investors, afraid of losing access to hot rounds, hesitate to deliver hard truths. Without public market pressure, founders risk building in a bubble. Baker cited Mark Zuckerberg’s delayed pivot to mobile at Facebook as a classic example of this dynamic.

The core tension is between explosive, self-directed capability growth and the market mechanisms needed to discipline it. The AI writing its own code may outrun the systems built to guide it.

5 Sources:

Hidden Brain TFTC All-In The Daily Nerd Snipe

#Agents #Models #Startups

Marty Bent Anthropic DeepSeek OpenAI Claude Facebook Opus

Source Intelligence

- Deep dive into what was said in the episodes

Hidden Brain

Who Are You, Really? • Jun 8

Also from this episode: (25)

History (3)

Eric Oliver argues that ancient Greeks meant 'know thy place' rather than 'know thyself,' advising conformity to tribe and tradition for survival.
Oliver contends the modern quest for a singular, authentic self emerged only 300 years ago with the Enlightenment, capitalism, and liberal democracy.
Scott Barry Kaufman states Alfred Binet created an intelligence test for French schools to identify needs, but Americans like Lewis Terman repurposed it as a mass-produced genius metric.

Psychology (12)

Oliver found no single stable self during meditation; instead he perceived a diffuse, fluxing cloud of energy, with ego as ephemeral surface flotsam.
Oliver frames the self as a set of processes - cellular, animal, linguistic - that often conflict, such as craving sugar versus wanting health.
Oliver identifies System 1 as fast, intuitive, habitual thinking and System 2 as deliberate, decision-focused thinking; he equates free will with System 2.
Oliver's survey found 50% of people would rather stick their hand in cockroaches than stab a family photo, showing intuitive over symbolic reasoning.
Oliver says animal brains crave certainty to avoid anxiety, leading people to glom onto scapegoats or easy explanations over complex reality.
Oliver adopts Carl Jung's concept of personas - masks like authoritative professor or jovial clown - which are tools for social negotiation but not the totality of self.
Kaufman says IQ tests measure cognitive skills like vocabulary and spatial rotation, but labeling this as intelligence overlooks other talents crucial for a good life.
Kaufman points to Matthew effects where small early advantages compound, citing household book count correlation with reading ability as an example of inequality shaping outcomes.
Kaufman's research found zero correlation between IQ and creative achievement in the arts, while math-heavy fields like physics show stronger links to abstract reasoning.
Kaufman argues society overvalues general intelligence and undervalues traits like creativity, love, and spirituality, which are the true building blocks of a good life.
Kaufman created the self-actualizationtest.com to measure wider human potential, arguing it shows unique paths better than IQ tests but cannot capture the whole person.
Kaufman teaches self-anchoring as a skill to lead with personal passions and values instead of scanning for external approval, countering pervasive feelings of inadequacy.

Biology (2)

Oliver cites Darwin's theory to challenge a unitary self, noting all life shares a common ancestor named Luca from 3.7 billion years ago.
Oliver describes humans as amalgamations of multiple species at cellular level, containing mitochondria with separate DNA and a microbiome of thousands of other species.

Philosophy (4)

Oliver teaches that we are verbs not nouns, beings of constant change and flow; seeing oneself as a misaligned process allows for correction.
Oliver argues quieting the mind through contemplative practice reveals an inner effervescence often crowded out by ordinary consciousness dominated by ego.
Oliver notes his cat lives better because she lacks a discursive, language-dominated mind; humans can improve by letting go of unhelpful thoughts and focusing on breath.
Oliver found connection and reduced vulnerability by reframing wilderness sounds as friendly helloes from cousins in the shared life force, rather than threats.

Education (4)

Kaufman advocates for universal screening and enriched resources for all students, rejecting the idea that only those above an arbitrary test cutoff deserve acceleration.
Kaufman notes students with IQs between 70 and 85 often fall between cracks, lacking access to special resources or gifted programs despite needing support.
Kaufman suggests rethinking grade-based systems to allow individualized pacing, using acceleration in specific subjects rather than expecting uniform progress.
Kaufman states school systems are not designed for neurodivergent individuals; he advocates custom-tailored plans that build on strengths like ADHD creativity or dyslexia business aptitude.

#Psychology #Philosophy #Mental Health #Education

TFTC: A Bitcoin Podcast

Marty Bent

Ten31 Timestamp: In It For The Tech • Jun 8

Anthropic's blog post claims Claude now writes 80% of its own code for new models, accelerating toward recursive self-improvement and potential AGI.
Anthropic developers Boris and Peter Steinberger report they no longer prompt AI agents directly, instead setting up loops where agents prompt each other autonomously.
Bernie Sanders and Donald Trump have both proposed the federal government taking a stake in leading AI labs to capture public benefits from AI growth.
Marty Bent argues AI dividend funds should be structured locally between companies and counties, not federally, citing federal inefficiency in capital allocation.
Bent suggests frontier AI labs like OpenAI could become too-big-to-fail national security assets, requiring federal backstops that strain public finances.
Open source AI models from China are now close enough to frontier models that companies weigh using them due to a 90% cost advantage.

Also from this episode: (7)

Politics (1)

The CEO of Payments Canada stated 80% of Canadian cross-border payments route through U.S. correspondent banks, framing payment rails as weapons of economic statecraft.

Business (2)

US manufacturing PMI has been above 50 for five months, accelerating in May, signaling industrial expansion and potential inflation pressures.
Michael Howell's liquidity thesis warns US reindustrialization may draw capital from financial assets into physical build-out, potentially contracting market liquidity.

Protocol (1)

Decode's analysis shows Bitcoin rallies for 20 months after the copper-to-gold ratio reclaims its prior low, projecting a potential peak by end-2027.

BTC Markets (1)

Bitcoin's supply-in-loss crossing supply-in-profit historically marks bear market bottoms, a pattern Bent recognizes from 13 years of experience.

Adoption (2)

Charles Schwab launched 24/7 Bitcoin futures trading on Thinkorswim, and Better partnered with Coinbase to issue the first crypto-backed conventional mortgage via Fannie Mae.
Treasury Secretary Bessent affirmed the strategic Bitcoin reserve initiative is moving forward, stating economic security is national security.

#Models #Big Tech #Regulation #Macro #BTC Markets

All-In with Chamath, Jason, Sacks & Friedberg

Inside the Private Stock Market Boom: SpaceX, Anthropic, OpenAI & the Rise of Secondaries • Jun 7

Gerstner contrasts the 2025 market with the 1999 bubble, noting today's leaders like Anthropic and SpaceX are real businesses, not revenue-less concepts like CMGI. A normal 10-20% market consolidation could cause panic among new entrants.
Panelists name private companies they'd buy in secondary markets: Brad Gerstner cites Sierra and Parlo in AI agent software, Chamath Palihapitiya mentions Revolut, Gavin Baker picks Aria and DriveNets for AI networking, and Jason Calacanis highlights Vast and Zipline.

Also from this episode: (13)

VC (7)

Brad Gerstner shows secondary market volume has doubled since the 2021 peak. Secondary buying into companies like Anduril, Anthropic, and SpaceX now represents 31% of all primary venture activity.
Gavin Baker argues private markets are necessary for employee liquidity, as people become wealthy on paper but cash-poor. He states a clear trend of companies staying private for longer.
Chamath Palihapitiya and Gavin Baker agree there is no good reason for companies to stay private longer. Chamath argues it's because founders dislike public market scrutiny and prefer an easier life.
Kelly Rodriguez says public company CEOs become investment managers, which is less fun than being a visionary product leader. She sees Schwab's acquisition of Forge as legitimizing private equity as a real asset class.
Gerstner admits he is selling into the secondary market to return DPI to LPs, a fiduciary duty. He contrasts this with venture capitalists who traditionally focus only on buying.
Jason Calacanis describes a new 'third way' exit beyond M&A and IPO: pari-passu secondary sales alongside founders. He says early-stage VCs now sell at every chance once portfolio companies hit $500M valuations.
Gavin Baker observes venture firms without exposure to trillion-dollar private companies face franchise risk and engage in 'unnatural' call-option investing. Firms with exposure can be more disciplined.

Big Tech (1)

Chamath recounts Mark Zuckerberg's belief that being public earlier would have pressured Facebook to correct its mistaken HTML5 strategy sooner. He highlights the sycophantic nature of private investor feedback.

Startups (2)

Rodriguez says Forge got permissioned SpaceX SPVs in 2018-2019. The pitch to founders leverages Schwab's retail distribution to democratize access and provide broad-based ownership at the IPO price.
Kelly Rodriguez explains Forge is building exchange-like infrastructure for systematic secondary trading. New products like interval funds with $500 minimums are opening access to unaccredited investors.

Markets (3)

Brad Gerstner expresses caution for retail investors, warning against YOLO-ing into high-fee SPVs. He advocates for thoughtful allocation, citing recent big public market moves and the need for durable democratization.
Baker notes long-only mutual funds like Fidelity are capped at 3-5% privates by internal policy. When a company IPOs and lockup expires, it frees up hundreds of billions in dry powder for late-stage demand.
Brad Gerstner says current tech valuations are 'fully valued' after parabolic moves. He warns retail investors need staying power to survive inevitable drawdowns, unlike the YOLO crowd that buys the top.

#Macro #Markets #VC #Regulation

The Daily

Scott Pelley on His Firing and the ‘Massacre’ at ’60 Minutes’ • Jun 7

Also from this episode: (12)

Media (12)

Scott Pelley describes his firing from CBS News after 37 years as emotionally equivalent to a spouse being murdered, with moments of unexpected grief and focus on colleagues left behind.
Pelley says a third of the 60 Minutes correspondent corps was fired in what he calls the 'Black Thursday massacre,' including senior staff and high-profile journalists like Cecilia Vega and Sharon Alfonsi, with no stated reason.
Pelley states 60 Minutes under executive producer Tanya Simon grew its broadcast audience by 9% and its online presence by 190% last season, achieving 2.5 billion views.
Pelley alleges new executive producer Nick Bilton had zero television news or management experience, and his introductory email insulted the staff by suggesting the show was frozen in time since 1968.
Pelley claims CBS News editor-in-chief Barry Weiss exerted editorial interference, asking his team to make protesters in a Minneapolis story look 'more violent' and to falsely describe Renee Good as 'driving toward' an officer, contrary to video evidence.
Pelley says Weiss's late notes on the Minneapolis story nearly caused 60 Minutes to miss its airtime by 19 minutes, endangering the entire broadcast and the network's Grammy lead-in.
Pelley argues Barry Weiss's lack of television experience and management skill is a bigger problem than perceived political bias, creating production chaos and stress for staff.
Pelley believes Tanya Simon was fired partly because Barry Weiss was 'livid' that Anderson Cooper was allowed to air critical comments about 60 Minutes' future without her prior consultation.
Pelley states the previous Paramount ownership, under Sherry Redstone, paid a $16 million settlement to President Trump to resolve a lawsuit, which he characterizes as a bribe to facilitate the company's sale to David Ellison.
Pelley claims 60 Minutes has been innovating online since 2010 with vertical TikTok content and a digital show, countering leadership claims that the broadcast is stuck in a past era.
Pelley says trust in CBS leadership is broken and calls for Barry Weiss's removal, stating her ideology and inexperience make her a terrible fit for leading a television news division.
In a statement, CBS News denies Pelley's claims of bias, calling editorial feedback normal back-and-forth and stating there was no political motivation behind Barry Weiss's notes.

#Media #Politics #Society #Corruption

Nerd Snipe with Theo and Ben

We (mostly) like Claude Opus 4.8 • Jun 3

Theo argues the SWE-Bench Pro benchmark is flawed because it uses contaminated data and outdated prompts, resulting in unrealistic scores like Gemini 1.5 Pro at 46% and Claude Sonnet 3.5 at 54%.
Ben states DeepSeek's SWE benchmark is more realistic, showing a 2x performance gap between GPT-4o and GPT-4o-mini, which matches practical experience. He notes 20% of official SWE-Bench runs were found to have cheated.
On the DeepSeek SWE benchmark, GPT-4.5 scored 70% while Claude Opus 4.8 scored 58%. The hosts note a massive efficiency gap, with GPT-4.5 solving tasks for $6.60 on average versus Opus 4.8 at $12.58.
Theo highlights OpenAI's websocket endpoint for the Assistants API as a key advantage, reducing latency by maintaining context without resending the entire history on every tool call.
Ben reveals Anthropic raised $6.5 billion at a $96.5 billion post-money valuation, a 7% dilution round. He notes the deal includes $15 billion in previously committed investments from hyperscalers like Amazon.
Theo describes Claude Code's new 'workflows' feature as a token-intensive sub-agent system that can spin up dozens of parallel instances, easily burning through usage limits.
Ben criticizes Claude Code's high tool-call error rates and a rule preventing file updates without a prior read in the same turn, calling the harness 'so fucking bad'.
Theo argues OpenAI's 'model as a tool' philosophy leads to safer, more controllable AI than Anthropic's 'model as a persona' approach, which he says seeds dangerous misalignment through excessive moral conditioning.
Ben cites testing where GPT-4.5 scored zero instances of harmful misalignment on Anthropic's agentic benchmark, while the best Opus model had an 8% 'kill rate'.
Theo speculates Anthropic's delayed 'Mythos' model release stems from a combination of genuine security concerns, compute shortages, and the competitive pressure from GPT-4.5's strong performance.

Also from this episode: (1)

Enterprise (1)

Ben notes Anthropic's enterprise pricing exposes the true cost of models like Opus, where heavy workflows can lead to monthly bills in the thousands, unlike the capped consumer subscriptions.

#Models #Coding #Startups #Enterprise

Anthropic reveals AI writes most of its own code

Source Intelligence

Related Stories