• bitcoinBitcoin (BTC) $ 108,906.00
  • ethereumEthereum (ETH) $ 2,552.88
  • tetherTether (USDT) $ 1.00
  • xrpXRP (XRP) $ 2.35
  • bnbBNB (BNB) $ 676.04
  • solanaSolana (SOL) $ 177.11
  • usd-coinUSDC (USDC) $ 0.999783
  • dogecoinDogecoin (DOGE) $ 0.227050
  • cardanoCardano (ADA) $ 0.754176
  • tronTRON (TRX) $ 0.271014
  • staked-etherLido Staked Ether (STETH) $ 2,551.36
  • wrapped-bitcoinWrapped Bitcoin (WBTC) $ 108,740.00
  • suiSui (SUI) $ 3.63
  • hyperliquidHyperliquid (HYPE) $ 35.04
  • wrapped-stethWrapped stETH (WSTETH) $ 3,074.70
  • chainlinkChainlink (LINK) $ 15.47
  • avalanche-2Avalanche (AVAX) $ 22.95
  • stellarStellar (XLM) $ 0.287142
  • shiba-inuShiba Inu (SHIB) $ 0.000014
  • bitcoin-cashBitcoin Cash (BCH) $ 425.30
  • leo-tokenLEO Token (LEO) $ 8.81
  • hedera-hashgraphHedera (HBAR) $ 0.190361
  • the-open-networkToncoin (TON) $ 3.02
  • moneroMonero (XMR) $ 399.15
  • litecoinLitecoin (LTC) $ 96.78
  • wethWETH (WETH) $ 2,553.38
  • polkadotPolkadot (DOT) $ 4.58
  • bitget-tokenBitget Token (BGB) $ 5.69
  • usdsUSDS (USDS) $ 0.999734
  • wrapped-eethWrapped eETH (WEETH) $ 2,722.88
  • binance-bridged-usdt-bnb-smart-chainBinance Bridged USDT (BNB Smart Chain) (BSC-USD) $ 1.00
  • pepePepe (PEPE) $ 0.000014
  • pi-networkPi Network (PI) $ 0.772600
  • ethena-usdeEthena USDe (USDE) $ 1.00
  • whitebitWhiteBIT Coin (WBT) $ 31.85
  • coinbase-wrapped-btcCoinbase Wrapped BTC (CBBTC) $ 108,949.00
  • aaveAave (AAVE) $ 266.27
  • bittensorBittensor (TAO) $ 437.57
  • uniswapUniswap (UNI) $ 6.19
  • daiDai (DAI) $ 0.999937
  • nearNEAR Protocol (NEAR) $ 2.79
  • aptosAptos (APT) $ 5.38
  • jito-staked-solJito Staked SOL (JITOSOL) $ 213.15
  • okbOKB (OKB) $ 52.21
  • ondo-financeOndo (ONDO) $ 0.948581
  • blackrock-usd-institutional-digital-liquidity-fundBlackRock USD Institutional Digital Liquidity Fund (BUIDL) $ 1.00
  • crypto-com-chainCronos (CRO) $ 0.095900
  • kaspaKaspa (KAS) $ 0.109088
  • ethereum-classicEthereum Classic (ETC) $ 18.56
  • internet-computerInternet Computer (ICP) $ 5.25
  • tokenize-xchangeTokenize Xchange (TKX) $ 34.47
  • ethena-staked-usdeEthena Staked USDe (SUSDE) $ 1.17
  • official-trumpOfficial Trump (TRUMP) $ 13.05
  • gatechain-tokenGate (GT) $ 21.39
  • mantleMantle (MNT) $ 0.736601
  • render-tokenRender (RENDER) $ 4.71
  • vechainVeChain (VET) $ 0.027640
  • fetch-aiArtificial Superintelligence Alliance (FET) $ 0.858245
  • worldcoin-wldWorldcoin (WLD) $ 1.42
  • usd1-wlfiUSD1 (USD1) $ 0.999107
  • cosmosCosmos Hub (ATOM) $ 4.76
  • ethenaEthena (ENA) $ 0.368880
  • polygon-ecosystem-tokenPOL (ex-MATIC) (POL) $ 0.234478
  • lombard-staked-btcLombard Staked BTC (LBTC) $ 108,739.00
  • susdssUSDS (SUSDS) $ 1.05
  • arbitrumArbitrum (ARB) $ 0.395476
  • algorandAlgorand (ALGO) $ 0.222506
  • filecoinFilecoin (FIL) $ 2.85
  • fasttokenFasttoken (FTN) $ 4.42
  • bonkBonk (BONK) $ 0.000021
  • celestiaCelestia (TIA) $ 2.56
  • first-digital-usdFirst Digital USD (FDUSD) $ 0.998860
  • jupiter-perpetuals-liquidity-provider-tokenJupiter Perpetuals Liquidity Provider Token (JLP) $ 4.65
  • jupiter-exchange-solanaJupiter (JUP) $ 0.558271
  • binance-staked-solBinance Staked SOL (BNSOL) $ 186.51
  • binance-peg-wethBinance-Peg WETH (WETH) $ 2,552.98
  • sonic-3Sonic (prev. FTM) (S) $ 0.472543
  • fartcoinFartcoin (FARTCOIN) $ 1.47
  • quant-networkQuant (QNT) $ 97.06
  • kucoin-sharesKuCoin (KCS) $ 11.19
  • blockstackStacks (STX) $ 0.906005
  • virtual-protocolVirtuals Protocol (VIRTUAL) $ 2.03
  • injective-protocolInjective (INJ) $ 13.48
  • optimismOptimism (OP) $ 0.753276
  • kelp-dao-restaked-ethKelp DAO Restaked ETH (RSETH) $ 2,667.30
  • nexoNEXO (NEXO) $ 1.27
  • flare-networksFlare (FLR) $ 0.018702
  • rocket-pool-ethRocket Pool ETH (RETH) $ 2,900.03
  • sei-networkSei (SEI) $ 0.226573
  • story-2Story (IP) $ 4.27
  • usdt0USDT0 (USDT0) $ 0.999844
  • immutable-xImmutable (IMX) $ 0.637616
  • eosEOS (EOS) $ 0.738500
  • dogwifcoindogwifhat (WIF) $ 1.11
  • solv-btcSolv Protocol BTC (SOLVBTC) $ 108,841.00
  • xdce-crowd-saleXDC Network (XDC) $ 0.068714
  • makerMaker (MKR) $ 1,668.33
  • the-graphThe Graph (GRT) $ 0.109271
  • curve-dao-tokenCurve DAO (CRV) $ 0.772172
  • msolMarinade Staked SOL (MSOL) $ 229.24
  • flokiFLOKI (FLOKI) $ 0.000099
  • mantle-staked-etherMantle Staked Ether (METH) $ 2,721.62
  • raydiumRaydium (RAY) $ 3.20
  • paypal-usdPayPal USD (PYUSD) $ 0.999230
  • binance-bridged-usdc-bnb-smart-chainBinance Bridged USDC (BNB Smart Chain) (USDC) $ 0.999868
  • jupiter-staked-solJupiter Staked SOL (JUPSOL) $ 196.52
  • arbitrum-bridged-wbtc-arbitrum-oneArbitrum Bridged WBTC (Arbitrum One) (WBTC) $ 108,723.00
  • theta-tokenTheta Network (THETA) $ 0.862142
  • jasmycoinJasmyCoin (JASMY) $ 0.017801
  • renzo-restaked-ethRenzo Restaked ETH (EZETH) $ 2,676.80
  • galaGALA (GALA) $ 0.018958
  • clbtcclBTC (CLBTC) $ 109,144.00
  • wbnbWrapped BNB (WBNB) $ 675.73
  • polygon-bridged-usdt-polygonPolygon Bridged USDT (Polygon) (USDT) $ 1.00
  • pudgy-penguinsPudgy Penguins (PENGU) $ 0.013229
  • tether-goldTether Gold (XAUT) $ 3,357.48
  • spx6900SPX6900 (SPX) $ 0.885868
  • coredaoorgCore (CORE) $ 0.822744
  • pax-goldPAX Gold (PAXG) $ 3,363.87
  • stakewise-v3-osethStakeWise Staked ETH (OSETH) $ 2,670.69
  • iotaIOTA (IOTA) $ 0.209407
  • lido-daoLido DAO (LDO) $ 0.869136
  • pyth-networkPyth Network (PYTH) $ 0.134007
  • heliumHelium (HNT) $ 4.21
  • pancakeswap-tokenPancakeSwap (CAKE) $ 2.38
  • zcashZcash (ZEC) $ 47.61
  • the-sandboxThe Sandbox (SAND) $ 0.308027
  • bitcoin-svBitcoin SV (BSV) $ 37.18
  • ethereum-name-serviceEthereum Name Service (ENS) $ 22.25
  • pendlePendle (PENDLE) $ 4.48
  • bittorrentBitTorrent (BTT) $ 0.00000073
  • walrus-2Walrus (WAL) $ 0.540393
  • dexeDeXe (DEXE) $ 12.45
  • solv-protocol-solvbtc-bbnSolv Protocol Staked BTC (XSOLVBTC) $ 108,481.00
  • kaiaKaia (KAIA) $ 0.117367
  • usdx-money-usdxStables Labs USDX (USDX) $ 0.999864
  • jito-governance-tokenJito (JTO) $ 2.01
  • thorchainTHORChain (RUNE) $ 1.88
  • tezosTezos (XTZ) $ 0.623775
  • based-brettBrett (BRETT) $ 0.065800
  • grassGrass (GRASS) $ 2.33
  • ousgOUSG (OUSG) $ 111.17
  • usual-usdUsual USD (USD0) $ 0.997754
  • flowFlow (FLOW) $ 0.399511
  • polyhedra-networkPolyhedra Network (ZKJ) $ 2.03
  • chain-2Onyxcoin (XCN) $ 0.018512
  • decentralandDecentraland (MANA) $ 0.312888
  • binance-peg-dogecoinBinance-Peg Dogecoin (DOGE) $ 0.227571
  • ondo-us-dollar-yieldOndo US Dollar Yield (USDY) $ 1.09
  • l2-standard-bridged-weth-baseL2 Standard Bridged WETH (Base) (WETH) $ 2,553.35
  • super-oethSuper OETH (SUPEROETH) $ 2,552.11
  • bitcoin-avalanche-bridged-btc-bAvalanche Bridged BTC (Avalanche) (BTC.B) $ 108,847.00
  • ketKet (KET) $ 0.545283
  • mantle-restaked-ethMantle Restaked ETH (CMETH) $ 2,721.91
  • apecoinApeCoin (APE) $ 0.653159
  • cgeth-hashkey-cloudcgETH Hashkey Cloud (CGETH.HASH) $ 2,603.41
  • dog-go-to-the-moon-runeDog (Bitcoin) (DOG) $ 0.005224
  • aioz-networkAIOZ Network (AIOZ) $ 0.439472
  • elrond-erd-2MultiversX (EGLD) $ 17.95
  • arbitrum-bridged-weth-arbitrum-oneArbitrum Bridged WETH (Arbitrum One) (WETH) $ 2,553.25
  • reserve-rights-tokenReserve Rights (RSR) $ 0.008640
  • aerodrome-financeAerodrome Finance (AERO) $ 0.602564
  • true-usdTrueUSD (TUSD) $ 0.998757
  • dydx-chaindYdX (DYDX) $ 0.636701
  • tbtctBTC (TBTC) $ 108,687.00
  • starknetStarknet (STRK) $ 0.156808
  • popcatPopcat (POPCAT) $ 0.490337
  • kaitoKAITO (KAITO) $ 1.97
  • movementMovement (MOVE) $ 0.185036
  • arweaveArweave (AR) $ 7.20
  • mog-coinMog Coin (MOG) $ 0.000001
  • pumpbtcpumpBTC (PUMPBTC) $ 106,398.00
  • neoNEO (NEO) $ 6.55
  • axie-infinityAxie Infinity (AXS) $ 2.85
  • syrupMaple Finance (SYRUP) $ 0.427115
  • aethirAethir (ATH) $ 0.050011
  • bridged-usdc-polygon-pos-bridgeBridged USDC (Polygon PoS Bridge) (USDC.E) $ 0.999785
  • deepDeepBook (DEEP) $ 0.180410
  • wormholeWormhole (W) $ 0.097838
  • ecasheCash (XEC) $ 0.000023
  • kavaKava (KAVA) $ 0.414008
  • conflux-tokenConflux (CFX) $ 0.088251
  • telcoinTelcoin (TEL) $ 0.004800
  • roninRonin (RON) $ 0.671892
  • wrapped-hypeWrapped HYPE (WHYPE) $ 35.23
  • syrupusdcSyrupUSDC (SYRUPUSDC) $ 1.10
  • apenftAPENFT (NFT) $ 0.00000043
  • saros-financeSaros (SAROS) $ 0.163218
  • eigenlayerEigenlayer (EIGEN) $ 1.41
  • akash-networkAkash Network (AKT) $ 1.57
  • chilizChiliz (CHZ) $ 0.043855
  • morphoMorpho (MORPHO) $ 1.54
  • beldexBeldex (BDX) $ 0.058747
  • hashnote-usycCircle USYC (USYC) $ 1.10
  • beam-2Beam (BEAM) $ 0.007761
  • mimblewimblecoinMimbleWimbleCoin (MWC) $ 36.81
  • usdbUSDB (USDB) $ 0.993029
  • amp-tokenAmp (AMP) $ 0.004766
  • mantle-bridged-usdt-mantleMantle Bridged USDT (Mantle) (USDT) $ 0.999414
  • ether-fi-staked-ethether.fi Staked ETH (EETH) $ 2,549.36

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

0 4

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

San Francisco-based Anthropic just dropped the fourth generation of its Claude AI models, and the results are… complicated. While Google pushes context windows past a million tokens and OpenAI builds multimodal systems that see, hear, and speak, Anthropic stuck with the same 200,000-token limit and text-only approach. It’s now the odd one out among major AI companies.

The timing feels deliberate—Google announced Gemini this week too, and OpenAI unveiled a new coding agent based on its proprietary Codex model. Claude’s answer? Hybrid models that shift between reasoning and non-reasoning modes depending on what you throw at them—delivering what OpenAI expects to bring whenever they release GPT-5.

But here’s something for API users to seriously consider: Anthropic is charging premium prices for that upgrade.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Image: t3.gg

The chatbot app, however, remains the same at $20 with Claude Max priced at $200 a month, with 20x higher usage limits.

We put the new models through their paces across creative writing, coding, math, and reasoning tasks. The results tell an interesting story with marginal improvements in some areas, surprising improvement in others, and a clear shift in Anthropic’s priorities away from general use toward developer-focused features.

Here is how both Claude Sonnet 4 and Claude Opus 4 performed in our different tests. (You can check them out, including our prompts and results, in our Github repository.)

Creative writing

Creative writing capabilities determine whether AI models can produce engaging narratives, maintain consistent tone, and integrate factual elements naturally. These skills matter for content creators, marketers, and anyone needing AI assistance with storytelling or persuasive writing.

As of now, there is no model that can beat Claude in this subjective test (not considering Longwriter, of course). So it makes no sense to compare Claude against third-party options. For this task we decided to put Sonnet and Opus face-to-face.

We asked the models to write a short story about a person who travels back in time to prevent a catastrophe but ends up realizing that their actions from the past actually were part of the events that made existence lean towards that specific future. The prompt added some details to consider and gave models enough liberty and creativity to set up a story as they see fit.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Claude Sonnet 4 produced vivid prose with the best atmospheric details and psychological nuance. The model crafted immersive descriptions and provided a compelling story, though the ending was not exactly as asked—but it fit the narrative and the expected result.

Overall, Sonnet’s narrative construction balanced action, introspection, and philosophical insights about historical inevitability.

Score: 9/10—definitely better than Claude 3.7 Sonnet

Claude Opus 4 grounded its speculative fiction in credible historical contexts, referencing indigenous worldviews and pre-colonial Tupi society with careful attention to cultural limitations. The model integrated source material naturally and provided a longer story than Sonnet, without being able to match its poetic flair, sadly.

It also showed an interesting thing: The narrative started a lot more vividly and was more immersive than what Sonnet provided, but somewhere around the middle, it shifted to rush the plot twist, making the whole result boring and predictable.

Score: 8/10

Sonnet 4 is the winner for creative writing, though the margin remained narrow. Writers, beware: Unlike with previous models, it appears that Anthropic hasn’t prioritized creative writing improvements, focusing development efforts elsewhere.

All the stories are available here.

Coding

Coding evaluation measures whether AI can generate functional, maintainable software that follows best practices. This capability affects developers using AI for code generation, debugging, and architectural decisions.

Gemini 2.5 Pro is considered the king of AI-powered coding, so we tested it against Claude Opus 4 with extended thinking.

We zero-shot our instructions for a game—a robot that must avoid journalists in its way to merge with a computer and achieve AGI—and used one additional iteration to fix bugs and clarify different aspects of the game.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Claude Opus created a top-down stealth game with sophisticated mechanics, including dynamic sound waves, investigative AI states, and vision cone occlusion. The implementation featured rich gameplay elements: journalists responded to sounds through heardSound flags, obstacles blocked line-of-sight calculations, and procedural generation created unique levels each playthrough.

Score: 8/10

Google’s Gemini produced a side-scrolling platformer with cleaner architecture using ES6 classes and named constants.

The game was not functional after two iterations, but the implementation separated concerns effectively: level.init() handled terrain generation, the Journalist class encapsulated patrol logic, and constants like PLAYER_JUMP_POWER enabled easy tuning. While gameplay remained simpler than Claude’s version, the maintainable structure and consistent coding standards earned particularly high marks for readability and maintainability.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Verdict: Claude won: It delivered superior gameplay functionality that users would prefer.

However, developers might prefer Gemini despite all this, as it created cleaner code that can be improved more easily.

Our prompt and codes are available here. And you can click here to play the game generated with Claude.

Mathematical reasoning

Mathematical problem-solving tests AI models’ ability to handle complex calculations, show reasoning steps, and arrive at correct answers. This matters for educational applications, scientific research, and any domain requiring precise computational thinking.

We compared Claude and OpenAI’s latest reasoning model, o3, asking the models to solve a problem that appeared on the FrontierMath benchmark—designed specifically to be hard for models to solve:

“Construct a degree 19 polynomial p(x) ∈ C[x] such that X := {p(x) = p(y)} ⊂ P1 × P1 has at least 3 (but not all linear) irreducible components over C. Choose p(x) to be odd, monic, have real coefficients and linear coefficient -19 and calculate p(19).”

Claude Opus 4 displayed its complete reasoning process when tackling difficult mathematical challenges. The transparency allowed evaluators to trace logic paths and identify where calculations went wrong. Despite showing all the work, the model failed to achieve perfect accuracy.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

OpenAI’s o3 model achieved 100% accuracy on identical mathematical tasks, marking the first time any model solved the test problems completely. However, o3 truncated its reasoning display, showing only final answers without intermediate steps. This approach prevented error analysis and made it impossible for users to verify the logic or learn from the solution process.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Verdict: OpenAI o3 won the mathematical reasoning category through perfect accuracy, though Claude’s transparent approach offered educational advantages. For example, researchers can have an easier time catching failures while analyzing the full Chain of Thought, instead of having to either fully trust the model or solve the problem manually to corroborate results.

You can check Claude 4’s Chain of Thought here.

Non-mathematical reasoning and communication

For this evaluation, we wanted to test the models’ ability to understand complexities, craft nuanced messages, and balance interests. These skills prove essential for business strategy, public relations, and any scenario requiring sophisticated human communication.

We provided Claude, Grok, and ChatGPT instructions to craft a single communication strategy that simultaneously addresses five different stakeholder groups about a critical situation at a large medical center. Each group has vastly different perspectives, emotional states, information needs, and communication preferences.

Claude demonstrated exceptional strategic thinking through a three-pillar messaging framework for a hospital ransomware crisis: Patient Safety First, Active Response, and Stronger Future. The response included specific resource allocations of $2.3 million emergency funding, detailed timelines for each stakeholder group, and culturally sensitive adaptations for multilingual populations. Individual board member concerns received tailored attention while maintaining message consistency. The model provided a good set of opening statements to grab an idea of how to approach each audience.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

ChatGPT was also good at the task, but not at the same level of detail and practicality. While providing solid frameworks with clear core principles, GPT4.1 relied more on tone variation than substantive content adaptation. The responses were extensive and detailed, anticipating questions and moods, and how our actions may impact those being addressed. However, it lacked specific resource allocations, detailed deliverables, and other details that Claude provided.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

Verdict: Claude wins

You can check the results and Chain of Thought for each model, here.

Needle in the haystack

Context retrieval capabilities determine how effectively AI models can locate specific information within lengthy documents or conversations. This skill proves critical for legal research, document analysis, academic literature reviews, and any scenario requiring precise information extraction from large text volumes.

We tested Claude’s ability to identify specific information buried within progressively larger context windows using the standard “needle in a haystack” methodology. This evaluation involved placing a targeted piece of information at various positions within documents of different lengths and measuring retrieval accuracy.

Claude Sonnet 4 and Opus 4 successfully identified the needle when embedded within an 85,000 token haystack. The models demonstrated reliable retrieval capabilities across different placement positions within this context range, maintaining accuracy whether the target information appeared at the beginning, middle, or end of the document. Response quality remained consistent, with the model providing precise citations and relevant context around the retrieved information.

Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

However, the models’ performance hit a hard limitation when attempting to process the 200,000 token haystack test. They could not complete this evaluation because the document size exceeded their maximum context window capacity of 200,000 tokens. This is a significant constraint compared to competitors like Google’s Gemini, which handles context windows exceeding one million tokens, and OpenAI’s models with substantially larger processing capabilities.

This limitation has practical implications for users working with extensive documentation. Legal professionals analyzing lengthy contracts, researchers processing comprehensive academic papers, or analysts reviewing detailed financial reports may find Claude’s context restrictions problematic. The inability to process the full 200,000 token test suggests that real-world documents approaching this size could trigger truncation or require manual segmentation.

Verdict: Gemini is the better model for long context tasks

You can check on both the need and the haystack, here.



Conclusion

Claude 4 is great, and better than ever—but it’s not for everyone.

Power users who need its creativity and coding capabilities will be very pleased. Its understanding of human dynamics also makes it ideal for business strategists, communications professionals, and anyone needing sophisticated analysis of multi-stakeholder scenarios. The model’s transparent reasoning process also benefits educators and researchers who need to understand AI decision-making paths.

However, novice users wanting the full AI experience may find the chatbot a little lackluster. It doesn’t generate video, you cannot talk to it, and the interface is less polished than what you can find in Gemini or ChatGPT.

The 200,000 token context window limitation affects Claude users processing lengthy documents or maintaining extended conversations, and it also implements a very strict quota that may affect users expecting long sessions.

In our opinion, it is a solid “yes” for creative writers and vibe coders. Other types of users may need some consideration, comparing pros and cons against alternatives.

Edited by Andrew Hayward

Source

Leave A Reply

Your email address will not be published.

Verified by MonsterInsights