{"id":77637,"date":"2024-03-04T18:19:23","date_gmt":"2024-03-04T18:19:23","guid":{"rendered":"https:\/\/neclink.com\/index.php\/2024\/03\/04\/anthropic-claims-its-new-models-beat-gpt-4\/"},"modified":"2024-03-04T18:19:23","modified_gmt":"2024-03-04T18:19:23","slug":"anthropic-claims-its-new-models-beat-gpt-4","status":"publish","type":"post","link":"https:\/\/neclink.com\/index.php\/2024\/03\/04\/anthropic-claims-its-new-models-beat-gpt-4\/","title":{"rendered":"Anthropic claims its new models beat GPT-4"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\">AI startup Anthropic, backed by <a href=\"https:\/\/techcrunch.com\/2023\/05\/23\/anthropic-raises-350m-to-build-next-gen-ai-assistants\/\">hundreds of millions<\/a> in venture capital (and perhaps soon <a href=\"https:\/\/news.crunchbase.com\/ai\/unicorn-anthropic-funding-menlo-goog-amzn\/\">hundreds of millions more<\/a>), today <a href=\"https:\/\/www.anthropic.com\/news\/claude-3-family\">announced<\/a> the latest version of its GenAI tech, Claude. And the company claims that it rivals OpenAI\u2019s <a href=\"https:\/\/techcrunch.com\/tag\/gpt-4\/\">GPT-4<\/a> in terms of performance.<\/p>\n<p>Claude 3, as Anthropic\u2019s new GenAI is called, is a family of models \u2014 Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, Opus being the most powerful. All show \u201cincreased capabilities\u201d in analysis and forecasting, Anthropic claims, as well as enhanced performance on specific benchmarks versus models like GPT-4 (but not <a href=\"https:\/\/techcrunch.com\/2023\/11\/06\/openai-launches-gpt-4-turbo-and-launches-fine-tuning-program-for-gpt-4\/\">GPT-4 Turbo<\/a>) and Google\u2019s <a href=\"https:\/\/techcrunch.com\/2024\/02\/08\/google-goes-all-in-on-gemini-and-launches-20-paid-tier-for-gemini-ultra\/\">Gemini 1.0 Ultra<\/a> (but not <a href=\"https:\/\/techcrunch.com\/2024\/02\/15\/googles-new-gemini-model-can-analyze-an-hour-long-video-but-few-people-can-use-it\/\">Gemini 1.5 Pro<\/a>).<\/p>\n<p>Notably, Claude 3 is Anthropic\u2019s first multimodal GenAI, meaning that it can analyze text as well as images \u2014 similar to <a href=\"https:\/\/techcrunch.com\/2023\/11\/06\/openai-gpt-4-with-vision-release-research-flaws\/\">some flavors<\/a> of GPT-4 and <a href=\"https:\/\/techcrunch.com\/2023\/12\/13\/google-brings-gemini-pro-to-vertex-ai\/\">Gemini<\/a>. Claude 3 can process photos, charts, graphs and technical diagrams, drawing from PDFs, slideshows and other document types.<\/p>\n<p>In a step one better than some GenAI rivals, Claude 3 can analyze multiple images in a single request (up to a maximum of 20). This allows it to compare and contrast images, notes Anthropic.<\/p>\n<p>But there\u2019s limits to Claude 3\u2019s image processing.<\/p>\n<p>Anthropic has disabled the models from identifying people \u2014 no doubt wary of the ethical and legal implications. And the company admits that Claude 3 is prone to making mistakes with \u201clow-quality\u201d images (under 200 pixels) and struggles with tasks involving spatial reasoning (e.g. reading an analog clock face) and object counting (Claude 3 can\u2019t give exact counts of objects in images).<\/p>\n<div id=\"attachment_2674016\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img fetchpriority=\"high\" fetchpriority=\"high\" decoding=\"async\" aria-describedby=\"caption-attachment-2674016\" class=\"size-full wp-image-2674016\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp\" alt=\"Anthropic Claude 3\" width=\"1024\" height=\"546\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp 2200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=150,80 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=300,160 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=768,410 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=680,363 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=1536,820 1536w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=2048,1093 2048w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=1200,640 1200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/5d20371eeb8d045465bb22cacfd269b5958b004d-2200x1174-1.webp?resize=50,27 50w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/p>\n<p id=\"caption-attachment-2674016\" class=\"wp-caption-text\"><strong>Image Credits:<\/strong> Anthropic<\/p>\n<\/div>\n<p>Claude 3 also won\u2019t <em>generate<\/em> artwork. The models are strictly image-analyzing \u2014 at least for now.<\/p>\n<p>Whether fielding text or images, Anthropic says that customers can generally expect Claude 3 to better follow multi-step instructions, produce structured output in formats like <a href=\"https:\/\/en.wikipedia.org\/wiki\/JSON\">JSON<\/a> and converse in languages other than English compared to its predecessors,. Claude 3 should also refuse to answer questions less often thanks to a \u201cmore nuanced understanding of requests,\u201d Anthropic says. And soon, the models will cite the source of their answers to questions so users can verify them.<\/p>\n<p>\u201cClaude 3 tends to generate more expressive and engaging responses,\u201d Anthropic writes in a support article. \u201c[It\u2019s] easier to prompt and steer compared to our legacy models. Users should find that they can achieve the desired results with shorter and more concise prompts.\u201d<\/p>\n<p>Some of those improvements stem from Claude 3\u2019s expanded context.<\/p>\n<p>A model\u2019s context, or context window, refers to input data (e.g. text) that the model considers before generating output. Models with small context windows tend to \u201cforget\u201d the content of even very recent conversations, leading them to veer off topic \u2014 often in problematic ways. As an added upside, large-context models can better grasp the narrative flow of data they take in and generate more contextually rich responses (hypothetically, at least).<\/p>\n<p>Anthropic says that Claude 3 will initially support a 200,000-token context window, equivalent to about 150,000 words, with select customers getting up a 1-milion-token context window (~700,000 words). That\u2019s on par with Google\u2019s newest GenAI model, the above-mentioned Gemini 1.5 Pro, which also offers up to a million-token context window.<\/p>\n<p>Now, just because Claude 3 is an upgrade over what came before it doesn\u2019t mean it\u2019s perfect.<\/p>\n<p>In a technical <a href=\"https:\/\/www-cdn.anthropic.com\/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627\/Model_Card_Claude_3.pdf\">whitepaper<\/a>, Anthropic admits that Claude 3 isn\u2019t immune from the issues plaguing other GenAI models, namely bias and <a href=\"https:\/\/techcrunch.com\/2023\/09\/04\/are-language-models-doomed-to-always-hallucinate\/\">hallucinations<\/a> (i.e. making stuff up). Unlike some GenAI models, Claude 3 can\u2019t search the web; the models can only answer questions using data from before August 2023. And while Claude is multilingual, it\u2019s not as fluent in certain \u201clow-resource\u201d languages versus English.<\/p>\n<p>But Anthropic\u2019s promising frequent updates to Claude 3 in the months to come.<\/p>\n<p>\u201cWe don\u2019t believe that model intelligence is anywhere near its limits, and we plan to release [enhancements] to the Claude 3 model family over the next few months,\u201d the company writes in a <a href=\"https:\/\/www.anthropic.com\/news\/claude-3-family\">blog post<\/a>.<\/p>\n<p>Opus and Sonnet are available now on the web and via Anthropic\u2019s dev console and API, Amazon\u2019s <a href=\"https:\/\/techcrunch.com\/tag\/bedrock\/\">Bedrock<\/a> platform and Google\u2019s <a href=\"https:\/\/techcrunch.com\/2023\/08\/29\/google-upgrades-vertex-ai-to-keep-pace-with-the-generative-ai-boom\/\">Vertex AI<\/a>. Haiku will follow later this year.<\/p>\n<p>Here\u2019s the pricing breakdown:<\/p>\n<ul>\n<li>Opus: $15 per million input tokens, $75 per million output tokens<\/li>\n<li>Sonnet: $3 per million input tokens, $15 per million output tokens<\/li>\n<li>Haiku: $0.25 per million input tokens, $1.25 per million output tokens<\/li>\n<\/ul>\n<p>So that\u2019s Claude 3. But what\u2019s the 30,000-foot view of all this?<\/p>\n<p><span style=\"font-size: 1rem; letter-spacing: -0.1px;\">Well, as we\u2019ve <\/span><a style=\"background-color: #ffffff; font-size: 1rem; letter-spacing: -0.1px;\" href=\"https:\/\/techcrunch.com\/2023\/04\/06\/anthropics-5b-4-year-plan-to-take-on-openai\/\" data-mrf-link=\"https:\/\/techcrunch.com\/2023\/04\/06\/anthropics-5b-4-year-plan-to-take-on-openai\/\">reported<\/a><span style=\"font-size: 1rem; letter-spacing: -0.1px;\"> previously, Anthropic\u2019s ambition is to create a next-gen algorithm for \u201cAI self-teaching.\u201d Such an algorithm could be used to build virtual assistants that can answer emails, perform research and generate art, books and more \u2014 some of which we\u2019ve already gotten a taste of with the likes of <\/span>GPT-4<span style=\"font-size: 1rem; letter-spacing: -0.1px;\">\u00a0and other large language models.<\/span><\/p>\n<p>Anthropic hints at this in the aforementioned blog post, saying that it plans to add features to Claude 3 that enhance its out-of-the-gate capabilities by allowing Claude to interact with other systems, code \u201cinteractively\u201d and deliver \u201cadvanced agentic capabilities.\u201d<\/p>\n<p>That last bit calls to mind OpenAI\u2019s <a href=\"https:\/\/www.theinformation.com\/articles\/openai-shifts-ai-battleground-to-software-that-operates-devices-automates-tasks\">reported<\/a> ambitions to build a software agent to automate complex tasks, like transferring data from a document to a spreadsheet or automatically filling out expense reports and entering them in accounting software. OpenAI already <a href=\"https:\/\/techcrunch.com\/2023\/11\/06\/openai-launches-api-that-lets-developers-build-assistants-into-their-apps\/\">offers<\/a> an API that allows developers to build \u201cagent-like experiences\u201d into their apps, and Anthropic, it seems, is intent on delivering functionality that\u2019s comparable.<\/p>\n<p>Could we see an image generator from Anthropic next? It\u2019d surprise me, frankly. Image generators are the subject of much controversy these days, mainly for copyright- and bias-related reasons. Google was recently forced to <a href=\"https:\/\/techcrunch.com\/2024\/02\/23\/embarrassing-and-wrong-google-admits-it-lost-control-of-image-generating-ai\/\">disable<\/a> its image generator after it injected diversity into pictures with a farcical disregard for historical context. And a number of image generator vendors are in <a href=\"https:\/\/techcrunch.com\/2023\/01\/27\/the-current-legal-cases-against-generative-ai-are-just-the-beginning\/\">legal battles<\/a> with artists who accuse them of profiting off of their work by training GenAI on that work without providing compensation or even credit.<\/p>\n<p>I\u2019m curious to see the evolution of Anthropic\u2019s technique for training GenAI, \u201c<a href=\"https:\/\/techcrunch.com\/2023\/05\/09\/anthropic-thinks-constitutional-ai-is-the-best-way-to-train-models\/\" data-mrf-link=\"https:\/\/techcrunch.com\/2023\/05\/09\/anthropic-thinks-constitutional-ai-is-the-best-way-to-train-models\/\">constitutional AI<\/a>,\u201d which the company claims makes the behavior of its GenAI easier to understand, more predictable and simpler to adjust as needed. Constitutional AI aims to provide a way to <a href=\"https:\/\/www.anthropic.com\/index\/core-views-on-ai-safety\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/www.anthropic.com\/index\/core-views-on-ai-safety\">align AI with human intentions<\/a>, having models respond to questions and perform tasks using a simple set of guiding principles. For example, for Claude 3, Anthropic said that it added a principle \u2014 informed by crowdsourced feedback \u2014 that instructs the models to be understanding of and accessible to people with disabilities.<\/p>\n<p>Whatever Anthropic\u2019s endgame, it\u2019s in it for the long haul. <a href=\"https:\/\/techcrunch.com\/2023\/04\/06\/anthropics-5b-4-year-plan-to-take-on-openai\/\">According<\/a> to a pitch deck leaked in May of last year, the company aims to raise as much as $5 billion over the next 12 months or so \u2014 which might just be the baseline it needs to remain competitive with OpenAI. (Training models isn\u2019t cheap, after all.) It\u2019s well on its way, with $2 billion and $4 billion in committed capital and pledges from Google and Amazon, respectively, and well over a billion combined from other backers.<\/p>\n<\/p><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2024\/03\/04\/anthropic-claims-its-new-models-beat-gpt-4\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI startup Anthropic, backed by hundreds of millions in venture capital (and perhaps soon hundreds of millions more), today announced the latest version of its<\/p>\n","protected":false},"author":1,"featured_media":77638,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[178],"tags":[],"class_list":["post-77637","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/77637","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/comments?post=77637"}],"version-history":[{"count":0,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/77637\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media\/77638"}],"wp:attachment":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media?parent=77637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/categories?post=77637"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/tags?post=77637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}