{"id":93317,"date":"2025-04-12T03:34:47","date_gmt":"2025-04-12T03:34:47","guid":{"rendered":"https:\/\/neclink.com\/index.php\/2025\/04\/12\/metas-vanilla-maverick-ai-model-ranks-below-rivals-on-a-popular-chat-benchmark\/"},"modified":"2025-04-12T03:34:47","modified_gmt":"2025-04-12T03:34:47","slug":"metas-vanilla-maverick-ai-model-ranks-below-rivals-on-a-popular-chat-benchmark","status":"publish","type":"post","link":"https:\/\/neclink.com\/index.php\/2025\/04\/12\/metas-vanilla-maverick-ai-model-ranks-below-rivals-on-a-popular-chat-benchmark\/","title":{"rendered":"Meta&#8217;s vanilla Maverick AI model ranks below rivals on a popular chat benchmark"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Earlier this week, Meta <a href=\"https:\/\/techcrunch.com\/2025\/04\/06\/metas-benchmarks-for-its-new-ai-models-are-a-bit-misleading\/\">landed in hot water<\/a> for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident <a href=\"https:\/\/www.theverge.com\/meta\/645012\/meta-llama-4-maverick-benchmarks-gaming\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">prompted the maintainers of LM Arena to apologize<\/a>, change their policies, and score the unmodified, vanilla Maverick.<\/p>\n<p class=\"wp-block-paragraph\">Turns out, it\u2019s not very competitive.<\/p>\n<p class=\"wp-block-paragraph\">The unmodified Maverick, \u201cLlama-4-Maverick-17B-128E-Instruct,\u201d <a href=\"https:\/\/lmarena.ai\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">was ranked below models<\/a> including OpenAI\u2019s GPT-4o, Anthropic\u2019s Claude 3.5 Sonnet, and Google\u2019s Gemini 1.5 Pro as of Friday. Many of these models are months old.<\/p>\n<blockquote class=\"wp-block-quote twitter-tweet is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">The release version of Llama 4 has been added to LMArena after it was found out they cheated, but you probably didn\u2019t see it because you have to scroll down to 32nd place which is where is ranks <a href=\"https:\/\/t.co\/A0Bxkdx4LX\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">pic.twitter.com\/A0Bxkdx4LX<\/a><\/p>\n<p class=\"wp-block-paragraph\">\u2014 \u03c1:\u0261e\u03c3n (@pigeon__s) <a href=\"https:\/\/twitter.com\/pigeon__s\/status\/1910705956486336586?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">April 11, 2025<\/a><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">Why the poor performance? Meta\u2019s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was \u201coptimized for conversationality,\u201d the company explained in a <a href=\"https:\/\/ai.meta.com\/blog\/llama-4-multimodal-intelligence\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">chart published<\/a> last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.<\/p>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/techcrunch.com\/2024\/09\/05\/the-ai-industry-is-obsessed-with-chatbot-arena-but-it-might-not-be-the-best-benchmark\/\">As we\u2019ve written about before<\/a>, for various reasons, LM Arena has never been the most reliable measure of an AI model\u2019s performance. Still, tailoring a model to a benchmark \u2014 besides being misleading \u2014 makes it challenging for developers to predict exactly how well the model will perform in different contexts.<\/p>\n<p class=\"wp-block-paragraph\">In a statement, a Meta spokesperson told TechCrunch that Meta experiments with \u201call types of custom variants.\u201d <\/p>\n<p class=\"wp-block-paragraph\">\u201c\u2018Llama-4-Maverick-03-26-Experimental\u2019 is a chat optimized version we experimented with that also performs well on LMArena,\u201d the spokesperson said. \u201cWe have now released our open source version and will see how developers customize Llama 4 for their own use cases. We\u2019re excited to see what they will build and look forward to their ongoing feedback.\u201d<\/p>\n<\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><br \/>\n<br \/><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/04\/11\/metas-vanilla-maverick-ai-model-ranks-below-rivals-on-a-popular-chat-benchmark\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score<\/p>\n","protected":false},"author":1,"featured_media":93318,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-93317","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-business"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/93317","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/comments?post=93317"}],"version-history":[{"count":0,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/93317\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media\/93318"}],"wp:attachment":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media?parent=93317"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/categories?post=93317"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/tags?post=93317"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}