{"id":89919,"date":"2025-01-20T01:45:01","date_gmt":"2025-01-20T01:45:01","guid":{"rendered":"https:\/\/neclink.com\/index.php\/2025\/01\/20\/ai-benchmarking-organization-criticized-for-waiting-to-disclose-funding-from-openai\/"},"modified":"2025-01-20T01:45:01","modified_gmt":"2025-01-20T01:45:01","slug":"ai-benchmarking-organization-criticized-for-waiting-to-disclose-funding-from-openai","status":"publish","type":"post","link":"https:\/\/neclink.com\/index.php\/2025\/01\/20\/ai-benchmarking-organization-criticized-for-waiting-to-disclose-funding-from-openai\/","title":{"rendered":"AI benchmarking organization criticized for waiting to disclose funding from OpenAI"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">An organization developing math benchmarks for AI didn\u2019t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.<\/p>\n<p class=\"wp-block-paragraph\">Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI\u2019s mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, <a href=\"https:\/\/techcrunch.com\/2024\/12\/20\/openai-announces-new-o3-model\/\">o3<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">In a <a rel=\"nofollow\" href=\"https:\/\/www.lesswrong.com\/posts\/cu2E8wgmbdZbqeWqb\/meemi-s-shortform\">post<\/a> on the forum LessWrong, a contractor for Epoch AI going by the username \u201cMeemi\u201d says that many contributors to the FrontierMath benchmark weren\u2019t informed of OpenAI\u2019s involvement until it was made public. <\/p>\n<p class=\"wp-block-paragraph\">\u201cThe communication about this has been non-transparent,\u201d Meemi wrote. \u201cIn my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.\u201d<\/p>\n<p class=\"wp-block-paragraph\">On social media, <a rel=\"nofollow\" href=\"https:\/\/www.reddit.com\/r\/singularity\/comments\/1i4n0r5\/this_is_so_disappointing_epoch_ai_the_startup\/\">some<\/a> <a rel=\"nofollow\" href=\"https:\/\/x.com\/Mihonarium\/status\/1880944026603376865\">users<\/a> raised concerns that the secrecy could erode FrontierMath\u2019s reputation as an objective benchmark. In addition to backing FrontierMath, OpenAI had visibility into many of the problems and solutions in the benchmark \u2014 a fact that Epoch AI didn\u2019t divulge prior to December 20, when o3 was announced.<\/p>\n<p class=\"wp-block-paragraph\">In a <a rel=\"nofollow\" href=\"https:\/\/x.com\/CarinaLHong\/status\/1880763406179066048\">post<\/a> on X, Stanford PhD mathematics student Carina Hong also alleged that OpenAI has privileged access to FrontierMath thanks to its arrangement with Epoch AI, and that this isn\u2019t sitting well with some contributors. <\/p>\n<p class=\"wp-block-paragraph\">\u201cSix mathematicians who significantly contributed to the FrontierMath benchmark confirmed [to me] \u2026 that they are unaware that OpenAI will have exclusive access to this benchmark (and others won\u2019t),\u201d Hong said. \u201cMost express they are not sure they would have contributed had they known.\u201d<\/p>\n<p class=\"wp-block-paragraph\">In a reply to Meemi\u2019s post, Tamay Besiroglu, associate director of Epoch AI and one of the organization\u2019s co-founders, asserted that the integrity of FrontierMath hadn\u2019t been compromised, but admitted that Epoch AI \u201cmade a mistake\u201d in not being more transparent. <\/p>\n<p class=\"wp-block-paragraph\">\u201cWe were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible,\u201d Besiroglu wrote. \u201cOur mathematicians deserved to know who might have access to their work. Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Besiroglu added that while OpenAI has access to FrontierMath, it has a \u201cverbal agreement\u201d with Epoch AI not to use FrontierMath\u2019s problem set to train its AI. (Training an AI on FrontierMath would be akin to <a rel=\"nofollow\" href=\"https:\/\/en.wikipedia.org\/wiki\/Teaching_to_the_test\">teaching to the test<\/a>.) Epoch AI also has a \u201cseparate holdout set\u201d that serves as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.<\/p>\n<p class=\"wp-block-paragraph\">\u201cOpenAI has \u2026 been fully supportive of our decision to maintain a separate, unseen holdout set,\u201d Besiroglu wrote. <\/p>\n<p class=\"wp-block-paragraph\">However, muddying the waters, Epoch AI lead mathematician Ellot Glazer <a rel=\"nofollow\" href=\"https:\/\/www.reddit.com\/r\/singularity\/comments\/1i4n0r5\/this_is_so_disappointing_epoch_ai_the_startup\/\">noted in a post on Reddit<\/a> that Epoch AI hasn\u2019t be able to independently verify OpenAI\u2019s FrontierMath o3 results.<\/p>\n<p class=\"wp-block-paragraph\">\u201cMy personal opinion is that [OpenAI\u2019s] score is legit (i.e., they didn\u2019t train on the dataset), and that they have no incentive to lie about internal benchmarking performances,\u201d Glazer said. \u201cHowever, we can\u2019t vouch for them until our independent evaluation is complete.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The saga is <a href=\"https:\/\/techcrunch.com\/2024\/09\/05\/the-ai-industry-is-obsessed-with-chatbot-arena-but-it-might-not-be-the-best-benchmark\/\">yet<\/a> <a href=\"https:\/\/techcrunch.com\/2024\/03\/07\/heres-why-most-ai-benchmarks-tell-us-so-little\/\">another<\/a> <a href=\"https:\/\/techcrunch.com\/2024\/11\/05\/people-are-using-games-like-pictionary-to-benchmark-ai-now\/\">example<\/a> of the challenge of developing empirical benchmarks to evaluate AI \u2014 and securing the necessary resources for benchmark development without creating the perception of conflicts of\u00a0interest.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/01\/19\/ai-benchmarking-organization-criticized-for-waiting-to-disclose-funding-from-openai\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>An organization developing math benchmarks for AI didn\u2019t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some<\/p>\n","protected":false},"author":1,"featured_media":89920,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-89919","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-business"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/89919","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/comments?post=89919"}],"version-history":[{"count":0,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/89919\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media\/89920"}],"wp:attachment":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media?parent=89919"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/categories?post=89919"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/tags?post=89919"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}