A corporation growing math benchmarks for AI didn’t disclose that it had obtained funding from OpenAI till comparatively lately, drawing allegations of impropriety from some within the AI group.
Epoch AI, a nonprofit primarily funded by Open Philanthropy, a analysis and grantmaking basis, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a take a look at with expert-level issues designed to measure an AI’s mathematical expertise, was one of many benchmarks OpenAI used to demo its upcoming flagship AI, o3.
In a publish on the discussion board LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t knowledgeable of OpenAI’s involvement till it was made public.
“The communication about this has been non-transparent,” Meemi wrote. “In my opinion Epoch AI ought to have disclosed OpenAI funding, and contractors ought to have clear details about the potential of their work getting used for capabilities, when selecting whether or not to work on a benchmark.”
On social media, some customers raised issues that the secrecy may erode FrontierMath’s popularity as an goal benchmark. Along with backing FrontierMath, OpenAI had visibility into lots of the issues and options within the benchmark — a undeniable fact that Epoch AI didn’t disclose previous to December 20, when o3 was introduced.
In a publish on X, Stanford PhD arithmetic pupil Carina Hong additionally alleged that OpenAI has privileged entry to FrontierMath because of its association with Epoch AI, and that this isn’t sitting properly with some contributors.
“Six mathematicians who considerably contributed to the FrontierMath benchmark confirmed [to me] … that they’re unaware that OpenAI may have unique entry to this benchmark (and others gained’t),” Hong mentioned. “Most specific they don’t seem to be certain they’d have contributed had they identified.”
In a reply to Meemi’s publish, Tamay Besiroglu, affiliate director of Epoch AI and one of many group’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, however admitted that Epoch AI “made a mistake” in not being extra clear.
“We have been restricted from disclosing the partnership till across the time o3 launched, and in hindsight we should always have negotiated more durable for the flexibility to be clear to the benchmark contributors as quickly as doable,” Besiroglu wrote. “Our mathematicians deserved to know who might need entry to their work. Regardless that we have been contractually restricted in what let’s imagine, we should always have made transparency with our contributors a non-negotiable a part of our settlement with OpenAI.”
Besiroglu added that whereas OpenAI has entry to FrontierMath, it has a “verbal settlement” with Epoch AI to not use FrontierMath’s downside set to coach its AI. (Coaching an AI on FrontierMath could be akin to educating to the take a look at.) Epoch AI additionally has a “separate holdout set” that serves as an extra safeguard for impartial verification of FrontierMath benchmark outcomes, Besiroglu mentioned.
“OpenAI has … been absolutely supportive of our determination to take care of a separate, unseen holdout set,” Besiroglu wrote.
Nonetheless, muddying the waters, Epoch AI lead mathematician Ellot Glazer famous in a publish on Reddit that Epoch AI hasn’t be capable of independently confirm OpenAI’s FrontierMath o3 outcomes.
“My private opinion is that [OpenAI’s] rating is legit (i.e., they didn’t practice on the dataset), and that they don’t have any incentive to lie about inner benchmarking performances,” Glazer mentioned. “Nonetheless, we are able to’t vouch for them till our impartial analysis is full.”
The saga is but one other instance of the problem of growing empirical benchmarks to guage AI — and securing the mandatory assets for benchmark growth with out creating the notion of conflicts of curiosity.