Today we're launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results