Toggle Poster Visibility
Oral
Fri Apr 25 12:30 AM -- 12:42 AM (PDT) @ Garnet 213-215 None
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
[
OpenReview]
Oral
Fri Apr 25 12:42 AM -- 12:54 AM (PDT) @ Garnet 213-215 None
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
[
Slides]
[
OpenReview]
Oral
Fri Apr 25 12:54 AM -- 01:06 AM (PDT) @ Garnet 213-215 None
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
[
OpenReview]
Oral
Fri Apr 25 01:06 AM -- 01:18 AM (PDT) @ Garnet 213-215 None
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
[
OpenReview]
Oral
Fri Apr 25 01:18 AM -- 01:30 AM (PDT) @ Garnet 213-215 None
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
[
OpenReview]
Oral
Fri Apr 25 01:30 AM -- 01:42 AM (PDT) @ Garnet 213-215 None
AFlow: Automating Agentic Workflow Generation
[
OpenReview]
Successful Page Load