Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI4MAT-ICLR-2025: AI for Accelerated Materials Design

nanoMINER: Multimodal Information Extraction for Nanomaterials

Roman Odobesku · Karina Romanova · Sabina Mirzaeva · Oleg Zagorulko · Roman Sim · Rustem Khakimullin · Julia Razlivina · Andrei Dmitrenko · Vladimir Vinogradov

Keywords: [ Automatic knowledge extraction ] [ Multimodal data extraction ] [ Large Language Models ] [ Nanozymes ] [ Nanomaterials ] [ Multi-agent systems ]


Abstract:

Automating structured data extraction from scientific literature is a critical challenge with broad implications across domains. We present nanoMINER, a multi-agent system that integrates large language models and multimodal analysis for scientific data extraction on nanomaterials. At its core, the ReAct agent orchestrates specialized agents to ensure comprehensive data extraction. We demonstrate its efficacy by automating the assembly of nanomaterial and nanozyme datasets, previously manually compiled by domain experts. While we achieve near-perfect extraction precision (0.98) for specific numerical parameters and excellent extraction quality for textual parameters, significant challenges remain in multimodal integration, visual data interpretation, and cross-format generalization. This paper explores the engineering complexities behind scientific data extraction systems and highlights open challenges that must be addressed to fully automate the knowledge extraction pipeline. We discuss how solving these challenges could dramatically accelerate materials discovery by eliminating manual data extraction bottlenecks and enabling truly data-driven research approaches.

Chat is not available.