Literature Mining with Large Language Models to Assist the Development of Sustainable Building Materials
Abstract
Concrete industry, as one of the significant sources of carbon emissions, drives the urgency for its decarbonization that requires a shift to alternative materials. However, the absence of systematic knowledge summary remains a challenge for further development of sustainable building materials. This work offers a cost-efficient strategy for information extraction tasks in complex terminology settings using small (2.8B) large language models (LLMs) with well-designed instruction-completion schemes and fine-tuning strategies, introducing a dataset cataloging civil engineering applications of alternative materials. The Multiple Choice instruction scheme significantly improves model accuracies in entity inference from non-Noun-Phrase sources, with supervised fine-tuning benefiting from straightforward tokenized representations of choices. We also demonstrate the utility of the dataset by extracting valuable insights into promising applications of alternative materials from knowledge graph representations.