(21−23) However, its construction was reliant on tedious manual labeling and training. A battery database was recently autogenerated from large-scale data extraction of scientific papers using the ”chemistry-aware” NLP tool, ChemDataExtractor. While BERT models have been used extensively in biomedical areas, they are rarely found in the domain of chemistry and materials science, especially in the field of battery research, in which millions of research papers have been published. However, there is a lack of huge language models such as BERT in the field of scientific research. Meanwhile, the latest models are able to capture more complex concepts with minimal human labeling, especially the bidirectional long short-term memory (LSTM)-based models, (10) and the transformer-based GPT-3 (11) and BERT. The first-generation language models (Word2Vec, (7) Glove, (8) FastText (9)) have proven abilities in capturing semantic meanings of words. (1−6) Language modeling is one of the NLP applications that has undergone considerable development in the past decade, and it lays the foundation of many other NLP tasks. Yet, advancements in Artificial Intelligence and Natural Language Processing (NLP) are enabling NLP-based text-mining techniques to provide a way for rapid large-scale information retrieval and data extraction from the materials-science literature without too much human intervention. The number of scientific publications in the area of battery research has been growing exponentially in recent years, but these huge textual resources are not being explored thoroughly by scientists. We also provide a website application for its interactive use and visualization. The fine-tuned BatteryBERT was then used to perform battery database enhancement. Our BatteryBERT models were found to outperform the original BERT models on the specific battery tasks. The pretrained BatteryBERT models were then fine-tuned on downstream tasks, including battery paper classification and extractive question-answering for battery device component classification that distinguishes anode, cathode, and electrolyte materials. They have been trained specifically on a corpus of battery research papers. To this end, we realized six battery-related BERT models, namely, BatteryBERT, BatteryOnlyBERT, and BatterySciBERT, each of which consists of both cased and uncased models. The Bidirectional Encoder Representations from Transformers (BERT) model, trained on a large data set in an unsupervised way, provides a route to process the scientific text automatically with minimal human effort. However, it is difficult to explore and retrieve useful information efficiently from these large unstructured sets of text. A great number of scientific papers are published every year in the field of battery research, which forms a huge textual data source.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |