Knowledge Extraction for Industry Classification in Large Language Models using Diverse Templates and Synthetic Data [in Japanese]

Kazuki YANO, Masanori HIRANO, Kentaro IMAJO

The 34th meeting of Special Interest Group on Financial Informatics of Japanese Society for Artificial Intelligence, pp. 132-137, Mar. 2, 2025

Conference

The 34th meeting of Special Interest Group on Financial Informatics of Japanese Society for Artificial Intelligence (SIG-FIN)

Abstract

大規模言語モデル（LLM）の金融分野への応用が注目を集める中，LLM は金融市場特有の専門知識を推論時に活用する必要がある．特に，日本の金融市場に特有の知識である業種区分は，投資判断における重要な指標である．しかし，金融特化型LLM であっても，その知識を効果的に抽出・活用できていないのが現状である．本研究では，LLM における業種区分の知識抽出能力を向上させるため，質問応答形式のテンプレートを用い，ルールベースおよびLLM ベースで合成したデータセットで微調整を行い，その効果を検証する．実験結果から，LLM ベースの合成データの使用およびテンプレートの多様化が，モデルの業種区分に関する知識抽出能力を有意に向上させることを確認した．さらに，合成データのパープレキシティと業種区分の正答率との間に相関があることを示し，効果的なデータセット設計の指針を提供する．

Keywords

Large Language Model; Fine-tuning; Synthetic Data;

doi

10.11517/jsaisigtwo.2025.FIN-034_132

bibtex

@inproceedings{Yano2025-sigfin34,
  title={{Knowledge Extraction for Industry Classification in Large Language Models using Diverse Templates and Synthetic Data [in Japanese]}},
  author={Kazuki YANO and Masanori HIRANO and Kentaro IMAJO},
  booktitle={The 34th meeting of Special Interest Group on Financial Informatics of Japanese Society for Artificial Intelligence},
  pages={132-137},
  doi={10.11517/jsaisigtwo.2025.FIN-034_132},
  year={2025}
}