Construction of a Japanese Financial Benchmark for Large Language Model Evaluation in the Financial Domain and the Performances of Models [in Japanese]

Masanori HIRANO

The 32nd meeting of Special Interest Group on Financial Informatics of Japanese Society for Artificial Intelligence, pp. 28-35, Mar. 2, 2024

Conference

The 32nd meeting of Special Interest Group on Financial Informatics of Japanese Society for Artificial Intelligence (SIG-FIN)

Abstract

大規模言語モデル(LLM)の発展とともに、分野や言語に特化した言語モデルの構築の必要性が議論されてきている。その中で、現在の大規模言語モデルがどの程度の性能を発揮するかを分野に特化して評価するベンチマークの必要性が高まっている。そこで、本研究では、日本語かつ金融分野に特化した複数タスクからなるベンチマークの構築を行い、主要なモデルに対するベンチマーク計測を行った。その結果、現時点ではGPT-4が突出していることと、構築したベンチマークが有効に機能していることを確認できた。一方で、それ以外のモデルのパフォーマンスも向上してきており、それらのモデルのパフォーマンス動向についても報告する。

Keywords

LLM; Benchmark; Finance; Japanese;

doi

10.11517/jsaisigtwo.2023.FIN-032_28

bibtex

@inproceedings{Hirano2024-sigfin32,
  title={{Construction of a Japanese Financial Benchmark for Large Language Model Evaluation in the Financial Domain and the Performances of Models [in Japanese]}},
  author={Masanori HIRANO},
  booktitle={The 32nd meeting of Special Interest Group on Financial Informatics of Japanese Society for Artificial Intelligence},
  pages={28-35},
  doi={10.11517/jsaisigtwo.2023.FIN-032_28},
  year={2024}
}