[Preprint] Dec. 8, 2023
With the recent development of large language models (LLMs), the models focusing on the certain domain and language has been discussed in its necessity. There is also a growing need for benchmarks to evaluate the performance of current large language models in each domain. Therefore, in this study, we constructed a benchmark consisting of multiple tasks specific to the Japanese and the financial domain, and conducted benchmark measurements on some main models. As a result, we confirmed that the GPT-4 is currently outstanding and that the constructed benchmarks are functioning effectively.
Large Language Model; Benchmark; Finance; Japanese;
@preprint{Hirano2023-pre-finllm, title={{Construction of a Japanese Financial Benchmark for Large Language Model Evaluation in the Financial Domain [in Japanese]}}, author={Masanori HIRANO}, doi={10.51094/jxiv.564}, year={2023} }