Prof. Dr. jur. Günter Reiner (HSU/UniBw H) & Prof. Dr. rer. nat. Daniel Braun (University of Marburg)
The amount of published court decisions has increased dramatically in recent decades. It is becoming increasingly difficult to process the information contained therein. Indexing is one of the oldest tools for accessing written textual information. It has proven its worth in the age of full-text searches. Human indexing is expensive and time-consuming. The HPC project serves as pre-study for a DFG project proposal on the efficient automatic structured indexing and automatic comparison of German-language court decisions using unsupervised open-source Natural Language Processing (NLP) methods. One of the approaches investigated will be the use of Large language models (LLMs). Data sets from a cooperation with a large legal publishing house are available as gold standard to compare against.
So far, there have been preliminary studies on the unstructured keywording of French- and German-language court decisions using less computationally intensive classical unsupervised approaches (e.g. topic modelling). The results were of mixed quality, with irrelevant keywords that could not easily be excluded using a stop word list as well as misleading or simply wrong keywords all too often appearing alongside relevant ones. The previous methods must now be adapted for use with large language models. Previous literature on the use of LLMs for extracting keywords, so far, is limited.
The aim of the HPC project, which is scheduled to take one month, is to develop the necessary technical foundation for a highly simplified first prototype that runs on the HPC infrastructure and is able to extract keywords from court decisions using the open-weights LLM LlaMA 3. This prototype will serve as a starting point for the larger research project, which is being proposed in the DFG application and is scheduled to last three years.