Large Language Models (LLMs) have revolutionized the field of AI, opening up new opportunities for research and industry applications. However, LLMs demonstrate impressive capabilities only in high-resource languages, in particular in English, while their performance varies substantially across different languages. Especially in the case of low-resourced languages, such as Greek, existing open-source LLMs are underperforming due to lack of training data.
Recently, there have been efforts to extend the capabilities of open-source LLMs to other languages (e.g., LeoLM for German, Aguila for Spanish, etc.). This shift provides local communities with alternatives to commercial, siloed solutions, and with control for developing safe and application-optimized models.
To address these challenges, we are thrilled to introduce Meltemi, the first Greek LLM trained by the Institute for Language and Speech Processing of Athena Research Center. Meltemi is a bilingual model; while being highly proficient in English, it has been extended to understand and generate fluent text in Modern Greek. Built on top of Mistral-7B through continual pretraining, Meltemi is trained on a corpus of 28.5 billion tokens that includes high-quality Greek texts.
We release two models trained with 8k context length under the Apache 2.0 license: Meltemi-7B-v1 and Meltemi-Instruct-7B-v1, an instruction-tuned variant that can be used for chatbot applications. The performance of the released models has been assessed on an LLM evaluation suite created by ILSP showing an average improvement of 14.9% compared to Mistral-7B.
The model was trained on an AWS infrastructure that was made available by GRNET.
Read more:
—
Source of article: athenarc