freedom

Language corpora and Text Encoding Initiative(TEI)

Open standards for documented linguistic knowledge Language corpora have become a foundational infrastructure for linguistics, natural language processing, and contemporary artificial intelligence. The term corpus does not merely denote a collection of texts but implies deliberate selection, structuring, and documentation according to explicit design criteria. Within this context, the Text Encoding Initiative Guidelines provide a … Read more

Synthetic Data, Real Risks: Why AI Must Be Trained on High-Quality Open Data

A seductive solution with hidden dangers Synthetic data is often presented as a clever fix for three persistent challenges in machine learning: data scarcity, unfair training distributions and privacy restrictions. At the same time, some argue it could democratise AI development by reducing dependence on large proprietary datasets held by a few dominant companies. But … Read more

Apertus AI: A Fully Open Multilingual LLM for Local and Customized Deployment

Apertus AI is one of the most transparent and technically mature efforts to build a fully open-source large language model. Developed in Switzerland and released together with its source code, training documentation and model weights, it offers an unprecedented level of reproducibility and independence from closed ecosystems. This makes it ideal for researchers, public-sector institutions … Read more

Building a Fully Open Greek LLM: A Three-Millennia Language Model Powered by Open Data Infrastructure

The rapid evolution of fully open large language models represents a transformative moment for countries that possess rich linguistic and cultural heritage. Over the past two years, the global AI community has shown that high-performance LLMs can be built openly, with transparent pipelines, published datasets and weights, and licenses that support both research and commercial … Read more

Meltemi, The first open-source Large Language Model for Greek

Large Language Models (LLMs) have revolutionized the field of AI, opening up new opportunities for research and industry applications. However, LLMs demonstrate impressive capabilities only in high-resource languages, in particular in English, while their performance varies substantially across different languages. Especially in the case of low-resourced languages, such as Greek, existing open-source LLMs are underperforming … Read more