Key points
- The Ukrainian State Archives has provided 10 TB of data to train the AI model Siaivo.
- The dataset includes documents, manuscripts, laws, court decisions, and media materials.
- The goal is to develop a Ukrainian language model and strengthen AI sovereignty.
- The project is implemented by the Ministry of Digital Transformation and the state enterprise Diia.
- This is the first case of using archival data for digital services.
The State Archival Service of Ukraine has provided 10 terabytes of data for training the national language model Siaivo. This is a large collection of historical materials, documents, and academic texts, equivalent in size to approximately 70,000 books.
The dataset includes:
- manuscripts;
- archival documents;
- legislative acts;
- court decisions;
- media materials;
- dictionaries.
These resources will help create a Ukrainian AI system that better understands national context and works with Ukrainian-language content without losing meaning.
Most global AI assistants generate responses in English and then translate them into Ukrainian, often losing context in the process. To ensure that Siaivo becomes a reliable source of information for people and businesses, we are training it on Ukrainian data.
– Ministry of Digital Transformation of Ukraine.
The Head of the State Archival Service of Ukraine, Anatolii Khromov, noted that this is the first time archival materials have been transferred for the development of digital services.
According to him, by 2026 the number of digital copies in state archives is expected to increase from 150 million to over 200 million.
This is one of the highest rates of digitisation of archival heritage in the world.
– Anatolii Khromov.
More than 50 partners have already joined the initiative, including media organisations, universities, and libraries.
The project is being implemented with the involvement of the Ministry of Digital Transformation of Ukraine and the state enterprise Diia.
Read also:
Ukrainians have named the first national AI Siaivo

