Data is becoming the gold, oil and silicon of the future, the great source of wealth that is intuited for the world to come. Data, in industrial quantities, are those that have been circulating for four years between the National Library of Spain (BNE) and the Barcelona Supercomputing Center (BSC), the national supercomputing center, within the marIA program. It is not a transcription error: the I and A of the name are capitalized in a graphic game that refers to the acronym for Artificial Intelligence. The marIA program claims data to teach the AI to speak Spanish, from the Word text corrector to the automated telephone service of any company.
Yesterday was an important day for those responsible for marIA. The BSC engineers traveled to Madrid and publicly presented their work together with the BNE librarians. The Secretary of State for Digitization and Artificial Intelligence, Carme Artigas, presided over the event and announced the state investment of 30 million euros for the Plan of Natural Language, which includes the research of several universities, that of the Royal Spanish Academy and, in a preferred place, MARIA. Sources from the Ministry of Economy have explained that the program, financed by the EU, does not yet have an application calendar.
A lot of money for exactly what? «MarIA is a set of resources, essentially language models and data to train those models that serve as basic infrastructure so that Spanish can be incorporated into any AI application that includes the language: Siri, Alexa, automatic translation programs, transcription of texts... We have generated a basic resource for researchers to use in artificial intelligence applications”, explains Marta Villegas, head of the project in Barcelona.
Her job, therefore, is to create a network of millions of word relations that, processed by computation, allow machines to know how to speak Spanish and be able to imitate it. Artificial Intelligence, like many humans, learns languages by listening and reading, by creating its connections, by imitating by ear.
“There are two difficulties in a project like this. The first is to find enough data. These models are trained with deep neural networks that are fed by big data. And the second is to have computational resources, sufficient computing capacity”, explains Villegas. And that is where the National Library comes into the project, the great provider of information with which to feed the computers of the BSC.
Just saw moped Mike for the first time in years, inside speedway and he was telling the same old joke about how to… https://t.co/zECXKAl6f5
— Hales✨ Mon Jul 19 18:11:23 +0000 2021
«The National Library has taken care of the written heritage of the Spanish language since its foundation. In 2009, we began to do the same with written Spanish on the Internet because we realized that otherwise there would be a digital dark age, without sources, explains Mar Pérez Morillo, director of the Digital Processes and Services Division of the BNE. Our work is the same as always, it has not changed because of marIA. The only thing we do is send the data we generate to the BSC so that their machines can train with them.”
Data that includes advertisements, first communion reminders, memes... any source that reproduces the form of a language at a specific time. “As a professional, I find it to be an impressive and very promising project. Suddenly, we see that the great heritage we have created can be used to create research and knowledge”, says Pérez Morillo.
Let's go to the practical applications? «The use of marIA will be in any Artificial Intelligence application that uses language: automatic translation, transcriptions and classification of texts, proofreading, voice applications, conversational systems, summary applications... They are applications that we use on a daily basis without bothering us. let us realize”, explains Marta Villegas. «Academic use, for example, is very interesting. We can now improve the interpretation of large amounts of natural language", adds Pérez Morillo.
From there, the fantasy. When cars drive themselves, will we be able to tell them "Let's go to my mother's house, but take the long way, there's no rush and it's more beautiful"? Can we tell you in Spanish? In Catalan, Galician or Basque...? It is reasonable to think so. marIA's approach includes all the languages of the State and provides for public and free exposure to each phase of work, so that researchers can use it in their applications. At the moment, only English and Mandarin are more advanced than Spanish.
According to the criteria of
The Trust ProjectLearn moreCinemaSorry, Mr. Wes Anderson, a Spaniard has gotten into your eyeThe Final InterviewJorge Dezcallar: "Everyone in Syria has put the spoon in and that's why the war has lasted 10 years"MusicRozalén, National Award for Current Music at just 35 years: "I thought, 'But this isn't up to me now'"