Large Acoustic Models: another challenge for the ecological AI world

Where: Sala conferenze (3th floor)
When: January 31, 2024, 3pm

Francesco Cutugno is associate professor of Natural Language Processing and Interaction Design (within the frame of the Software Engineering courses) at the University of Naples Federico II. From 2013 to 2018 he was the Italian Association of Speech Sciences president. He is presently a member of the board of the Italian Association of Computational Linguistics. Francesco Cutugno directs Urban/Eco an interdepartmental research center devoted to the study of applications of Artificial Intelligence to conversational agents, applications in architecture, and cultural heritage and language.

Abstract:
These days, many nations, many researchers belonging to public institutions, and many research organizations are attempting the aim of building public, freely accessible, Large (Generative) Language Models (LLM). The enterprise also requires the collection of the huge amount of training data normally required to pursue this purpose. Similarly, another fundamental goal of public research should be to provide the scientific community with AI-based Automatic Speech Recognition systems that, equivalently to LLMs, require massive computational load, infrastructures, and thousands of hours of audio data both labeled and unlabeled. In analogy with LLM, we call these systems Large Acoustic Models (LAM). My talk will deal with the leading technologies in this field, will describe the needed data profile, and will propose alternatives to the current approaches aiming at partly simplifying the complexity of the task of transcribing speech. Some conclusive remarks will be devoted to some aspects of explainability hidden in the DNN approach to the problem.