CONTACT
Leap forward
from Shibuya
and move
the world

"TAI AAI #10 - AI x Speech" will be held

TAI AAI #10 – AI x Speechを、6月5日に開催いたします。

​Join us for an evening at the intersection of AI and speech technology, hosted by the Tokyo AI (TAI)’s Advanced AI (AAI) group. This session brings together experts pushing the boundaries of ASR, speech disentanglement, and speech-to-speech translation. From real-world applications to cutting-edge research, you’ll explore how AI is transforming how we understand, process, and communicate through human speech.

Speakers

  • the CEO and co-founder of Paraparas
    Qi Chen
    Title: From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation
    Abstract: This talk provides a practical guide to building Japanese ASR models based on my experience developing ReazonSpeech-k2-v2, an open-source I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent variations and real-time performance requirements. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and researchers in the field of Japanese speech recognition. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and entrepreneurs looking to bridge the gap between academic speech research and real-world applications.
    Bio: Qi Chen is the CEO and co-founder of Paraparas, developing Paralogue, a platform that transforms dialogue and monologue into personalized He completed his PhD in Cognitive Science under Douglas Hofstadter at Indiana University Bloomington, focusing on computational models of analogical thinking. His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus on creating technology that enhances rather than replaces human capabilities.
    Title: From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation
    Abstract: This talk provides a practical guide to building Japanese ASR models based on my experience developing ReazonSpeech-k2-v2, an open-source I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent variations and real-time performance requirements. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and researchers in the field of Japanese speech recognition. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and entrepreneurs looking to bridge the gap between academic speech research and real-world applications.
    Bio: Qi Chen is the CEO and co-founder of Paraparas, developing Paralogue, a platform that transforms dialogue and monologue into personalized He completed his PhD in Cognitive Science under Douglas Hofstadter at Indiana University Bloomington, focusing on computational models of analogical thinking. His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus on creating technology that enhances rather than replaces human capabilities.
  • PhD student at Science Tokyo
    Nathania Nah
    Title: Exploring Disentanglement in Speech
    Abstract: Disentanglement is a method that aims to identify and separate distinctive generative factors in the data, thus removing the sensitivity of the representation to variations in data that are uninformative to the task. Traditionally, speech disentanglement has been used in speaker relevant and generation tasks, such as speaker verification, voice Traditionally, speech disentanglement has been used in speaker relevant classification and generation tasks, such as speaker verification, voice conversion, speech synthesis, etc. However, these works often focus on the We will discuss our aims to disentangle features in pretrained speech representations to better identify We will discuss our aims to disentangle features in pretrained speech representations to better identify how they are used in downstream tasks as well as as improve the understanding of the features captured in self-supervised methods, with the ultimate goal to generate models with better explainability. Bio: Nathania is a Ph.
    Bio: Nathania is a PhD student at Science Tokyo (formerly Tokyo Tech) studying machine learning in speech. Her work primarily consists of multimodal recognition of personality and emotion, and she is currently focusing on affective computing in speech at Shinoda Lab. Her work primarily consists of multimodal recognition of personality and emotion, and she is passionate about ways to improve personal and mental well-being with new technologies.
    Title: Exploring Disentanglement in Speech
    Abstract: Disentanglement is a method that aims to identify and separate distinctive generative factors in the data, thus removing the sensitivity of the representation to variations in data that are uninformative to the task. Traditionally, speech disentanglement has been used in speaker relevant and generation tasks, such as speaker verification, voice Traditionally, speech disentanglement has been used in speaker relevant classification and generation tasks, such as speaker verification, voice conversion, speech synthesis, etc. However, these works often focus on the We will discuss our aims to disentangle features in pretrained speech representations to better identify We will discuss our aims to disentangle features in pretrained speech representations to better identify how they are used in downstream tasks as well as as improve the understanding of the features captured in self-supervised methods, with the ultimate goal to generate models with better explainability. Bio: Nathania is a Ph.
    Bio: Nathania is a PhD student at Science Tokyo (formerly Tokyo Tech) studying machine learning in speech. Her work primarily consists of multimodal recognition of personality and emotion, and she is currently focusing on affective computing in speech at Shinoda Lab. Her work primarily consists of multimodal recognition of personality and emotion, and she is passionate about ways to improve personal and mental well-being with new technologies.
  • Meishu Song
    Title: Speech-to-Speech Technology: Recent Advances and Challenges
    Abstract: This talk will outline the core architecture and technical principles of speech-to-speech systems, exploring the evolution from The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in low-resource languages and emotion preservation.
    Bio: With a Ph.D. in Affective Computing AI from the University of Tokyo, Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotion preservation. Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotional companionship products.
    Meishu possesses extensive expertise in emotion recognition and deep learning, with applications spanning diverse sectors including education, mental health, and automotive industries. Her work bridges cutting-edge technology with human-centered design to create AI systems that better Her work bridges cutting-edge technology with human-centered design to create AI systems that better understand and respond to human emotions now!
    Title: Speech-to-Speech Technology: Recent Advances and Challenges
    Abstract: This talk will outline the core architecture and technical principles of speech-to-speech systems, exploring the evolution from The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in low-resource languages and emotion preservation.
    Bio: With a Ph.D. in Affective Computing AI from the University of Tokyo, Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotion preservation. Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotional companionship products.
    Meishu possesses extensive expertise in emotion recognition and deep learning, with applications spanning diverse sectors including education, mental health, and automotive industries. Her work bridges cutting-edge technology with human-centered design to create AI systems that better Her work bridges cutting-edge technology with human-centered design to create AI systems that better understand and respond to human emotions now!

Organizers

  • Kai Arulkumaran
    Previously, he completed his PhD in Bioengineering at Imperial College London and had work experience at DeepMind, FAIR, Microsoft Research, Twitter Cortex, and NNAISENSE. His research areas are deep learning, reinforcement learning, evolutionary computation, and computational areas are deep learning, reinforcement learning, evolutionary computation, and computational neuroscience.
  • Craig Sherstan
    His current research is on the application of RL to create AI opponents for the video game Gran Turismo. Previously, he completed his PhD in Reinforcement Learning at the University of Alberta, Canada as part of the Bionic Limbs for Improved Natural Control Lab. past experience working with human-computer interfaces, robotics, and various software industries.
  • Ilya Kulyatin
    Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.

Time Table

18:00 - 18:30

Doors open

18:30 - 18:40

Introduction

18:40 - 19:10

From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation (Qi Chen)

19:10 - 19:40

Exploring Disentanglement in Speech (Nathania Nah)

19:40 - 20:10

Speech-to-Speech Technology: Recent Advances and Challenges (Meishu Song)

20:10 - 21:00

Networking

PREV

「SDS DAYS 社会を変革するディープテック・イノベーションを探索する3日間」を開催

Sakura Deeptech Shibuya Accelerator Kickoff Week" was held.

NEXT