CONTACT
Leap forward
from Shibuya
and move
the world

TAI AAI #10 - AI x Speech has been successfully held.

TAI AAI #10 - AI x Speech was held on June 5.

 

This Tokyo AI (TAI) Advanced AI (AAI) group session featured speakers on AI in Speech.

 

Tokyo AI (TAI) is a community composed of people based in Tokyo and working with, studying, or investing in AI. We are engineers, product managers, entrepreneurs, academics, and investors intending to build a strong "AI coreˮ in Tokyo.

Find more in our overview: https://bit.ly/tai_overview

Speakers

  • the CEO and co-founder of Paraparas
    Qi Chen
    Title: From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation
    Abstract: This talk provides a practical guide to building Japanese ASR models based on my experience developing ReazonSpeech-k2-v2, an open-source I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent variations and real-time performance requirements. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and researchers in the field of Japanese speech recognition. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and entrepreneurs looking to bridge the gap between academic speech research and real-world applications.
    Bio: Qi Chen is the CEO and co-founder of Paraparas, developing Paralogue, a platform that transforms dialogue and monologue into personalized He completed his PhD in Cognitive Science under Douglas Hofstadter at Indiana University Bloomington, focusing on computational models of analogical thinking. His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus on creating technology that enhances rather than replaces human capabilities.
    Title: From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation
    Abstract: This talk provides a practical guide to building Japanese ASR models based on my experience developing ReazonSpeech-k2-v2, an open-source I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent variations and real-time performance requirements. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and researchers in the field of Japanese speech recognition. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and entrepreneurs looking to bridge the gap between academic speech research and real-world applications.
    Bio: Qi Chen is the CEO and co-founder of Paraparas, developing Paralogue, a platform that transforms dialogue and monologue into personalized He completed his PhD in Cognitive Science under Douglas Hofstadter at Indiana University Bloomington, focusing on computational models of analogical thinking. His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus on creating technology that enhances rather than replaces human capabilities.
  • PhD student at Science Tokyo
    Nathania Nah
    Title: Exploring Disentanglement in Speech
    Abstract: Disentanglement is a method that aims to identify and separate distinctive generative factors in the data, thus removing the sensitivity of the representation to variations in data that are uninformative to the task. Traditionally, speech disentanglement has been used in speaker relevant and generation tasks, such as speaker verification, voice Traditionally, speech disentanglement has been used in speaker relevant classification and generation tasks, such as speaker verification, voice conversion, speech synthesis, etc. However, these works often focus on the We will discuss our aims to disentangle features in pretrained speech representations to better identify We will discuss our aims to disentangle features in pretrained speech representations to better identify how they are used in downstream tasks as well as as improve the understanding of the features captured in self-supervised methods, with the ultimate goal to generate models with better explainability. Bio: Nathania is a Ph.
    Bio: Nathania is a PhD student at Science Tokyo (formerly Tokyo Tech) studying machine learning in speech. Her work primarily consists of multimodal recognition of personality and emotion, and she is currently focusing on affective computing in speech at Shinoda Lab. Her work primarily consists of multimodal recognition of personality and emotion, and she is passionate about ways to improve personal and mental well-being with new technologies.
    Title: Exploring Disentanglement in Speech
    Abstract: Disentanglement is a method that aims to identify and separate distinctive generative factors in the data, thus removing the sensitivity of the representation to variations in data that are uninformative to the task. Traditionally, speech disentanglement has been used in speaker relevant and generation tasks, such as speaker verification, voice Traditionally, speech disentanglement has been used in speaker relevant classification and generation tasks, such as speaker verification, voice conversion, speech synthesis, etc. However, these works often focus on the We will discuss our aims to disentangle features in pretrained speech representations to better identify We will discuss our aims to disentangle features in pretrained speech representations to better identify how they are used in downstream tasks as well as as improve the understanding of the features captured in self-supervised methods, with the ultimate goal to generate models with better explainability. Bio: Nathania is a Ph.
    Bio: Nathania is a PhD student at Science Tokyo (formerly Tokyo Tech) studying machine learning in speech. Her work primarily consists of multimodal recognition of personality and emotion, and she is currently focusing on affective computing in speech at Shinoda Lab. Her work primarily consists of multimodal recognition of personality and emotion, and she is passionate about ways to improve personal and mental well-being with new technologies.
  • Meishu Song
    Title: Speech-to-Speech Technology: Recent Advances and Challenges
    Abstract: This talk will outline the core architecture and technical principles of speech-to-speech systems, exploring the evolution from The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in low-resource languages and emotion preservation.
    Bio: With a Ph.D. in Affective Computing AI from the University of Tokyo, Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotion preservation. Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotional companionship products.
    Meishu possesses extensive expertise in emotion recognition and deep learning, with applications spanning diverse sectors including education, mental health, and automotive industries. Her work bridges cutting-edge technology with human-centered design to create AI systems that better Her work bridges cutting-edge technology with human-centered design to create AI systems that better understand and respond to human emotions now!
    Title: Speech-to-Speech Technology: Recent Advances and Challenges
    Abstract: This talk will outline the core architecture and technical principles of speech-to-speech systems, exploring the evolution from The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in low-resource languages and emotion preservation.
    Bio: With a Ph.D. in Affective Computing AI from the University of Tokyo, Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotion preservation. Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotional companionship products.
    Meishu possesses extensive expertise in emotion recognition and deep learning, with applications spanning diverse sectors including education, mental health, and automotive industries. Her work bridges cutting-edge technology with human-centered design to create AI systems that better Her work bridges cutting-edge technology with human-centered design to create AI systems that better understand and respond to human emotions now!

Organizers

  • Kai Arulkumaran
    Previously, he completed his PhD in Bioengineering at Imperial College London and had work experience at DeepMind, FAIR, Microsoft Research, Twitter Cortex, and NNAISENSE. His research areas are deep learning, reinforcement learning, evolutionary computation, and computational areas are deep learning, reinforcement learning, evolutionary computation, and computational neuroscience.
  • Craig Sherstan
    His current research is on the application of RL to create AI opponents for the video game Gran Turismo. Previously, he completed his PhD in Reinforcement Learning at the University of Alberta, Canada as part of the Bionic Limbs for Improved Natural Control Lab. past experience working with human-computer interfaces, robotics, and various software industries.
  • Ilya Kulyatin
    Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.

Time Table

18:00 - 18:30

Doors open

18:30 - 18:40

Introduction

18:40 - 19:10

From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation (Qi Chen)

19:10 - 19:40

Exploring Disentanglement in Speech (Nathania Nah)

19:40 - 20:10

Speech-to-Speech Technology: Recent Advances and Challenges (Meishu Song)

20:10 - 21:00

Networking

PREV

SDS DAYS 3 DAYS TO EXPLORE DEEP TECH INNOVATIONS TRANSFORMING SOCIETY