Accurate Online Speaker Diarization with Supervised Learning

November 14, 2018

Table of Contents

Speaker diarization, the process of partitioning an audio stream with multiple people into homogeneous segments associated with each individual, is an important part of speech recognition systems. By solving the problem of “who spoke when”, speaker diarization has applications in many important scenarios, such as understanding medical conversations, video captioning and more. However, training these systems with supervised learning methods is challenging — unlike standard supervised classification tasks, a robust diarization model requires the ability to associate new individuals with distinct speech segments that weren’t involved in training.

Importantly, this limits the quality of both online and offline diarization systems. Online systems usually suffer more, since they require diarization results in real time.

Source: googleblog.com

Tags :

comments powered by Disqus

20 Best YouTube channels for AI and machine learning

What are the most interesting and informative YouTube channels about artificial intelligence (AI) and machine learning? Subscribe to these 20 high-quality channels today to stay up to date with the latest AI and machine learning breakthroughs. Siraj Raval:

New Theory of Intelligence May Disrupt AI and Neuroscience

Recent advancement in artificial intelligence, namely in deep learning, has borrowed concepts from the human brain. The architecture of most deep learning models is based on layers of processing– an artificial neural network that is inspired by the neurons of the biological brain. Yet neuroscientists do not agree on exactly what intelligence is, and how it is formed in the human brain — it’s a phenomena that remains unexplained.

A Google Brain engineer’s guide to entering AI

Note that this guide was written in November 2018 to complement an in-depth conversation on the 80,000 Hours Podcast with Catherine Olsson and Daniel Ziegler on how to transition from computer science and software engineering in general into ML engineering, with a focus on alignment and safety. If you like this guide, we’d strongly encourage you to check out the podcast episode where we discuss some of the instructions here, and other relevant advice. Technical AI safety is a multifaceted area of research, with many sub-questions in areas such as reward learning, robustness, and interpretability.

Accurate Online Speaker Diarization with Supervised Learning

Tags :

Share :

Related Posts

20 Best YouTube channels for AI and machine learning

New Theory of Intelligence May Disrupt AI and Neuroscience

A Google Brain engineer’s guide to entering AI