New Google AI Makes it Easier to Record Business Presentations in Loud Environments

0

Story by Alison DeNisco Rayome , Tech Republic

Google’s deep learning audio-visual model can enhance the voice of a single person in a video and mute other background noise.

Building a slide deck, pitch, or presentation? Here are the big takeaways:

  • Google researchers unveiled a deep learning audio-visual model for isolating a single speech signal from a mix of sounds, including other voices and background noise.
  • The model has potential applications in speech enhancement and recognition in videos and in video conferencing.

When you find yourself in a noisy conference hall or networking event, it’s usually pretty easy to focus your attention on the particular person you’re talking to, while mentally “muting” the other voices and sounds in the area. This capability—known as the cocktail party effect—comes naturally to humans, but has remained a challenge for computers in terms of automatically separating an audio signal into its individual speech sources.

At least, until now: Google researchers have developed a deep learning audio-visual model for isolating a single speech signal from a mix of sounds, including other voices and background noise. As detailed in a new paper, the researchers were able to create videos with a computer in which specific people’s voices are enhanced, while all other sounds are toned down.

Continue Reading…

Share.

About Author

Let's Do Video is committed to bringing you the best coverage of visual collaboration technologies and related productivity tools and strategies. All aggregated content and press releases are posted under this listing.

Leave A Reply