Artificial Intelligence and Machine Learning's Rising Potential to Transform Transcription and Translation for Media Editing


Event Time

Originally Aired - Sunday, April 14   |   10:00 AM - 10:20 AM PT

Event Location

Pass Required: Core Education Collection Pass

Don't have this pass? Register Now!

Info Alert

Create or Log in to myNAB Show to see Videos and Resources.

Videos

Resources

{{video.title}}

Log in to your myNAB Show to join the zoom meeting!

Resources

Info Alert

This Session Has Not Started Yet

Be sure to come back after the session starts to have access to session resources.

Recent developments in artificial intelligence (AI) and machine learning (ML) have provided the ability to generate high-quality transcripts during the editing process. A few examples which are built upon these transcriptions include rich search capabilities, captioning, automatic paper cuts, and timeline editing based on text selection.  

Separately, there are additional AI/ML advances in the production of high-quality text-to-text translation.  These recent advances have improved the accuracy when converting text between languages, capturing the nuances of both the source and target languages. Editing operations which benefit from this include captioning, providing the ability to view the translation composited over the image, and search, where an editor searches in their native tongue for a word or phrase and finds the resulting clip and time offset in the original language.  

A third recent development in the AI/ML arena is the ability to create what is known as a voice print to uniquely identify a given person’s voice. This is represented in a compact mathematical form, based on analysis of the input audio. Conversely, the ability to re-create a voice, known as re-voicing, can be achieved with this same voice print. Based on the voice print, word timing, and transcript, it is now possible to generate speech using a given person’s voice characteristics.  

Editing content that does not match one's native tongue has obvious challenges. As mentioned, editors can use captioning information to assist the editing process, reading translations on clips that do not match their native language. However, novel solutions are on the horizon.   

Combining the above AI/ML approaches of transcript generation, translation, and re-voicing, it is now feasible to automatically re-voice the audio. A given performance can now be edited in a familiar language during the editing process. As an example, consider a clip which has dialogue in Mandarin. A native French speaker who is not familiar with Mandarin could edit these clips, originally in Mandarin, in their native French, since the re-voicing process was run with the original actor’s voice on the French translation. When editing is complete, the editor may relink back to the original clips to finalize the program in the original Mandarin language.  

There are ethical concerns about the use of AI/ML in the re-voicing process which must be understood and addressed before widely exposing capabilities of this sort. The impacts go beyond editing, impacting talent, rights, and fair compensation. These will all be explored in the discussion. 


Presented as part of:

Generative AI for Media


Speakers

Randy Fayan
Sr. Director of Engineering
Avid Technology