How to Use Speech to Text for Events: Best Practices

Aug 27, 2021 | Blog, How-to

As virtual spaces and services become more common, so does speech-to-text software. Business professionals, content creators, and countless others use Artificial Intelligences (AI) like Siri, Alexa, Zoom, and more to provide shopping services, automatic captions, automatic translations, and more.

Unfortunately, sometimes these engines have trouble understanding requests from their users. This may be only a minor inconvenience in something like a text message, but during meetings and longer broadcasts it can become a serious problem. Thankfully, speakers can cooperate with AI engines and improve their audience’s experience with a few simple techniques.

Let’s consider how to use speech to text better by keeping three key concepts to keep in mind: Environment, Articulation, and Familiarity.

Environment

Speech to text software relies on clear audio, and can’t differentiate between a speaker’s voice and background noise as well as humans can. In order to distract the AI engine less, consider the following:

  • Turn off music and reduce background noise as much as possible. This goes both for live speakers and virtual meeting hosts; make sure there’s no extra noise coming through from screen shares, videos, or unmuted participants.

In addition, consider these points for virtual events:

  • Use headphones if available. This reduces the risk of computer audio re-entering the microphone and creating a loop.
  • Use a microphone other than the one built into your computer or device. Even ear-bud microphones are often clearer and reduce echo better than built-in laptop, tablet, and phone mics. For ongoing events, consider investing in a desk microphone or dedicated headset.

Articulation

Speech to text engines are very good at recognizing speech, but still have more trouble with mumbling or rapid speech than a human might. Errors in same-language captions range from minor inconveniences to major barriers to understanding. Furthermore, those errors can create disastrous problems if pairing speech to text software with live translation.

There are many great articles on pacing your speech during presentations, and we recommend looking some up and practicing. Even if your meeting is casual, speaking this way will help the speech to text engine (and your audience) understand you best. Consider the following basics (and find a handy printable reference sheet here):

  • Speak slowly. This is difficult even for practiced public speakers, but it is the single best thing you can do to help speech to text understand you.  A simple rule of thumb is, “If it feels like you’re speaking too slowly, it’s probably about right.”
  • Prepare key terms and ideas. Make sure that you know how to pronounce any key terms or names you reference during your content, especially if you plan to incorporate quotes or textual references you did not write yourself.
  • Practice enunciation and clear speech. When words and syllables blur together, speech to text engines have a harder time understanding them. Consider words in your language and area that often get shortened; some common English examples are “gonna” for “going to,” “wanna” for “want to,” and similar phrases. The next set of points can also help with this.

Familiarity

Hands-on practice and familiarity with how to use speech to text software for your specific situation doesn’t take long and incredibly beneficial.

  • Compile a list of any relevant names and highly-specialized terms a few days before the event. Speech to text engines usually have extremely large vocabularies, but they might not know how to correctly spell names, acronyms, or highly-specific industry terms. Luckily, some have options to pre-load key terms. In addition, some have auto-replacement features similar to a customizable Spell Check. You can use these to mark terms the AI engine often gets wrong and supply the correct term for it to use instead.
  • Spend 15-20 minutes speaking into your specific speech to text engine and seeing how accurately it hears you. This will help you identify any speech patterns, phrases, or other nuances that confuse the software. Practice adjusting your speech to see if you can get better results, and take notes on what changes you made to have handy during your event

As you practice, keep in mind that AI will always have some margin of error. Focus on making sure your key ideas come through, rather than each specific word. If you cannot achieve the quality you hope for, share this with your event coordinator. They may be able to coach you further or discuss options for supplementing your presentation, guaranteeing a great audience experience.

Ready for the next step?

Find out how spf.io can help you translate and caption your next event.

Multilingual conference translation basics

Multilingual conference translation basics

Holding a multilingual conference used to require a much larger budget, especially since conference translation efforts did not have access to today’s speech recognition technologies. Organizers had to be willing to take on increasingly complex logistics, such as...

read more