Sunday, 24 July 2022

Adventures in Speech Recognition

Let me have a try at dictating my blog directly from my phone - I have never done this before, and I want to see if this technology is going to be useful... 

But first, I have to install Google's Gboard keyboard on my Samsung phone. Should have done this a long time ago - Gboard can handle spellcheck in two languages at the same time, something that Samsung's Android keyboard seems reluctant to do. I open the phone, tap the microphone icon in the Google bar, and speak the magic worlds "Open Blogger". This opens the Chrome browser showing me my Blogger accounts; I have to tap on the one I wish to open; it opens - then I place the cursor in the text box, and tap the microphone icon on Gboard. It works first time.

I must say that the ability to dictate straight to the phone has the potential of being a life-changing technology that boosts productivity and assists creativity [although obviously editing is needed - the software doesn't punctuate.]

And so - off we go... the following paragraph is taken directly from words dictated to my phone.

"I normally set off for my walks with a notepad and pen in my pockets in case I have any thoughts that I want to jot down [manually inserted semi-colon] now I can just open my phone and just start dictating directly into it [comma] and it captures my speech with a great degree of accuracy [full stop new paragraph.] I am just passing now a small farm on the left and in a ditch from the undergrowth the smell of carrion [dash inserted] a dead mammal might be a hair [correction - hare] or it might be a cat or it might be someone's dog that they buried there [dash] the smell is unmistakable [full stop] Anyway I'm heading down towards the railway line for today's walk [full stop] It's Sunday [comma] overcast [comma] but generally the weather is okay [full stop] The wonderful thing about this speech-to-text [hyphens added] software it is nearly effortless compared to typing [comma] especially typing on the tiny phone keyboard [comma] so I'm looking ahead [comma] I think that I'll be doing a lot of this in the future [full stop] It seems to be working very well punctuation is a little bit so of Hawaii [can't remember what I said, and I can't unscramble what 'so of Hawaii' have possibly been] it doesn't seem to be able to put in full stops or commas where needed [full stop, end of trial.]"

Well, there we go - it's come a long way since my first attempts - over 25 years ago, it must be said, back in the UK. Optical character recognition had taken off very quickly - when I installed desktop publishing in our editorial office in 1990, we had a scanner that would reproduce with 99% accuracy type-written texts (this was pre-word processing, pre-email, so contributions would come typed on A4 paper). Rarely did anyone need to check with original because an obvious mistake made by the OCR software couldn't be worked out. However, my first attempts with speech-to-text were only about 60-70% accurate, mainly because the software had been trained on US English accents, not on my upper-lower-middle-class Estuarine English accent. But - a quarter of a century later - with the exception of 'hare/hair' and 'so of Hawaii', and of course the lack of punctuation, it passes the test.

Speech recognition can only improve with time as AI gets smarter, having worked over more data. Placing punctuation where needed requires an understanding of context, often, the placing of a punctuation mark will be retrospective - once the full sentence has reached the end. Will speech-recognition software be able to go back into a sentence it's just typed out to insert the necessary comma, semi-colon, dash, or question mark where needed? Working on a vast corpus of written text, AI neural networks will slowly make their first attempts at punctuating dictated speech. All those proponents of an imminent AI singularity - the moment when AI overtakes human intelligence - should watch this space. If AI is ever to be considered 'sentient', the least it can do is to learn how to punctuate.

But having tried this I can be sure that there will be more dictated blog posts coming soon. What that means for quality and quantity of blogging remains to be seen!

This time two years ago:
A Short Pilgrimage to Bid Farewell to the Day

This time six years ago:
Thoughts, trains set in motion

This time eight years ago:

This time nine years ago:
Up that old, familiar mountain

This time ten years ago
More from Penrhos

No comments: