Polyfill integration #79

JamesBrill · 2020-12-17T13:26:20Z

JamesBrill
Dec 17, 2020
Maintainer

Problem

react-speech-recognition depends on the SpeechRecognition part of the Web Speech API (W3C spec) to collect audio from the microphone and transcribe it. Unfortunately, this is an experimental API that is almost exclusively implemented by Google browsers, which make calls to a Google speech recognition service. For browsers that aren't owned by giant tech companies that can afford to use their own speech recognition services, this isn't an option without money being exchanged behind closed doors. The frustration this has caused the developers of Chromium-based browsers in particular is nicely captured in this post. This limited support has two outcomes:

Web apps that use the Web Speech API (and this library) will have a cool voice-driven experience on Chrome and a less rich experience on other browsers. An example is Duolingo, which apparently only has the voice-based language exercises on Chrome
Web developers and the users of their products are perhaps unknowingly sending their voice data to Google

Solution

Ideally, react-speech-recognition could be utilised on any major browser to enable voice-driven web experiences everywhere and encourage more developers to experiment with this technology. Furthermore, any audio data produced by the users of such experiences should be processed by the owners of the web apps, rather than Google.

One solution is to polyfill SpeechRecognition with implementations that use popular cloud services to perform the audio processing. In other words, fill in that missing feature on browsers that don't support SpeechRecognition. While this means that web developers will need to pay for and deploy their own speech recognition services, it will free them from the existing constraints of the API. At a high level, this will entail the following:

Developer sets up a speech recognition service hosted by, for example, Amazon Web Services (AWS)
There exists a polyfill that is designed to speak to AWS and provide the same transcription functionality that SpeechRecognition does
Developer gives the polyfill some AWS credentials so it can retrieve transcriptions from their hosted service
This polyfill is somehow integrated with react-speech-recognition so that the transcription can be injected into the developer's React app

A starting point

A first pass at an AWS Transcribe polyfill has been created here. This handles all the interactions with the AWS SDK and the WebSockets-based audio streaming, presenting it in a simple API.

Given that AWS is the cloud provider of choice for most developers, it seems reasonable for this polyfill to be the first to be integrated with react-speech-recognition.

How to integrate polyfills with React Speech Recognition?

This is where this discussion comes in. My thoughts are on this are as follows...

Reusing the W3C spec

One decision to make is what interface should be used for communication between this library and a SpeechRecognition polyfill. I'm of the opinion that, if possible, we should utilise the one that's already been established and well documented by Mozilla. react-speech-recognition is already tightly coupled to this API and its own interface reflects this (e.g. the options for configuring the processing of "interim results" and "continuous mode"). If other polyfill authors come along, they will have a well-defined standard by the W3C (albeit a draft that is heavily influenced by Google) to base their implementations on.

If a polyfill (a) implemented the existing spec, (b) patched the implementation into the window object, and (c) had sensible fallbacks or warnings for the parts that were not implemented yet, then the integration with react-speech-recognition could be very simple as the interface would not change, the implementation would be available in the same place, and the two libraries would not need to know anything about each other. In an ideal world, an example usage might look like this:

import '<cloud-provider>-speech-recognition-polyfill`
import SpeechRecognition, { useSpeechRecognition } from `react-speech-recognition`

Configuring the client for the cloud provider

One challenge would be configuring the polyfill with the credentials needed for the given cloud provider. A couple of options come to mind:

To support the "code-free" example above, credentials could be specified in a separate config file. This config could also include other options, including the choice of whether to use the polyfill whenever possible (fallbackOnly: false) or just as a fallback when the native browser implementation is not available (fallbackOnly: true). The logic in the polyfill could be something like:

if getUserMedia not supported:
  return as polyfill not supported

read credentials and fallbackOnly from config

if SpeechRecognition supported and fallbackOnly:
  return as polyfill not needed

init AwsSpeechRecognition using credentials
patch AwsSpeechRecognition into window

Configure the polyfill in code and then pass it into react-speech-recognition as an override for SpeechRecognition. For example:

import AwsSpeechRecognition from 'aws-speech-recognition-polyfill`
import SpeechRecognition, { useSpeechRecognition } from `react-speech-recognition`

const awsSpeechRecognition = new AwsSpeechRecognition({
  credentials: 'foobarbaz',
  region: 'eu-west-1',
  fallbackOnly: false
})
SpeechRecognition.setRecognition(awsSpeechRecognition)

How much of the spec needs to be implemented?

Definitely not the whole thing. react-speech-recognition only uses a subset, which consists of the following:

continuous (property)
lang (property)
interimResults (property)
onresult (property). On the events received, the following properties are used:
- event.resultIndex
- event.results[i].isFinal
- event.results[i][0].transcript
- event.results[i][0].confidence
onend (property)
start (method)
stop (method)
abort (method)

Even amongst these, some could be skipped in a basic polyfill as long as the missing pieces were documented. For example, if the values for continuous and lang were limited, warnings or even errors could be thrown if a user tried to set them to an unsupported value. The concept of an "interim result" could also be optional.

Standardising polyfills across cloud providers

AWS Transcribe is not the only service that could be used for this purpose - there are others (Azure, GCP, maybe IBM?). AWS may not be the best choice for a company that has gone all-in on Azure services, for example. So each of these could have their own polyfill to suit the needs of each consumer.

If the polyfills were based on the W3C spec, this spec could potentially be represented by a stub class with all properties and methods provided with "not implemented" warnings. Then polyfills could extend this class and fill in whatever parts they can and add their own warnings to parts for which they have limited support (e.g. maybe only a limited set of languages can be supported by a polyfill via the lang property).

One suggestion I have is for all these polyfills to eventually live in the same repo and share this stub class. This would help ensure they share a standardised interface and have consistent behaviour. Then all consumers of these polyfills (not just react-speech-recognition) could safely swap one for another whenever they change their cloud provider. This might take the form of a small Lerna monorepo with each polyfill being published as a separate package but sharing the base class under the hood. If that base changed, all the polyfills could be updated simultaneously.

Alternative: Adapters

The polyfill authors may choose to design their own APIs and diverge from the W3C spec. In that case, react-speech-recognition would need to maintain Adapters for each polyfill. While this does give the polyfill authors freedom to create APIs that better suit the features offered by their cloud providers (maybe AWS Transcribe can do things that the SpeechRecognition API has no method for), it does create a few challenges in this repo:

The maintainers of this repo (currently just me) will have to maintain and update Adapters for all supported polyfills
The maintainers of this repo will need at least some idea of what the cloud service for each polyfill can and can't do
We'll need to figure out how to publish these adapters separately to avoid bloating react-speech-recognition with cloud provider SDKs in Webpack bundles

Thinking about this more:

If the Adapters end up being implementations of the W3C spec (which is what react-speech-recognition currently consumes), these could potentially be owned by the polyfill authors. This would make more sense given the polyfill authors would know better than anyone how to write an Adapter for their polyfills
If the polyfill authors want to add functionality on top of the spec, they could just extend the spec in a backwards-compatible way without modifying the existing interface. Then no Adapters would be needed

TL;DR

I'm in favour of this AWS speech recognition polyfill being modified to implement enough of this spec to enable the most basic functionality in react-speech-recognition. I would be happy to collaborate on this.

And of course, I'm interested to hear others' opinions and alternative designs. Polyfill integration is an exciting proposition as it unlocks speech recognition experiences for more of the web and gives developers a means of building these experiences cross-platform.

ceuk · 2020-12-17T15:58:33Z

ceuk
Dec 17, 2020

Hi James,

Great write up.

One decision to make is what interface should be used for communication between this library and a SpeechRecognition polyfill. I'm of the opinion that, if possible, we should utilise the one that's already been established and well documented by Mozilla. react-speech-recognition is already tightly coupled to this API and its own interface reflects this (e.g. the options for configuring the processing of "interim results" and "continuous mode"). If other polyfill authors come along, they will have a well-defined standard by the W3C (albeit a draft that is heavily influenced by Google) to base their implementations on.

I definitely think aligning to the spec is the way to go. My only hesitancy is that we're making the assumption that the spec will ever make it out of draft. I'm glad that Edge has implemented it too now but given that it was likely trivial due to Microsoft having Azure I don't know how much of an indication that is that we will see it supported by, for example, Firefox, Chromium derivatives any time soon.

I don't want to come across as overly negative/pessimistic, but audio in the browser is a notoriously complex and slow-moving area. Case and point: the fact that createScriptProcessor was deprecated over 5 years ago but it's successor is still in working draft phase and with very poor browser support leaving no reliable option for developers.

Given the above, if it turns out there will be a large amount of effort involved in maintaining spec-consistent APIs we may want to look at the cost-benefit assuming others share my low expectations of anything like full support in the next few years at least (and assuming it doesn't change in the process).

For now, I can't think of a better way to standardise so I guess we just cross our fingers :)

If a polyfill (a) implemented the existing spec, (b) patched the implementation into the window object, and (c) had sensible fallbacks or warnings for the parts that were not implemented yet, then the integration with react-speech-recognition could be very simple as the interface would not change, the implementation would be available in the same place, and the two libraries would not need to know anything about each other.

I'm a bit hesitant about patching global objects- I don't think it's too uncommon for polyfills to be exposed via a scoped/different package name (e.g. bluebird promises). Assuming we were spec-compliant, the benefits you're talking about around implementation not needing to change would still pretty-much apply, you'd just import/reference a different class.

The polyfill authors may choose to design their own APIs and diverge from the W3C spec. In that case, react-speech-recognition would need to maintain Adapters for each polyfill. While this does give the polyfill authors freedom to create APIs that better suit the features offered by their cloud providers (maybe AWS Transcribe can do things that the SpeechRecognition API has no method for), it does create a few challenges in this repo:

I think the issue is that I've called my library a polyfill whereas it's more of a general-purpose library. I wanted to create a simple API that makes speech recognition easy to use. In a way, it's essentially the same as this library except not react-based. I think the most sensible thing might be to split my repo up and move the "friendly" API downstream so that we can maintain a general-purpose, spec-compliant polyfill that anyone can use to patch their speech-recognition implementations.

In the same vein: I think it's fair to assume that other polyfills will also attempt to mirror the spec so hopefully adapters won't be required.

Configure the polyfill in code and then pass it into react-speech-recognition as an override for SpeechRecognition. For example:

I think this option works better personally - although I get that it's probably more work from your point of view. I just can't think of any elegant way to implicitly fetch config etc without polluting the global namespace or introducing needless complexity. It's yet another concession in the "make the implementations identical" objective, but again it's only going to be one line at the top of a file probably so hopefully not a huge issue.

In terms of next steps, are you happy for me to go away and refactor my repo to confirm to the parts of the spec listed above? I think it will make it easier to by not just this library but any similar ones that might pop up in the future.

0 replies

JamesBrill · 2020-12-18T13:16:07Z

JamesBrill
Dec 18, 2020
Maintainer Author

I agree that the draft spec can't be completely relied upon. That said, I think it's the best option we have for now - it's a well-documented API out of the box and has the potential to become standard. Though I'm not sure how that could happen until the big cloud providers open up access to their speech recognition services, or perhaps a company like Mozilla builds their own and shares it with other browser vendors.

The W3C group did briefly toy with the idea of using streaming APIs (rather like what you're doing in your polyfill) in SpeechRecognition as well as making the service URI configurable (this was apparently supported in Chrome for a while before being dropped). They seemed to abandon the idea to avoid drastically rewriting the spec (their discussion here). If they ever return to this idea, good chance the spec would change a fair bit.

I think the most sensible thing might be to split my repo up and move the "friendly" API downstream so that we can maintain a general-purpose, spec-compliant polyfill that anyone can use to patch their speech-recognition implementations.

Makes sense to me. I think it's reasonable to support the spec in the polyfill and build more user-friendly APIs around it elsewhere.

I think [configuring the polyfill in code] works better personally - although I get that it's probably more work from your point of view.

Either approach works for me and would present equally simple setup for consumers.

In terms of writing an MVP polyfill that we can test with this library early on, some of the properties could be simplified or constrained:

continuous: I'm not sure which setting for this is easier to implement with AWS Transcribe. Although react-speech-recognition will try to set this to false by default, we could initially mandate that this be set to true when working with the polyfill if continuous listening is the easiest for you to implement
lang: Presumably this just need to be constrained by the list of supported locale values here. It's a shame the language selection is so limited compared to the batch transcription
interimResults: From a skim through the AWS Transcribe docs, it looks like its streaming transcription has a similar concept in "partial results". I don't know if this simplifies things for the first pass, but you could just return "final" results (i.e. where IsPartial is false) and effectively ignore the value of the interimResults property. Likewise, you could hardcode event.results[i].isFinal to true at first. Then fill in the interim/partial results in a later version of the polyfill
event.results[i][0].confidence: This library only checks that this isn't 0 to handle a bug in Android Chrome where duplicate results are emitted with 0 confidence. You could hardcode this to 1 on the first pass

I'll leave it up to you to decide how to handle consumers setting unsupported values for these properties. If the given values are invalid, you could fail gracefully and fallback to sensible supported values, or you could fail loudly and throw an error to force the consumer to provide a supported value. I'm not too fussed either way - any property constraints could be documented in your polyfill's README.

In terms of next steps, are you happy for me to go away and refactor my repo to confirm to the parts of the spec listed above? I think it will make it easier to by not just this library but any similar ones that might pop up in the future.

Yes, that would be great. While you do that, I can make the SpeechRecognition object configurable in this library so your polyfill can be passed in.

0 replies

ceuk · 2020-12-18T13:36:45Z

ceuk
Dec 18, 2020

The W3C group did briefly toy with the idea of using streaming APIs (rather like what you're doing in your polyfill) in SpeechRecognition as well as making the service URI configurable (this was apparently supported in Chrome for a while before being dropped). They seemed to abandon the idea to avoid drastically rewriting the spec (their discussion here). If they ever return to this idea, good chance the spec would change a fair bit.

Really interesting, thanks

I'll leave it up to you to decide how to handle consumers setting unsupported values for these properties. If the given values are invalid, you could fail gracefully and fallback to sensible supported values, or you could fail loudly and throw an error to force the consumer to provide a supported value. I'm not too fussed either way - any property constraints could be documented in your polyfill's README.

Veering towards the former but we'll see

I think realistically it'll be January before I have a first pass ready to share.

It's exciting to be working on this, Gartner predicts a huge proliferation of browser-based voice-enabled apps in 2021 as voice search overtook text search for the first time this year. The web is absolutely not ready for that right now and being part of laying the foundations in preparation for wider adoption is a great place to be. Maybe you'd be up for looking at speech synthesis afterwards too? :)

Anyway I'll get started and let you know on here when I have something to show. cheers.

3 replies

JamesBrill Dec 18, 2020
Maintainer Author

Great! Let me know if you need any further input from me in the meantime.

Maybe you'd be up for looking at speech synthesis afterwards too?

While I'd like to set a solid foundation for speech recognition first, I would certainly be interested in venturing into the synthesis half of the Web Speech API. Before writing a library for it, I'd like to understand the most valuable use cases first. The first that come to mind are:

Screen reader
Audio mode for news articles
Language learning
Some kind of Japanese-style avatar for kids

There is already a decent SpeechSynthesis React library here, including support for an Azure polyfill. Though when I look at that library, I'm not sure it needs to be a React library specifically (unlike speech recognition, there isn't really much to "React" to), aside from the button components it exports. The underlying API seems pretty simple - you just need a speak function that can take some attributes like text to speak, voice, pitch, and polyfill (optional).

However, there may be some interesting opportunities for building accessibility components. For example, a wrapper React component that will read the label and/or content of its child on click or focus. Or a wrapper for an entire React app that could enable "web native" screen reading, including keyboard shortcuts, maybe even a VoiceOver-style rotor. Though I'm not sure how valuable these would be if the user is already using a screen reader.

Alternatively, to support the news article and language learning use cases, the wrapper component could read aloud any text in an arbitrary child component and perhaps provide a customisable play button of some kind to trigger the reading.

I suspect that the general use cases for SpeechSynthesis are already captured by react-say. Perhaps the missing piece there is (a) a good generic SpeechSynthesis library that isn't coupled with React and (b) an AWS "Pollyfill".

ceuk Dec 18, 2020

On the last bit it wouldn't take long to implement a similar-style AWS-based polyfill using AWS Polly (Pollyfil? 😄 ). But as you said, first things first

ceuk Dec 18, 2020

Oh, and in terms of use cases, another would be for chatbots and other interactive services. For example, if you are using STT to provide chat input, you will probably also want to provide TTS output, this ties in with the accessibility point somewhat I suppose

JamesBrill · 2021-02-21T13:32:05Z

JamesBrill
Feb 21, 2021
Maintainer Author

How's the AWS Transcribe polyfill coming along, @ceuk ? I've just made a release with polyfill support - we've currently got one polyfill working (more or less) for Azure Cognitive Services.

6 replies

sushilbansal Mar 1, 2021

Hi @ceuk, @JamesBrill thanks a lot for your efforts. Any update on this please.

JamesBrill May 22, 2021
Maintainer Author

Hi @ceuk do you have any updates on the AWS Transcribe polyfill? Anything I can do to help?

ceuk May 31, 2021

Hi @JamesBrill, sorry for the silence. Contracting at the moment so super busy - here is what is currently on my list:

Which of these do you consider to be blockers for integrating with this library? And are you are of any other blockers?

I'll make some time this week to try and get as many of them fixed as possible. Going to do a bit of dev now as it's a public holiday here in the UK so got a free half day :) Will tick the above off as/when they are done

ceuk May 31, 2021

Continuous option is now supported :)

JamesBrill Jun 4, 2021
Maintainer Author

@ceuk Great to see continuous mode supported! I'm really keen to get an AWS polyfill ready to integrate with this library and that's a step towards getting us there. Here are my thoughts on the potential blockers:

The bug in Safari is probably the only hard blocker for me. "Non-continuous mode" is commonly used and the purpose of these polyfills is to enable speech recognition on non-Chrome browsers, with Apple's browsers representing a significant market share
The lack of support for isomorphic web apps is not a hard blocker, but would be the second thing I'd like to see fixed. Server-side rendering seems to be making a comeback, so I expect more consumers to be using frameworks like Next.js. In theory, this shouldn't be too hard to fix - checking typeof window === 'undefined' and then making everything a no-op
The language support issue isn't so bad - I can include that in the list of polyfill limitations. If AWS does support multiple languages, it would be nice to have though
This isn't related to integration with this library, but it would be useful for the polyfill to have some means of exposing whether the browser supports it or not. Internet Explorer 11 and old versions of other browsers don't support the AudioContext and MediaDevices APIs - it would be good for consumers of the polyfill to be able to fail gracefully on those browsers. That flag will also be useful when being server-side rendered too. Example: on the Speechly polyfill, I put this flag on the speech recognition class as a static property called hasBrowserSupport - you can see usage here and the check itself here
I'm not sure if this is something you can control, but I recall the AWS Transcribe transcription being a bit hit-and-miss. I'd like to compare a demo from AWS with the polyfill to see if Transcribe really isn't that great or if something needs to be configured differently in the polyfill

I'll try to block out some time this weekend to give this another look. Thanks for continuing to update it!

ceuk · 2021-03-02T01:42:58Z

ceuk
Mar 2, 2021

@JamesBrill

Right, apologies for the delay. It's almost at a good first version I think - just need to write some tests. Thought it would be good to get some eyes on it while I do.

The polyfill now behaves more-or-less identically to the native implementation with the main caveat being that you have to import and use the polyfill directly and that when you instantiate it you have to provide some AWS config (region and identity pool ID).

As you advised in your previous post, I've been unable to support some of the properties and have chosen to not support others in this version. You can see the full support table here

Most should be self-explanatory, but the one that confused me was continuous. I tested with the browser APIs and I can't seem to get it to do anything: the transcript always comes back the same and in the same format. Perhaps you can shed some light on that?

In terms of future versions, the best candidates for implementing next are interimResults, event.results[i].isFinal and continuous (assuming I can work out what it does 😅 ).

I've also re-written it in Typescript so we have definitions out of the box now too which is nice.

Anyway, let me know your thoughts.

Cheers

3 replies

JamesBrill Mar 7, 2021
Maintainer Author

Looks like good progress so far. I'll have a go at integrating this with react-speech-recognition to see how well they fit together.

Regarding continuous, this is my understanding of it from using Google's implementation (the Azure polyfill appears to replicate this behaviour):

When false, the microphone will turn on and listen until it gets a "final" result. Then it turns off.
When true, the microphone turns on and stays on until stop or abort is called.

As for how to implement this, it might be worth looking at how the Azure polyfill handles this.

JamesBrill Mar 7, 2021
Maintainer Author

I have some feedback from my first tests integrating this with react-speech-recognition. I tested this in Safari.

The AWS polyfill returns an instance of the AWS recogniser, rather than the class it instantiated. The Azure polyfill generates a class from its config and this is what react-speech-recognition consumes when applying polyfills. It fell over when consuming the instance as it was expecting a constructor. I should have specified this distinction when describing the integration between these libraries previously. I could modify react-speech-recognition to identify whether it's receiving an instance or a class, but it would be preferable to be consistent and require just one or the other. From the Mozilla docs, the general expectation is that the consumer will instantiate their own SpeechRecognition object, so exporting a class rather than an instance is probably the way to go.
When listening "non-continuously", it did transcribe the text as expected. The microphone turned off as expected after the result was emitted. Not sure if this is an AWS Transcribe problem or a configuration problem, but the transcription didn't seem to be as accurate as the ones produced by Google or Azure.
The polyfill only seems to generate one result. When listening on subsequent times, I couldn't get any further results to be emitted.
When specifying a different language (zh-CN for Chinese), I got this console error: Your account isn't authorized to use language zh-CN. Check the language code that you used and try your request again. This seems to be a supported language code according to the AWS docs - do I need an additional IAM role policy to support the transcription of different languages?

ceuk Mar 8, 2021

That looks fairly straightforward on the continuous stuff. I'm kicking myself for not thinking to look at the Azure polyfill you mentioned. Will do that now.

The AWS polyfill returns an instance of the AWS recogniser, rather than the class it instantiated

Do you mean you'd like a function that takes the config and returns a closure that, when called with new returns an instance of AWSRecognizer?

e.g.

import RecognititionPolyfill from "speech-recognition-aws-polyfill";
const SpeechRecognition = RecognititionPolyfill.create({
  IdentityPoolId: "XXX"
  region: "eu-west-2"
});

const recognition = new SpeechRecognition()
// ...

Not sure if this is an AWS Transcribe problem or a configuration problem, but the transcription didn't seem to be as accurate as the ones produced by Google or Azure.

I've noticed the same. I suspected it might have been the default sample rate which is 16khz this is the highest supported by Transcribe Streaming currently but I've just checked Azure and it looks the same there too.

Can't think what else it could be on my end so I'm assuming Transcribe just isn't as good? However, this doesn't sit right with me as I'm assuming it's the same tech that powers Alexa which is significantly better. I'll have a think.

The polyfill only seems to generate one result. When listening on subsequent times, I couldn't get any further results to be emitted.

This sounds like a bug - I'll have a look

When specifying a different language (zh-CN for Chinese), I got this console error:

If you scroll down a little bit on that page you linked you'll see a table below that list. Transcribe and Transcribe Streaming are two distinct services - the latter supports far fewer languages, unfortunately.

Next steps for me:

Change the export format based on your reply to this.
Investigate and fix the bug identified above in Safari
Fix a parsing bug I just found when non-standard characters are returned.
Work out how to simulate voice input for e2e testing
Support continuous

JamesBrill · 2021-03-08T16:54:17Z

JamesBrill
Mar 8, 2021
Maintainer Author

Do you mean you'd like a function that takes the config and returns a closure that, when called with new returns an instance of AWSRecognizer?

Yes, exactly.

If you scroll down a little bit on that page you linked you'll see a table below that list. Transcribe and Transcribe Streaming are two distinct services - the latter supports far fewer languages, unfortunately.

Aha, totally missed that. That's a shame.

Work out how to simulate voice input for e2e testing

Off the top of my head, a couple of ways I'd try doing this:

Record some input files. Then mock the API that gets input from the microphone (this?) so that it streams audio from the input files instead
Mock the responses from AWS Transcribe (not so much an E2E test, but will be useful for writing unit tests)

2 replies

ceuk Mar 10, 2021

0.1.9 has a create method which works as above, working on the other items now

JamesBrill Mar 20, 2021
Maintainer Author

Good stuff. One suggestion that comes to mind is to give the option to not fallback to the default browser implementation. Thinking from a consumer's point of view, it may be desirable to have a consistent experience on all browsers, including Chrome. There may also be features that AWS provides that Google's implementation does not have.

ceuk · 2021-06-04T18:52:44Z

ceuk
Jun 4, 2021

Hi James, That all makes sense. I already have an [isSupported](https://github.com/ceuk/speech-recognition-aws-polyfill/blob/master/src/recognizers/aws.ts#L24) prop but happy to rename. The language and isomorphism stuff should both be pretty trivial so may as well stick them in the next version in addition to the Safari issue. With regards to the quality of the recognition I might have messed around with the transcoding stuff since you last tested. I did a comparison again when I was doing the continuous stuff and it seems a lot better now so might be worth seeing what you think. I'll try get all the above done as soon as I can but probably won't be for a couple of weeks. Just need a day free on the weekend at some point and should be able to crack through it. I'll let you know when it's ready anyway

6 replies

ceuk Jun 11, 2021

Oh man what a shoddy bunch of code that last push was 😅 especially the create method (I'll give that a bit more love).

On the continuous thing, it should work, I'm just an idiot 😂 (I still need to add isFinal too)

Overall nothing massive there, the big thing now is the bugs in Sarafi etc

JamesBrill Jun 18, 2021
Maintainer Author

Ah it's okay, we've all made a few commits like that at some point. Just make sure those bits have some test coverage before we recommend this for public use. Keep me posted - it looks like this polyfill is close to being ready. 👍

JamesBrill Jan 30, 2022
Maintainer Author

Hi @ceuk any updates on the AWS polyfill? Anything I can do to help you resolve the issues mentioned above? I've recently had a developer contact me expressing interest in such a polyfill, so the demand is definitely there.

ceuk Jan 31, 2022

Hey, I need to get my head back into this as it's been a while. but I would like to try and get this thing to a v1 yeah. Will see what I can get done this week in the evenings

JamesBrill Jun 13, 2022
Maintainer Author

@ceuk I've made a little PR on your polyfill to help grease the wheels a bit: ceuk/speech-recognition-aws-polyfill#8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polyfill integration #79

{{title}}

Replies: 7 comments 20 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Polyfill integration #79

JamesBrill Dec 17, 2020 Maintainer

Problem

Solution

A starting point

How to integrate polyfills with React Speech Recognition?

Reusing the W3C spec

Configuring the client for the cloud provider

How much of the spec needs to be implemented?

Standardising polyfills across cloud providers

Alternative: Adapters

TL;DR

Replies: 7 comments · 20 replies

JamesBrill Dec 18, 2020 Maintainer Author

JamesBrill Dec 18, 2020 Maintainer Author

JamesBrill Feb 21, 2021 Maintainer Author

JamesBrill May 22, 2021 Maintainer Author

JamesBrill Jun 4, 2021 Maintainer Author

JamesBrill Mar 7, 2021 Maintainer Author

JamesBrill Mar 7, 2021 Maintainer Author

JamesBrill Mar 8, 2021 Maintainer Author

JamesBrill Mar 20, 2021 Maintainer Author

JamesBrill Jun 18, 2021 Maintainer Author

JamesBrill Jan 30, 2022 Maintainer Author

JamesBrill Jun 13, 2022 Maintainer Author

JamesBrill
Dec 17, 2020
Maintainer

Replies: 7 comments 20 replies

JamesBrill
Dec 18, 2020
Maintainer Author

JamesBrill Dec 18, 2020
Maintainer Author

JamesBrill
Feb 21, 2021
Maintainer Author

JamesBrill May 22, 2021
Maintainer Author

JamesBrill Jun 4, 2021
Maintainer Author

JamesBrill Mar 7, 2021
Maintainer Author

JamesBrill Mar 7, 2021
Maintainer Author

JamesBrill
Mar 8, 2021
Maintainer Author

JamesBrill Mar 20, 2021
Maintainer Author

JamesBrill Jun 18, 2021
Maintainer Author

JamesBrill Jan 30, 2022
Maintainer Author

JamesBrill Jun 13, 2022
Maintainer Author