Skip to main content

Voice search

The widget offers voice search as an alternative to typing. Transcription happens in the user's browser via the Web Speech API (webkitSpeechRecognition / SpeechRecognition); the widget then sends the resulting text to /api/widget/search. The component is VoiceSearchModal.

How it works

  1. The user clicks the "voice" button next to the input. If features.voiceSearch = false or the browser does not support the Web Speech API, the button is not rendered (widget/src/core/voiceSearch.ts:96-99 checks 'SpeechRecognition' in window).
  2. The modal opens and the browser requests microphone permission (first time).
  3. The browser transcribes in real time. The modal applies:
    • MAX_RECORDING_SECONDS = 60 — cuts off after one minute.
    • Silence detection: 1.5 s of silence after at least 2 s recorded → cut and process (VoiceSearchModal.tsx:32-35).
  4. When transcription ends, the widget calls POST /api/widget/search with query = transcript (useModalSearchHandlers.handleVoiceSearch:120-156). No audio is uploaded: STT lives entirely on the client.

Design decision: the Web Speech API gives us zero network latency for STT and zero server-side cost, at the price of limited support (Chromium-based + Safari). There is no server-side fallback — if the browser does not support Web Speech API, the feature is hidden.

Browser support

BrowserSupport
Chrome / Edge (≥ 90)yes
Safari (≥ 14.1)yes
iOS Safari (≥ 14.5)yes — explicit permission required
Android Chromeyes
Firefoxno — Firefox does not implement webkitSpeechRecognition or SpeechRecognition. The button is hidden.
Othersgraceful fallback: the button is hidden if the API is missing.

The modal cycles through these states (i18n keys under voice.*):

  • tapToStart — initial prompt.
  • recording — recording and streaming transcription.
  • processing — confirming transcript and dispatching the search.
  • success — transcript ready.
  • Errors: microphonePermissionDenied, microphoneAccessFailed, microphoneBlockedByPolicy, noAudioDetected, noVoiceDetected, audioProcessError, recordingError, browserNotSupported.

Browser permissions

  • getUserMedia() requires a secure context (HTTPS). It works on localhost without HTTPS by exception.
  • If your host applies Permissions-Policy, do not block microphone=*. Recommended: Permissions-Policy: microphone=(self "https://cdn.neuroon.ai").

A server-side endpoint exists (but the widget does not use it)

There is a POST /api/widget/search/audio endpoint (WidgetSearchController.java:185) that accepts multipart/form-data with an audio field. It is available if you want to build a custom client with server-side STT, but the official widget does not use it today.

curl -X POST https://api.neuroon.ai/api/widget/search/audio \
-H "X-Widget-Token: $WIDGET_TOKEN" \
-F "audio=@recording.webm" \
-F "locale=en"

Returns AudioSearchResponseDTO with both the transcript and the results. Useful for server-side integrations or browsers without Web Speech API.

Further reading