Details of Additions/Differences from Sun's JSAPI Specification

All of Sun's 1.0 specification has been implemented.

There are also several additions provided mainly by classes external to the JSAPI specification.


Additions:

Inside javax.speech:

Some javax.speech classes have been slightly modified from their original specification. Strictly, this should not be done, but the modifications were minor and not easily made outside of the javax.speech package. These classes/methods are:
RecognizerAdapter.
recognizerListening added
(for an example, see 
examples.recognition.TestResultListener)
A "recognizerListening" method has been added to the RecognizerAdapter class since it seemed to be missing, and a RECOGNIZER_LISTENING field to the ResultEvent class. A RECOGNIZER_LISTENING event is fired after a result has been processed, so over the course of several speech commands the recognizer will cycle through the RECOGNIZER_LISTENING and RECOGNIZER_PROCESSING states, firing events as it goes. My apologies to Sun if I have missed (what may be a very good) reason why they omitted the RECOGNIZER_LISTENING event. It's use is demonstrated in the examples.TestEngineListener class, which is used in all the recognition examples.
Word Constructor added
(for an example, see 
examples.vocab.Pronunciations)
The constructor Word(String writtenForm, String spokenForm, String[] pronunciations, long categories) has been added. This is a non-JSAPI-spec method so that new words or new pronunciations can be added to the VocabManager. (It was not clear otherwise how to add new words without such a constructor). For details on how to construct a pronunciation from phonetic symbols, see the documentation bundled with Microsoft's Speech API.

In com.cloudgarden packages:

The com.cloudgarden.audio package has been added to provide extra IO capabilities, including integration with the JMF via the getDataSource and getDataSink methods of the CGAudioManager class. Also, the com.cloudgarden.speech package exposes some public classes which are the implementations of some of the javax.speech classes, and provide additional methods, and the com.cloudgarden.speech.userinterface package provides a few additional user-interface components.
 
Control Components The method getControlComponent(int type) has been added to CGEngineProperties to allow all of the Microsoft user-interface windows to be displayed.
Detection of SAPI4/SAPI5 engine In virtually all situations, the engine type (ie SAPI4 or 5) is transparent to the implementation, but when needed the engine type can be discovered and the situation handled accordingly by calling the CGEngineProperties.getSapiVersion method. Differences between SAPI4 and SAPI5 engines are detailed below.
Flexible Audio Input/Output classes and methods The com.cloudgarden.audio package provides numerous classes to enable audio speech data IO with Files, Lines and remote clients. The package allows data to be read from and written to standard file types (WAV, MP3, AIFF, QT, GSM etc) and transmitted across a network in uncompressed and compressed (GSM) formats.

In addition, the getDataSource and getDataSink methods of the CGAudioManager class provide DataSource and DataSink implementations which enable the Java Media Framework to be used to pass audio data to a Recognizer and retrieve it from a Synthesizer. This allows all the benefits of the JMF to be used, such as compressed audio formats.

Feedback on recognition confidence Recognition results are given with values representing the confidence with which they have been recognized by the speech engine, allowing a certain degree of feedback to provided in language-training applications.
Recognized utterances can be saved to WAVE files The AudioClip returned from the FinalResult.getAudio method can be cast into a com.cloudgarden.speech.CGResultAudioClip, whose saveToFile(String fileName) method can be called to save the audio data to a WAVE file.
Guessing of current speaker Only applies to SAPI4 engines - the CGEngineProperties.allowGuessingOfSpeaker can be used to allow the Recognizer to change the current SpeakerProfile based on who it thinks is speaking.
Lip-Sync events for speech synthesis Lip-sync events are now detected from Synthesizers and CGSpeakableEvents broadcast to CGSpeakableListeners, which can then display the current shape of a mouth using the com.cloudgarden.speech.userinterface.Mouth Component.
Graphical User Interface additions Classes in the com.cloudgarden.speech.userinterface and com.cloudgarden.speech packages allow customizable Dialogs to be displayed which list all available engines, profiles and voices and allow the user to test and select them. Here is the SpeechEngineChooser

Lip-sync events are also captured from synthesizers and can be displayed using a com.cloudgarden.speech.userinterface.Mouth which extends the java.awt.Component class so can be superimposed on other Components.

New Rules Two rules, <WILDCARD> and <DICTATION>, have been added from Microsoft's grammar definition, which are used in a grammar in place of any single word. <DICTATION> returns the recognized word while <WILDCARD> returns "...". For an example of their usage, see examples/grammars/helloWorld.gram (used in the examples.recognition.LoadJSGFFromURL example).
Spelling grammar There is a "spelling" grammar as well as a "dictation" grammar - load it by calling Recognizer's getDictationGrammar("spelling") method.
Selecting a voice with the JSML <ENGINE> tag Not really an addition, but the JSML <ENGINE> tag can be used to select a voice, for example <ENGINE ENGID="Gender=Male;Name=Microsoft Mike"> or <ENGINE ENGID="Gender=Female"> (see the examples/testJSML.xml file). Possible values are
  • Age=Child, Teen, Adult or Senior (though all Microsoft voices are Adult)
  • Name=Microsoft Mike, Microsoft Sam or Microsoft Mary (these are the only 3 real voices supplied with the Microsoft SDK)
  • Gender=Male or Female

  • Language=409 (this is for American English - 809 is British English, I'm not sure about the other Language IDs).


Differences between SAPI4 and SAPI5 engines

SpeakerManager - SpeakerProfile methods

Pronunciation of words added using VocabManager.addWord or the <SAYAS> tag.

Certain peculiarities of SAPI4 speech engines and the SAPI5 specification require that a certain amount of care and experimentation be used when using the VocabManager.addWord method or the JSML <SAYAS> tag.

For SAPI4 engines:

For SAPI5 engines:



Synchronization with AWT EventQueue:

By default, AWT synchronization is turned on in this implementation, since this is part of Sun's specification. However, synchronization can be turned off by calling the method com.cloudgarden.speech.CGEngineCentral.setAWTSynchronization(false). This allows the JSAPI to be used from Applets - see this section.

With AWT synchronization turned on, all JSAPI events are synchronized with the AWT EventQueue (the sending of events is inside a block synchronized with the EventQueue).