Home Blog Careers Contact Us
Esqube Communication Solutions Pvt.Ltd.
About Us Technology Products Investor News/Events Downloads Partners
VoIP
Speech and Audio
Wireless
WiFonet
VENPBX
VID
VID
Home » Technology » Speech and Audio
ViOP
Speech Audio
Speech
Adaptable vocabulary speech recognition
With wide usage of speech recognition in many applications, such as voice dialing, voice command/query, etc., speech recognition systems are required which can perform well with user selected vocabularies, noise conditions and complexity. We have developed a scalable IWR (isolated word recognizer) which can easyly adapt to new vocabulary (new words) and new speakers. The approach accommodates even multi-lingual words and different vocabularies for different speakers.
Speaker Authentication
There is a lot of interest, currently, for person authentication through bio-metrics. One of the non-invasive, non-line of sight, widely accepted measure is a person's voice. We have developed a new scheme of person identification through parametric modeling of, text independent voice signal.We have developed new algorithms called "mixture-PCA" which captures a particular speaker's feature space uniquely, providing authentication even with a short duration utterance of about 1 sec.
Automatic Language Identification
For multi-lingual voice response systems, it is important to recognise the language of the voice signal. (This can be elicited by a short query in the different languages of the system) We have developed a new method of classifying the language identity using the acoustic signal only and not requiring any transcription of a foreign language, unlike other approaches in the literature. The performance of this system is comparable to those in the literature and we have shown its performance of a group of Indian languages also.
Multi-rate Speech coder
We have developed a proprietary (variant of CELP) speech compression algorithm for VoIP applications which can get over network conjestion. The approach is to use a multi-rate coder which will adapt the source bitrate to suit the network condition. This provides a reliable method of obtaining better quality reconstructed speech than simple packet-loss recovery schemes. It is shown that the speech coder at 5 Kb/s with 25% reduced bitrate can provide better speech quality than even a 5% packet loss coder. Similary, with 40% reduced bitrate the quality is better than even 5% packet loss at 4 Kb/s. The PESQ-MOS score of the 5 Kb/s coder is close to 3.5.
Speech/Music Coding
Current signal compression methods, such as MPEG for audio and CELP for speech, work well for the respective signals and not very well for the other. We have developed a new model of parametrically representing both speech and a variety of music signals, such as vocal and many different instruments. The approach is based on a non-stationary signal model, unlike the stationary models used in CELP or sinusoidal coders. We have integrated psychoacoustic masking threshold in the frequency domain with the new parameter estimation, so as to get minimum number of parameters/second representation of the signal. A nice feature of the technique is the trade off between the number of parameters in time domain to that of frequency domain and adapting the same from frame to frame, such that the representation is good for a wide variety of signals.The above new technique can be used for developing new standards for speech/audio distribution over Internet, digital audio broadcasting, time-scale or pitch-scale modification in audio dubbing and also other audio effects.
Very low bitrate speech coding
For secrecy applications (civilian and military) and very low bitrate wireless channels such as underwater communication, we need very low bitrate speech coding, less than 1 Kb/s, even if it is synthetic quality. (This may be useful for speech archival also.) We have developed a very low bitrate speech coding technique, at around 500 b/s, based on a segment vocoding principle. The coder is found to be effective across different language speech, extending what is reported in the literature.
Enhancement of noisy speech
In many practical speech applications, the speech gets corrupted with random noise and it is important to improve the quality/intelligibility of such a speech for both human listening as well as machine processing. This is often the case for recovering the speech of cockpit voice recorders, factory shop speech, automobile noise, etc. We have developed novel method of enhancing noisy speech, even at 0 dB SNR.
^ Top
Audio
Q-Stereo surround sound for the PC based multimedia system
Q-Stereo surround sound is a novel surround sound format for hi-fi audio listening. It has been developed at IISc and has been psycho-acoustically tested to provide a more enjoyable listening experience than several other surround sound formats, such as stereo surround. In particular, it has the capability to convert even a monophonic signal to surround format which provides a virtual immersive audio experience.This solution can go with other hi-fi audio effects, such as equalization, ambience synthesis, etc. Any PC with multi-channel audio output, such as AC97 audio chip, can be enabled with this new feature.
Simulation of multi-channel audio surround format using only two loudspeakers (or headphone)
To reduce the investment on the number of loudspeakers that a user invests in, we can provide pseudo-multichannel format, such as surround sound, Q-stereo surround sound, etc., through DSP techniques.
Enhancement to Audio Steno
Audio Steno has become common application as independent devices or as part of PC application software. Usually, standard speech compression techniques are used, such as Mu-law PCM (.wav files) or ADPCM, which provide 64 kb/s and 32kb/s representation of speech, respectively.Esqube can provide proprietary speech compression software (which does not have further royalty costs, such as in standard coders like G.723, G.728, etc.), which can be scaled to the available memory capacity in a graceful manner. For example, the proprietary switchable rate speech coder can provide bit rates of either 2kb/s, 4 kb/s, 6kb/s or 8kb/s, selected by the user for the specific speech quality requirement; this results in increase in stored speech duration by 8 fold to 32 fold.
Automatic Singer identification of MP3 audio files
Large size music file collection and distribution has resulted in a need for making automated music lists based on singer, genre,instrument, etc. Currently, Esqube has a proprietary and very effective voice identification algorithm/software to identify the singer of a polyphonic music file. There is a large amount of old and current music which is not indexed. Such collection can now be indexed by a user automatically for ease of access and sharing.The above technology is also very useful for real-time display of the singer identity on a MP3 audio/video player.
For more information, please contact hari@esqube and raj@esqube.com.
^ Top
« Back
Technology
ESQUBE is a technology company whose mission is to build products and offer design services based on DSP, wireless and wired communi..
Read More
Wireless
Wireless Technologies of Esqube Communication Solutions Pvt Ltd.
Read More