I just wanted to add a link to Qualcomm’s latest HD Voice Video. It has some good description of the voice improvements going into their chipsets. This is probably a strong reason not to jump off their platform but there are other ways to create these same improvements.
HD Voice is starting to get some attention with the recent launches by Orange and Sprint. As the hypeometer’s needle climbs, there will be a lot of attention focused in this area. I just wanted to put a few facts out there to keep it all straight. These operators have actually different technologies behind their HD Voice launches that eventually merge at VoLTE. I saw some silliness about the HD Voice launches in AnandTech and other places so let’s get started…
First a brief history of the universe, starting with current voice technologies used with 3G networks.
Narrowband voice coding has been used in digital cellular systems since the beginning. Today’s smartphones typically employ EVRC for CDMA2000/3GPP2 based networks with a fraction of those employing the more advanced EVRC-B algorithm and AMR for UMTS/3GPP networks. EVRC and AMR are CODECs to transform voice into digitized speech using low amounts of bandwidth/throughput with a primary technique being limiting the input frequency ranges.
The measurement of voice is based on sampling a population of listeners that rate the quality of the spoken sentences after coding and decoding by an algorithm. Listeners are asked to (subjectively) rate the recordings they heard vs a reference standard. The reference standards are like (A) direct recording of voices or (B) Pulse Code Modulation (PCM) at 64Kbps known in standards as G.711. Here is an example of the rating questions:
This is an experiment to determine the perceived quality of speech over the telephone. You will be listening to a number of recorded speech samples, spoken by several different talkers, and you will be rating how good you think they sound.
Use the single headphone on the ear you normally use for the telephone. On each trial a two- sentence sample will be played. After you have listened to the sample, determine the category from the list below which best describes the overall quality of the sample. Press the numeric key on your keyboard corresponding to your rating for how good or bad that particular passage sounded.
Select the category which best describes the sample you just heard for purposes of everyday speech communication.
The OVERALL SPEECH SAMPLE was:
5 – EXCELLENT
4 – GOOD
3 - FAIR
2 – POOR
1 – BAD
EVRC compresses each 20 milliseconds of (300-3200 Hz), 16-bit sampled speech input into output frames of one of three different sizes: full rate of 171 bits (8.55 kbit/s), half rate of 80 bits (4.0 kbit/s), eighth rate of 16 bits (0.8 kbit/s). EVRC has a peak bitrate of 8.5Kbps, a minimum of 0.8Kbps and ‘conversational’ planning rate of 6Kbps.
3GPP2 EVRC Standards: 3GPP2 C.S0014-D
The AMR (Adaptive Multi-Rate) codec encodes narrowband (200-3400 Hz) signals for each 20 milliseconds of 8000 Hz at variable bit rates ranging from 4.75 to 12.2 kbps with toll quality speech starting at 7.4 kbps. AMR has a peak bitrate of 12.2Kbps, minimum of 4.75Kbps, ‘typical’ conversational rate of 4Kbps.
3GPP AMR Standard: TS 26.071
The goal of these narrowband VOCODERs is to reduce bandwidth during a conversation while delivering acceptable call quality. You will achieve near ideal speech quality but not full lifelike sound in perfect network conditions.
If you are reading this then likely you have first hand experience with the voice coders used in 3G networks. Moving forward …
Qualcomm (the main commercial influence for EVRC) has developed a more advanced (newer) line of CODECs they call 4GV which include EVRC-B and EVRC-WB (wide band.) Alternatively, there is a small consortium of companies that drive patents for AMR including Voice Age, Nokia, Ericsson, and France Telecom, and they have evolved their narrowband AMR with AMR-WB (you guessed it, wide band.) Lastly, there is SiLK, propelled by Skype.
EVRC-WB is based on a split band coding paradigm in which two different coding models are used for the signal by independently sampling the low frequency (LF) (0-4 KHz) and the high frequency (HF) (3.5-7 KHz) bands.
MOS: 3.24( Street Noise, 15 dB SNR )
AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz.
- Configuration A (Config-WB-Code 0): 6.6, 8.85, and 12.65 kbit/s (Mandatory multi-rate configuration)
- Configuration B (Config-WB-Code 2): 6.6, 8.85, 12.65, and 15.85 kbit/s
- Configuration C (Config-WB-Code 4): 6.6, 8.85, 12.65, and 23.85 kbit/s
SILK negotiates one of four modes during call setup: Narrowband (NB): 8 kHz sampling rate o Mediumband (MB): 8 or 12 kHz sampling rate. Wideband (WB): 8, 12 or 16 kHz sampling rate. Super Wideband (SWB): 8, 12, 16 or 24 kHz sampling rate. The purpose of these modes is to allow the decoder to limit the highest sampling rate used by the encoder.
MOS: 3.22 ( Office Noise, 15 dB SNR )
Nokia paper comparing Silk and AMR-WB. (Note they are a patent holder for AMR-WB and the paper does slant that way.)
HD Voice is a broad term marketed by operators that seems to refer to the voice coding, more specifically the use of the wide band CODERs like AMR-WB and EVRC-WB. Therefore, under typical conditions, the additional bandwidth used will provide a more lifelike sound between the caller/called.
Orange in the U.K. began marketing HD Voice in September of 2010. They have a 3GPP based UMTS network thus they are using the AMR-WB vocoder. They have 7 handsets on their website as supporting the AMR-WB vocoder.
Sprint recently announced the launch of HD Voice with their launch of HTC EVO 4G LTE. Apparently they are using Transcoder Free Operation (TrFO) to support this feature. The basics of this are the 2 end points (Caller and Called) must have the EVRC-WB supported to be able to enjoy the additional sound quality. (It also means the network accepts Service Option 73 requests…)
3G phones have the VOCODERs built into the device and they only work with the connected 3G network infrastructure for voice calling. VoLTE uses an IP Multimedia System (IMS) architecture, that essentially is an application that runs over the LTE channel. The devices (UE) have an IMS client that uses Session Initiation Protocol (SIP) signaling to place calls. The IMS is functionally equivalent to their 3G counterparts but slightly more flexible as you can have various architectures such as distributed, localized, centralized etc… Some interesting flexibility exists in the IMS client, as it is possible for the IMS client to have variable VOCODERs and the IMS has a flexible architecture that will allow support for various VOCODERS. This probably means you can upgrade/downgrade to/from HD voice while mobile, and operators will likely support (free/incremental cost) wide band coding when on high rate connections such as WiFi, femotcells etc.. This makes life more interesting.
On the flip side, the only official VOCODER supported with 3GPP for LTE networks right now is AMR. Some of you need to push SILK and EVRC-B into the 3GPP standards. Mobile calling could be so much more interesting than it is today.
OK, that was a huge wind up for a little paragraph. The point is HD Voice is available on a few operators over 3G today and likely available almost everywhere with VoLTE using mostly wide band VOCODERs that provide higher MOS scores but also use slightly more bandwidth than 3G voice calls. It will be interesting to see how OTT providers like Skype fit in as they can easily integrate into the IMS/3GPP/VoLTE architecture and may have more to offer in some cases.