Sound and music

Measurement of speech intelligibility: subjective methods

What is it?

We talk on the phone; we listen to the presentations and speeches in the concert halls. Some of us try to eavesdrop on other people's conversations, others make a wiretap impossible. However, such situations happened for all the above cases, where background noise prevented to hear somebody’s conversation that words became simply unintelligible. In order to avoid such problems before the operation of sound channels is carried out the acoustic examination of speech communication channels.

Channel or speech communication channel (or voice data transmission channel) – it is the physical environment, the distance that the sound goes from the starting point to ending. It may be the air, electroacoustical, vibrational, parametric, optical-electronic channel, but let us do not consider them, because our goal is measuring the most important quality criterion of sound channel - speech intelligibility.

Measuring methods of speech intelligibility can be summarized in a list:

- Purely subjective method;
- Objectified;
- Tonal;

- Formant:
- - AI (articulation index);
- - SII (speech intelligibility index);

- Modulation:
- - STI (Speech transmission index);
- - RASTI (Rapid STI);
- - STIPA (STI for sound reinforcement systems);
- - STITEL (STI for telecommunication systems);
- %ALcons (Loss assessment of articulation of consonants);

Of course, there are still Soviet methods of Pokrovsky, Bykov, Sapozhkova, the above methods provide the best results.

It is understood that to cover all at once we are not able, therefore let us begin to consider the differences between the objective methods from subjective methods, let us take a look at subjective.

Pure subjectivism

In rating of the speech intelligibility of purely subjective method is involved a pair of announcer and a testing person. It is convenient to consider their work using the example of radio station testing following the recommendations of IRCC (International Radio Consultative Committee). The announcer reads the text on the transmitting side of radio channel, while a testing person rates it quality using a five point scale (or any other scale) on the receiving side. We notice the obvious disadvantage of this approach, namely the inevitable impact on the result of speech peculiarities and ear of testing people.

The solution of this problem is obvious as the problem itself.


The most common objectified method is the articulation. It consists in the fact that before the measurements are created the normal acoustic conditions (noise levels) in the test channel. This process involves a few testing people and the announcer reads specially compiled tables of syllables (articulating tables) instead of plain text. The testing people record the speech and at the end of transmission verify their tables with the announcer's tables. The ratio of correctly heard the syllables to their total number is the rating of speech intelligibility, which is expressed as a percentage or a fraction of one.

Let us note that impact of various factors is averaged when the greater number of dictated syllables is received. Impact of factors is more averaged when the various announcers and testing people are involved in the testing. This is objectified articulation method. However, the reading practice of sound combinations helps to get the objective results that have no meaning, you can think through and recover corrupted element when receiving words or phrase.

With regard to testing people, it is believed that it should be a specially trained team.


• Universality (the method is applicable to any type of sound channel);
• Simplicity (the method does not require from the operators of technical knowledge)


• Inconvenience of measurement procedure (time-consuming, material and human resources);
• Creation of articulation tables (each new type of tables provides the different measurement results);
• Dependence of results on the professional skills of operators;
• Inability to make automation process;
• Human factor (the effect on the result of speech and ear peculiarities)

Objectify. Part 2

Let us consider one more objectified subjective method - tonal, according to which the announcer is replaced by the generator of pure tones. This artificial voice is a conventional speaker without a cone, which generates signals so that the level of sound pressures that is generated at different frequencies could correspond to the spectral curve of formant. The testing people will not disappear. Now their task to determine whether is there audible signal at a given frequency or not.
The frequencies where the measurements are performed
250 500 650 800 990 1125 1300 1500 1700 1875
2050 2225 2425 2725 3100 3500 3850 4550 6150 8600

The level of formant sensation is measured by fading input to the disappearance of audible sound, then the fading should be decreased before the sound will appear. The two values of fading are averaged. This is the result of the measurement.

Formant speech intelligibility is defined by the table below:
1 0.04 10 0.65 19 1.92 28 3.22 37 4.28 46 4.75
2 0.09 11 0.76 20 2.07 29 3.37 38 4.37 47 4.78
3 0.14 12 0.89 21 2.2 30 3.51 39 4.46 48 4.8
4 0.19 13 1.03 22 2.36 31 3.64 40 4.52 49 4.82
5 0.24 14 1.18 23 2.5 32 3.75 41 4.57 50 4.85
6 0.3 15 1.32 24 2.65 33 3.87 42 4.62 51 4.88
7 0.37 16 1.47 25 2.79 34 3.97 43 4.66 52 4.95
8 0.46 17 1.62 26 2.93 35 4.08 44 4.69
9 0.55 18 1.77 27 3.08 36 4.18 45 4.72
dB - sensation level of tone; % - formant speech intelligibility

Total formant speech intelligibility is defined as the sum of:


In order to complete the measurement of speech intelligibility is sufficient to determine the syllable intelligibility:
5 5 25 46.2 45 75 65 90 85 98
10 15 30 55 50 80 70 92.5 90 99
15 26 35 62.555 81 75 95.2 95 99.5
36 40 69 60 87.280 96.2100 100
A - formant speech intelligibility; S - syllable speech intelligibility


•It does not require a team of announcers;
• It significantly reduces the time of measurement;
• There is no need for articulation tables


• Raise the requirements for technical education of measuring staff;
• Inability to make automation process;
• Human factor

But what are...

... the differences between the objective from the subjective methods? It is all about the human factor, or rather in its absence as well as for measurements using an artificial voice, mouth and ear.

Let us consider the simplest objective method.

First of all, on the receiving end of the testing sound channel is created noise level that corresponds to work conditions. Next, it is measure the noise level at the output of the artificial ear in a critical band of hearing, at that an average frequency of this band equals to the frequency of the measuring tone. This noise level should be recorded, we still will need it. After that, instead of noise on the input of sound channel feed a tone signal. The intensity level of sound on the microphone is set in that way so at conditional zero on the fading adjuster the distribution of sound pressures correspond to the spectral curve of formant. Further, the fading adjuster sets the level of tone signal at the outlet sound channel was equal to the noise level. Readout of the fading adjuster is the result of measurement.

In order to determine the formant and syllable intelligibility are used the same methods as in the tonal method.


• Accuracy and speed;
• There is no need for the announcers and testing people;
• There is possibility to make the fully automate measurement procedure


• Raise the requirements for technical education of measuring staff

The End

Thank you for your attention!
ZimerMan 17 october 2011, 15:45
Vote for this post
Bring it to the Main Page


Leave a Reply

Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute