Corpus Pinyin Distribution Analysis — Tonal
There are five tones in Mandarin Chinese, which are tone 1,2,3,4,5. The first tone (Flat or High Level Tone) is represented by a macron (ˉ) added to the pinyin vowel. The second tone (Rising or High-Rising Tone) is denoted by an acute accent (ˊ). The third tone (Falling-Rising or Low Tone) is marked by a caron/háček (ˇ). It is not the rounded breve (˘), though a breve is sometimes substituted due to font limitations. The fourth tone (Falling or High-Falling Tone) is represented by a grave accent (ˋ). The fifth tone (Neutral Tone) is represented by a normal vowel without any accent mark. In this corpus, number 1 to 5 was used to represent the five tones. The purpose of conducting this analysis is to understand the distribution of five tones in the entire corpus and in order to have better understanding of the tonal error rate made by the beginner learners. The following table shows the tone distribution of this corpus.
Tone |
Occurrence |
Percentage |
1 |
105754 |
22.15% |
2 |
94353 |
19.76% |
3 |
86822 |
18.18% |
4 |
154937 |
32.45% |
5 |
35627 |
7.46% |
Table 13 Tonal Distribution Analysis
From the above table, tone 4 appeared the most frequent whereas tone 5 has the least occurrence. In Mandarin Chinese, there are also not many characters have tone 5.
Reference:
http://en.wikipedia.org/wiki/Pinyin#Tones