May 26 – 30 2014

Corpus Pinyin Distribution Analysis — Tonal

There are five tones in Mandarin Chinese, which are tone 1,2,3,4,5. The first tone (Flat or High Level Tone) is represented by a macron (ˉ) added to the pinyin vowel. The second tone (Rising or High-Rising Tone) is denoted by an acute accent (ˊ). The third tone (Falling-Rising or Low Tone) is marked by a caron/háček (ˇ). It is not the rounded breve (˘), though a breve is sometimes substituted due to font limitations. The fourth tone (Falling or High-Falling Tone) is represented by a grave accent (ˋ). The fifth tone (Neutral Tone) is represented by a normal vowel without any accent mark. In this corpus, number 1 to 5 was used to represent the five tones. The purpose of conducting this analysis is to understand the distribution of five tones in the entire corpus and in order to have better understanding of the tonal error rate made by the beginner learners. The following table shows the tone distribution of this corpus.

Tone

Occurrence

Percentage

1

105754

22.15%

2

94353

19.76%

3

86822

18.18%

4

154937

32.45%

5

35627

7.46%

Table 13 Tonal Distribution Analysis

From the above table, tone 4 appeared the most frequent whereas tone 5 has the least occurrence. In Mandarin Chinese, there are also not many characters have tone 5.

Reference:

http://en.wikipedia.org/wiki/Pinyin#Tones