June 23 – 27 2014

Corpus Pinyin Distribution Analysis — Phonetic (Phoneme)

Phoneme is the basic unit of a syllable in Hanyu Pinyin. Compared to the work done by last week, analyze the distribution of phoneme in the entire corpus is more challenging. This is because a single syllable might consist of different phonemes. For example, “ca” consists of phoneme “tsh” and phoneme “a”. By conducting this analysis, it is possible to understand if the distribution of phonetic in the entire corpus is similar to the statistic of frequent phoneme appears in modern daily Chinese conversation. The following table shows the analysis result.

Phonemes Initial/Final Percentage (Corpus) Frequency (Resource) Percentage (Resource) Difference
N ang,ong,iong,ing,eng,iang,uang,yang,yong 8.16% 621 6.38% 0.02
a ang,ai,uan,ao,an,uai,ia,a,iao,ian,van,iang,uang,ua,yang,yuan,yao,yan,ya,yvan 17.89% 1279 13.13% 0.05
f f, 1.36% 119 1.22% 0.00
i ei,ai,iu,uai,iong,vn,in,ia,ing,ie,iao,ian,iang,i,ui,ya, yan, yang, yao, ye, yi, yin, ying, yong, you 24.45% 1422 14.60% 0.10
k g, 2.45% 141 1.45% 0.01
kh k, 1.09% 93 0.95% 0.00
l l, 3.03% 223 2.29% 0.01
m m, 2.10% 143 1.47% 0.01
n n,uan,an,vn,in,ian,van,un,yvn,yan,yin 8.41% 800 8.21% 0.00
p b, 2.53% 159 1.63% 0.01
ph p, 0.76% 118 1.21% 0.00
r r,er 1.41% 58 0.60% 0.01
s s, 0.71% 305 3.13% 0.02
t d, 4.22% 165 1.69% 0.03
th t 2.17% 144 1.48% 0.01
ts j,z 5.17% 351 3.60% 0.02
tsh c,q 2.47% 223 2.29% 0.00
u uan,iu,ong,ao,uai,iong,iao,uo,un,u,ui,ou,uang,ua,w,yong,yao 17.02% 1339 13.75% 0.03
x h,s 3.13% 168 1.73% 0.01
y van,v,yv, yvan, yve 2.46% 187 1.92% 0.01
§ sh, 3.58% 189 1.94% 0.02
« ei,ve,iu,en,e,ing,ie,er,eng,o,uo,un,ui,ou,yve,ye,ying 18.29% 1130 11.60% 0.07
ÿ§ zh, 2.68% 218 2.24% 0.00
ÿ§h ch, 1.54% 144 1.48% 0.00

The above table shows that the distribution of phonemes in our corpus is similar to the statistics of frequent phoneme appears in modern Chinese. Therefore, this corpus is said to be useful in assisting the beginner learners in learning Mandarin Chinese.

Resources:

http://lingua.mtsu.edu/chinese-computing/phonology/phoneme3500.php

http://lingua.mtsu.edu/chinese-computing/phonology2004/py2phoneme.php