Week 2 (January 20 – 24, 2014) | Computer-Assisted Language Learning

Research and Perform Correlation Test

In this project, human raters evaluated all the utterances. Although the scoring criteria is made known to the evaluators before they actually conduct the test, there might still be possibilities that the grading could be inconsistent since the perception of pronunciation correctness is subjective. Hence, it is important to perform the correlation test periodically to ensure the consistency of the score ratings.

In statistics, inter-rater reliability is defined as a measure of reliability used to assess the degree to which different judges or raters agree in their assessment decisions ^[1]. It is especially useful in this project because the judgment of fluency score can be considered relatively subjective. Therefore, human evaluators will not necessarily interpret the answer the same way.

Pearson Product Moment Correlation (PPMC) is applied in determining the how well the fluency score between two raters are related. It shows the linear relationship between two sets of data. The computation of PPMC ^[3] is shown in Figure 2.

Figure 2 PPMC Computation Formula

The guideline for interpreting PPMC value is shown in table 1^[2].

Table 1 PPMC Guideline

Correlations Value	Interpretation
0.70 or higher	Very strong positive relationship
0.40 to 0.69	Strong positive relationship
0.30 to 0.39	Moderate positive relationship
0.20 to 0.29	Weak positive relationship
0.01 to 0.19	No or negligible relationship
-0.01 to -0.19	No or negligible relationship
-0.20 to -0.29	Weak negative relationship
-0.30 to -0.39	Moderate negative relationship
-0.40 to -0.69	Strong negative relationship
-0.70 or higher	Very strong negative relationship

In this project, three raters evaluated 5 different sets of utterances. Each set consists of 100 unique utterances without any noisy and unreliable data entry. A Python script was written to compute PPMC based on the formula shown in Figure 2.

The PPMC result of these 5 sets of utterances is shown in table 2, 3, 4, 5 and 6 respectively.

Table 2 PPMC Result of Set 1 Consistency Test

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.86	0.842
Rater 2	0.86	–	0.877
Rater 3	0.842	0.877	–

Table 3 PPMC Result of Set 2 Consistency Test

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.872	0.858
Rater 2	0.872	–	0.811
Rater 3	0.858	0.811	–

Table 4 PPMC Result of Set 3 Consistency Test

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.831	0.822
Rater 2	0.831	–	0.838
Rater 3	0.822	0.838	–

Table 5 PPMC Result of Set 4 Consistency Test

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.884	0.866
Rater 2	0.884	–	0.82
Rater 3	0.866	0.82	–

Table 6 PPMC Result of Set 5 Consistency Test

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.821	0.865
Rater 2	0.821	–	0.826
Rater 3	0.865	0.826	–

The PPMC results of all 5 sets of utterances show that the fluency score between each of 2 raters have strong positive relationship. Thus, it can conclude that the fluency rating among 3 raters were consistent.

Reference:

[1] http://www.uni.edu/chfasoa/reliabilityandvalidity.htm

[2] http://faculty.quinnipiac.edu/libarts/polsci/statistics.html

[3] http://office.microsoft.com/en-001/excel-help/correl-HP005209023.aspx