Week 2 (January 20 – 24, 2014)

 Research and Perform Correlation Test

In this project, human raters evaluated all the utterances. Although the scoring criteria is made known to the evaluators before they actually conduct the test, there might still be possibilities that the grading could be inconsistent since the perception of pronunciation correctness is subjective. Hence, it is important to perform the correlation test periodically to ensure the consistency of the score ratings.

In statistics, inter-rater reliability is defined as a measure of reliability used to assess the degree to which different judges or raters agree in their assessment decisions [1]. It is especially useful in this project because the judgment of fluency score can be considered relatively subjective. Therefore, human evaluators will not necessarily interpret the answer the same way.

Pearson Product Moment Correlation (PPMC) is applied in determining the how well the fluency score between two raters are related. It shows the linear relationship between two sets of data. The computation of PPMC [3] is shown in Figure 2.

Figure 2 PPMC Computation Formula

 

The guideline for interpreting PPMC value is shown in table 1[2].

Table 1 PPMC Guideline

Correlations Value

Interpretation

0.70 or higher

Very strong positive relationship

0.40 to 0.69

Strong positive relationship

0.30 to 0.39

Moderate positive relationship

0.20 to 0.29

Weak positive relationship

0.01 to 0.19

No or negligible relationship

-0.01 to -0.19

No or negligible relationship

-0.20 to -0.29

Weak negative relationship

-0.30 to -0.39

Moderate negative relationship

-0.40 to -0.69

Strong negative relationship

-0.70 or higher

Very strong negative relationship

 

In this project, three raters evaluated 5 different sets of utterances. Each set consists of 100 unique utterances without any noisy and unreliable data entry. A Python script was written to compute PPMC based on the formula shown in Figure 2.

 

The PPMC result of these 5 sets of utterances is shown in table 2, 3, 4, 5 and 6 respectively.

 

Table 2 PPMC Result of Set 1 Consistency Test

Rater 1

Rater 2

Rater 3

Rater 1

0.86

0.842

Rater 2

0.86

0.877

Rater 3

0.842

0.877

 

Table 3 PPMC Result of Set 2 Consistency Test

Rater 1

Rater 2

Rater 3

Rater 1

0.872

0.858

Rater 2

0.872

0.811

Rater 3

0.858

0.811

 

Table 4 PPMC Result of Set 3 Consistency Test

Rater 1

Rater 2

Rater 3

Rater 1

0.831

0.822

Rater 2

0.831

0.838

Rater 3

0.822

0.838

 

Table 5 PPMC Result of Set 4 Consistency Test

Rater 1

Rater 2

Rater 3

Rater 1

0.884

0.866

Rater 2

0.884

0.82

Rater 3

0.866

0.82

 

Table 6 PPMC Result of Set 5 Consistency Test

Rater 1

Rater 2

Rater 3

Rater 1

0.821

0.865

Rater 2

0.821

0.826

Rater 3

0.865

0.826

 

The PPMC results of all 5 sets of utterances show that the fluency score between each of 2 raters have strong positive relationship. Thus, it can conclude that the fluency rating among 3 raters were consistent.

 

Reference:

[1] http://www.uni.edu/chfasoa/reliabilityandvalidity.htm

[2] http://faculty.quinnipiac.edu/libarts/polsci/statistics.html

[3] http://office.microsoft.com/en-001/excel-help/correl-HP005209023.aspx