Week 3 (January 27 – 31, 2014) | Computer-Assisted Language Learning

Research and Perform Cohen’s Kappa Test

In addition to measure PPMC, Cohen’s Kappa is another important statistic to assess inter-rater reliability in this project. The computation of Cohen’s Kappa is shown in Figure 3 ^[1].

where n = number of subjects, n_a = number of agreements and n_ε = number of agreements due to chance.

Figure 3 Cohen’s Kappa Statistic Computation

The guideline of interpreting Cohen’s Kappa statistic is shown in Table 7 ^[2].

Table 7 Cohen’s Kappa Statistic Guideline

Cohen’s Kappa Statistic	Agreement Interpretation
< 0	Less than chance agreement
0.01 to 0.2	Slight agreement
0.21 to 0.40	Fair agreement
0.41 to 0.6	Moderate agreement
0.61 to 0.80	Substantial agreement
0.81 to 0.99	Almost perfect agreement

In this project, a Python script was written to compute Cohen’s Kappa based on the formula shown in Figure 3.

The Cohen’s Kappa statistics of 5 sets of utterances are shown in table 8,9,10,11 and 12 respectively.

Table 8 Cohen’s Kappa Statistic of Set 1

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.81	0.71
Rater 2	0.81	–	0.73
Rater 3	0.71	0.73	–

Table 9 Cohen’s Kappa Statistic of Set 2

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.74	0.66
Rater 2	0.74	–	0.56
Rater 3	0.66	0.56	–

Table 10 Cohen’s Kappa Statistic of Set 3

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.71	0.58
Rater 2	0.71	–	0.63
Rater 3	0.58	0.63	–

Table 11 Cohen’s Kappa Statistic of Set 4

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.83	0.62
Rater 2	0.83	–	0.57
Rater 3	0.62	0.57	–

Table 12 Cohen’s Kappa Statistic of Set 5

	Rater 1	Rater 2	Rater 3
Rater 1	–	0.64	0.67
Rater 2	0.64	–	0.56
Rater 3	0.67	0.56	–

According to the guideline, the Cohen’s Kappa statistic of 5 sets utterances shows that the agreement of fluency score rating between each two raters is moderate to substantial.

Reference:

[1] http://www.real-statistics.com/reliability/cohens-kappa/

[2]http://www1.cs.columbia.edu/~julia/courses/CS6998/Interrater_agreement.Kappa_statistic.pdf