Tuesday, August 27, 2013

Part II: Measures of Central Position

Measure of Central Position (grouped data)

To find a set of quantitative data, it is indeed necessary to define numerical measures that describe essential characteristics of the data. Further, any measure indicating the center of a set of data, arranged in order of magnitude, is the measure of central position or measure of central tendency. The most commonly used measures of central position are the mean, median, and mode.

Mean
Observe the following achievement scores of pupils in mathematics: 18, 19, 20, 21, 22, 23, 24, 25, 26 and 75. If you add all the score divided by the number of pupils, the mean of all items is 27.3. This figure is no longer a representative value since most scores is less than 27.3 except for the pupil that obtained the score of 75. The example gives one property of mean that is "mean is strongly influenced by extreme value."

For grouped data, the mean can be computed using the long method and the coded deviation method (short method).  However, for this discussion we will focus on getting the mean using the coded deviation.

Getting the mean using the coded deviation method, the original observations are converted to coded deviations (d’). Here, you choose for an assumed mean . In choosing for assumed mean, any reasonable value in the distribution will do but generally the highest frequency is taken. We use this equation, 
, whereis the assumed mean,is the sum of the differences of frequency and unit coded deviation, N is the total number of observations, is the class interval. To illustrate this see the example below:

Class Interval
x
f
d’
fd’
118 – 126
122
3
- 3
- 9
127 – 135
131
5
- 2
- 10
136 – 144
140
9
- 1
- 9
145 – 153
149
12
0
0
154 – 162
158
5
+1
+5
163 – 171
167
4
+2
+8
172 – 180
176
2
+3
+6


  N=40

Σfd': -9


Median 

Another measure of central position is Median. Observe the following distribution: 

     a). 2, 3, 8, 10, 16, 17, 18
     b). 2, 3, 8, 10, 16, 17 

What is the mid-value of a and b? If your answer is 10 for a and 9 for b , you are correct. The unit 10  and 9 is the median in the distribution for set a and b. This example brings us the description that median is the middle measurement/item/value in a set of measurement arranged in an increasing or decreasing order. For set a, median is easily identified since the set is odd while in set b which is even, there were two middle values (8 & 10). To get the median for set b, add these two values then divide it by 2. Thus, (8+10)÷2=9.

Moreover, the median is a positional measure. The values of the individual items in the distribution do not affect the median. Example, in this distribution: 2, 4, 5, 6, 7, 15, 37… 6 is the median despite of two deviant values (15 and 37). This means that median is not affected by extreme values. Because of this, median can be considered an appropriate measure if you don’t want extreme values to influence the average.


For grouped data, that is when data are given in frequency distribution form, we first determine in what class interval we can find the N/2th case. This means that we have to ascertain the value which divides the distribution into equal parts. To understand deeply, the table below presents the frequency distribution of 38 scores, where half of the scores (that is N/2=38/2=19) lies above the median (in ascending order of distribution this can be identified below the median with larger values in the interval, in this case it is 25) and half below (in ascending order of distribution this can be identified above the median with smaller values in the interval in this case it is 23).



SCORES
F
<cf
>cf
40-44
45-49
50-54
55-59
1
1
4
7
1
2
6
13
38
37
36
32
60-64
10
23
25
65-69
70-74
75-79
80-84
9
3
1
2
32
35
36
38
15
6
3
2

          N=38



 
If median is taken from above, that is considering the >cf, the N/2 which is equal to 19, it would fall between 60-64. This can be done by counting the frequencies upward from the bottom and finding where the N/2 (19) item is found. In our example, N/2 (19) lies in the interval 60-64, whose boundaries are 59.5 and 64.5, thus the following equation can be used:
where U refers to the upper boundary where the median lies; N/2 is the half of the total number of observations; Fub is the sum of all frequencies above the upper boundary; fm is the frequency of the median class; i is the length of the interval. Thus, the median is:




If we take median from below, we consider the <cf that is counting the frequencies from above to bottom. We observe the same procedure as in the median from above, only we use this equation:

 where L is the lower boundary of the class interval; N/2 is the half of the total number of observations; Flb is the sum of all frequencies below the lower boundary; fm is the frequency of the median class; i is the length of the interval. Thus the median from below is:
Mode

The mode on the other hand is the simplest measure of central position, simplest in a sense that it can be easily identified. In an ungrouped data the item that occurs most often is the mode. Which is the mode in this set of scores: 17, 18, 18, 20, 21? The score often occurring in the set is 18, so 18 is the mode. A distribution with one mode is known to be unimodal while a distribution with two or more modes is said to be multimodal. Below are samples showing unimodes and multi-modes:


Sample 1                  Sample 2                  Sample 3
    21                                      21                           16       
    20                                      21                          14
    19                                       19                          16
    19                                       19                          15
    19                                        17                         14
    17                                        16                         15
    15                                        15                         17

Sample 1 is an example of unimode where 19 is the item occurring most often in the distribution. Sample 2 and 3 are examples of multimodes, where there are more than one item occurring frequently in the distribution. What are the modes in sample 2 and 3? For sample 2 the modes are 21 and 19 while in sample 3, the modes are 16, 15 and 14.
When data are grouped, the mode is defined as the midpoint of the interval containing the largest number of cases. Moreover, the modal value can be also computed if data are grouped, thus we use this equation:
where Lmo refers to lower boundary of the modal class (usually obtained in the class interval with the highest frequency); fmo is the frequency of the modal class; f1 is the frequency above the modal class; f2 is the frequency below the modal class; and i refers to the length of class interval.

Let’s find the mode using same data.
SCORES
F
Class Boundary
40-44
45-49
50-54
55-59
1
1
4
7
39.5-44.5
44.5-49.5
49.5-54.5
54.5-59.5
60-64
10
59.5-64.5
65-69
70-74
75-79
80-84
9
3
1
2
64.5-69.5
69.5-74.5
74.5-79.5
79.5-84.5

          N=38