Sunday, September 1, 2013

Part I: FDT, Measures of Central Position (Mean) and Measures of Variability (SD)

Frequency Distribution Table

There are many ways of summarizing the data you gathered one, in particular, is the Frequency Distribution Table (FDT). The frequency distribution is a useful summary of most kinds of data. It sorts observations into categories and describes how often observations fall into each category. In simple terms, frequency distribution refers to the tabular arrangement of data by classes or categories together with their corresponding class frequencies. Data presented in the form of a frequency distribution are called grouped data.



In constructing a frequency distribution table, considerations must be given to the number of classes (class size) to be used and the class intervals to be employed. Below is a presentation of a technique in constructing a frequency distribution. 

Step 1: Get the range (R) by subtracting the highest score from the lowest score


  • example: HS=119; LS=35 ---> R=HS-LS --->R=119-35=84 --->R=84
Step 2: Determine the number of classes (k)  using Sturges’ approximation which is given by: 
           k = 1+3.322 logn (n is the number of observations), which is rounded off to the next higher integer

  • example: n=50
  • k=1+3.322 logn
  • k=1+3.322 log50
  • k=1+3.322 (1.70)
  • k=1+5.6474
  • k=6.6474 ≈ 7
Step 3: Find the width of the class interval (c) using this formula: 
     


  • example: c=84/7=12
After computing the class size (k) and class interval (Ci) you can now tally the frequencies (fi) for each class and compute for class marks (x), and class boundaries (Cb).


CLASS FREQUENCY. This refers to the number of observations belonging to a class interval, or the number of items within a category (Pagoso, 1986). To illustrate, consider the following scores of ten pupils in a competitive test: 15, 15, 15, 18, 18, 19, 22, 22, 24, 24.

                                  Scores                 frequency
                                     15                        3
                                     18                        2
                                     19                        1
                                     22                        2
                                     24                        2



            Class frequency can be arranged in different forms. This can be done using cumulative frequency and relative frequency.
  • CUMULATIVE FREQUENCY. This is a tabular arrangement of data by class intervals whose frequencies are cumulated. There are two kinds of cumulative frequency (cf). These are: “less than” cumulative frequency (<cf) whose sum of frequencies for each class interval is less than the upper class boundary (Cb) of the interval they correspond to.



Example:               Ci                 f          <cf           Cb
                         15-16              3             3       14.5-16.5
                         17- 18             2             5       16.5-18.5
                         19-20              1             6       18.5-20.5
                         21-22              2             8       20.5-22.5
                         23-24              2             10     22.5-24.5

Each number in <cf column is interpreted as: three items are less than 16.5; 5 are less than 18.5 and so on. 
On the other hand, the “greater than” cumulative frequency (>cf) whose sum of frequencies for each class interval is greater than the lower class boundary of the interval they correspond to.

Example:               Ci                 f          >cf           Cb
                         15-16              3             10     14.5-16.5
                         17- 18             2             7       16.5-18.5
                         19-20              1             5       18.5-20.5
                         21-22              2             4       20.5-22.5
                         23-24              2             2       22.5-24.5

Each number in >cf column is interpreted as: 10 items are greater than 14.5; 7 items are greater than 16.5 and so on…

  • RELATIVE FREQUENCY. This is a tabular arrangement of the data showing the proportion in percent of each frequency to the total frequency. This can be obtained by dividing the class frequency by the total frequency.



Example:               Ci                 f          rf (%)
                         15-16              3            30
                         17- 18             2            20
                         19-20              1            10
                         21-22              2            20
                         23-24              2            20

            Thus, if we have a class frequency of 3, the relative frequency is 3/10 or 30%




CLASS MARK. This can be obtained by adding the lower limit and upper limit and dividing the resulting sum by 2. Example in the interval 75-79, the lower limit is 75 and the upper limit is 79 gives us the average of 77, thus: x=(lower limit + upper limit) / 2 = (75+79)/2 = 154/2 =77


CLASS BOUNDARY. This refer to the true limits of the distribution, where lower class boundary [Li] is computed by subtracting ½ unit from the lower class limit while the upper class boundary [Ui] is obtained by adding ½ unit to the upper class limit. To show this concept let’s use the interval 75-79 again. In this interval we know that the lower limit is 75 and the upper limit is 79. To get the lower boundary and upper boundary we simply:
  • Lower boundary = 75-0.5=74.5
  • Upper boundary = 79+0.5=79.5
Below is an example of frequency distribution showing frequency, class marks (mid-points) and class boundaries:



Class Interval
f
c (class mark)
Class Boundaries
<cf
>cf
75-79
80-84
85-89
2
14
14
77
82
87
74.5-79.5
79.5-84.5
84.5-89.5
2
16
30
30
28
14

Measure of Central Position: Mean (grouped data)

To find a set of quantitative data, it is indeed necessary to define numerical measures that describe essential characteristics of the data. Further, any measure indicating the center of a set of data, arranged in order of magnitude, is the measure of central position or measure of central tendency. The most commonly used measures of central position are the mean, median, and mode.

Mean
Observe the following achievement scores of pupils in mathematics: 18, 19, 20, 21, 22, 23, 24, 25, 26 and 75. If you add all the score divided by the number of pupils, the mean of all items is 27.3. This figure is no longer a representative value since most scores is less than 27.3 except for the pupil that obtained the score of 75. The example gives one property of mean that is "mean is strongly influenced by extreme value."

For grouped data, the mean can be computed using the long method and the coded deviation method (short method).  
For long method, we use this equation:
, where f is the frequency, x is the class mark, and N is the total frequency or total number of observation or cases.
Example: 


Class Interval
f
x
fx
118 – 126
127 – 135
136 – 144
145 – 153
154 – 162
163 – 171
172 – 180
3
5
9
12
5
4
2
122
131
140
149
158
167
176
366
655
1260
1788
790
688
352

N: 40

Σfx: 5,879

For the coded deviation method, the original observations are converted to coded deviations (d’). Here, you choose for an assumed mean . In choosing for assumed mean, any reasonable value in the distribution will do but generally the highest frequency is taken. We use this equation, 
, whereis the assumed mean,is the sum of the differences of frequency and unit coded deviation, N is the total number of observations, i is the class interval. To understand deeply below is an example:

Class Interval
x
f
d’
fd’
118 – 126
122
3
- 3
- 9
127 – 135
131
5
- 2
- 10
136 – 144
140
9
- 1
- 9
145 – 153
149
12
0
0
154 – 162
158
5
+1
+5
163 – 171
167
4
+2
+8
172 – 180
176
2
+3
+6


  N=40

Σ: -9


1 comment: