Collected and classified data are presented in a form of frequency distribution. Frequency distribution is simply a table in which the data are grouped into classes on the basis of common characteristics and the number of cases which fall in each class are recorded. It shows the frequency of occurrence of different values of a single variable. A frequency distribution is constructed to satisfy three objectives :
(i) To facilitate the analysis of data,
(ii) To estimate frequencies of the unknown population distribution from the distribution of sample data, and
(iii) To facilitate the computation of various statistical measures.
Frequency distribution can be of two types:
1. Univariate Frequency Distribution.
2. Bivariate Frequency Distribution.
Univariate distribution incorporates different values of one variable only whereas the Bivariate frequency distribution incorporates the values of two variables. The Univariate frequency distribution is further classified into three categories:
(i) Series of individual observations,
(ii) Discrete frequency distribution, and
(iii) Continuous frequency distribution.
Series of individual observations, is a simple listing of items of each observation.
Discrete Frequency Distribution: In a discrete series, the data are presented in such a way that exact measurements of units are indicated. In a discrete frequency distribution, we count the number of times each value of the variable in data given to you. This is facilitated through the technique of tally bars.
Continuous Frequency Distribution: If the identity of the units about a particular information collected, is neither relevant nor is the order in which the observations occur, then the first step of condensation is to classify the data into different classes by dividing the entire group of values of the variable into a suitable number of groups and then recording the number of observations in each group.
Principles for Constructing Frequency Distributions
Inspite of the great importance of classification in statistical analysis, no hard and fast rules are laid down for it. A statistician uses his discretion for classifying a frequency distribution and sound experience, wisdom, skill and aptness for an appropriate classification of the data. However, the following guidelines must be considered to construct a frequency distribution:
- Type of classes: The classes should be clearly defined and should not lead to any ambiguity.
They should be exhaustive and mutually exclusive so that any value of variable corresponds to only class.
- Number of classes: The choice about the number of classes in which a given frequency distribution should he divided depends upon the following things;
(i) The total frequency which means the total number of observations in the distribution.
(ii) The nature of the data which means the size or magnitude of the values of the variable.
(iii) The desired accuracy.
(iv) The convenience regarding computation of the various descriptive measures of the frequency distribution such as means, variance etc.
The number of classes should not be too small or too large. If the classes are few, the classification becomes very broad and rough which might obscure some important features and characteristics of the data.
The accuracy of the results decreases as the number of classes becomes smaller. On the other hand, too many classes will result in a few frequencies in each class. This will give an irregular pattern of frequencies in different classes thus makes the frequency distribution irregular. Moreover a large number of classes will render the distribution too unwieldy to handle. The computational work for further processing of the data will become quite tedious and time consuming without any proportionate gain in the accuracy of the results.
- Size of Class Intervals: Because the size of the class interval is inversely proportional to the number of classes in a given distribution, the choice about the size of the class interval will depend upon the sound subjective judgment of the statistician.
- Class Boundaries: If in a grouped frequency distribution there are gaps between the upper limit of any class and lower limit of the succeeding class (as in case of inclusive type of classification), there is a need to convert the data into a continuous distribution by applying a correction factor for continuity for determining new classes of exclusive type. The lower and upper class limits of new exclusive type classes are called class boundaries.
- Mid-value or Class Mark: The mid value or class mark is the value of a variable which is exactly at the middle of the class. The mid-value of any class is obtained by dividing the sum of the upper and lower class limits by 2.
Mid value of a class = 1/2 [Lower class limit + Upper class limit]
The class limits should be selected in such a manner that the observations in any class are evenly distributed throughout the class interval so that the actual average of the observations in any class is very close to the mid-value of the class.
6. Open End Classes: The classification is termed as open end classification if the lower limit of the first class or the upper limit of the last class or both are not specified and such classes in which one of the limits is missing are called open end classes.