Data Processing: Editing, Coding, Tabulating
After collecting data, the method of converting raw data into meaningful statement; includes data processing, data analysis, and data interpretation and presentation.
Data reduction or processing mainly involves various manipulations necessary for preparing the data for analysis. The process (of manipulation) could be manual or electronic. It involves editing, categorizing the open-ended questions, coding, computerization and preparation of tables and diagrams.
Information gathered during data collection may lack uniformity. Example: Data collected through questionnaire and schedules may have answers which may not be ticked at proper places, or some questions may be left unanswered. Sometimes information may be given in a form which needs reconstruction in a category designed for analysis, e.g., converting daily/monthly income in annual income and so on. The researcher has to take a decision as to how to edit it.
Editing also needs that data are relevant and appropriate and errors are modified. Occasionally, the investigator makes a mistake and records and impossible answer. “How much red chilies do you use in a month” The answer is written as “4 kilos”. Can a family of three members use four kilo chilies in a month? The correct answer could be “0.4 kilo”.
Care should be taken in editing (re-arranging) answers to open-ended questions. Example: Sometimes “don’t know” answer is edited as “no response”. This is wrong. “Don’t know” means that the respondent is not sure and is in a double mind about his reaction or considers the questions personal and does not want to answer it. “No response” means that the respondent is not familiar with the situation/object/event/individual about which he is asked.
Coding of data:
Coding is translating answers into numerical values or assigning numbers to the various categories of a variable to be used in data analysis. Coding is done by using a code book, code sheet, and a computer card. Coding is done on the basis of the instructions given in the codebook. The code book gives a numerical code for each variable.
Now-a-days, codes are assigned before going to the field while constructing the questionnaire/schedule. Pose data collection; pre-coded items are fed to the computer for processing and analysis. For open-ended questions, however, post-coding is necessary. In such cases, all answers to open-ended questions are placed in categories and each category is assigned a code.
Manual processing is employed when qualitative methods are used or when in quantitative studies, a small sample is used, or when the questionnaire/schedule has a large number of open-ended questions, or when accessibility to computers is difficult or inappropriate. However, coding is done in manual processing also.
Sarantakos (1998: 343) defines distribution of data as a form of classification of scores obtained for the various categories or a particular variable. There are four types of distributions:
- Frequency distribution
- Percentage distribution
- Cumulative distribution
- Statistical distributions
- Frequency distribution:
In social science research, frequency distribution is very common. It presents the frequency of occurrences of certain categories. This distribution appears in two forms:
Ungrouped: Here, the scores are not collapsed into categories, e.g., distribution of ages of the students of a BJ (MC) class, each age value (e.g., 18, 19, 20, and so on) will be presented separately in the distribution.
Grouped: Here, the scores are collapsed into categories, so that 2 or 3 scores are presented together as a group. For example, in the above age distribution groups like 18-20, 21-22 etc., can be formed)
2. Percentage distribution:
It is also possible to give frequencies not in absolute numbers but in percentages. For instance instead of saying 200 respondents of total 2000 had a monthly income of less than Rs. 500, we can say 10% of the respondents have a monthly income of less than Rs. 500.
3. Cumulative distribution:
It tells how often the value of the random variable is less than or equal to a particular reference value.
4. Statistical data distribution:
In this type of data distribution, some measure of average is found out of a sample of respondents. Several kind of averages are available (mean, median, mode) and the researcher must decide which is most suitable to his purpose. Once the average has been calculated, the question arises: how representative a figure it is, i.e., how closely the answers are bunched around it. Are most of them very close to it or is there a wide range of variation?
Tabulation of data:
After editing, which ensures that the information on the schedule is accurate and categorized in a suitable form, the data are put together in some kinds of tables and may also undergo some other forms of statistical analysis.
Table can be prepared manually and/or by computers. For a small study of 100 to 200 persons, there may be little point in tabulating by computer since this necessitates putting the data on punched cards. But for a survey analysis involving a large number of respondents and requiring cross tabulation involving more than two variables, hand tabulation will be inappropriate and time consuming.
Usefulness of tables:
Tables are useful to the researchers and the readers in three ways:
- The present an overall view of findings in a simpler way.
- They identify trends.
- They display relationships in a comparable way between parts of the findings.
By convention, the dependent variable is presented in the rows and the independent variable in the columns.