QT/U1 Topic 2 Collection of Data and Formation of Frequency Distribution
Collection of data is a statistical requirement. Statistics are a set or series of numerical data that acts as a facilitating factor of policy-making. In other words, numerical data establishes Statistics. Numerical data undergoes processing and manipulations before it aids the process of decision making. Hence, numerical data are the raw materials to statistics. These raw materials can originate from various sources. Statisticians and analysts collect these data in different methods.
TYPES OF DATA
There are 2 types of data. Discussed below are the types of data.
- Primary Data: refers to the data that the investigator collects for the very first time. This type of data has not been collected either by this or any other investigator before. A primary data will provide the investigator with the most reliable first-hand information about the respondents. The investigator would have a clear idea about the terminologies uses, the statistical units employed, the research methodology and the size of the sample. Primary data may either be internal or external to the organization.
- Secondary Data: refers to the data that the investigator collects from another source. Past investigators or agents collect data required for their study. The investigator is the first researcher or statistician to collect this data. Moreover, the investigator does not have a clear idea about the intricacies of the data. There may be ambiguity in terms of the sample size and sample technique. There may also be unreliability with respect to the accuracy of the data.
PRIMARY DATA – METHODS OF COLLECTION OF DATA
(I) Direct Personal Investigation
Consists of the collection of data by the investigator in a direct manner. The investigator (or researcher) is responsible for personally approaching a respondent and investigating the research and gather appropriate information. In other words, the researcher himself enters the field and solicits data that he requires to take the research forward. Thus, this method of data collection ensures first-hand information. This data is all the more reliable for an intensive research. But in an extensive research, this data is inadequate and proves to be unreliable. This method of collection of data is time-consuming. Hence, it tends to get handicapped when there is lack of time resource. However, the greatest demerit is that this method is very subjective in nature and is not suitable for objective based extensive researches.
(II) Indirect Oral Interview
Consists of the collection of data by the investigator in an indirect manner. The investigator (or enumerator) approaches (either by telephonic interviews) an indirect respondent who possesses the appropriate information for the research. Thus, this method of data collection ensures first-hand information because the interviewers can cross-question for the right and appropriate information.
(III) Mailed Questionnaire
Consists of mailing a set or series of questions related to the research. The respondent answers the questionnaire and forwards it back to the investigator after marking his/her responses. This method of collection of data has proven to be time-saving. It is also a very cost-efficient manner of collecting the required data. An investigator who has the access to the internet and an email account can undertake this method of data collection. The researcher can only investigate those respondents who also have access to the internet and an email account. This remains the only major restriction of this method.
Scheduling involves a face to face situation with the respondents. In this method of collecting data, the interviewer questions the respondent according to the questions mentioned in a form. This form is known as a schedule. This is different than a questionnaire. A questionnaire is personally filled by the respondents and the interviewer may or may not be physically present. Whereas, the schedule is filled by the enumerator or interviewer after asking the respondent his/her answer to a specific question. And in scheduling method of collecting data, the interviewer or enumerator is physically present.
(V) Local agencies
In this method, the information is not directly or indirectly collected by either the interviewer of the enumerator. Instead, the interviewer hires or employs a local agency to work for him/her and help in gathering appropriate information. These local agents are often known as correspondents as well. Correspondents are only responsible for gathering accurate and reliable information. They work according to their preference and adopt different methods to do so.
SECONDARY DATA – SOURCES OF DATA
(I) Published Sources
There are many national organizations, international agencies and official publications that collect various statistical data. They collect data related to business, commerce, trade, prices, economy, productions, services, industries, currency and foreign affairs. They also collect information related to various (internal and external) socio-economic phenomena and publish them. These publications contain statistical reports of various kinds. Central Government Official Publication, Publications of Research Institutions, Committee Reports and International Publications are some published sources of secondary data.
(II) Unpublished Sources
Some statistical data are not always a part of publications. Such data are stored by institutions or a private firms. Researchers often make use of these unpublished data in order to make their researches all the more original.
Frequency distribution is a representation, either in a graphical or tabular format, that displays the number of observations within a given interval. The intervals must be mutually exclusive and exhaustive, and the interval size depends on the data being analyzed and the goals of the analyst. Frequency distributions are typically used within a statistical context.
As a statistical tool, a frequency distribution provides a visual representation for the distribution of a particular variable. Analysts often use it to show or illustrate the data collected in a sample. For example, the height of children can be split into several different categories or ranges. In measuring the height of 50 children, some are tall, and some are short, but there is a high probability of a higher frequency or concentration in the middle range. The most important factors are that the intervals used must be non-overlapping and must contain all of the possible observations.
Frequency distributions can be presented as a frequency table, a histogram or a bar chart. Both histograms and bar charts provide a visual display using columns, with the y-axis representing the frequency count, and the x-axis representing the variable to be measured. In this example, the y-axis is the number of children, and the x-axis is the height. In general, the chart will show a normal distribution, which means that the majority of occurrences, or in this case children of a certain height, will fall in the middle column. In a histogram, the height of the column represents the range of values for that variable.
Frequency Distributions Used In Trading
Frequency distributions are not commonly used in the world of investments. However, traders who follow Richard D. Wyckoff, a pioneering trader in the early 20th century, use an approach to trading based on frequency distribution. Investment houses still use the approach, which requires considerable practice, to teach traders. The frequency chart is referred to as a point-and-figure chart and was created out of a need for floor traders to take note of price action and to identify trends. The y-axis is the variable measured, and the x-axis is the frequency count. Each change in price action is denoted in X’s and O’s. Traders interpret it as an uptrend when three X’s emerge; in this case, demand has overcome supply. In the reverse situation, when the chart shows three O’s, it indicates that supply has overcome demand.