Collection of Data Notes

Collection of Data

Introduction

  • Statistics are numerical data, originating from counting (enumeration) or measurement.

  • A statistical enquiry involves collecting facts and figures about a particular phenomenon.

  • The investigator conducts the enquiry by counting or measuring characteristics.

  • Ideally, the investigator is a trained statistician.

  • Respondents are the people from whom information is collected.

  • Statistical units are the items on which measurements are taken.

  • The collection of statistical data is the process of counting, enumerating, or measuring, along with systematic recording.

  • Systematic data collection is the foundation of statistical analysis.

Preliminaries to Data Collection

Before data collection, the following points should be examined:

  1. Objectives and scope of the enquiry:

    • Define the goals of the study clearly.

  2. Statistical units to be used:

    • Determine the specific units for measurement.

  3. Sources of information (data):

    • Identify where the data will come from.

  4. Method of data collection:

    • Decide how the data will be gathered.

  5. Degree of accuracy aimed at in the final results:

    • Set the desired level of precision.

  6. Type of enquiry:

    • Determine the nature of the investigation.

Objectives and Scope of the Enquiry

  • Clearly define the objectives of the statistical enquiry.

  • This definition helps determine the nature of data to be collected and the statistical techniques to be used.

  • Explicit objectives prevent the collection of irrelevant information and highlight the potential uses of the data.

  • The scope of the enquiry significantly impacts the data collected and the methods used for collection and analysis.

  • Scope includes coverage regarding the type of information, subject matter, and geographical area.

    • For example, a cost of living index study should specify whether it pertains to a city, state or all of India, and which class of people it is intended for.

  • Large-scale investigations may require sample methods, while smaller studies may allow for 100% enumeration (census method).

  • The objectives and scope should align with available resources (money, manpower, time).

Statistical Units to be Used

  • A statistical unit is a well-defined and identifiable object or group associated with measurements or counts.

  • Examples include an individual person, family, household, or a block of locality in a socio-economic survey.

  • Clearly define the statistical units before data collection to avoid errors and fallacious conclusions.

  • Units can be conventionally fixed (e.g., metres, kilograms, hours) or arbitrary (especially in socio-economic studies).

Requisites of a Statistical Unit
  1. Unambiguous:

    • The unit should be rigidly defined to avoid ambiguity.

    • It should cover the entire population and be distinct, ensuring each element belongs to only one unit.

  2. Specific:

    • The statistical unit must be precise and specific, leaving no room for interpretation by investigators.

    • Distinguish between conventional and arbitrary definitions of characteristics/variables.

      • Conventional definitions are commonly used and remain the same.

      • Arbitrary definitions are specific to the enquiry and may change.

      • Example: Wages can be weekly, monthly, for skilled labor, including bonuses, etc.

  3. Stable:

    • The unit should be stable over time and across different places.

    • Significant fluctuations can make data incomparable and reduce utility.

    • Example: Fluctuations in the value of money (inflation) or weight measurements (altitude) can render comparisons useless.

  4. Appropriate to the enquiry:

    • The unit selected must be relevant to the given enquiry.

    • Example: Wholesale prices are suitable for studying changes in the general price level, while retail prices are appropriate for cost of living indices.

  5. Uniform:

    • The unit adopted should be homogeneous throughout the investigation for comparable measurements.

    • Example: Using both yards and metres in measuring length would lead to confusion.

Types of Statistical Units
  1. Units of Collection:

    • Sub-divided into:

      • Units of Enumeration: The basic unit on which observations are made, determined in advance based on the enquiry's objectives.

        • Examples: person, household, family, farm, shop, livestock, firm.

        • Should be clearly defined (shape, size, etc.).

        • Example: For a cost of living index, "household" must be clearly defined (blood relations, common kitchen, all residents, ration card holders).

      • Units of Recording: The units in which data are recorded(quantification).

        • Examples: Weight (kilograms), length (metres), price (rupees).

        • Units of measurement can be simple or composite.

          • Simple Units: Represent one condition without qualification (e.g., metre, rupee).

          • Composite Units: Simple units with qualifying words.

            • Compound Units: Simple unit with one qualifying word (e.g., skilled worker, ton-kilometre).

            • Complex Units: Simple unit with two or more qualifying words (e.g., production per machine hour).

            • Composite units are restrictive in scope and require clear definitions.

  2. Units of Analysis and Interpretation:

    • Units in which statistical data are ultimately analyzed and interpreted.

    • Determine whether results will be expressed in absolute or relative figures.

    • Facilitate comparisons between data sets.

    • Generally, units are rates, ratios, percentages, and coefficients.

      • Rates: Comparison between two heterogeneous quantities (numerator and denominator are different) (e.g., mortality rates).

        • Expressed per thousand; coefficient is rate per unit.

      • Ratios and Percentages: Compare homogeneous quantities (numerator and denominator are the same) (e.g., ratio of smokers to non-smokers).

    • The unit of analysis often provides relative figures independent of units of measurement.

    • Example: Coefficient of Variation for comparing variability, Coefficient of Skewness for comparing symmetry.

Sources of Information (Data)

  • Decide on data sources after determining objectives, scope, and statistical units.

  • Data can be collected first hand (primary data) or from published sources (secondary data).

    • Primary data: Collected originally by the investigator.

    • Secondary data: Used from previously collected data.

      • Example: Vital rates prepared by the Registrar-General of India are primary data but become secondary when reproduced in the U.N. Statistical Abstract.

  • Primary data collection requires defining terms and statistical units, considering the enquiry's objectives and scope.

  • Secondary data requires careful editing and scrutiny for reliability, suitability, and adequacy.

  • The use of primary, secondary, or both types of data depends on the enquiry's purpose and scope.

Method of Data Collection

  • This is only a consideration if primary data is to be used.

  • Decision required - whether to use (i) census method or (ii) sample technique.

    • Census Method: 100% inspection of the population.

    • Sample Technique: Inspecting only a representative subset of the population.

      • In situations such as infinite or very large populations census method is impractical.

  • Census method isn't viable for destructive testing (e.g., breaking strength of chalk, life of bulbs, testing explosives).

  • Choice depends on survey objectives, scope, resource limitations, and desired accuracy.

  • For sample method, need to decide sample size and technique (e.g., simple random sampling, stratified random sampling).

Degree of Accuracy Aimed at in the Final Results

  • Decide the desired degree of accuracy or precision before starting the enquiry.

  • The precision level influences data collection methods and sample size.

  • Information from previous studies can guide precision, assuming no fundamental changes.

  • Perfect accuracy is practically impossible due to errors in measurement, collection, analysis, and interpretation.

  • Reasonable precision is sufficient for valid inferences.

  • Depends on the objectives and scope of the enquiry.

    • Example: Centimetre differences matter in measuring cloth, but metre differences may be immaterial in measuring distances between cities.

  • "The necessary degree of accuracy in counting or measuring depends upon the practical value of accuracy in relation to its cost."

Types of Enquiry

  • Deciding on the type of enquiry is important before data collection.

  • Types include:

    • Official, Semi-official, or Un-official.

    • Initial or Repetitive.

    • Confidential or Non-confidential.

    • Direct or Indirect.

    • Regular or Ad-hoc.

    • Census or Sample.

    • Primary or Secondary.

Official, Semi-official, or Un-official Enquiry
  • Important factor: Sponsoring agency.

    • Official Enquiry: Conducted by government (central, state, local).

    • Semi-official Enquiry: Conducted by organizations with government patronage (e.g., I.C.A.R., I.A.S.R.I., I.S.I.).

    • Un-official Enquiry: Sponsored by private institutions (e.g., F.I.C.C.I.), trade unions, universities, or individuals.

  • Facilities differ significantly for each type.

  • Official enquiries can compel information furnishing and may obtain it at respondents' own cost.

  • Unofficial enquiries rely on persuasion and request, with no legal compulsion for respondents.

  • Financial positions also vary, affecting the scope and depth of enquiries.

Initial or Repetitive Enquiry
  • Initial Enquiry: Conducted for the first time.

  • Repetitive Enquiry: Continuation of a previously conducted enquiry.

  • Initial enquiry requires formulating the entire plan, while repetitive enquiry modifies the original based on past experiences.

  • Valid conclusions in repetitive enquiries require ensuring no material changes in the definitions of terms.

Confidential or Non-confidential Enquiry
  • Confidential Enquiry: Information and results are kept secret, meant only for the sponsoring agency.

    • Common for enquiries by private organizations like trade unions.

  • Non-confidential Enquiry: Results are published and made known to the public.

    • Most enquiries conducted by state, private bodies, or individuals are of this type.

Direct or Indirect Enquiry
  • Direct Enquiry: The phenomenon under study can be quantitatively measured (e.g., age, weight, income).

  • Indirect Enquiry: The phenomenon is qualitative and not measurable (e.g., honesty, beauty, intelligence).

  • Indirect enquiries convert qualitative characteristics into quantitative phenomena using standard scales (e.g., Intelligence Quotient (I.Q.) score).

Regular or Ad-hoc Enquiry
  • Regular Enquiry: Conducted periodically at equal intervals (e.g., census every 10 years).

    • C.S.O. conducts and publishes results periodically (e.g., Monthly Abstract of Statistics).

  • Ad-hoc Enquiry: Conducted as needed without regularity.

    • C.S.O. and N.S.S.O. conduct a number of ad-hoc enquiries.

Primary and Secondary Data

  • The most essential factor in statistical enquiry is that the original collection of data is correct and proper.

  • Inadequacies at the source render even most advanced techniques invalid.

  • Data can be from two sources:

    1. The investigator conducts the enquiry originally.

    2. The agency obtains the necessary data from other sources that have already collected the data on that subject.

    • Primary Data: Collected for the first time and used by them in the statistical analysis.

    • Secondary Data: Published or unpublished and have been already collected and processed by some other agency or person.

      • The second agency if publishes and fills such data becomes the secondary source to anyone who uses these data later.

Choice Between Primary and Secondary Data
  • The choice of data depends on the nature, objective, and scope of the enquiry; the time and money at disposal; the degree of precision aimed at; and the status of the agency.

Remarks:

  1. When using secondary data, obtain the data from the primary source to save yourself from errors of transcription.

  2. Generally, secondary data is used because fairly reliable published data is available in publications by governments, organizations and institutions.

Internal and External Data
  • Internal Data: collected by the organization from its own internal operations. (e.g., production, sales, profits).

  • External Data: Obtained from publications of other agencies (e.g., governments), for use by the given organization.

Methods of Collecting Primary Data

  • The methods are:

    1. Direct personal investigation.

    2. Indirect oral interviews.

    3. Information received through local agencies.

    4. Mailed questionnaire method.

    5. Schedules sent through enumerators.

Direct Personal Investigation
  • Collection of data personally by the investigator from the sources concerned.

  • Most restricts the scope of the enquiry. This nature of this technique is only suitable if the enquiry is intensive rather than extensive.

  • Its not suitable if the field of investigation is too wide.

Merits:
  1. First hand information is more reliable and accurate.

  2. The data is generally reliable if the type of enquiry is intensive.

  3. The response is more encouraging when the investigator approaches the audience personally,

  4. The investigator can extract proper information by talking to them at their level and using local connotations.

Demerits:
  1. Suited only for intensive studies and not for extensive enquiries.

  2. Handicapped due to lack of time, money and manpower.

  3. Greatest drawback is it is subjective. Success largely depends upon the intelligence and diplomacy of the investigator.

  4. Results obtained may not be reliable as the investigator may not be intelligent or tactful enough to understand the psychologies of the interviewing audience.

Indirect Oral Investigation
  • When direct personal investigation is not practicable as people can be unwilling to furnish the information.

  • Data on different problems are collected by interviewing persons who are directly or indirectly concerned with the subject matter of the enquiry.

  • A small list of question is prepared and put to the witnesses and their replies are recorded.

  • Such procedure is adopted by the Enquiry Committees or Commissions appointed by the government.

Merits:
  1. The enumerators can use their intelligence, skill and tact to extract correct information by cross examination of the informants.

  2. Less expensive and requires less time for investigation.

  3. Expert views and suggestions are obtained in order to conduct the enquiry more effectively.

Demerits:
  1. Due to lack of supervision the investigator has to rely on the enumerators.

  2. The accuracy of the data depends on the nature and quality of the witnesses.

  3. Wrong choice of witnesses will give biased results which would adversly affect the findings of the enquiry.

Information Received Through Local Agencies
  • Information is not collected formally by the investigator.

  • They are collected by the local agents by the investigator in different parts of field of enquiry.

  • They submit their reports periodically to the head office to be processed for final analysis.

  • Usually employed by newspaper or economic or sports agencies to obtain data.

  • This is useful for obtaining estimates of agricultural crops which may be submitted to the government by the village school teachers.

Merits:
  • Works out to be cheaper and more economical.

  • Required information is obtained expeditiously.

Demerits:
  • Style of agents will be different so results can be biased and not very reliable.

  • Many neglect to register, causing under-estimation. There should be legal compulsions for registration of events and also there should be sanctions for the enforcement of the obligation.

Mailed Questionnaire Method
  • A questionnaire including list of questions is mailed to the respondents with a request for quick response within the specified time.

  • The important factor is the skill, efficiency, care and the wisdom with which the questionnaire is framed.

Merits:
  1. Of all methods, this one is the most economical in terms of time, money and manpower.

  2. This can be used with extensive enquiries to cover a wide area.

  3. Errors are eliminated as the information is supplied directly by the person concerned.

Demerits:
  1. Used only with educated people given they would understand the question and reply in their handwriting.

  2. People might suppress correct information and furnish wrong replies.

  3. Cannot verify the accuracy and reliability of the information received.

  4. No scope of asking supplementary questions for cross checking the information supplied by them.

Schedules Sent Through Enumerators
  • Questionnaire is a list of questions answered by the respondent.

  • Schedule is a device of obtaining answers by field agents in a face to face situation with the respondent.

  • Enumerators go to the respondents personally with the schedule (list of questions), ask them the questions there in and record their replies.

Merits:
  1. Enumerators explain objectives of enquiry to the informants and impress upon them the need of furnishing correct information.

  2. Technique is useful and yields fairly reliable results. There is non response which occurs if it isn't possible to contact respondents even after many calls or the respondent does not furnish the request.

  3. This can still be used with illiterate respondents.

  4. Can handle the situation affectively and check the accuracy of information supplied by some intelligent cross-questioning by asking supplementary questions.

Demerits:
  1. It is quite expensive as enumerators have to be trained and they are paid.

  2. Also more time consuming as compared to mailing methods.

  3. Success depends upon the efficiency and skill of the enumerators.

  4. Due to variation the information can differ by enumerators. Minimize variation.

  5. Success lies on the manner which the schedule is prepared and drafted. This makes to obtain complete and desired information from the respondents.

Remarks:

  1. It is desirable to scrutinise questionnaires or schedules duly filled in for detecting and inconsistency.

  2. Two sets of enumerators can check on the honesty and integrity of the other enumerators. However this can be harmful as individuals might get irritated being approached for the second time.

Drafting or Framing the Questionnaire

  • The questionnaire is designed with utmost care to collect relevant data without confusion.

  • Drafting a good questionnaire is a highly specialized job.

General Points to bear in mind

  • The size of the questionnaire should be as small as possible (roughly 15 to 25 questions).

  • The questions should be arranged in a natural logical sequence.

  • The usage of vague and 'multiple meaning' words should be avoided.

  • Questions should be readily comprehensible and easy to answer for the respondents.

*Types of Questions (In Questionnaire):

*(a) Shut Questions: Here the possible answers are suggested, and the respondent ticks one of them. Here are the types:

(i) Simple Alternate Questions: Respondent has to choose between two clear alternatives like 'Yes or No'
(ii) Multiple Choice Questions: Either the first method (Alternate Questions) is not used or additional
answers between Yes and No like Do not know, No opinion, Occasionally, Casually, Seldom, etc., are added.

*(b) Open Questions: There are no alternative answers, and the respondents express their opinions.

  • 'Leading' questions should be avoided.
    The questionnaire should be designed, so as to provide internal checks on the accuracy of the information supplied by the respondents including some connected questions.
    From practical point of view it is desirable to try out the questionnaire on a small scale.

A covering letter from the organisers of the enquiry should be enclosed along with the questionnaire, so they know what they are agreeing to.

Sources of Secondary data

Published Sources
  • There are number of national (government, semi-government and private) organisations and also international agencies which collect statistical data relating to business and economics etc which publish their results in statistical reports on a regular basis (monthly, quarterly, annually, ad-hoc).

    • Official Publications of Central Government.

    • Publications of Semi-Government Statistical Organisations.

    • Publications of Research Institutions.

    • Publications of Commercial and Financial Institutions.

Reports of Various Committees and Commissions appointed by the Government
  • The report of the survey and enquiry commissions and committees of the Central and State Governments to find their expert views on some important matters relating to economic and social phenomena like wages, dearness allowance, prices, national income, taxation, land, education, etc., are invaluable secondary information.

  • Simon-Kuznet Committee report on National Income in India

  • Wanchoo Commission report on Taxation

  • Kothari Commission report on Educational Reforms

  • Pay Commissions Reports

  • Land Reforms Committee report

  • Gupta Commission report on Maruti Affairs

Newspapers and Periodicals

Enumerated below

  • Statistical material on social-economic problems.

  • Eastern Economist, Economic Times. Also enumerated below.

  • The Financial Express, Indian Journal of Economics, Commerce, Capital, Transport, Statesman's Year Book and The Times of India Year Book, etc.
    International Publications
    United Nations Organisation (U.N.O.)
    International Monetary Fund (I.M.F.)
    World Bank
    Economic and Social Commission for Asia and Pacific (ESCAP)
    International Finance Corporation (I.F.C.)
    International Statistical Education Institute.

Unpublished Sources
  • The data need not always be published. These records are maintained by private firms, enterprises and departments of offices of central and state governments.

  • Remarks: In some cases information is gathered and is confidential in nature.

Precautions in the use of Secondary Data

  • Should be used with extra caution. Investigator must be satisfied by reliability, accuracy, adequacy, and suitability of the data for the problem being investigated.

  • Proper Care should be taken to edit the data.

Important points to use secondary data

The Reliability of Data. Some elements to consider:

  • the reliability, integrity and experience of the collecting organization

  • the reliability of source of information and

  • the methods used for the collection and analysis of the data

The Suitability of Data. Important points to note
  • observe and compare the objectives, nature and scope of the given enquiry with the original investigation

  • is to take into account the difference in the timings of collection and homogeneity of conditions for the original enquiry and the investigation in hand.
    Adequacy of data:

should be there and that will solve the limitations and inaccuracies that's involved in the secondary Data.