Graphical representation is a good way to represent summarised data. However, graphs provide us only an overview and thus may not be used for further analysis. Hence, we use summary statistics like computing averages. to analyse the data. Mass data, which is collected, classified, tabulated and presented systematically, is analysed further to bring its size to a single representative figure.
This single figure is the measure which can be found at central part of the range of all values. It is the one which represents the entire data set. Hence, this is called the measure of central tendency. In other words, the tendency of data to cluster around a figure which is in central location is known as central tendency. Measure of central tendency or average of first order describes the concentration of large numbers around a particular value. It is a single value which represents all units. a.
Statistical Averages: The commonly used statistical averages are arithmetic mean, geometric mean, harmonic mean. b. Arithmetic mean is defined as the sum of all values divided by number of values and is represented by X. c. Median: Median of a set of values is the value which is the middle most value when they are arranged in the ascending order of magnitude. Median is denoted by ‘M’.
d. Mode: Mode is the value which has the highest frequency and is denoted by Z. Modal value is most useful for business people.For example, shoe and readymade garment manufacturers will like to know the modal size of the people to plan their operations. For discrete data with or without frequency, it is that value corresponding to highest frequency. Appropriate Situations for the use of Various Averages 1. Arithmetic mean is used when: a.
In depth study of the variable is needed b. The variable is continuous and additive in nature c. The data are in the interval or ratio scale d.
When the distribution is symmetrical 2. Median is used when: a. The variable is discrete .
There exists abnormal values c. The distribution is skewed d. The extreme values are missing e. The characteristics studied are qualitative f. The data are on the ordinal scale 3.
Mode is used when: a. The variable is discrete b. There exists abnormal values c. The distribution is skewed d. The extreme values are missing e. The characteristics studied are qualitative 4. Geometric mean is used when: a.
The rate of growth, ratios and percentages are to be studied b. The variable is of multiplicative nature 5. Harmonic mean is used when: a.The study is related to speed, time b. Average of rates which produce equal effects has to be found Positional Averages Median is the mid-value of series of data. It divides the distribution into two equal portions. Similarly, we can divide a given distribution into four, ten or hundred or any other number of equal portions.
2) EXPLAIN THE PURPOSE OF TABULAR PRESENTATION OF STATISTICAL DATA. DRAFT A FORM OF TABULATION TO SHOW THE DISTRIBUTION OF POPULATION ACCORDING TO i) Community by age, ii) Literacy, iii) Sex, and iv) Marital Status? ? Tabulation follows classification.It is a logical or systematic listing of related data in rows and columns. The row of a table represents the horizontal arrangement of data and column represents the vertical arrangement of data.
The presentation of data in tables should be simple, systematic and unambiguous. The following are the objectives and purpose of tabular presentation of statistical data. a) It simplifies complex data b) It highlights important characteristics c) It presents data in minimum space d) It facilitates comparison e) It brings out trends and tendencies ) It facilitates further analysis The main parts of the table are discussed as hereunder: i) TABLE NUMBER: Table number is to identify the table for reference.
When there are many tables in an analysis, then table numbers are helpful in identifying the tables. ii) TITLE: It indicates the scope and the nature of contents in concise form. In other words, title of a table gives information about the data contained in the body of the table.
It should not be lengthy. iii)CAPTIONS: Captions are the headings and subheadings describing the data present in the columns. v) STUBS: These are the headings and subheadings of rows. v) BODY OF THE TABLE: It contains numerical information vi) RULING AND SPACING: It separates columns and rows. However, totals are separated from main body by thick lines. vii) HEAD NOTE: It is given below the title of the table to indicate the units of measurement of the data and is enclosed in brackets.
viii) SOURCE NOTE: It indicates the source from which data is taken. The source note related to table is placed at the bottom on the left hand corner. TABLE 1 – % OF DISTRIBUTION OF POPULATION Marital Status |Age/Sex |LITERATE |ILLITERATE | | | | |It is simple to calculate and easy to understand |It is affected by extreme values | |It is based on all values |It cannot be determined for distributions with open-end class | | |intervals | |It is rigidly defined |It cannot be graphically located | |It is more stable |Sometimes it is a value which is not in the series | |It is capable of further algebraic treatment | | Median of a set of values is the middle most value when the values are arranged in the ascending order of magnitude. The merits and demerits of Median are as under: MERITS |DEMERITS | |It can be easily understood and computed |It is not based on all values | |It is not affected by extreme values |It is not capable of further algebraic treatment | |It can be determined graphically (Ogives) |It is not based on all values | |It can be used for qualitative data | | |It can be calculated for distributions with open end classes.
| | Mode is the value which has the highest frequency. The merits and demerits of Mode are as under: MERITS |DEMERITS | |In many cases it can be found by inspection |It is not based on all values | |It is not affected by extreme values |It is not capable of further mathematical treatment | |It can be calculated for distributions with open end classes |It is much affected by sampling fluctuations | |It can be located graphically | | |It can be used for qualitative data | | The best measure of tendency is arithmetic mean. It is defined as a value obtained by dividing the sum of all the observation by their number, that is mean = (Sum of all the observations)/ (Number of the observations). Arithmetic Mean is used because it is simple to understand and easy to interpret.
It is quickly and easily calculated. It is amenable to mathematical treatments.It is relatively stable in repeated sampling experiments.
4) Machines are used to pack sugar into packets supposedly containing 1. 20 kg each. On testing a large number of packets over a long period of time, it was found that the mean weight of the packets was 1. 24 kg and the standard deviation was 0. 04 Kg. A particular machine is selected to check the total weight of each of the 25 packets filled consecutively by the machine. Calculate the limits within which the weight of the packets should lie assuming that the machine is not been classified as faulty.
? Since sample size is 25 which is less than 30, therefore, it is a case of small sample.T-Test distribution is used to calculate confidence interval. Given, Sample Size = n = 25 Standard deviation, S = 0. 04 Degrees of Freedom, df = n-1 = 25 – 1 = 24 Mean weight, [pic]= 1. 24 Weight = µ ? = 5% = 0. 05 t? /2 = t 0.
05/2 = t 0. 025 = 2-064 at 95% confidence and degree of freedom df = 24 The limits are, =[pic] ± t? /2 S/vn = 1. 24 ± 2. 064( 0. 04 / v25 ) = 1. 24 ± [ 2. 064 ( 0.
04 / 5) ] = 1. 24 ± 0. 016512 [pic] – t? /2 S/vn ? µ ? [pic] + t? /2 S/v = 1.
24 – 0. 016512 ? µ ? 1. 24 + 0. 016512 = 1. 223488 ? µ ? 1. 256512 5) A packaging device is set to fill detergent power packets with a mean weight of 5 Kg.
The standard deviation is known to be 0. 01 Kg.These are known to drift upwards over a period of time due to machine fault, which is not tolerable. A random sample of 100 packets is taken and weighed. This sample has a mean weight of 5. 03 Kg and a standard deviation of 0. 21 Kg.
Can we calculate that the mean weight produced by the machine has increased? Use 5% level of significance. Since sample size is 100 which is a case of large sample So Z-test statistics will be used for hypothesis testing. Let us take the null hypothesis, H0 Let mean weight has increased H1 and HA for alternate hypothesis H0 : µ = 5 H1 : µ > 5 ( Right Tailed test ) Given, Sample size = n = 100 Mean Weight = [pic] = 5. 03 kgStandard deviation = S = 0. 21 kg Level of significance, ? = 5% Z = ([pic] – µ ) / (S / vn) = (5. 03 – 5 ) / (0. 21 / v100) Z calculated = 1.
428 Now, check the table for 5% Now, Z critical = Z? = Z0. 05 = 1. 645 ( For one tailed test ) Since calculated value, Z calculated = 1.
428 is less than its critical value Z? = 1. 645 Therefore, H0 is accepted. Hence we conclude the mean weight produced by the machine has increased. 6) Find the probability that at most 5 defective bolts will be found in a box of 200 bolts if it is known that 2 per cent of such bolts are expected to be defective . (you may take the distribution to be Poisson; e-4= 0. 0183). Given, total number of bolts, n = 200 P (defective bolt) = 2% = 0. 02 Therefore, m = np = 200 * 0.
02 = 4 |P(X = 0) |= |P (zero defective bolt) | | |= |(e-m m0 ) / 0! | | |= |(e-4 40 ) / 1 | | |= |( 0. 0183 ) ( 1 ) / 1 | | |= |0. 183 | | | | | |P (at most 5 defective bolts) |= |P (X? 5) | | |= | P (X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=5) | | |= |(e-m m0) / 0! + (e-m m1) / 1! + ( e-m m2) / 2! + ( e-m m3) / 3! + (e-m m4) / 4! + (e-m | | | |m5) / 5! | | |= |e-m [ 1 + m1 / 1! + m2/2! + m3/3! + m4/4! + m5/5! | | |= |e-4 [1 + 41 / 1 + 8/2 + 64/6 + 256/24 + 1024/120 ] | | |= |0. 0183 [ 1 + 4 + 8 + 10. 67 + 10.
67 + 8. 53 ] | | |= |0. 0183 * 42. 87 | | |= |0. 784521 | STATISTICS FOR MANAGEMENT – SET (2) 1) WHAT DO YOU MEAN BY STATISTICAL SURVEY?DIFFERENTIATE BETWEEN “QUESTIONNAIRE” AND “SCHEDULE”? ? A Statistical survey is a scientific process of collection and analysis of numerical data.
Statistical surveys are used to collect numerical information about units in a population. Surveys involve asking questions to individuals. Surveys of human populations are common in government, health, social science and marketing sectors. The point wise difference between questionnaire and schedule is as under: |QUESTIONNAIRE |SCHEDULE | |The questionnaires are filled with questions pertaining to the |Information an be collected through schedules filled by | |investigation. |investigators through personal contact. | |Questionnaire method of collection of data depends mainly on |A Schedule is suitable for an extensive area of investigation | |proper drafting of questionnaire. |through investigator’s personal contact. | |The respondents fills the questionnaire |The information in the schedule is filled by the investigator | | |himself through surveying.
| |There are different types of questions used in the questionnaire |No such types of schedules are used. | |viz. Contingency questions, Matrix questions, closed ended | | |Questions and Open ended Questions | | |This method is used to cover large areas of investigation |This method is suitable for an extensive area.
| |This method results in many non response situations. |The problem of non response is minimized. | |This form can be understood by only literate persons |Even Illiterate persons can understand it through personal | | |contact. | |No direct contact is done between the investigator and respondent|The respondent and investigators comes in direct contact. | )