“While the individual man is an
insolvable
puzzle, in the aggregate he
becomes a
mathematical certainty. You can,
for
example, never foretell what any
one man
will do, but you can say with
precision
what an average number will be
up to.”
—Arthur Conan Doyle,
The Sign
of Four
Sherlock
Holmes spoke these words to his colleague Dr.Watson as the two were unravelling
a mystery. The detective was implying that if a single member is drawn at
random from a population, we cannot
predict
exactly what
that member will look like. However, there are some “average” features of the
entire population that an individual is likely to possess. The degree of
certainty with which we would expect to observe such average features in any
individual depends on our knowledge of the variation among individuals in the
population. Sherlock Holmes has led us to two of the most important statistical
concepts: average and variation. (While
the individual man is an insolvable)
Statistics
is something that surrounds us every day – we’re constantly bombarded with
statistics, in the form of polls, tests, ratings, etc. Understanding those
statistics can be an important thing, but unfortunately, most people have never
been taught just what statistics really mean,
how they’re computed, or how to distinguish the different between statistics
used properly, and statistics misused to deceive.
The most
basic concept in statistics in the idea of an average. An average is a single number which
represents the idea of a typical value.
There are three different numbers
which can represent the idea of an average value, and it’s important to know which one is being used, and whether or not that
is appropriate. The three values are the mean, the median, and the mode.
MEAN
The
mean is what most people are taught as the average in middle school math. Given
a set of values, the mean is what you get by adding up all of the values, and
dividing that sum by the number of values.
The
mean is a very useful number – it summarizes the properties of the group. It’s
important to understand that the mean does not represent
an individual – in fact, there may be no individual whose value matches the
mean.
MEAN = = sum of all data values /
number of data values
or, more formally,
DEFINITION: Mean
The mean is the sum of a set of
values, divided by the number of values in
the set. The notation for the
mean of a set of values is a horizontal bar over
the variable used to represent
the set. The formula for the mean of a data
set {x1, x2, . . . , xn} is,
How to calculate the mean?
QUESTION
What is the mean of the data set
{10; 20; 30; 40; 50}?
SOLUTION
Step 1 : Calculate the sum of the
data
10 + 20 + 30 + 40 + 50 = 150
Step 2 : Divide by the number of values in the data
set to get the mean
Since there are 5 values in the data set, the mean
is
Mean =
MEDIAN
DEFINITION: Median
The median of a data set is the value in the central
position, when the data
set has been arranged from the lowest to the highest
value.
Note that exactly half of the values from the data
set are less than the median and the other half are greater than the median.
To calculate the median of a quantitative data set,
first sort the data from the smallest to the largest value and then find the
value in the middle. If there are an odd number of data, the median will be
equal to one of the values in the data set. If there are an even number of
data, the median will lie halfway between two values in the data set.
Example 4: Median for an odd number of values
QUESTION
What is the median of {10; 14; 86; 2; 68; 99; 1}?
SOLUTION
Step 1 : Sort the values.
The values in the data set, arranged from the
smallest to the largest,
are 1; 2; 10; 14; 68; 86; 99.
Step 2 : Find the number in the middle
There are 7 values in the data set. Since there are
an odd number of
values, the median will be equal to the value in the
middle, namely,
in the 4th position. Therefore the median of the
data set is 14.
Example 5: Median for an even number of values
QUESTION
What is the median of {11; 10; 14; 86; 2; 68; 99;
1}?
SOLUTION
Step 1 : Sort the values
The values in the data set, arranged from the
smallest to the largest,
are
1; 2; 10; 11; 14; 68; 86; 99
Step 2 : Find the number in the middle
There are 8 values in the data set. Since there are
an even number
of values, the median will be halfway between the
two values in the
middle, namely, between the 4th and 5th positions.
The value in the
4th position is 11 and the value in the 5th position
is 14. The median
lies halfway between these two values and is
therefore
Median =
MODE
DEFINITION: Mode
The mode of a data set is the value that occurs most
often in the set. The
mode can also be described as the most frequent or
most common value in
the data set.
To calculate the mode, we simply count the number of
times that each value appears in the data set and then find the value that
appears most often.
A data set can have more than one mode if there is
more than one value with the highest count. For example, both 2 and 3 are modes
in the data set {1; 2; 2; 3; 3}. If all points in a data set occur with equal
frequency, it is equally accurate to describe the data set as having many modes
or no mode.
Example 6: Finding the mode
QUESTION
Find the mode of the data set {2; 2; 3; 4; 4; 4; 6;
6; 7; 8; 8; 10; 10}.
SOLUTION
Step 1 : Count the number of times that each value
appears in the data set
Value
|
Count
|
2
|
2
|
3
|
1
|
4
|
3
|
6
|
2
|
7
|
1
|
8
|
2
|
10
|
2
|
Step 2 : Find the value that appears most often
From the table above we can see that 4 is the only
value that appears
3 times, and all the other values appear less often.
Therefore the
mode of the data set is 4.
One problem with using the mode as a measure of
central tendency is that we can usually not compute the mode of a continuous
data set. Since continuous values can lie anywhere on the real line, any
particular value will almost never repeat. This means that the frequency of
each value in the data set will be 1 and that there will be no mode. We will
look at one way of addressing this problem in the section on grouping data.
Example : Comparison of measures of central tendency
in real life situation.
QUESTION
There are regulations in South Africa related to
bread production to protect consumers. By law, if a loaf of bread is not
labelled, it must weigh 800 g, with the leeway of 5 per cent under or 10 per cent
over. Vishnu is interested in how a well-known, national retailer measures up
to this standard. He visited his local branch of the supplier and recorded the
masses of 10 different loaves of bread for one week. The results, in grams, are
given below:
Monday Tuesday Wednesday Thursday
Friday Saturday
Sunday
|
802,4 787,8 815,7 807,4 801,5 786,6 799,0
|
796,8 798,9 809,7 798,7 818,3 789,1 806,0
|
802,5 793,6 785,4 809,3 787,7 801,5 799,4
|
819,6 812,6 809,1 791,1 805,3 817,8
801,0
|
801,2 795,9 795,2 820,4 806,6
819,5 796,7
|
789,0 796,3 787,9 799,8 789,5 802,1 802,2
|
789,0 797,7 776,7 790,7 803,2 801,2 807,3
|
808,8 780,4 812,6 801,8 784,7 792,2 809,8
|
802,4 790,8 792,4 789,2 815,6 799,4 791,2
|
796,2 817,6 799,1 826,0 807,9 806,7 780,2
|
1. Is this data set qualitative or quantitative?
Explain your answer.
2. Determine the mean, median and mode of the mass
of a loaf of bread for
each day of the week. Give your answer correct to 1
decimal place.
3. Based on the data, do you think that this
supplier is providing bread within
the South African regulations?
SOLUTION
Step 1 : Qualitative or quantitative?
Since each mass can be represented by a number, the
data set is
quantitative. Furthermore, since a mass can be any
real number, the
data are continuous.
Step 2 : Calculate the mean
In each column (for each day of the week), we add up
the measurements
and divide by the number of measurements, 10. For
Monday,
the sum of the measured values is 8007.9 and so the
mean for Monday
is
In the same way, we can compute the mean for each
day of the week.
See the table below for the results.
Step 3 : Calculate the median
In each column we sort the numbers from lowest to
highest and find the value in the middle. Since there are an even number of measurements
(10), the median is halfway between the two numbers in the middle. For Monday,
the sorted list of numbers is
789,0; 789,0; 796,2; 796,7; 801,2;
802,3; 802,3; 802,5; 808,7; 819,6
The two numbers in the middle are 801,2 and 802,3
and so the median is
In the same way, we can compute the median for each
day of the week:
Day
|
Mean(g)
|
Median(g)
|
Monday
|
800.8
|
801.8
|
Tuesday
|
797.2
|
796.1
|
Wednesday
|
798.7
|
797.2
|
Thursday
|
803.4
|
800.8
|
Friday
|
802.0
|
804.3
|
Saturday
|
801.6
|
801.4
|
Sunday
|
799.3
|
800.2
|
From the above calculations we can see that the
means and medians
are close to one another, but not quite equal. In
the next worked
example we will see that the mean and median are not
always close
to each other.
Step 4 : Determine the mode
Since the data are continuous we cannot compute the
mode. In the
next section we will see how we can group data in
order to make it
possible to compute an approximation for the mode.
Step 5 : Conclusion: Is the supplier reliable?
From the question, the requirements are that the
mass of a loaf
of bread be between 800 g minus 5%, which is 760 g,
and plus 10%,
which is 880 g. Since every one of the measurements
made by Vishnu
lies within this range and since the means and
medians are all close
to 800 g, we can conclude that the supplier is
reliable.
Reference:
1. Handbook
on Mathematics.
2. Book
of Everything in Mathematics- Grade 9-10
3. Wikipedia
4. Wolfram
Maths World.
No comments:
Post a Comment