The branch of mathematics which deals with data handling is known as statistics. Data handling means collection, interpretation, analysis, manipulation & presentation of data. Let’s begin with understanding the concept of data.
Data is nothing but the collection of information. The initial data collected regarding the object of the study is called raw data or ungrouped data. To condense data, we group them into sets of suitable sizes and mention the frequency of each group. Such sets of data are called grouped data.
So, before moving on to the types of statistics, we need to start from the basics.
Fundamental terms Associated with Statistics
There are a few key terms that we should be aware of:
- Observations: The things that we collect data about are called observations. Observations can be a person, a business, movies, basically anything that we’re interested in.
- Variables: The observations are described with the help of variables. Variables record the measurements that we’re interested in. For each observation, we record a score for each of the variables.
Observation | Variables | |||
Best friend | Favorite chocolate | Favorite colour | Favorite movie | Favorite sport |
Arya | Frrrozen Haute | Grey | Bahubali | Basketball |
Now that we have a basic understanding of data let’s move ahead with statistics.
Kinds of Statistics
There are many statistical methods that can be broadly categorised as:
- Descriptive
- Inferential
- Analytical
- Applied
Descriptive Statistics:
The purpose of this method is to summarise the data & bring out its salient features. This allows you to see the pattern in the data. The data is organised & summarised using numbers & graphs. These methods bring out the different characteristics of data. The information is represented through bar graphs, pie charts, tables, histograms, etc. Measurements of kurtosis & skewness. Mean, median & mode functions can be applied to the data. It can only be used to describe the objects or the set of data under a specific study. The results cannot be used to generalise to any other set of population or group.
There are two types of descriptive statistics:
- Measures of central tendency are calculated and expressed as the mean, median, and mode & capture the general trends within the data.
- Measures of spread describe how the data relate to each other, the way they are distributed and including:
- The range
- The frequency distribution
- Mean absolute deviation
- Variance
- Standard deviation
Measures of spread help in understanding the trends within the data and are often visually represented in tables, histograms & pie and bar charts.
Inferential Statistics:
This type of method consists of complex mathematical calculations that help draw inferences about the characteristics of a larger population based on a sample taken from it. There are some margins of errors included in it. It is usually impossible to examine each member of the population individually. So we select a small population group as the representative subset of the larger population, called a statistical sample. From this analysis, we can say something about the larger population from which the sample came.
There are two main parts of inferential statistics:
- A confidence interval: It is a type of estimate computed from the statistical sample. The investigator decides the associated confidence level of the interval.
- Hypothesis testing: Tests of significance or hypothesis testing are used to make claims about the larger population by analysing a statistical sample. There is some uncertainty in this process. This is expressed in terms of a level of significance.
When conducting research using inferential statistics, a significance test is performed to determine if the results can be generalised to a larger population. For example, let’s say that a car company wants to open its branch in India & wants to know what type of car is the most popular in the country. Now it’s almost impossible to go to each and every adult in the country, ask them separately. Instead what they’ll do is take some groups of people (sample population) from different parts of the country to conduct a survey with them.
Now let’s say that 60% of the sample population preferred electric cars. So based on this data with some confidence interval say, 90%, they’ll infer that 60% of the adult population with a say +/- 4% margin of error prefers electric cars in India. To put it simply, they can say that we’re 90% confident that 60% of the adult population in India with a +/- 4% margin of error prefers electric cars. The sample size is inversely proportional to the margin of error & directly proportional to the confidence interval.
Analytical Statistics:
It consists of all the methods that help analyse and compare any two or more variables. This includes the methods of correlation, regression analysis, and the like.
Applied statistics:
It consists of the methods that are applied to the cases of real life. This includes statistical quality control, linear programming, sample survey, inventory control, and the like.
Some important terms & formulae
Class limit:
Suppose the age group of teachers is divided into class intervals 25 – 35, 35 – 45, etc.
In class intervals 35 – 45, 35 is the lower class limit, and 45 is the upper-class limit.
Class size:
The difference between upper-class limit and lower class limit.
Classmark:
It is the mid-value of each class interval.
class mark=lower limit+upper limit2
Mean
For Ungrouped data:
Consider, for ‘n’ observations in ungrouped data as x1, x2, x3, …. xn. The mean is:
Direct method:
Where, fi = frequency corresponding to the classmark ‘xi‘
Assumed mean method:
Where, A = assumed mean and di = xi – A
Step deviation method:
Mode
For Ungrouped Data:
The mode is an observation with maximum frequency.
For Grouped Data:
l = lower limit of the modal class.
f0 = frequency of the class preceding the modal class.
f1 = frequency of the modal class.
f2 = frequency of the class succeeding the modal class.
h = size of the class interval.
The class with the highest frequency is called the modal class.
Median
Median is the middle-most observation in the sorted data.
For Grouped Data:
l = lower limit of the median class.
n = number of observations.
cf = cumulative frequency of the class preceding the median class.
f = frequency of the median class.
h = class size.
The class interval in which the median lies is called the median class.
For Ungrouped Data:
Arrange the data in ascending order.
If a number of elements in the set of data ‘n’ is odd, then the median is n+12th observation.
If a number of elements in the set of data ‘n’ is even, then the median is the average of (n/2)th and (n/2+1)th observations.
Relationship Between Mean, Mode, and Median are expressed as
3 Median = Mode + 2 Mean
Conclusion
In this way, through the various concepts of statistics, including mean, median, and mode and the different kinds of statistics, you can get well-versed in using statistics to analyse any data set.