Variable Types in Statistical Analysis for Data Science

Swarupa P
3 min readJul 2, 2021
Photo by Carlos Muza on Unsplash

In terms of statistics, analyzing the dataset with respect to a particular data science problem the values will be in the form of variables.

Variables are defined as a piece of recorded information or characteristics about a person. Variables are broadly classified into 2 types.

  1. Categorical Variables or Qualitative Variables
  2. Numerical Variables or Quantitative Variables

1.Categorical or Qualitative Variables: A categorical/qualitative variable is defined as a variable in which the values cannot be counted. It describes the data that fits into groups or categories. These variables will summarize the data in the form of percentages such as the number of males and the number of females in a class.

For example, We can group the people based on the states .States(Variables include New York, Florida, Washington)

Categorical variables are divided into 2 types.

1.1 Nominal Variables and

1.2 Ordinal Variables.

1.1 Nominal Variables: Qualitative/Categorical variables are not measured on a scale in statistics, so that’s why these are called Nominal Variables. The values which cannot be measured or ordered are Nominal values. In Nominal variables, the data is assigned to a group/category.

For Example, we can take the DateOfBirth column as for this column we cannot scale the values.

1.2 Ordinal Variables: These variables are measured and we can give an order/rank to its values.

For Example, we can give a specific order for Age column.

2.Numerical Variables or Quantitative Variables: These variables are measured and they will result in a number. For Example, we can take the income column as a numerical variable. These variables can summarize the data by using the measures of central tendency(Mean, Median). These variables are divided into 2 types.

2.1 Discrete variables and

2.2 Continuous variables

2.1 Discrete variables: These variables are a kind of statistic variable and it will take only a finite number of values. These values are not continuous and it will neglect all those values which are in decimals.

For Example, the number of people in a class or the number of births in a day.

2.2 Continuous variables: These variables are a kind of statistic variable and it will take an infinite set of values. A continuous variable is used in statistics to describe data that is measurable in some way. If your data deals with measuring a height, weight, or time, then you have a continuous variable.

Continuous variables can further be subdivided into scales of measurement as ratio scale variable or interval scale variable.

Ratio Variable: A ratio variable, has all the properties of an interval variable and also has a clear definition of 0.0. When the variable equals 0.0, there is none of that variable.

For Example, Age, weight, and length

Interval Variable: The interval variable is a measurement variable that is used to define values measured along a scale, with each point placed at an equal distance from one another.

In brief, An interval scale is one where there is order and the difference between two values is meaningful.

For Example, When grading test scores like the SAT, for example, we use numbers as a reference point.

General Rule: If we can apply some kind of math to the data, then it’s called Quantitative Variable and for the data, we can’t add the strings such as blue + green and this is called Categorical Variables. Numbers are sometimes assigned to a qualitative variable for data analysis, still, they are considered qualitative analysis. For example, we can assign 1 for “cat” and 2 for “dog”.

I just hope you really liked my article! Also, if you enjoyed reading this article, it would mean a lot if you could give it some claps. Feel free to share with your friends as well!

--

--

Swarupa P

Hey, I’m Swarupa from Bangalore, India. Software Engineer. I write about startups and technology