Categorical data (or Qualitative data)
Categorical data is sorted by defining characteristics. This can include gender, social class, ethnicity, hometown, the industry you work in, or a variety of other labels. This data type is non-numerical, meaning you are unable to add them together, average them out, or sort them in any chronological order. Categorical data is great for grouping individuals or ideas that share similar attributes, helping your machine learning model streamline its data analysis.
This can be further classified as Nominal and Ordinal data.
Nominal data:
The nominal level represents the categories that can not be put in any order. This level represents only the individual category or name. It only represents quality. With a nominal scale, we may identify the difference between two individuals within the variable. It doesn’t provide any idea about the size of the difference. Values have no specific order and can be written in any order. Examples of such data are:
{Male, Female}
{North Zone, South Zone, East Zone, West Zone}
{Maruti, Tata Motors, Mahindra, Toyota, Ford}
Ordinal Data
These are also categorical in nature but the value follows some order. It indicates the categorical variables that can be put in order. With this scale, we can determine the direction of the difference of a variable, but we can not determine the size of the difference.There is a meaning in the order like Very Good would be greater than Good but still these are categorical because we don’t know how many times is Very good greater than Good.
Examples:
Course Rating:
Poor |
Fair |
Good |
Very Good |
Excellent |
Students Grade:
E |
D |
C |
B |
A |
Preferance:
First Choice |
Second Choice |
Third Choice |
We know the order in the Ordinal data like:
Excellent > Good
That is order is maintained no matter what value we assign to them. Categorical data can be summarized in a table that lists individual categories and their respective frequency count, e.g. Frequency Distribution.
One can also use a relative frequency distribution which lists the categories and proportion with which each occurs.
Frequency distribution and relative frequency distribution can also be summarized as Bar chart and Pie chart respectively, something like below: