Introduction to nominal data
A variable used to associate each data point in a set of observations, or in a particular instance, to a certain qualitative category is a categorical variable. Categorical variables have two types of scales, ordinal and nominal. The first type of categorical scale is dependent on natural ordering, levels that are defined by a sense of quality. Variables with this ordering convention are known as ordinal variables. In comparison, variables with unordered scales are nominal variables.
A nominal variable, or nominal group, is a group of objects or ideas collectively grouped by a particular qualitative characteristic. Nominal variables do not have a natural order, which means that statistical analyses of these variables will always produce the same results, regardless of the order in which the data is presented.
Even though ordinal variable statistical methods cannot be used for nominal groups, nominal group methods can be used for both types of categorical data sets; however, nominally categorizing ordinal data will remove order, limiting further dataset analysis to result in nominal outcomes.
Valid performable operations on nominal data
Since a nominal group consists of data that is either identified as a member or non-member, each individual data point carries no additional significance beyond group identification. Additionally, data identification justifies whether it is necessary to form new nominal groups based on the information available. Because nominal categories cannot be numerically organized or ranked, members associated with a nominal group cannot be placed in an ordinal or ratio form.
Nominal data is often compared to ordinal and ratio data to determine if individual data points influence the behavior of quantitatively driven datasets. For example, the effect of race (nominal) on income (ratio) could be investigated by regressing the level of income upon one or more dummy variables that specify race. When nominal variables are used in these contexts, the valid data operations that may be performed are limited. While arithmetic operations and calculations measuring the central tendency of data (quantitative assignments of data analysis, including mean, median) cannot be performed on nominal categories, performable data operations include the comparison of frequencies and the frequency distribution, the determination of a mode, the creation of pivot tables, and uses of Chi-square goodness of fit and independence tests, coding and recoding, and logistic or probit regressions.
Examples and logical analysis of nominal data
As ‘nominal’ suggests, nominal groups are based on the name of the data it encapsulates. For example, citizenship is a nominal group. A person can either be a citizen of a country or not. With this, a citizen of Canada does not have “more citizenship” than another citizen of Canada; therefore, it is impossible to order citizenship by any mathematical logic.
Another example of name categorization would be identifying "words that start with the letter 'a'". There are thousands of words that start with the letter 'a' but none have "more" of this nominal quality than others, meaning that the word starting with the letter ‘a’ is more important than determining the number of ‘a’s as the first letters of an instance because this is associated with membership rather than quantifying the data as an ordinal group.
With this, the correlation of two nominal categories is difficult because some relationships that occur are spurious, where two or more variables are incorrectly assumed to correlate with one another. Data compared within categories may also be unimportant. For example, figuring out whether proportionally more Canadians have first names starting with the letter 'a' than non-Canadians would be a fairly arbitrary, random exercise. However, the use of comparing nominal data with a frequency distribution to associate gender and political affiliation would be more effective since a correlation between the counts of a particular party affiliation would compare to the number of male and or female voters accounted in a dataset.
From a quantitative analysis perspective, one of the most common operations to perform on nominal data is dummy variable assignment, a method earlier introduced. For example, if a nominal variable has three categories (A, B, and C), two dummy variables would be created (for A and B) where C is the reference category, the nominal variable that serves as a baseline for variable comparison. Another example of this is the use of indicator variable coding that assigns a numerical value of 0 or 1 to each data point in a set. This method identifies whether individual observations belong to a particular group (set to one) or not (set to zero). This numerical association allows for more flexibility in nominal data analysis as it captures differences not only between distinct nominal groups, but also the differences present among data within a set, determining the interactions between nominal variables and other variables in a systematic context.
References
- Agresti, Alan (2007). An Introduction to categorical data analysis. Wiley series in probability and statistics (2nd ed.). Hoboken (N.J.): Wiley-Interscience. ISBN 978-0-471-22618-5.
- Dahouda, Mwamba Kasongo; Joe, Inwhee (2021). "A Deep-Learned Embedding Technique for Categorical Features Encoding". IEEE Access. 9: 114381–114391. Bibcode:2021IEEEA...9k4381D. doi:10.1109/ACCESS.2021.3104357. ISSN 2169-3536.
- Rugg, Gordon; Petre, Marian (2006), A Gentle Guide To Research Methods, McGraw-Hill International, ISBN 9780335219278.
- T.Reynolds, H. (1984). Analysis of Nominal Data. SAGE Publications, Inc. doi:10.4135/9781412983303. ISBN 978-1-4129-8330-3.
- Reid, Howard M. (2014). Introduction to statistics: fundamental concepts and procedures of data analysis. Los Angeles: SAGE. ISBN 978-1-4522-7196-5.
- Ryan, Thomas P. (2009). Solutions manual to accompany modern regression methods. Wiley series in probability and statistics (2nd ed.). Hoboken, N.J: Wiley. ISBN 978-0-470-08186-0.
Introduction to nominal dataA variable used to associate each data point in a set of observations or in a particular instance to a certain qualitative category is a categorical variable Categorical variables have two types of scales ordinal and nominal The first type of categorical scale is dependent on natural ordering levels that are defined by a sense of quality Variables with this ordering convention are known as ordinal variables In comparison variables with unordered scales are nominal variables Visual difference between nominal and ordinal data w examples the two scales of categorical data A nominal variable or nominal group is a group of objects or ideas collectively grouped by a particular qualitative characteristic Nominal variables do not have a natural order which means that statistical analyses of these variables will always produce the same results regardless of the order in which the data is presented Even though ordinal variable statistical methods cannot be used for nominal groups nominal group methods can be used for both types of categorical data sets however nominally categorizing ordinal data will remove order limiting further dataset analysis to result in nominal outcomes Valid performable operations on nominal dataSince a nominal group consists of data that is either identified as a member or non member each individual data point carries no additional significance beyond group identification Additionally data identification justifies whether it is necessary to form new nominal groups based on the information available Because nominal categories cannot be numerically organized or ranked members associated with a nominal group cannot be placed in an ordinal or ratio form Nominal data is often compared to ordinal and ratio data to determine if individual data points influence the behavior of quantitatively driven datasets For example the effect of race nominal on income ratio could be investigated by regressing the level of income upon one or more dummy variables that specify race When nominal variables are used in these contexts the valid data operations that may be performed are limited While arithmetic operations and calculations measuring the central tendency of data quantitative assignments of data analysis including mean median cannot be performed on nominal categories performable data operations include the comparison of frequencies and the frequency distribution the determination of a mode the creation of pivot tables and uses of Chi square goodness of fit and independence tests coding and recoding and logistic or probit regressions Collection and description of nominal data from frequency distribution to bar charts using qualitative information such as computer brand ownedExamples and logical analysis of nominal dataAs nominal suggests nominal groups are based on the name of the data it encapsulates For example citizenship is a nominal group A person can either be a citizen of a country or not With this a citizen of Canada does not have more citizenship than another citizen of Canada therefore it is impossible to order citizenship by any mathematical logic Another example of name categorization would be identifying words that start with the letter a There are thousands of words that start with the letter a but none have more of this nominal quality than others meaning that the word starting with the letter a is more important than determining the number of a s as the first letters of an instance because this is associated with membership rather than quantifying the data as an ordinal group With this the correlation of two nominal categories is difficult because some relationships that occur are spurious where two or more variables are incorrectly assumed to correlate with one another Data compared within categories may also be unimportant For example figuring out whether proportionally more Canadians have first names starting with the letter a than non Canadians would be a fairly arbitrary random exercise However the use of comparing nominal data with a frequency distribution to associate gender and political affiliation would be more effective since a correlation between the counts of a particular party affiliation would compare to the number of male and or female voters accounted in a dataset From a quantitative analysis perspective one of the most common operations to perform on nominal data is dummy variable assignment a method earlier introduced For example if a nominal variable has three categories A B and C two dummy variables would be created for A and B where C is the reference category the nominal variable that serves as a baseline for variable comparison Another example of this is the use of indicator variable coding that assigns a numerical value of 0 or 1 to each data point in a set This method identifies whether individual observations belong to a particular group set to one or not set to zero This numerical association allows for more flexibility in nominal data analysis as it captures differences not only between distinct nominal groups but also the differences present among data within a set determining the interactions between nominal variables and other variables in a systematic context ReferencesAgresti Alan 2007 An Introduction to categorical data analysis Wiley series in probability and statistics 2nd ed Hoboken N J Wiley Interscience ISBN 978 0 471 22618 5 Dahouda Mwamba Kasongo Joe Inwhee 2021 A Deep Learned Embedding Technique for Categorical Features Encoding IEEE Access 9 114381 114391 Bibcode 2021IEEEA 9k4381D doi 10 1109 ACCESS 2021 3104357 ISSN 2169 3536 Rugg Gordon Petre Marian 2006 A Gentle Guide To Research Methods McGraw Hill International ISBN 9780335219278 T Reynolds H 1984 Analysis of Nominal Data SAGE Publications Inc doi 10 4135 9781412983303 ISBN 978 1 4129 8330 3 Reid Howard M 2014 Introduction to statistics fundamental concepts and procedures of data analysis Los Angeles SAGE ISBN 978 1 4522 7196 5 Ryan Thomas P 2009 Solutions manual to accompany modern regression methods Wiley series in probability and statistics 2nd ed Hoboken N J Wiley ISBN 978 0 470 08186 0