Descriptive Statistics: of the Art of Exploring and Other Issues Part III

In the last two posts we have talked about the types of variables (quantitative vs. qualitative) and the measurement levels (nominal, ordinal, interval and ratio), besides, we left a practical exercise. Today we are going to present the solution, and we will start talking about the most used descriptive measures for qualitative variables.

NameType of variableLevel of measurement
Suppliers nameQualitativeNominal
Corporate namesQualitativeNominal
Company’s NitQualitativeNominal
Date on wich business relationship beganQuantitativeInterval
Mobile phone numberQualitativeNominal
Location addressQualitativeNominal
Means of paymentQualitativeNominal
Supplier StatusQualitativeNominal
Number of purchase transactionsQuantitativeRatio
Amount of each transaction (pesos)QuantitativeRatio
Amount paid in each transaction (pesos)QuantitativeRatio
Amount owed in each transaction (pesos)QuantitativeRatio
Invoice numberQualitativeOrdinal
Discount (percentage)QuantitativeRatio
VAT (pesos)QuantitativeRatio
Calification the supplier serviceQualitativeOrdinal

Descriptive Measures for Qualitative Variables

To tell you about the type of tables and graphs that we use with qualitative variables, I am going to support wit an example that uses a database that has information on customers who have credit cards. These data were used in Yeh, I.C., & Lien, C.H. (2009). The comparisons of data mining techniques for the predictive accuracy of the probability of default of credit card clients. Expert Systems with Applications, 36 (2), 2473-2480. This data set has 30,000 observations and 25 variables:

  1. Credit value (dollars): the value is given in credit, includes the individual credit and the one given to the family
  2. Gender: (1 = male; 2 = female).
  3. Educational level (1 = primary, 2 = high school, 3 = university, 4 = none).
  4. Marital status: (1 = married, 2 = single, 3 = other).
  5. Age: years
  6. Variable sixth to eleven corresponds history of past payment as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. .
  7. Variable 12 to 17: Amount of bill statement (dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
  8. Variable 18 to 23 corresponds to the amount of previous payment (dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
  9. Variable 18 a 23 corresponde al valor de la facture pagado, X18 corresponde al valor de septiembre de 2005, y así sucesivamente hasta llegar abril de 2005 en la variable 23.

Categorice las variables anteriores en los tipos de variables y los niveles de medición que hemos visto, la respuesta en el siguiente párrafo.

We will work with the educational level (ordinal), civil status (nominal), payment status of the invoice (ordinal) and gender (nominal) variable since today topic is qualitative variables.The descriptive statistics we frecuently use with this type of variable are frequency tables. These show us the categories of the variable of interest and how many observations there are in each category and that is what we call absolute frequency. We can also calculate the relative frecuency that indicates the total number of observations that corresponds to each group of the variable of interest. If we look at the gender frequency table, we find that 60% of the customers of the credit card are women and the rest are men (40%).

The results of the frequency table of the Civil Status can be interpreted as follows: “We found that 5 out of 10 clients are single, 4 out of 10 clients are married, and 1 out of 10 answered another category”. Why the category “does not respond” does not appear ? Because I am taking the proportions to a scale of 10 and the portion of 0.02 is imperceptible.

Something more interesting may be to make a cross-tab that allows us to see how the variable gender relates to the variable civil status. We are going to present the crossed table in three ways to show you the difference. The first table only contains the frequencies in each category. For example, 14 men did not respond to marital status. 9411 women are single. The second table shows the percentage of men and women in each response category of the variable of marital status, that is why we observe that the total of each row is 100%. For example, we can say that 26% of the people who did not respond to the variable of marital status (54) are men and the rest are women (74%). The third table shows the percentage in each civil state from each category of the gender variable. For example, in the case of women, we can observed that 47% are married, 52% are single, and 1% have another type of civil status.

Tables are an excellent way to summarize information, but we can not ignore the graphics. In fact, the visualizations we make of the data are vital. 90% of the information absorbed by our brain is visual, our brain processes 60000 times faster visualizations than texts, so much so that 2/3 of the electrical impulses of our mind originate in response to visual information (Olivares, 2013). This topic deserves its post so we will continue with it next week and then we will return to the issue of descriptive statistics of quantitative variables.

References

  1. Olivero, Ernesto (2013). We are 90% visual beings. Disponible en https://ernestoolivares.com/we-are-90-visuals-beings

Leave a comment

en_USEnglish