next up previous contents
Next: Discussion Up: Entropy Previous: Example   Contents

Recoding variables

There are times when recoding variables becomes necessary, as was previously shown in chapter 2 for building configurations. For example we can easily recode the 6-point scale stress variable into a 3-point scale and a 2-point scale variable.

Let's do the trichotomisation#3618#> of the variable (named STRE3) following this scheme:


Full scale Trichotomized scale
$ 1 \leq STRE \leq 2$ 1
$ 3 \leq STRE \leq 4$ 2
$ 5 \leq STRE \leq 6$ 3

and its dichotomisation#3619#> into BSTRE was performed following this scheme:


Full scale Dichotomized scale
$ 1 \leq STRE \leq 3$ 0
$ 4 \leq STRE \leq 6$ 1

The distribution of these variables is given in figure 6.3. Levels 1 and 2 of the trichotomous stress were encountered 120 and 119 times (for both a corresponding probability of 0.46), while level 3 was encountered 22 times (for a probability of 0.08). Regarding the dichotomous stress variable, the low stress level BSTRE=0 was seen 204 times, accounting for 78% of all states, leaving 22% for the higher stress level.

Figure 6.3: Distribution of trichotomous and dichotomous STRESS variables
\includegraphics [width=6cm]{fa7-stre3-distr.eps} \includegraphics [width=6cm]{fa7-bstre-distr.eps}

From these frequencies and probabilities one easily computes the entropy of the new variables. $ H(STRE3) = 1.33$ bits and $ H(BSTRE) = 0.76$ bits; table 6.2 summarizes the results. For comparing entropies of variables of different scales it is somewhat preferable to measure the amount of standardized entropy . It is simply calculated by dividing the entropy with its maximum entropy, $ \log_2(m)$. The maximum entropy for a 2, 3 and 6 scale variable is $ \log_2(2)=1$, $ \log_2(3)=1.58$ and $ \log_2(6)=2.58$ respectively. It is no coincidence that $ \log_2(6) = \log_2(2) +
\log_2(3)$, since $ \log(a*b) = \log(a) + \log(b)$. Standardized entropy ranges from 0 to 1 (or 0% to 100%).


Table 6.2: Entropy for three STRESS variables
Variable Entropy Standardized Entropy
STRE (full scale) 2.20 0.85
STRE3 (trichotomous) 1.33 0.84
BSTRE (dichotomous) 0.76 0.76


Consequently the standardized entropy for the stress variable using a 2-, 3- and 6-point scale is 0.76 (0.76/1), 0.84 (1.33/1.58) and 0.85 (2.20/2.58). It thus implies that although the 3-point scale employs half the number of categories of the original scale, the amount of disorder (or uncertainty) is almost identical. This suggests that there are categories in the 6-point scale that are not much frequent, and that recoding into a 3-point scale would not drastically reduce the amount of information. This avenue is numerically explored in section 6.2.


next up previous contents
Next: Discussion Up: Entropy Previous: Example   Contents
Philippe Lemay
1999-09-14