This program recodes one or more existing numeric variables into a new SDA variable.
If a variable of the same name already exists in the main dataset for the study, the variable cannot be created. Choose another name for your variable.
Variable names:
For example, if you want to recode the variables 'age' and 'sex', you enter the names of those variables into the text box, separated by spaces and/or a comma:
Then you specify 'age' as the input variable:
And then you can specify the recoding rules as:
Value | Label | age |
---|---|---|
1 | Younger | 0-34 |
2 | Middle | 35-64 |
3 | Older | 65-99 |
The "Value" column specifies the category values of the new variable. Each category must be a single numeric value.
The "Label" column specifies the label for each category. This is the text that will be displayed in a table using the new variable. The label is optional, but it is helpful in most cases, especially when there is no obvious ordering of the categories. The input box for labels only displays 16 characters at a time, but you can actually enter up to 250 characters for each label. If you enter a very long description for a category, you may also want to specify an abbreviated label for the category, to be used when running tables. Such optional labels are specified in brackets, after the longer text. For example: Respondent agrees with all the major policies listed [Agrees with all]
The next column(s) specify which values of the input variable(s) should be included in each category of the new variable. The specified codes of the input variable (here 'age') can consist of single code values, ranges, or a combination of many values and/or ranges (separated by commas). The input box for entering these code values and ranges only displays 8 characters at a time, but you can actually enter up to 80 characters in each box. (You can use the arrow keys to scroll back and forth in the box.)
In addition to specifying numeric code values and ranges, you can use a few other special symbols that are useful in applying recode rules:
Symbol | Meaning |
---|---|
* | Matches any VALID value. If used in a range, matches the lowest or highest VALID value. |
** | Matches ANY value, including missing data (both user-defined and system-missing). If used in a range, matches the lowest or highest value, including missing data. |
$. | Matches the system-missing code. Note the period after the dollar sign. |
Note that an asterisk or double asterisk, when NOT used in a range, cannot be combined with other specifications in a recode rule. (All other specifications can be combined in a recode rule, but must be separated by commas.)
For example, to combine 'age' and 'sex' into a new variable named 'agesex' with four categories numbered 1-4, you first specify 'agesex' as the variable to be created:
Then you would specify both 'age' and 'sex' as input variables:
Then (assuming 'sex' is coded 1=Male, 2=Female) you can specify the recoding rule as:
Value | Label | age | sex |
---|---|---|---|
1 | Younger Men | 0-40 | 1 |
2 | Younger Women | 0-40 | 2 |
3 | Older Men | 41-99 | 1 |
4 | Older Women | 41-99 | 2 |
These recoding rules are easily extended to handle more than two input variables. You can also add more rows for recoding rules by clicking on the button labeled 'Add empty row to table'.
Value | Label | age |
---|---|---|
1 | Younger | *-34 |
2 | Middle | 35-64 |
3 | Older | 65-* |
Using this method, all valid age values up through 34 would go into the first recoded group. And all valid age values of 65 or older would go into the third group. All remaining cases with missing data values would be automatically assigned the system missing value in the new variable. If you want to recode the missing data codes (0 and 99) into a numeric category in the new variable, then use the following recode rules:
Value | Label | age |
---|---|---|
1 | Younger | *-34 |
2 | Middle | 35-64 |
3 | Older | 65-* |
9 | Missing data | 0,99 |
Note that this value of 9 on the new variable will be a valid code unless you set 9 to be a missing data code or out of the valid range. (See the help on these options.)
1. Mention the input missing data code explicitly
A character string that has been defined as character missing data on a numeric input variable can be assigned to one of the categories of the recoded variable by specifying that character missing data code in the recoding rules. Similarly, the system missing data code can be recoded by referring to it as `$.' in a recoding rule. (Note the period after the dollar sign.)
In this example, the characters `D' and `R' have been defined as missing data values for the variable `age', to indicate "Don't know" and "Refused." Also, some cases had a blank input field for 'age' in the original data file and were assigned the system missing data code. Those missing data codes in the original variable can be recoded, respectively, into the NUMERIC codes 7, 8, and 9 as follows:
Value | Label | age |
---|---|---|
1 | Younger | *-34 |
2 | Middle | 35-64 |
3 | Older | 65-* |
7 | Don't know | D |
8 | Refused | R |
9 | No data | $. |
2. Use double asterisks (**)
Double asterisks match ANY code, including all missing data codes. They can be used to assign numeric missing data codes, character missing data codes, and the system missing data code to a numeric value on the new variable.
For example, to recode ALL the rest of the codes of the variable `age' (not previously mentioned in a recoding rule) into the category `9' on the new variable, you would specify:
Value | Label | age |
---|---|---|
1 | Younger | *-34 |
2 | Middle | 35-64 |
3 | Older | 65-* |
9 | All missing data | ** |
If a case matches more than one recode rule, the first rule encountered will apply. In this example the last recode rule has '**' for the input variables -- which matches any value. Any cases not covered by a rule higher up in the recode rules will receive the value 9.
Value | Label | var1 | var2 |
---|---|---|---|
1 | Group 1 | 1,3-5,7 | 1-10 |
2 | Group 2 | 8-10,12 | 100 |
2 | 41,45,55 | 51-90 | |
9 | Unassigned | ** | ** |
Here the output code 2 has two rules -- which are listed individually because they cannot be combined into one rule. Note that a label only has to be specified once for an output category, even if that category has multiple recode rules.
This program creates a new SDA variable as a result of a computation based on one or more existing numeric variables.
If a variable of the same name already exists in the location for new variables, it will NOT be replaced, unless the option to replace it is selected. If a variable of the same name already exists in the main dataset for the study, the variable cannot be created. Choose another name for your variable.
Variable names:
Basic expressions on one line look like this:
Only one new variable can be created at a time. And the name of a new variable can only appear once on the left side of an equal sign (except in IF/ELSE IF/ELSE statements).newvar = spend + spend2 + spend3 (or) newvar = sum(spend, spend2, spend3)
Note that the two examples above are NOT equivalent. The 'sum' function treats missing data codes differently than just using '+'. Using '+' will usually generate a missing data code on the new variable for a case unless ALL of the input variables have valid codes. But the 'sum' function can skip over variables with missing data codes and just add up the valid codes on the specified variables.
Descriptions of all of the operators, functions, and options that can be used with the COMPUTE program are given below.
If ANY input variable in a basic expression has a missing data code for a particular case, the output variable being created will generally be assigned a missing data code. By default the case will be assigned the system missing data code. However, if the user has designated some specific value as the missing data code for the output variable, the case will be assigned that value.
This automatic assignment of an output missing data code does not hold if the user does one of the following:
See the documentation on each of those functions or options, to see how they treat missing data in the input variables.
A basic compute expression applies to every case. If you want to treat different cases in different ways, then you can use IF / ELSE IF / ELSE expressions. In these expressions, there is a "condition" (in parentheses) which determines whether the computation should be carried out on a given case. Here is a simple example where the new, computed variable 'newvar' is assigned the value of the variable 'var3', the value of variable 'var4' or the value -1 depending on whether the condition is true for the current case.
IF (var1 eq 1) newvar = var3 ELSE IF (var1 eq 2) [A space after `ELSE' is optional] newvar = var4 ELSE newvar = -1In IF / ELSE IF / ELSE statements:
The condition part of the expression must use one of the following logical operators (which can be in upper or lower case):
OPERATOR EXAMPLES EQ equal to if (x eq y) newvar = 1 NE not equal to if (x ne y) newvar = 1 GT greater than if (x gt y) newvar = 1 GE greater or equal if (x ge y) newvar = 1 LT less than if (x lt y) newvar = 1 LE less or equal if (x le y) newvar = 1 AND both are true if (x gt y AND x gt z) newvar = 1 OR either is true if (x gt y OR x gt z) newvar = 1The 'AND' and 'OR' operators can be used to combine the simple logical operators into 'compound' condition expressions. Here's a fairly complex example:
if ( (MISSING(var1,var2) eq 0) AND (var1 gt 0 OR var2 eq 10))
IF-statements can be nested. In such cases, however, it is necessary to use `ENDIF' to eliminate ambiguity. The following example illustrates how this can be done:
IF ( oldvar1 eq 1 ) IF ( oldvar2 lt 100 ) newvar = 1 ELSEIF ( oldvar3 eq 2 ) newvar = 2 ENDIF ELSE IF ( oldvar4 gt 10 ) newvar = 3 ENDIF ENDIFThere can only be one IF-statement at the top level of the nested expression. The example above has more than one IF-statement, but all except one are nested within the top-level IF/ELSE expression.
Notice how the use of `ENDIF' removes ambiguity about what part goes with what. It is required that `ENDIF' be used for the nested portion of complex IF-statements. The very last `ENDIF', however, could have been omitted.
It is important to understand how missing data values in the 'condition' part of an IF / ELSE IF expression are treated. These expressions are evaluated in order. When one of these expressions returns a missing data value for a particular case, no further IF / ELSE IF expressions are evaluated, and the output variable for that case is assigned the missing data code. For example:
IF ( var1 EQ 1 ) newvar = oldvar * 2 ELSE newvar = 0If 'var1' is a missing data value, then you might expect the IF expression's condition would be evaluated as false and the evaluation would continue to the ELSE expression where newvar would be assigned '0'. However, this is NOT what happens. The condition is evaluated as neither true nor false but 'missing'. And, as stated above, evaluation stops as soon as an expression's condition is evaluated as missing. The output variable 'newvar' is immediately assigned a missing value and evaluation stops.
It is often useful to use the MISSING() function to make the handling of missing data values explicit. For example, if the intent of the compute statement above is to assign 'oldvar * 2' to newvar if var1 equals 1, but assign '0' for any other value (including missing data), then the statement could be re-written:
IF ( MISSING(var1) EQ 1 ) newvar = 0 ELSE IF ( var1 EQ 1 ) newvar = oldvar * 2 ELSE newvar = 0See the Other summaries section for more information about the MISSING function.
Now let's look at compound condition expressions with the AND or OR operators.
The 'truth table' for an AND expression can be summarized as follows:
For example:
IF ( var1 EQ 1 AND var2 EQ 2 ) newvar = oldvar * 2 ELSE newvar = 0
Now let's look at the OR operator:
The 'truth table' for an OR expression can be summarized as follows:
For example:
IF ( var1 EQ 1 OR var2 EQ 2 ) newvar = oldvar * 2 ELSE newvar = 0
Complicated expressions can be specified in steps using temporary variables -- variables with names that begin with '$'. These variables only exist while COMPUTE is running.
Each expression using a temporary variable must be on a separate line, before the final line that gives the name of the new variable to be saved. For example:
$temp1 = var1 + var2 $temp2 = var3 / var4 newvar = $temp1 / $temp2
A temporary variable cannot be used in the test portion of an IF-expression. For example, it is NOT legal to use:
if ($temp1 eq 1) (NOT legal)However, a temporary variable CAN appear on the left hand side of an equal sign within an IF-statement. For example, the following is legal:
if (age lt 40) $temp1 = 1
+ - * / Addition, subtraction, multiplication, division ^ Power -- for example: var1^2 (var1 squared) -var1 Negative of var1 (unary -) ( ) Parentheses are used to alter (or clarify) the usual order of evaluation.Order in which the various operators are applied:
The functions listed below are recognized in compute expressions. The name of each function can be given in either upper or lower case. The arguments `a' or `b' stand for a specific constant (2 or 4.5, for example). The arguments `x' or `y' stand for either an existing SDA variable, a temporary variable, a constant, or another expression.
ABS(x) Absolute value of x EXP(x) Exponential function (antilog), e^x LOG(x) or LN(x) Natural logarithm LG10(x) or LOG10(x) Logarithm - base 10 MOD(x,a) Modulus (remainder) of `x' divided by `a' (e.g., mod(5,2) equals 1) RND(x) or ROUND(x) Round off SQRT(x) Square root TRUNC(x) Truncate; the integer part of x
MEAN.n(x,y,...) Mean of the given variables SUM.n (x,y,...) Sum of the given variables MIN.n (x,y,...) Minimum value of the given variables MAX.n (x,y,...) Maximum value of the given variables
Note that the `.n' part of the function name is optional. If used, it tells the function that at least `n' of the given variables must have valid data for a case; otherwise the function returns the missing data code. The default value for `n' is 1.
For example, `mean(var1,var2,var3)' will generate the mean of the three variables, even if only one of the three has a valid code. The invalid codes are simply ignored. On the other hand `mean.2(var1,var2,var3)' will generate a mean for a specific case only if at least two variables have valid codes on that case.
COUNT(x,y(a-b)) Number of variables with values between a and b (can specify different ranges for each var; missing data or out-of-range codes are not counted unless include-MD option is selected) CUM(x) Cumulate the value of `x' from one case to the next (`x' can be a variable or a constant; if `x' is a missing data value, the cumulation from the previous case is carried over) MISSING (x,y,...) Number of variables with missing data or out-of-range codes
UNIFORM(x,y) Uniform distribution between x and y (x and y can be constants or variables) DUNIFORM(x,y) Discrete uniform distribution between x and y (result is always a whole number) NORMAL(x,y) Normal distribution with mean=x, sd=y
SIN(x), COS (x) Sine and cosine (x is in radians) ARSIN(y) or ARCSIN(y) Arcsine ARTAN(y) or ARCTAN(y) Arctangent
If you select this option to include missing data values in computations, the program will consider numeric missing data values as valid, for purposes of generating the new variable.
For example, the expression 'newvar = 2 * age' would ordinarily result in a missing data value for the new variable if the variable 'age' had the value '99' which was designated as a missing data code (to indicate a refusal). If this option is selected, the new variable in this case would receive a valid value of (2 * 99 =) 198.
Note that this option will not override character missing data values (such as 'D' or 'R'), nor will it override the system missing data code. Such missing data values do not have any numeric value that could be used in a computation.
If you prefer to assign your own missing data code to such cases, select this option, AND ALSO list one or more values as missing data values in the optional specifications for the new variable. Then the cases with no valid output value will be assigned the first missing data code you specified for the new variable.
Intermediate results are never rounded. All calculations are carried out using double-precision numbers. If rounding is requested, only the final result is rounded to the specified number of decimal places.
Value | Label |
---|---|
1 | Lowest |
5 | Middle of the range |
10 | Highest |
If the text for a category is long, you can also assign an abbreviated version that will be used as the category label in crosstabulations and other similar output. Put the desired abbreviation in brackets before or after the long text for a category. For example:
Value | Label |
---|---|
1 | Minimum value expected from the computation[Lowest] |
10 | Highest value expected from the computation [Highest] |
(Note that the text box will only show about 20 characters at a time and will scroll, unlike this example, which shows a larger box for purposes of clarity.)
One reason to specify the seed might be to generate the same series of random numbers on repeated runs for diagnostic or instructional purposes. Note, however, that the same seed might not generate the same random numbers on different platforms (for example, on Linux versus Windows).
If the name of the new variable to be created matches the name of a variable that already exists (in the directory for new variables), that variable can be replaced by the new one, provided that the option to replace that variable is in effect. If the option NOT to replace the variable is in effect, the program will send a message that the variable already exists and that you should select the 'Replace' option if you want to overwrite it. If you are creating variables in a public workspace (shared with other users) please be kind: replace a variable only if you created it.
For example: 8, 9, 91-99
If you specify one or more missing data codes, the first such code specified can be used to assign a value on the new variable for those cases which do not have a valid outcome code. Cases having a missing data code on the new variable are ordinarily excluded from analyses involving that variable.
The rules used by RECODE or COMPUTE to create the new variable are also included in the descriptive text for the variable.
The variable list includes the following features: