This program recodes one or more existing numeric variables into a new SDA variable.
For example, if you want to recode the variables 'age' and 'sex', you enter the names of those variables into the first two text boxes:
| Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 | ||
|---|---|---|---|---|---|---|---|
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
The specified codes on the input variable (here 'age') can consist of single code values, ranges, or a combination of many values and/or ranges (separated by commas). The input box for entering these code values and ranges only displays 8 characters at a time, but you can actually enter up to 80 characters in each box. (You can use the arrow keys to scroll back and forth in the box.)
The recoding rules ignore the missing-data status of NUMERIC codes on the input variable, if they are mentioned explicitly or in a range. For instance, if the value 90 for 'age' were flagged as a missing-data code, but included in the range 65-95 as in the example above, it would be recoded into the value 3 on the new variable. (There is additional help on the treatment of numeric missing-data codes.)
Any categories of the input variable not included in the recoding rules will generally become missing-data on the new variable, and they will ordinarily be excluded from analyses of the new variable. For example, if some cases in the recode above had codes of 96 or 97, they would be recoded into a missing-data category on the new variable. You can specify what that missing-data category should be. (See the help on that topic.)
For example, to combine 'age' and 'sex' into four categories numbered 1-4, you would first specify 'age' as input variable #1, and 'sex' as input variable #2:
| Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 | ||
|---|---|---|---|---|---|---|---|
Keeping in mind that 'sex' is coded 1 for men and 2 for women, you could then specify the recoding rules as:
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
If you enter a very long description for a category, you may also want to specify an abbreviated label for the category, to be used when running tables. Such optional labels are specified in brackets, after the longer text. For example:
Respondent agrees with all the major policies listed [Agrees with all]
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
Using this method, all valid age values up through 34 would go into the first recoded group. And all valid age values of 65 or older would go into the third group. (Recoding missing-data values is discussed next.)
1. Mention the input missing-data code explicitly as a single value.
For example, if the original 'age' value of 99 was defined as a missing-data code, it can be assigned to a new category of 9 on the new variable as follows:
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
2. Include the input missing-data code as part of a range.
For example, the cases with the original value of 99 on age (whether or not 99 was defined as a missing-data value) can be recoded into category 3 of the new variable by specifying a range that includes that value:
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
3. Use an open range with TWO asterisks (**) instead of one.
For example, the following specification will recode all numeric codes 65 and over into category 3 of the new variable (whether or not the codes had been defined as missing-data codes).
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
1. Mention the input missing-data code explicitly
A character code that has been defined as missing data on an original NUMERIC variable can be assigned to one of the categories of the recoded variable by specifying that character in the recoding rules. Similarly, the system missing-data code can be recoded by referring to it as `$.' in a recoding rule. (Note the period after the dollar sign.)
For example, the characters `D' and `R' may have been defined as missing-data values for the variable `age', to indicate "Don't know" and "Refused." Also, some cases may have had a blank input field for 'age' in the original data file and were assigned the system missing-data code. Those missing-data codes in the original variable can be recoded, respectively, into the NUMERIC codes 7, 8, and 9 as follows:
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
2. Use double asterisks (**)
Double asterisks match ANY code, including all missing-data codes. They can be used to assign character missing-data codes and the system missing-data code to a numeric value on the new variable.
For example, to recode ALL the rest of the codes of the variable `age' (not previously mentioned in a recoding rule) into the category `9' on the new variable, you would specify:
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
Note that it is only possible to recode character missing-data codes into a numeric code. It is not possible to recode anything INTO a character value. Also, it is not currently possible to recode a CHARACTER variable (which is different from a NUMERIC variable with one or more character values defined as missing-data codes).
However, it is possible to recode anything into the system missing-data code. Any value that does not match a recode rule will be converted into the system missing-data code, unless a user-specified missing-data value was supplied.
For example, in the following specification age 35 will be recoded into the first category, and not the second, because the first match is the one that counts. Similarly age 65 will be recoded into the second category, and not the third.
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
Notice that order is important with overlapping ranges. The following specification will NOT have the same effect as the preceding one:
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
In this example, age 65 will be assigned a value of 3 on the new variable (instead of a value of 2 as in the previous example), and age 35 will be assigned a value of 2 (instead of 1).
For example, to have age recoded into two categories, with category 1 including everyone EXCEPT those aged 35-64, you could use the following recoding rules:
| OUTPUT Variable | VALUES of the INPUT Variables | ||||||
|---|---|---|---|---|---|---|---|
| Value | Label | Var 1 | Var 2 | Var 3 | Var 4 | Var 5 | Var 6 |
This program creates a new SDA variable as a result of a computation based on one or more existing numeric variables.
Steps to take
A simple example: newvar = 2 * oldvar
EXPRESSION to define the new variable
The expression is of the general form:
newvar = expressionThe name of one (and only one) new variable must appear on the left-hand side of the equal sign. If a variable of the same name already exists (in the location for new variables), it will NOT be replaced, unless the option to replace it is selected.
Only one new variable can be created at a time. And the name of a new variable can only appear once on the left side of an equal sign (except in an IF-statement).newvar = spend + spend2 + spend3 (or) newvar = sum(spend, spend2, spend3)
Note that the two examples above are NOT equivalent. The 'sum' function treats missing-data codes differently than just using '+'. Using '+' will usually generate a missing-data code on the new variable for a case unless ALL of the input variables have valid codes. But the 'sum' function can skip over variables with missing-data codes and just add up the valid codes on the specified variables.
Descriptions of all of the operators, functions, and options that can be used with the COMPUTE program are given next.
This automatic assignment of an output missing-data code does not hold if the user does one of the following:
See the documentation on each of those functions or options, to see how they treat missing-data in the input variables.
IF-statements, ELSEIF-statements, and ELSE-statements are evaluated in order. When one of those statements returns a missing-data value for a particular case, no further IF/ELSEIF/ELSE statements are evaluated, and the output variable for that case is assigned the missing-data code.
if (var1 eq 1) newvar = var3 else if (var1 eq 2) [A space after `else' is optional] newvar = var4 else newvar = -1
The `ELSE IF' part can be repeated; `ELSE' can be used only once; both parts are optional.
The expressions `IF', `ELSE IF', and `ELSE' should begin on a new line. Note that either upper or lower case can be used for `IF' and `ELSE'.
If no `ELSE' part is used, it is possible that some cases will not meet any of the conditions; the new variable will then be set to the specified missing data code for those cases.
There is an implied `ENDIF' at the end of the expression. The use of `ENDIF' is optional unless there are nested IF-statements (as shown below).
OPERATOR EXAMPLES EQ equal to if (x eq y) newvar = 1 NE not equal to if (x ne y) newvar = 1 GT greater than if (x gt y) newvar = 1 GE greater or equal if (x ge y) newvar = 1 LT less than if (x lt y) newvar = 1 LE less or equal if (x le y) newvar = 1 AND both are true if (x gt y AND x gt z) newvar = 1 OR either is true if (x gt y OR x gt z) newvar = 1 These operators can be in upper or lower case.
IF ( oldvar1 eq 1 )
IF ( oldvar2 lt 100 )
newvar = 1
ELSEIF ( oldvar3 eq 2 )
newvar = 2
ENDIF
ELSE
IF ( oldvar4 gt 10 )
newvar = 3
ENDIF
ENDIF
There can only be one IF-statement at the top level of the
nested expression.
The example above has more than one IF-statement, but
all except one are nested within the top-level
IF/ELSE expression.
Notice how the use of `ENDIF' removes ambiguity about what part goes with what. It is required that `ENDIF' be used for the nested portion of complex IF-statements. The very last `ENDIF', however, could have been omitted.
$temp1 = var1 + var2 $temp2 = var3 / var4 newvar = $temp1 / $temp2
Variables with names that begin with `$' only exist while COMPUTE is running. They are not available for analysis after COMPUTE is finished creating the new variable.
A temporary variable cannot be used in the test portion of an IF-expression. For example, it is NOT legal to use:
if ($temp1 eq 1) (NOT legal)However, a temporary variable CAN appear on the left hand side of an equal sign within an IF-statement. For example, the following is legal:
if (age lt 40) $temp1 = 1
+ - * / Addition, subtraction, multiplication, division
^ Power -- for example: var1^2 (var1 squared)
-var1 Negative of var1 (unary -)
( ) Parentheses are used to alter (or clarify) the
usual order of evaluation.
Order in which the various operators are applied:
ABS(x) Absolute value of x
EXP(x) Exponential function (antilog), e^x
LOG(x) or LN(x) Natural logarithm
LG10(x) or LOG10(x) Logarithm - base 10
MOD(x,a) Modulus (remainder) of `x' divided by `a'
(e.g., mod(5,2) equals 1)
RND(x) or ROUND(x) Round off
SQRT(x) Square root
TRUNC(x) Truncate; the integer part of x
MEAN.n(x,y,...) Mean of the given variables SUM.n (x,y,...) Sum of the given variables MIN.n (x,y,...) Minimum value of the given variables MAX.n (x,y,...) Maximum value of the given variables
Note that the `.n' part of the function name is optional. If used, it tells the function that at least `n' of the given variables must have valid data for a case; otherwise the function returns the missing data code. The default value for `n' is 1.
For example, `mean(var1,var2,var3)' will generate the mean of the three variables, even if only one of the three has a valid code. On the other hand `mean.2(var1,var2,var3)' will generate a mean for a specific case only if at least two variables have valid codes on that case.
COUNT(x,y(a-b)) Number of variables with values between a and b
(can specify different ranges for each var;
missing data or out-of-range codes are not
counted unless include-MD option is selected)
CUM(x) Cumulate the value of `x' from one case to
the next (`x' can be a variable or a constant;
if `x' is a missing-data value, the cumulation
from the previous case is carried over)
MISSING (x,y,...) Number of variables with missing data or
out-of-range codes
UNIFORM(x,y) Uniform distribution between x and y
(x and y can be constants or variables)
DUNIFORM(x,y) Discrete uniform distribution between x and y
(result is always a whole number)
NORMAL(x,y) Normal distribution with mean=x, sd=y
SIN(x), COS (x) Sine and cosine (x is in radians) ARSIN(y) or ARCSIN(y) Arcsine ARTAN(y) or ARCTAN(y) Arctangent
If you select this option to include missing-data values in computations, the program will consider numeric missing-data values as valid, for purposes of generating the new variable.
For example, the expression 'newvar = 2 * age' would ordinarily result in a missing-data value for the new variable if the variable 'age' had the value '99' which was designated as a missing-data code (to indicate a refusal). If this option is selected, the new variable in this case would receive a valid value of (2 * 99 =) 198.
Note that this option will not override character missing-data values (such as 'D' or 'R'), nor will it override the system missing-data code. Such missing-data values do not have any numeric value that could be used in a computation.
If you prefer to assign your own missing-data code to such cases, select this option, AND ALSO list one or more values as missing-data values in the optional specifications for the new variable. Then the cases with no valid output value will be assigned the first missing-data code you specified for the new variable.
Intermediate results are never rounded. All calculations are carried out using double-precision numbers. If rounding is requested, only the final result is rounded to the specified number of decimal places.
1 Lowest 5 Middle of the range 10 Highest
If the text for a category is long, you can also assign an abbreviated version that will be used as the category label in crosstabulations and other similar output. Put the desired abbreviation in brackets before or after the long text for a category. For example:
1 Minimum value expected from the computation [Lowest] 10 Maximum value expected from the computation [Highest]
One reason to specify the seed might be to generate the same series of random numbers on repeated runs for diagnostic or instructional purposes.
Additional Details for Archivists
The location for new variables is specified either in the hypertext archive
specification file (HARC file)
or as an argument to the HSDA program.
If new variables are to be placed in
the SAME directory as the original variables,
the creation of a new variable named 'age', for example,
would overwrite the original variable of the same name.
On the other hand, if the HARC file specifies that new variables
are to be placed in a DIFFERENT directory
(the usual recommended procedure),
the creation of a new variable named 'myrecode', for instance,
would NOT overwrite a variable of the same name in the original
study directory;
however, it would overwrite a variable of the same name in the
directory specified for new variables.
Those programs search for variables in the study directories
specified in the HARC file for each study
in the order given in the HARC file.
If a new variable (located in a separate directory)
has the same name as an original variable,
the analysis programs will locate only the version contained
in the first study directory given in the HARC file.
If the HARC file lists the original study directory first,
the programs would only find the first version of the variable
and not bother to look further.
This means that if you intend to create new versions of original
variables, and retain the same names,
make sure that the HARC file is set up in such a way that the
analysis programs look for variables in the SDA dataset directory
for new variables before looking in the SDA dataset directory for
the original variables.
Note that this option for creating and accessing new variables with
the same names as variables in the main archive dataset is available
only if the location of the SDA dataset for new variables is specified
in the HARC file (using the 'OUTSTUDY=' specification).
If the location for new variables is specified as an argument to
the HSDA program, the dataset for new variables will be searched last,
after searching the main archive dataset.
Consequently, the RECODE and COMPUTE programs will not create new
variables with the same names as variables in the main archive dataset,
since they would be inaccessible.
For example: 8, 9, 91-99
If you specify one or more missing-data codes, the first such code specified can be used to assign a value on the new variable for those cases which do not have a valid outcome code. Cases having a missing-data code on the new variable are ordinarily excluded from analyses involving that variable.
The rules used by RECODE or COMPUTE to create the new variable are also included in the descriptive text for the variable.