SDA 3.4 Documentation for COMPUTE


NAME

compute - compute a new variable

USAGE

compute -b filename

DESCRIPTION

COMPUTE creates a new SDA variable by performing calculations on existing variables or by generating random distributions. Users who run this program interactively should see the online help document.

A change to note in version 2.1 is that the user no longer needs to specify the number of decimals to store in the new variable. If the ‘decimals=’ keyword is missing, the new variable is stored with as many decimals as necessary.

To run this program in batch mode it is necessary to prepare a command file, which specifies how the new variable is to be created and the options to use. The name of this batch command file is specified to the program after the ‘-b’ option flag. This document explains how to prepare such a file.

BATCH FILE LAYOUT

The batch file is laid out in separate parts. The algebraic expression that defines the new variable must be the first part, followed by an asterisk (*) on a line by itself. The elements of the other parts can be given in any order. The category labels and the descriptive text can have varying numbers of lines, and each of those parts ends with an asterisk (*) on a line by itself.

The general layout is as follows:

     (Algebraic expression)

     *

     (Input and output specifications)

     CATEGORIES=      [optional]
     (Category text and labels)
     *

     TEXT=            [optional]
     (Descriptive text for the new variable)
     *

THE EXPRESSION

The name and the numeric values of the new variable are defined by an algebraic expression. It is important to understand that a new variable is defined by appearing on the left side of an equal sign, and a new variable can appear only once in that position (except in ‘if’ statements).

A complete list of operators and functions that can be used in expressions is given later in this document. A few examples of expressions are given next in this document. However, many more examples can be found in the section on ’expressions’ in the online help document.

Simple expressions on one line

newvar = var1 + var2 newvar = sqrt(var1) newvar = mean.2(var1,var2,var3) if (var1 eq 1) newvar = var2

Expressions with If / Else if / Else

if (var1 eq 1) newvar = var3 else if (var1 eq 2) [A space after ‘else’ is optional] newvar = var4 else newvar = -1 endif
The ‘ELSE IF’ part can be repeated; ‘ELSE’ can be used only once; both parts are optional.

The words ‘IF’, ‘ELSE IF’, ‘ELSE’, and ‘ENDIF’ should begin on a new line. Note that they can be either in upper or in lower case.

If no ‘ELSE’ part is used, it is possible that some cases will not meet any of the conditions; the new variable will then be set to the specified missing data code for those cases.

There is an implied ‘ENDIF’ at the end of the entire expression. Therefore, the use of ‘ENDIF’ is optional unless there are nested IF-statements.

Expressions with Temporary Variables

Complicated expressions can be specified in steps using temporary variables -- variables with names that begin with ‘$’. These variables only exist while COMPUTE is running.

Each expression using a temporary variable must be on a separate line, before the final line that gives the name of the new variable to be saved. Temporary variables can only be used in assignment statements. They cannot be used in the test portion of an IF-statement.

The following is an example of the use of temporary variables.

$temp1 = var1 + var2 $temp2 = var1 * var2 newvar = $temp1 / $temp2


KEYWORDS FOR COMPUTE SPECIFICATIONS

The specifications other than the expression are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. The valid keywords are as follows (with significant characters shown in capital letters):

Keywords Defining Input Variables and Computations (all optional)


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


STudies=      path of source dataset(s)       Look for input variables
                                                only in current directory

MISSing=      Valid                           Exclude input missing data,
                                                or out of range values

SEED=         seed for random numbers         Use system clock and
                                                process ID.

Keywords Defining the New Variable (all are optional)


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


OUTSTudy=     path of study for new variable  Current directory

DECimals=     maximum number of decimals      Indefinite number of
                to store                        decimal places

CATlabels=    (precedes lines of category     No category text
                labels - see details below)     or labels

LABEL=        long label for new variable     No long label

OVERwrite=    Yes                             Do not overwrite new var
                                                if it already exists

TEXT=         (precedes lines of descriptive  No descriptive text
                text -- see details below)

MD=           list of invalid codes, ranges   No defined MD codes
              (also used for output value
               if input vars have MD
               -- see below)

MIN=          minimum valid value             No defined minimum

MAX=          maximum valid value             No defined maximum

Other options


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

DIAGnostics=  yes                             No diagnostic summary of
                                                the new variable

COLorcoding=  yes                             No colored headings in the
                                                diagnostic output

GVARCase=     LOWER or UPPER                  Do not convert all variable
                                                names to lower/upper case

LAnguagefile= Name of file with non-English   English labels on
                labels and messages             output

SAVebatch=    name of directory               No file preserved with batch
                                                commands to create new var
                                                (for interactive version)
                                                The batch file name is the
                                                name of the new variable,
                                                with the suffix ’.cmp’


ABBREVIATIONS AND REPETITIONS

Most keywords can be abbreviated. Usually only two or three characters are required. Either upper or lower case may be used. The keyword for the category text for the new variable, for instance, can be given as ‘catlabels=’ or ‘CATegories=’ or ‘cat=’. If keywords are repeated, the second specification will override the first.

COMMENTS

Anything on a line beginning with "#" is ignored by the batch processor and can therefore be used for comments. Blank lines are also ignored, except as a part of descriptive text.

CATEGORY TEXT AND LABELS

Category text and labels for one or more codes of the new variable can be supplied. First put the ‘CATlabels=’ keyword on a line by itself; then specify on a separate line each code, followed by one or more spaces or tabs, then the category text [and short label, if desired]. (Programs such as TABLES and MEANS will use the short label for a category, if one is available.) Put an asterisk (*) on a line by itself after the last label. For example:

     catlabels=
      0 Lowest value [Low]
      5 Medium
     10 Highest value [High]
     *

DESCRIPTIVE TEXT

Descriptive text may be stored with the new variable. This text can then be displayed when the variable is used in analysis programs or in a codebook. First put the ‘TEXT=’ keyword on a line by itself; then write as many lines of text as you wish to store with the new variable. Put an asterisk (*) on a line by itself after the last line of text.

MISSING DATA ON THE NEW VARIABLE

If the value of the new variable cannot be computed for a case (usually because the input variables have missing data or because there is no ‘else’ after an ‘if’ statement), the output variable will take on one of two values, depending on the options that were specified:

MULTIPLE COMPUTES

COMPUTE commands to create more than one variable can be included in the same batch file. After the first set of commands (expression plus other specifications), put a line beginning with two asterisks (**); then the commands for another new variable can follow. The values of the ‘STudies=’ and ‘OUTSTudy=’ keywords are carried over from the previous set of commands, unless they are respecified.

BACKWARD COMPATIBILITY

COMPUTE can read most older CSA compute commands. The following keywords are still recognized and are equivalent to the new keywords shown in parentheses:

OPERATORS USED IN THE EXPRESSION

Arithmetic operators

+ - * /
Addition, subtraction, multiplication, division

^
Power
for example: newvar = var1^2
(‘newvar’ is the square of ‘var1’)

-
Unary ‘-’ (negative of a variable or expression)
for example: newvar = -var1
(‘newvar’ is the opposite sign of ‘var1’)

()
Parentheses are used to alter (or clarify) the usual order of evaluation.
Precedence of operators: functions, unary -, ^, * and /, + and - then left to right within level.

Logical operators to use with If / Else if

The arguments ‘x’ or ‘y’ stand for either an existing SDA variable, a constant, or another expression.

Operator / Meaning / Example

EQ
equal to
if (x eq y) newvar = 1

NE
not equal to
if (x ne y) newvar = 1

GT
greater than
if (x gt y) newvar = 1

GE
greater or equal
if (x ge y) newvar = 1

LT
less than
if (x lt y) newvar = 1

LE
less or equal
if (x le y) newvar = 1

AND
logical AND
if (x lt 2 AND y lt 2) newvar = 1

OR
logical OR
if (x lt 2 OR y lt 2) newvar = 1
These operators can be in upper or lower case.

FUNCTIONS USED IN THE EXPRESSION

The functions listed below are recognized in expressions by the COMPUTE program. The name of each function can be given in either upper or lower case.

The arguments ‘a’ or ‘b’ stand for a specific constant (2 or 4.5, for example). The arguments ‘x’ or ‘y’ stand for either an existing SDA variable, a temporary variable, a constant, or another expression.

Arithmetic Functions

ABS(x)
Absolute value

EXP(x)
Exponential function (antilog), e^x

LOG(x) or LN(x)
Natural logarithm

LOG10(x) or LG10(x)
Logarithm - base 10

MOD(x,a)
Modulus (remainder) of ‘x’ divided by ‘a’ (e.g., mod(5,2) equals 1)

ROUND(x) or RND(x)
Round off (e.g., round(2.5) equals 3)

SQRT(x)
Square root

TRUNC(x)
Truncate (e.g., trunc(2.5) equals 2)

Summaries of Variables

MEAN.n (x,y,...)
Mean of the given variables

SUM.n (x,y,...)
Sum of the given variables

MIN.n (x,y,...)
Minimum value of the given variables

MAX.n (x,y,...)
Maximum value of the given variables

Note: the ‘.n’ part of the function name is optional. If used, it tells the function that at least ‘n’ of the given variables must have valid data for a case; otherwise the function returns the missing-data code. The default value for ‘n’ is 1.

For example, ‘mean(var1,var2,var3)’ will generate the mean of the three variables, even if only one of the three has a valid code. On the other hand ‘mean.2(var1,var2,var3)’ will generate a mean for a specific case only if at least two of the variables have valid codes.


Other Summaries

COUNT(x,y(a-b))
Number of variables with values between ‘a’ and ‘b’. (You can specify a different value or a different range for each variable; for example:
count(var1(1), var2(1-3), var3, var4(5-7)).
In the above example, the range ‘5-7’ applies to var3 as well as to var4; the last variable in the list MUST have a specified value or range. Missing-data or out-of-range codes are not counted unless the keyword ‘missing=valid’ has been specified.)

CUM(x)
Cumulate the value of ‘x’ from one case to the next. (The first case is just the value of ‘x’ for that case; subsequent cases keep adding the value of ‘x’. If ‘x’ for a case is a missing-data code, and if the keyword ‘missing=valid’ has NOT been specified, the cumulative value is the same as for the preceding case; cumulation resumes with the next case.)

MISSING (x,y,...)
Number of variables with missing-data or out-of-range values.

Random Distribution Functions

UNIFORM(x,y)
Uniform distribution between ‘x’ and ‘y’

DUNIFORM(x,y)
Discrete uniform distribution between ‘x’ and ‘y’.
(The result is a whole number.)

NORMAL(x,y)
Normal distribution with mean ‘x’ and standard deviation ‘y’

Trigonometric Functions

SIN(x), COS (x)
Sine and cosine (‘x’ is in radians)

ARCSIN(y) or ARSIN(y)
Arcsine

ARCTAN(y) or ARTAN(y)
Arctangent


EXAMPLES OF BATCH FILES

Basic example

Basic example: compute the sum of 2 variables
(with each variable coded 1-5)
newvar = spend + spend2 * study = c:\archive\nes96 (PC syntax) study = /archive/nes96 (UNIX syntax) label=Sum of spend and spend2 catlabels= 2 Lowest 6 Medium 10 Highest *

Multiple complex computes in the same file

(with each set of compute commands separated by ‘**’)

# 1. Count the number of occurences of ‘1’


newvar1 = count(spend, spend2, spend3, spend4 (1) ) * study = /sda/demostudy label=Number of ‘spend too much’ in spend - spend4 overwrite=yes text = This variable counts the number of times that the code ’1’ (for govt. spends too much on this) is recorded. * **

# 2. Create a second new variable in this run.


# Compute the mean of 3 variables; at least 2 must have valid codes.

newvar2 = mean.2(spend, spend2, spend3) * label=Average of spend, spend2, and spend3 md=9 **

# 3. Also create a random variable with a normal distribution


# random variable will have mean=0, standard deviation=10.
newvar3=normal(0,10) * label=Random numbers with mean 0,sd 10 seed= 12121


CSM, UC Berkeley
January 25, 2010