SDA 4.0 Documentation for REGRESS


NAME

regress - multiple regression

USAGE

regress -b batchfile

DESCRIPTION

REGRESS carries out a conventional regression analysis, using ordinary least squares, for specified input variables. A weight variable can be used to give different weights to each case, and filter variables may be used to exclude some of the cases. If a case has missing data on ANY of the specified variables, it is excluded from all the calculations.

Recodes, dummy variables, and product terms can be generated temporarily within the program itself, so that the user will not have to create such variables before running a regression.

Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is in HTML, which can be viewed with a Web browser.

It is also possible to run the program directly by preparing a batch command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the `-b' option flag.


CONTENTS OF THIS DOCUMENT


KEYWORDS

The batch file contains specifications for the analysis. These specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. The valid keywords are as follows (with significant characters shown in capital letters):

Basic Keywords


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


STUdy=        path(s) of dataset(s)           Look for variables in the
                                                current directory only

DEP=          name of dependent variable      REQUIRED

INDep=        names of independent vars       REQUIRED
              (separated by spaces/commas)

Weight=       name of weight variable         No weighting

Filter=       name(s) and codes of filter     No filter
                variable(s)

STRatum=      name of variable giving         No stratification for
                sample stratum                  computing standard errors
              $1: Force one stratum

CLuster=      name of variable giving         No cluster variable for
                sample cluster                  computing standard errors

NDECimals=    number of decimals for main     3 decimal places
                results (coefficients, SE's)

SAvefile=     filename to receive output      Output sent to screen
                (overwrite existing file)       (standard output)

DUMMYgenmax=  A number between 1 and 100      Max of 25 dummy vars can be
                (max dummy vars)                generated by the "m:" syntax
                                                for a single categorical var

GVARCase=     LOWER or UPPER                  No force to lower/upper case


Display Options


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

COLORcoding=  Yes                             No color coding of cells
                                                or colored headings

LAnguagefile= Name of file with non-English   English labels on
                labels and messages             output

RUNtitle=     Title or comments for run       No title or comments

SHORTlist=    Yes (omit list of               Output list of all
                indep vars at top)              independent variables

TExt=         Yes                             No text for variables


Other Statisics

In addition to the main results, one or more of the following optional statistics can be displayed. You can specify the desired number of decimal places in parentheses if the defaults, listed below, are not satisfactory.

Note that the `otherstats=' keyword can be repeated on subsequent lines if necessary.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


OTHERstats=
              TTests (ndec)                   No T-tests
              FTest (ndec)                    No Global F-test
              UNIvariate (ndec)               No univariate stats
              BPRODuct (ndec)                 No B*coefficent stats
              CORel (ndec)                    No correlation matrix
              COVar (ndec)                    No covariance matrix
              COEFF (ndec)                    No covar of coefficients matrix
              CONF (90, 95, or 99)            No confidence intervals
                                               ('CONF' alone gives 95% CI)


ADDITIONAL INFORMATION

DECIMAL PLACES

Each statistic has a default number of decimal places with which it will be printed. The default number of decimal places for the main results (regression coefficients and standard errors and confidence intervals) is 3 decimal places. The `NDECimals=' keyword is used for changing the number of decimals output for those statistics.

For the other (optional) statistics, the default numbers of decimals are as follows:

To change the number of decimals for these statistics, put the desired number of decimals in parentheses after specifying the statistic. Note that requesting the BPRODUCT statistics will force the output of the univariate statistics as well. And the specification of decimal places for the BPRODUCT statistics will override any specification of decimal places for the univariate statistics.

ABBREVIATIONS

Keywords can usually be abbreviated down to the number of characters required to differentiate them from other keywords. The keyword for the name of the dependent variable, for instance, can be given as 'dependent=' or 'dep='. Either upper or lower case may be used. In the list of keywords given above, the minimum set of characters for each keyword is capitalized.

COMMENTS

Anything on a line beginning with "#" is ignored by the batch processor and can therefore be used for comments. Blank lines are also ignored.

MENTION OF KEYWORD SUFFICIENT

The form 'keyword=yes' may be shortened to 'keyword'. That is, the '=yes' may be omitted for those options which require no further specification. For example, 'text=yes' can be shortened to 'text'.

REPETITION OF KEYWORDS

If there is not enough room on a line to list all of the desired variables, the keyword can be repeated on a new line, and more variables can be listed. In such a case the second list is appended to the first list, for purposes of generating tables.

This appending feature applies to the keywords for specifying the independent variables, the filter variables, and the 'otherstats=' keyword. It also applies to the 'study=' keyword, for specifying the locations of the SDA dataset directories. If other keywords are repeated, the program will print an error message and stop.


EXAMPLES OF BATCH FILES

# Basic example
     dep = spend
     indep = age, educ gender

     savefile = myregress.htm
-----------------------------------
# Redefine some ranges, use weight and filter variables,
# and request descriptive text for the variables.

     dep = spend
     indep = age(18-30) educ  gender

     weight= wtvar
     filters= var21(1-3) var30(1)
     text = yes

     savefile = myregress.htm
-----------------------------------
# Specify stratum and cluster variables,
# for calculating complex standard errors.
     dep = spend
     indep = age, educ gender

     stratum = stratvar
     cluster = psuvar

     savefile = myregress.htm
-----------------------------------
# Specify the location of the SDA study datasets
# (necessary if not the current directory).
# Also request some optional statistics, most with a specified number of decimals.
     study = /sa/testdata
     study = /sa/testdata/newvars

     dep = spend
     indep = age educ gender recodedvar

     otherstats = ttests ftest(4) univar(3) correl(3)

     savefile = myregress.htm


CSM, UC Berkeley/ISA
January 30, 2017