SDA 4.0 Documentation for LOGIT

NAME

LOGIT - Logit(Logistic) and Probit Regression

USAGE

logit -b batchfile

DESCRIPTION

LOGIT carries out a logistic or probit regression analysis, using the method of maximum likelihood, for specified input variables. A weight variable can be used to give different weights to each case, and filter variables may be used to exclude some of the cases. If a case has missing data on ANY of the specified variables, it is excluded from all the calculations.

Recodes, dummy variables, and product terms can be generated temporarily within the program itself, so that the user will not have to create such variables before running a regression.

One numeric variable is specified as the dependent variable or the variable to be predicted. In order for this variable to be used as a dependent variable in logit or probit regression, it must be coded to have exactly two categories: 0 and 1. If the variable you want to use as a dependent variable is not already coded as a simple 0/1 variable, you can create a dummy variable, or you can recode the variable temporarily. If the dependent variable is left as anything other than a simple 0/1 variable, the program will recode the dependent variable automatically. The lowest valid score will be recoded to the value '0', and all other scores will be recoded to the value '1'.

Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is generally in HTML, which can be viewed with a Web browser.

It is also possible to run the program directly by preparing a command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the `-b' option flag.

KEYWORDS

The batch file contains specifications for the analysis. These specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. The valid keywords are as follows (with significant characters shown in capital letters):

Basic Keywords


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

COefficients= PROBIT                          Calculate LOGIT regression
                                                coefficients and results

STUdy=        path(s) of dataset(s)           Look for variables in
                                                current directory only

SAvefile=     filename to receive output      Output sent to screen
               (overwrites existing file)      (standard output)

DEP=          name of dependent variable      REQUIRED

INDep=        names of independent vars       REQUIRED
              (separated by spaces/commas)

Weight=       name of weight variable         No weighting

Filter=       name(s) and codes of filter     No filter
                variable(s)

STRatum=      name of variable giving         No stratification for
                sample stratum                  computing standard errors
              $1: Force one stratum

CLuster=      name of variable giving         No cluster variable for
                sample cluster                  computing standard errors

GVARCase=     LOWER or UPPER                  No force to lower/upper case


DUMMYgenmax=  A number between 1 and 100      Max of 25 dummy vars can be
                (max dummy vars)                generated by the "m:" syntax
                                                for a single categorical var

NDEcimals=    number of decimals for main     3 decimal places
               results (coefficients, SE's)

Display Options


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

COLORcoding=  Yes                             No color coding of
                                                coefficients or headings

LAnguagefile= Name of file with non-English   English labels on
                labels and messages             output

RUNtitle=     Title or comments for run       No title or comments

SHORTlist=    Yes (omit list of               Output list of all
                indep vars at top)              independent variables

TExt=         Yes                             No text for variables

Other Statistics

In addition to the main results, one or more of the following optional statistics can be displayed. If the product of B times the mean (BPRODuct) is requested, univariate statistics are also included automatically. The 'OTHERstats' keyword can be repeated.

You can specify the desired number of decimal places in parentheses for univariate statistics and 'BPRODuct' if the default, listed below, is not satisfactory. Note, however, that the number of decimals specified for 'BPRODuct' will override the number specified for 'UNIvariate'.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

OTHERstats=
              TTests (ndec)                   No T-tests
              EXPB                            No exp(B) for logit
              FTest (ndec)                    No Global F-test
              UNIvariate (ndec)               No univariate statistics
              BPRODuct (ndec)                 No B*Mean statistics
              COEFF (ndec)                    No covar of coefficients matrix
              CONF (90, 95, or 99)            No confidence intervals
                                               ('CONF' alone gives 95% CI)

Chart Options

There are several chart options, assuming that the chart generation servlet is running on the server computer. Two of the specifications are required, in order to produce charts.

The statistic charted is each regression coefficient and its confidence interval.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

TBLProperties= PATHNAME for chart properties   REQUIRED for charts
                file
               Required location for SDA 4 is:
               SDAROOT/tmpdir/xxx.cht
               where 'SDAROOT' is the pathname
                of the SDA installation on
                your server, and
               where 'xxx' is any name.
                (See the last example below)

               (This is a temporary filename,
                to be passed on to the charting
                servlet.)

CH_URL=         URL of chart-generation        REQUIRED for charts
                 servlet on the server.
                Required URL for SDA 4 is:
                http://SDAURL/sdaweb/charts
                 where 'SDAURL' is the
                 hostname of the SDAWEB
                 application on your server.
                 (See the last example below)

CH_COEFF=       Coefficient to chart            none
                 (B or EXPB or PROBUNIT
                  or PROBSD or NONE)

CH_INDEPVARSmax=Number of independent vars      all
                 to include in the chart
                 (first N variables, where
                 N is an integer)

CH_RANGEOPT=    Set the range of the chart      auto
                 either to:
                 AUTO (set by the program) or
                 CUSTOM (use specified low/high)

   if CH_RANGEOPT=CUSTOM, the following two options
     can be used:

CH_RANGELOW=    Lower bound of the range
                 (can have decimals)

CH_RANGEHIGH=   Upper bound of the range
                 (can have decimals)

CH_FONT=        Font to use in the chart        SansSerif

CH_WIdth=       Width of chart in pixels        600

CH_HEight=      Height of chart in pixels       400

Technical Options

There are some other options for the maximum likelihood estimation and for ASCII output of results. These options are only available in batch mode and are primarily for testing. They are not accessible from the standard SDA Web interface.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

ASCiifile=    Name of file for ascii output   Only HTML output
               (for diagnostic purposes)

MAXIter=      Maximum number of iterations    15

NOVerbose=    Yes                             Report results of each
                                                iteration in the
                                                ASCII output file
                                                (if 'ASCiifile='
                                                 is specified)

TOLerance=    Tolerance for convergence       .0001

ADDITIONAL INFORMATION

ABBREVIATIONS FOR KEYWORDS

Keywords can usually be abbreviated down to the number of characters required to differentiate them from other keywords. The keyword for the name of the dependent variable, for instance, can be given as `dependent=' or `dep='. Either upper or lower case may be used. In the list of keywords given above, the minimum set of characters for each keyword is capitalized.

Mention of Keyword Sufficient

The form `keyword=yes' may be shortened to `keyword'. That is, the `=yes' may be omitted for those options which require no further specification. For example, `text=yes' can be shortened to `text'.

COMMENTS

Anything on a line beginning with "#" is ignored by the batch processor and can therefore be used for comments. Blank lines are also ignored.

DECIMAL PLACES

The default number of decimal places for all the statistics is 3 places, except for the variance/covariance matrix of coefficients with a default of 6 decimal places. The `NDECimals=' keyword is used for changing the number of decimals to output for the main results (regression coefficients and their standard errors and confidence intervals).

To change the number of decimals for the other (optional) statistics, put the desired number of decimals in parentheses after specifying the statistic. Note that requesting the BPRODUCT statistics will force the output of the univariate statistics as well. And the specification of decimal places for the BPRODUCT statistics will override any specification of decimal places for the univariate statistics.

REPETITION OF KEYWORDS

If there is not enough room on a line to list all of the desired variables, the keyword can be repeated on a new line, and more variables can be listed. In such a case the second list is appended to the first list, for purposes of generating tables.

This appending feature applies to the keywords for specifying the independent variables, the filter variables, and the 'otherstats=' keyword. It also applies to the 'study=' keyword, for specifying the locations of the SDA dataset directories. If other keywords are repeated, the program will print an error message and stop.

EXAMPLES OF BATCH FILES

Basic logistic regression

Specify the dependent variable as a dummy variable.

     study = /sa/testdata

     dep = spend(d:1-2)
     indep = age, educ gender

     savefile = mylogit.htm

Run a probit regression, with t-tests and univariate statistics.

Redefine some ranges; use weight and filter variables; and request descriptive text for the variables.

     dep = spend(d:1-2)
     indep = age(18-30) educ  gender

     coefficients = probit
     otherstats = ttests
     otherstats = univariate

     weight= wtvar
     filters= var21(1-3) var30(1)
     text = yes

     savefile = mylogit.htm

Specify stratum and cluster variables, for calculating complex standard errors.

     dep = spend(d:1-2)
     indep = age, educ gender

     stratum = stratvar
     cluster = psuvar

     savefile = mylogit.htm

Specify the location of the SDA study datasets (necessary if not the current directory).

Also get 90% confidence intervals, and request some optional statistics, most with a specified number of decimals.

     study = /sa/testdata
     study = /sa/testdata/newvars

     dep = spend(d:1-2)
     indep = age educ gender recodedvar

     otherstats = conf(90)
     otherstats = ttests ftest(4) coeff(8) bproduct(2)

     savefile = mylogit.htm

Specifying some chart options

In addition to the two required chart specifications, specify the exponential of the logit coefficients to be charted. Also limit the chart to the first 4 independent variables.

     study = /sa/sdatest
     dep = vardep
     indep = spend1 spend2 spend3 spend4 age educ gender


     tblproperties = /var/www/sda/tmpdir/testing.cht
     ch_url=http://sda.berkeley.edu/sdaweb/charts

     ch_coeff = expb
     ch_indepvarsmax = 4

     savefile = mymeans.htm

CSM, UC Berkeley/ISA
June 4, 2018