SDA 4.0 Documentation for MEANS


NAME

means - run tables of means in batch mode

USAGE

means -b filename

DESCRIPTION

MEANS displays the mean value of a dependent variable in a crosstabular format. The means are calculated and displayed within categories defined by the row, column, and control variables. (Only a row variable is necessary.)

Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is in HTML, which can be viewed with a Web browser.

It is also possible to run the program in batch mode by preparing a command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the `-b' option flag.


CONTENTS OF THIS DOCUMENT


KEYWORDS


The batch file contains specifications for the analysis. The specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. The valid keywords are as follows (with significant characters shown in capital letters):

Basic Specifications for the Tables


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


STUdy=        path of dataset directory       Look for variables in
                                                current directory only
SAvefile=     filename to receive output      Output sent to screen
                (overwrite existing file)       (standard output)


Variable Specifications

DEPendent=    variables name(s)               REQUIRED
               (separated by spaces/commas)
ROWvar=       variable name(s)                REQUIRED
               (separated by spaces/commas)
COLUMNvar=    variable name(s)                No column variable

CONtrolvar=   variable name(s)                No control variable

Weight=       name of weight variable         No weighting

Filter=       name(s) and codes of filter     No filter
                variable(s)

GVARCase=     LOWER or UPPER                  No force to lower/upper case

STRatum=      name of variable giving         No stratification for
                sample stratum                  computing standard errors
              $1: Force one stratum

CLuster=      name of variable giving         No cluster variable for
                sample cluster                  computing standard errors


General Options

COLORcoding=  Yes                             No color coding of cells
                                                or colored headings

LAnguagefile= pathname of file with           English labels on
                non-English labels              output

NOTABle=      Yes (to suppress tables of      Display the tables
                means, confidence intervals,
                and diagnostic information
                but still get other info)

TExt=         Yes                             No text for variables

RUNtitle=     title or comments for run       No title or comments

Statistics in Each Cell

Main statistic to display in each cell

The main statistic to display in each cell of the table can be one of five options: the means, the totals (which are the numerators of the means), or the transformation of a 0/1 dependent variable into a logit, a probit, or a logit scaled as a probit. The default main statistics to display are the means.

Instead of displaying the main statistic directly, it is possible to display the DIFFERENCE from something else, by adding the `difference=' keyword. The difference for each cell can be the difference between the cell mean and either the overall mean, the mean in the same column of a specified row, or the mean in the same row of a specified column. If a row or column difference is requested, you must also specify the BASE CATEGORY to use for the comparison.

For differences between a specified row or column, it is possible to obtain the average of the differences, instead of the difference in the marginal column or row. This option is set in the Global Specifications section for the dataset in the SDA Manager (or in the general section of the HARC file by setting XMEANS=YES).

For each statistic the user can specify the number of desired decimal places (in parentheses, after the name of the statistic). See below for the default number of decimals for each statistic.



Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

MAINstat=     MEANs (ndec)                    Display means, with
              TOTALs (ndec)                     two decimal places
              LOgit  (ndec)
              PRobit (ndec)
              LP (ndec)

DIFference=   Overall (ndec)                  Display main statistic
              Row     (ndec)
              Column  (ndec)

BASEcat=      code for comparison row/column  REQUIRED for row/column
                                                differences

AVGDiffs=     Yes                             No average differences
                                                from a row or column
                                                are displayed


Other statistics in each cell

In addition to the main statistic, one or more of the following optional statistics can be displayed in each cell (with the desired number of decimal places in parentheses if the defaults, listed below, are not satisfactory). Note that the 'otherstats=' keyword can be repeated on subsequent lines if necessary.

Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


OTHERSTats=
              MEDIAN (ndec)                   No Median of dep variable

              SER (ndec)                      No standard errors for
                                                simple random sample

              ZSTATistic (ndec)               No Z- or T-statistics

              P (ndec)                        No p-value
               (only for differences
                from a row or a col)

              SD (ndec)                       No standard deviations

              Ncases                          No unweighted N's
              WNcases (ndec)                  No weighted N's


             (for complex samples only)

              SEC (ndec)                      No standard errors for
                                                complex sample design
              DEFT (ndec)                     No design effect


             (for cluster samples only)


              RHO  (ndec)                     No cluster coefficient


Optional tables of statistics

Additional tables of statistics can be generated, if desired.

An ANOVA table can be produced. For simple random samples the ANOVA table and an F-test is produced. For complex samples the F-test is omitted and the only output is the eta-squared statistics, which show descriptively the proportion of the variance of the dependent variable that is explained by the row and column variables and their interaction.

A table with the upper and lower bounds of the confidence interval for the mean (or total or differenc or difference) in each cell can be produced. The default level of confidence is the 95 percent level, but the 90 or 99 percent levels can also be specified (in parentheses). The number of decimal places displayed will be the same as requested for the means. If both complex and SRS standard errors have been requested, only the complex standard errors are used for the confidence intervals.

For complex samples, a table with diagnostic information in each cell can also be produced.

A multiple classification analysis (MCA) can be carried out. The default number of decimals is 3, but another number of decimal places can be specified.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

ANova=        Yes                             No anova table

OTHERTABles=
              CONFidence(level)               No table with confidence
                (level can be 90,95,or 99)      intervals

              DIAGnostics                     No table with diagnostics

              MCA (ndec)                      No Multiple Classification
                                                Analysis


Chart Options

There are several chart options, assuming that the chart generation servlet is running on the server computer. Two of the specifications are required, in order to produce charts.

Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


TBLProperties= PATHNAME for chart properties   REQUIRED for charts
               file
               (This is a temporary file, to
                be passed on to the charting
                servlet.  The TABLES program
                will generate multiple files
                from the given filename, if
                multiple charts are generated
                because a control variable
                was specified or because
                multiple row or column
                variables were specified.)

CH_HIDEpath=    Yes (suppress path to chart    display the path
                  properties file)

CH_URL=         URL of chart-generation        REQUIRED for charts
                 servlet on the server.

CH_MAXCHarts=   Maximum number of charts to     25
                 create on this run (1-100)

CH_TYPe=        Type of chart to create         bar
                (bar or line)

CH_ORientation= Orientation of BAR charts       vertical
                (vertical or horizontal)

CH_EFfects=     Visual effects for BAR charts   use2D
                (use2D - 2 dimensional;
                 use3D - 3 dimensional)

CH_SHOWMeans=   Yes (put means on the chart)    No means

CH_FONT=        Font to use in charts           SansSerif

CH_COLor=       Yes (create charts in color)    Greyscale charts

CH_BARcolors=   Path for custom palette file    Standard colors
                  for bar charts

CH_LINEcolors=  Path for custom palette file    Standard colors
                  for line charts

CH_WIdth=       Width of chart in pixels        600

CH_HEight=      Height of chart in pixels       400


ADDITIONAL INFORMATION


ABBREVIATIONS

Keywords can usually be abbreviated down to the number of characters required to differentiate them from other keywords. Sometimes only one character is required. The keyword for the weight variable, for instance, can be given as "weight=" or "wei=" or even "w=". Either upper or lower case may be used. In the list of keywords above, the minimum string of characters required for each specification is shown in capital letters.

MENTION OF KEYWORD SUFFICIENT

The form `keyword=yes' may be shortened to `keyword'. That is, the `=yes' may be omitted for those options which require no further specification. For example, `text=yes' can be shortened to `text'.

DECIMAL PLACES

Each statistic has a default number of decimal places with which it will be printed. To change the default, put the desired number of decimals in parentheses after specifying the statistic. The statistics affected and their defaults are: For the main statistics: means(2), totals(0), logit(2), probit(2), logit- scaled-as-probit(2). If differences are displayed, the default number of decimal places is the same as for the main statistic. For the optional statistics: medians(0), ser(3), zstat or tstat(2), p-value(2), sd(3), wncases(0), sec(3), deft(2), rho(2). For MCA statistics, the default is 3 decimals. It is not necessary to request the `mean' statistic unless you want to change the number of decimal places for the mean; unless otherwise specified, the mean is the main statistic that will be displayed.

REPETITION OF KEYWORDS

If there is not enough room on a line to list all of the desired variables, the keyword can be repeated on a new line, and more variables can be listed. In such a case the second list is appended to the first list, for purposes of generating tables. This appending feature applies to the keywords for specifying the dependent, row, column, control, and filter variables, and also to the `otherstats' and the `othertables' keywords. If other keywords are repeated, the program will print an error message and stop.

ORDER OF PROCESSING LISTS

When more than one variable is given for the dependent, row, column, or control variable specifications, the tables are produced in the following order: Tables for EACH of the control variables are produced with the FIRST column variable and the FIRST row variable and the FIRST dependent variable. Then the whole list of control variables is processed again for the SECOND column variable and the FIRST row variable and the FIRST dependent variable; and so on until the whole set of column variables has been processed. Then the whole series is repeated for the SECOND row variable; and so on until all the row variables have been used. Finally, the whole series is repeated for each succeeding dependent variable.

Briefly, the variables will cycle in the following order: control, column, row, dependent. All of the tables will be produced using the same weight, filters, and other options.

COMMENTS

Anything on a line beginning with "#" is ignored by the batch processor and can therefore be used for comments. Blank lines are also ignored.

BACKWARD COMPATIBILITY

Versions prior to SDA 1.2b used 'vertical' and 'horizontal' to specify the 'rowvar' and 'columnvar' variables in the batch command files. Although the older terminology has been superseded, those keywords are still recognized for now as synonomous with the newer 'rowvar' and 'columnvar' specifications.


EXAMPLES OF BATCH FILES


Basic example


     study = /archive/nes84
     dep = vardep
     row = var1
     column = var3

     otherstats = ncases
     anova = yes
     savefile = mymeans.htm

Multiple variables

Specify multiple dependent, row, and column variables, which will generate a table for each combination of the variables.
Also redefine some ranges, and use weight and filter variables.
     study = /archive/nes84
     dep = vardep1 vardep2
     row = var1(1-9) var2 var3(0-9)
     column = var3, var4

     weight= wtvar
     filters= var21(1-3) var30(1)

     otherstats = se, ncases
     anova
     savefile = mymeans.htm

Differences from means in a specified column

Calculate the differences (with 3 decimal places) from column 1, the standard error and statistical significance of each difference, and request some text options

     study = /archive/nes94
     dep = vote
     row = party
     column = sex

     diffs = col(3)
     basecat = 1

     otherstats =  se p ncases
     anova

     text
     runtitle= Test run to demonstrate batch mode

     savefile= mymeans.htm

Complex standard errors

Specify stratum and cluster variables, for complex standard errors; also request tables of confidence intervals and diagnostics
     study = /archive/nes94
     dep = vote
     row = party
     column = sex

     stratum = stratvar
     cluster = psuvar
     otherstats =  sec ser deft rho ncases
     othertables = confidence diagnostics

     savefile= mymeans.htm

Specifying some chart options

In addition to the required two chart specifications, request charts in color (instead of grayscale) with means printed on the bars.
     study = /sa/sdatest
     dep = vardep
     row = var1
     column = var3

     savefile = mymeans.htm

     tblproperties = /sa/charts/mychartspecs
     ch_url=http://sda.berkeley.edu/chartgen
     ch_color = yes
     ch_showmeans= yes


CSM, UC Berkeley/ISA
February 13, 2017