Several optional statistics such as medians, percentiles, and standard errors can also be calculated and displayed in each cell of the output table. Note that the standard error option refers only to the mean, not to the median or percentile. Each statistic can be displayed with a specified number of decimal places.
Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is in HTML, which can be viewed with a Web browser.
It is also possible to run the program in batch mode by preparing a command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the `-b' option flag.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ STUdy= path of dataset directory Look for variables in current directory only SAvefile= filename to receive output Output sent to screen (overwrite existing file) (standard output) Variable Specifications DEPendent= variables name(s) REQUIRED (separated by spaces/commas) ROWvar= variable name(s) REQUIRED (separated by spaces/commas) COLUMNvar= variable name(s) No column variable CONtrolvar= variable name(s) No control variable Weight= name of weight variable No weighting Filter= name(s) and codes of filter No filter variable(s) GVARCase= LOWER or UPPER No force to lower/upper case STRatum= name of variable giving No stratification for sample stratum computing standard errors $1: Force one stratum CLuster= name of variable giving No cluster variable for sample cluster computing standard errors General Options COLORcoding= Yes No color coding of cells or colored headings LAnguagefile= pathname of file with English labels on non-English labels output NOTABle= Yes (to suppress tables of Display the tables means, confidence intervals, and diagnostic information but still get other info) TExt= Yes No text for variables RUNtitle= title or comments for run No title or comments
Instead of displaying the main statistic directly, it is possible to display the DIFFERENCE from something else, by adding the `difference=' keyword. The difference for each cell can be the difference between the cell mean and either the overall mean, the mean in the same column of a specified row, or the mean in the same row of a specified column. If a row or column difference is requested, you must also specify the BASE CATEGORY to use for the comparison.
For differences between a specified row or column, it is possible to obtain the average of the differences, instead of the difference in the marginal column or row. This option is set in the Global Specifications section for the dataset in the SDA Manager (or in the general section of the HARC file by setting XMEANS=YES).
For each statistic the user can specify the number of desired decimal places (in parentheses, after the name of the statistic). See below for the default number of decimals for each statistic.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ MAINstat= MEANs (ndec) Display means, with TOTALs (ndec) two decimal places LOgit (ndec) PRobit (ndec) LP (ndec) DIFference= Overall (ndec) Display main statistic Row (ndec) Column (ndec) BASEcat= code for comparison row/column REQUIRED for row/column differences AVGDiffs= Yes No average differences from a row or column are displayed
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ OTHERSTats= (EITHER medians OR percentiles can be specified, but not both) MEDIAN (ndec) No Median of dep variable PERCENTile (nth, ndec) No nth percentile MINimum (ndec) No minimum value MAXimum (ndec) No maximum value Ncases No unweighted N's WNcases (ndec) No weighted N's (statistics for means only) SER (ndec) No standard errors for simple random sample ZSTATistic (ndec) No Z- or T-statistics P (ndec) No p-value (only for differences from a row or a col) SD (ndec) No standard deviations (for complex samples only) SEC (ndec) No standard errors for complex sample design DEFT (ndec) No design effect (for cluster samples only) RHO (ndec) No cluster coefficient REMEDIAN= ASNEEDED or ALWAYS NEVER: No remedian estimates for medians or percentiles (see additional information below)
An ANOVA table can be produced. For simple random samples the ANOVA table and an F-test is produced. For complex samples the F-test is omitted and the only output is the eta-squared statistics, which show descriptively the proportion of the variance of the dependent variable that is explained by the row and column variables and their interaction.
A table with the upper and lower bounds of the confidence interval for the mean (or total or differenc or difference) in each cell can be produced. The default level of confidence is the 95 percent level, but the 90 or 99 percent levels can also be specified (in parentheses). The number of decimal places displayed will be the same as requested for the means. If both complex and SRS standard errors have been requested, only the complex standard errors are used for the confidence intervals.
For complex samples, a table with diagnostic information in each cell can also be produced.
A multiple classification analysis (MCA) can be carried out. The default number of decimals is 3, but another number of decimal places can be specified.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ ANova= Yes No anova table OTHERTABles= CONFidence(level) No table with confidence (level can be 90,95,or 99) intervals DIAGnostics No table with diagnostics MCA (ndec) No Multiple Classification Analysis
The statistic charted is the statistic specified with the 'MAINSTAT=' keyword (default is MEANS). However, if MEDIANS or PERCENTILES are specified with the 'OTHERSTats=' keyword, the chart can be based on the MEDIAN or PERCENTILE (whichever was specified).
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ CHARTtype= PERCENTile Chart the 'MAINSTAT' instead of MEDIANS or PERCENTILES TBLProperties= PATHNAME for chart properties REQUIRED for charts file (This is a temporary file, to be passed on to the charting servlet. The TABLES program will generate multiple files from the given filename, if multiple charts are generated because a control variable was specified or because multiple row or column variables were specified.) CH_HIDEpath= Yes (suppress path to chart display the path properties file) CH_URL= URL of chart-generation REQUIRED for charts servlet on the server. CH_MAXCHarts= Maximum number of charts to 25 create on this run (1-100) CH_TYPe= Type of chart to create bar (bar or line) CH_ORientation= Orientation of BAR charts vertical (vertical or horizontal) CH_EFfects= Visual effects for BAR charts use2D (use2D - 2 dimensional; use3D - 3 dimensional) CH_SHOWMeans= Yes (put means on the chart) No means CH_FONT= Font to use in charts SansSerif CH_COLor= Yes (create charts in color) Greyscale charts CH_BARcolors= Path for custom palette file Standard colors for bar charts CH_LINEcolors= Path for custom palette file Standard colors for line charts CH_WIdth= Width of chart in pixels 600 CH_HEight= Height of chart in pixels 400
For further information on this method of estimating the median
or percentile, see Peter J. Rousseeuw and Gilbert W. Bassett,
Jr., "The Remedian: A Robust Averaging Method for Large Data
Sets." Journal of the American Statistical Association,
March 1990, vol. 85, pp. 97-104. Note that SDA uses a base
of 101 to calculate the remedian
Briefly, the variables will cycle in the following order: control, column, row, dependent. All of the tables will be produced using the same weight, filters, and other options.
study = /archive/nes84 dep = vardep row = var1 column = var3 otherstats = ncases anova = yes savefile = mymeans.htm
study = /archive/nes84 dep = vardep1 vardep2 row = var1(1-9) var2 var3(0-9) column = var3, var4 weight= wtvar filters= var21(1-3) var30(1) otherstats = se, ncases anova savefile = mymeans.htm
study = /archive/nes94 dep = vote row = party column = sex diffs = col(3) basecat = 1 otherstats = se p ncases anova text runtitle= Test run to demonstrate batch mode savefile= mymeans.htm
study = /archive/nes94 dep = vote row = party column = sex stratum = stratvar cluster = psuvar otherstats = sec ser deft rho ncases othertables = confidence diagnostics savefile= mymeans.htm
study = /sa/sdatest dep = vardep row = var1 column = var3 savefile = mymeans.htm tblproperties = /sa/charts/mychartspecs ch_url=http://sda.berkeley.edu/chartgen ch_color = yes ch_showmeans= yes