SDA 4.1 Documentation for CORRTAB
NAME
corrtab - crosstabular breakdown of correlations
USAGE
corrtab -b batchfile
DESCRIPTION
CORRTAB displays the correlations between two variables (the X-
variable and the Y-variable) in a crosstabular format. The
correlations are calculated and displayed within categories
defined by the row, column, and control variables. (Only a row
variable is necessary.)
Ordinarily this program is invoked by the Web interface for the
SDA programs, and the user does not have to deal with the
keywords given in this document. Output from the program is in
HTML, which can be viewed with a Web browser.
It is also possible to run the program directly by preparing a
command file, which specifies the variables to be analyzed and
the options to use. This document explains how to prepare such a
file. The name of this batch command file is specified to the
program after the '-b' option flag.
CONTENTS OF THIS DOCUMENT
KEYWORDS
The batch file contains specifications for the analysis. These
specifications are given in the form "keyword = something" with
one keyword per line. Keywords may be given in any order, either
in upper or in lower case. The valid keywords are as follows
(with significant characters shown in capital letters):
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
STUdy= path of dataset directory Look for variables in
current directory only
XVAR= name(s) of 1st variable REQUIRED
(separated by spaces/commas)
YVAR= name(s) of 2st variable REQUIRED
ROWvar= variable name(s) REQUIRED
(separated by spaces/commas)
COLUMNvar= variable name(s) No column variable
CONtrolvar= variable name(s) No control variable
Weight= name of weight variable No weighting
Filter= name(s) and codes of filter No filter
variable(s)
COLORcoding= Yes No color coding of
coefficients or headings
GVARCase= LOWER or UPPER No force to lower/upper case
LAnguagefile= Name of file with non-English English labels on
labels and messages output
RUNtitle= Title or comments for run No title or comments
(1 line only)
SAvefile= filename to receive output Output sent to screen
(overwrite existing file) (standard output)
TExt= Yes No text for variables
Main Statistic to display
The main statistic to display in each cell of the table can be
one of two options: the Pearson correlation coefficient, or the
log
of the odds ratio. The default main statistics to display are
the
Pearson correlation coefficients.
Instead of displaying the main statistic directly, it is
possible
to display the DIFFERENCE from something else, by adding the
`difference=' keyword.
For each statistic the user can specify the number of desired
decimal places (in parentheses, after the name of the
statistic).
See
below
for the default number of decimals for each statistic.
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
MAINstat= CORR (ndec) Display correlations,
LOGodds (ndec) with default number
of decimal places
DIFference= Overall (ndec) Display main statistic
(diff from overall correlation)
Optional statistics
In addition to the main statistic, one or more of the following
optional statistics can be displayed in each cell (with the
desired
number of decimal places in parentheses if the defaults, listed
below,
are not satisfactory. Note that the 'statistics=' keyword can
be
repeated on subsequent lines if necessary.
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
STAtistics=
SE (ndec) No standard errors
TSTATistic (ndec) No t-statistic in cells
Ncases No unweighted N's
WNcases (ndec) No weighted N's
ADDITIONAL INFORMATION
ABBREVIATIONS FOR KEYWORDS
Keywords can usually be abbreviated down to the number of
characters required to differentiate them from other keywords.
Sometimes only one character is required. The keyword for the
weight variable, for instance, can be given as "weight=" or
"wei=" or even "w=". Either upper or lower case may be used. In
the list of keywords above, the minimum string of characters
required for each specification is shown in capital letters.
Mention of Keyword Sufficient
The form `keyword=yes' may be shortened to `keyword'. That is,
the `=yes' may be omitted for those options which require no
further specification. For example, `text=yes' can be shortened
to `text'.
CALCULATION OF STANDARD ERRORS
If standard errors are requested, they are computed with the
standard formulas for each statistic or its transformation
(assuming simple random sampling). Note that the confidence
interval for the Pearson correlation coefficient is not
symmetric; therefore, there is no single standard error that
applies in both directions. CORRTAB outputs the average distance
of the upward and the downward confidence band for one standard
error (based on the retransformation of Fisher's Z), since that
number is ordinarily a useful approximation. However, if cell
sizes are small or the correlations of interest are close to zero
or one, this average may not be good enough to make statistical
inferences. In such a case (or when in doubt) use Fisher's
transformation and its associated standard error to carry out
statistical tests on the corresponding Pearson correlations.
Note that the calculation of the standard error of the
correlation coefficient in each cell is based by default on the
UNWEIGHTED number of cases, even if a weight variable has been
used for calculating the correlation coefficient. Ordinarily
this procedure will generate a more appropriate statistical test
than one based on the weighted N in each cell.
COMMENTS
Anything on a line beginning with "#" is ignored by the batch
processor and can therefore be used for comments. Blank lines
are also ignored.
DECIMAL PLACES
Each statistic has a default number of decimal places with which
it will be printed. To change the default, put the desired
number of decimals in parentheses after specifying the statistic.
The default number of decimal places are as follows:
- Main statistics: 2 decimals (correlations, logs of odds
ratios, and their differences)
- Standard errors: 3 decimals
- T-statistic: 2 decimals
- Weighted N of cases: no decimals
It is not necessary to request the `correlation' main
statistic unless you want to change the number of decimal places.
Unless otherwise specified, the Pearson correlation coefficient
is the statistic that will be displayed.
DICHOTOMIZING VARIABLES
The optional calculation of odds ratios (instead of Pearson
correlation coefficients) assumes that the X-variables and the Y-
variables have only two categories each. If the variables you
want to use as X- and Y- variables are not already coded as
dichotomous variables, you can create dummy variables or you can
recode the variables temporarily. Otherwise CORRTAB will recode
those variables automatically (but only temporarily). The lowest
valid score will be recoded to the value '0', and all other
scores will be recoded to the value '1'.
ORDER OF PROCESSING LISTS OF VARIABLES
When more than one variable is given for the x, y, row, column,
or control variable specifications, the tables are produced in
the following order: Tables for EACH of the control variables
are produced with the FIRST column variable and the FIRST row
variable and the FIRST pair of x and y variables. Then the whole
list of control variables is processed again for the SECOND
column variable and the FIRST row variable and the FIRST pair of
x and y variables; and so on until the whole set of column
variables has been processed. Then the whole series is repeated
for the SECOND row variable; and so on until all the row
variables have been used. Then the whole series is repeated for
the SECOND Y-variable; and so on until all the Y-variables have
been used. Finally, the whole series is repeated for each
succeeding X-variable.
Briefly, the variables will cycle in the following order:
control, column, row, Yvar, Xvar. All of the tables will be
produced using the same weight, filters, and other options.
REPETITION OF KEYWORDS
If there is not enough room on a line to list all of the desired
variables, the keyword can be repeated on a new line, and more
variables can be listed. In such a case the second list is
appended to the first list, for purposes of generating tables.
This appending feature applies to the keywords for specifying the
x and y variables, the row, column, control, and filter
variables, and the `statistics=' keyword. If other keywords are
repeated, the program will print an error message and stop.
EXAMPLES OF BATCH FILES
Basic example
study = /sa/nes84
xvar = spend
yvar = spend2
row = education
column = gender
savefile = mytables.htm
Using more options
Specify multiple sets of variables, redefine some ranges,
and use weight and filter variables.
xvar = spend spend2 spend3
yvar = age educ
row = var1(1-9) var2 var3(0-9)
column = var3, var4
weight= wtvar
filters= var21(1-3) var30(1)
savefile = mytables.htm
Differences and other options
Put differences instead of original correlations in each cell,
and request some text options
xvar = spend
yvar = spend2
row = var1 var2
column = var4 var5
# Display differences (with 3 decimal places) from the overall
correlation coefficient
differences = overall(3)
# Request that full text of the variables be printed,
# and put a run title or comment on the top of each page
text= yes
runtitle= Test run to demonstrate program
savefile= mytables.htm
CSM, UC Berkeley/ISA
March 18, 2019