If a case has missing data on ANY of the specified variables, by default it is excluded from all the calculations. However, there is an option to exclude cases pairwise -- that is, to calculate each correlation coefficient using all cases having valid data on that PAIR of variables.
Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is in HTML, which can be viewed with a Web browser.
It is also possible to run the program directly by preparing a batch command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the `-b' option flag.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ STUdy= path(s) of dataset(s) Look for variables in current directory only Vars= names of vars to correlate REQUIRED (separated by spaces/commas) Weight= name of weight variable No weighting Filter= name(s) and codes of filter No filter variable(s) GVARCase= LOWER or UPPER No force to lower/upper case MD= Pairwise Cases with any MD are excluded SAvefile= filename to receive output Output sent to screen (overwrite existing file) (standard output) TExt= Yes No text for variables LAnguagefile= Name of file with non-English English labels on labels and messages output RUNtitle= Title or comments for run No title or comments
For each statistic the user can specify the number of desired decimal places (in parentheses, after the name of the statistic). See below for the default number of decimals for each statistic. Since the default main statistic is the Pearson correlation coefficient, it is not necessary to specify that statistic unless you want to change the number of decimal places to display.
It is possible to reverse the sign of one or more of the variables. This may be desirable, for example, in order to have all of the expected correlations positive. Then a negative correlation will stand out as being unexpected. If you want to reverse the sign of a variable, give its index position after the 'reverse=' keyword. A variable's index position is its relative position after the 'vars=' keyword.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ MAINstat= CORR (ndec) Display correlations, LOGodds (ndec) with default number of decimal places REVerse= list Do not reverse the signs (see example below) of variables
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ OTHERstats= SECOR (ndec) No standard errors of the correlations (Univariate statistics) MEANs (ndec) No means SD (ndec) No standard deviations SEVAR (ndec) No standard errors Ncases No unweighted N's WNcases (ndec) No weighted N's (Paired statistics) PMEANs (ndec) No paired means PSD (ndec) No paired std devs PSEVAR (ndec) No paired std errs PNcases No paired N's PWNcases (ndec) No paired weighted N's PSQ= list1 ; list2 (ndec) No P-square statistics (see below)Note that the 'otherstats=' keyword can be repeated on subsequent lines if necessary.
The calculation of the standard error of the correlation coefficient in each cell is based by default on the UNWEIGHTED number of cases, even if a weight variable has been used for calculating the correlation coefficient. Ordinarily this procedure will generate a more appropriate statistical test than one based on the weighted N in each cell.
If all of the correlation coefficients in one row are exactly double the size of the coefficients in another row, for example, there is a constant proportionality, and the index will be 1.0. Usually this statistic is used to examine the consistency of the relationships of several items (defining the rows of the matrix) in respect to a number of criterion variables (defining the columns of the matrix). For a discussion of the use of this statistic for creating scales, see Thomas Piazza, "The Analysis of Attitude Items," American Journal of Sociology, vol. 86 (1980) pp. 584-603.
The `PSQ=' keyword allows you to specify which items should be used for the rows (list1), and which items should be used as the criterion variables (list2). Each list is a set of numbers, referring to the order in which the variables were specified after the `Vars=' keyword. Each list can consist of single numbers or ranges, separated by commas or blanks. The two lists are separated by a semicolon. An example is given below.
This appending feature applies to the keywords for specifying the variables to be correlated, the filter variables, and the `otherstats=' keyword. It also applies to the 'study=' keyword, for specifying the locations of the SDA dataset directories. If other keywords are repeated, the program will print an error message and stop.
# Basic example study = /sa/testdata vars = spend spend2 spend3 spend4 savefile = mymatrix.htm
----------------------------------- # Use weight and filter variables, and request some # univariate statistics and descriptive text for the variables. vars = spend spend2 spend3 spend4 otherstats = means, ncases weight= wtvar filters= age(18-50) gender(1) text = yes savefile = mymatrix.htm
----------------------------------- # Generate a P-square matrix of the four spend variables, # using age, educ, and sex as the criterion variables. # Also request 3 decimal places. vars = spend spend2 spend3 spend4 age educ sex psq = 1-4; 5-7 (3) runtitle= Test run to demonstrate P-square stats savefile= mypsq.htm ----------------------------------- # Reverse the sign of the correlations involving two of # the four spending variables -- the 2nd and 4th mentioned # after the 'vars=' keyword. vars = spend spend2 spend3 spend4 reverse = 2 4 text runtitle= Test run to demonstrate reversing signs savefile= mytest.htm