SDA 4.1 Documentation for XTABLES


xtables - Generate n-way tables


xtables -b filename [-t ]


XTABLES is a program that reads SDA variables and generates multi-way tables of frequencies or means. The program can produce up to 10-way tables. (The main interactive SDA TABLES program can only produce up to 3-way tables.)

Output from the program is a text file. The first part of the output gives the results for each cell of the table. The second part of the output (unless suppressed by the NOMETADATA option) gives the metadata, in the form of DDL, for the variables used in the tables.

One line is output for each cell of the table. By default, cells with no actual cases are not output. With the ALLCELLS option, however, all cells of the table are output, PROVIDED that a category of each variable was actually encountered in the dataset. This means that categories that are theoretically possible to encounter because they are within the valid range of a variable, but were never in fact used in the data, still will not be output.

Examples of program input and output are given below.

This program can be run only in batch mode. It is not invoked through the regular SDA user interface.

Meaning of the option flags:

-b filename
The name of the batch command file is specified to the program after the `-b' option flag.

Show timing statistics for the program execution (sent to the "standard error" output device -- usually the screen).


The batch command file contains specifications for the analysis. These specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. The valid keywords are as follows (with significant characters shown in capital letters):

Keyword       Possible Specification          Default (if no keyword)

STUdy=        path of dataset directory       Look for variables only in
                (can be repeated)               current directory

INVARs=       name of var(s) to crosstab      REQUIRED
                (can be repeated )

MEANvar=      name of variable for means      Generate frequencies

Weight=       name of weight variable         No weighting

Filter=       name(s) and codes of filter     No filter

SAvefile=     filename to receive output      Output sent to screen
               (overwrite existing file)       (standard output)

ALLCELls=     yes                             Exclude cells with zero cases

MDInclude=    yes                             Exclude missing data codes

NOMETAdata=   yes                             Output metadata (DDL specs) for
                                                the variables in the table

Note that it is enough to mention the keywords ALLCELLS, MDINCLUDE, and NOMETADATA. They will be activated whether or not they are followed by '= yes'.


Example 1 -- 2-way crosstabulation

This example is for a 2-way crosstablulation of the variables 'ideo' by 'spend'. The SDA dataset is in the directory '/sa/sdatest'. A weight variable 'casewt' is used. The tables is limited to cases with a code of 1 (male) on 'gender'. The results will be stored in the file 'out1.txt' and will overwrite that file if it already exists.

Input commands

# Location of SDA study (relative or absolute path) study = /sa/sdatest # Input variables - from 1 to 10 can be specified. invar = ideo invar = spend # Weight variable name weight = casewt # Filter specification. Multiple filters with multiple # codes and/or ranges can be specified filter = gender(1) # Filename for output (default is "standard output") savefile = out1.txt
Output from the preceding batch file (example 1)
CasesTotal = 1113 CasesValid = 473 InvarCount = 2 Invar = ideo Invar = spend Weight = casewt Filter = gender(1) # Contents of each cell: # Input variable values, N, weighted N CellsStart 1 1 : 117 115.823 1 3 : 41 41.369 1 5 : 4 3.206 3 1 : 58 66.546 3 3 : 66 81.190 3 5 : 14 12.505 5 1 : 96 94.551 5 3 : 70 73.224 5 5 : 7 5.772 CellsEnd CellsTotal = 9 DDLStart * name = ideo label = Political ideology in general type = numeric record = 1 column = 64 width = 1 max = 5 md = 8,9 catlabels = 1 Liberal 3 Conservative [Conserv] 5 Moderate 7 Never think of myself in those terms [Not labl] 8 Don't know [DK] 9 Refused text = In general, when it comes to politics, do you usually think of yourself as a liberal, a conservative, a moderate, or what? * name = spend label = Military spending type = numeric record = 1 column = 40 width = 1 md = 8,9 catlabels = 1 Too much 3 About right [Abt ok] 5 Too little [Too litl] 8 Don't know [DK] 9 Refused text = This country faces many problems, none of which can be solved easily or inexpensively. I'm going to name some of these problems. For each one, please tell me whether you think we're spending too much money on them, too little money, or about the right amount. First, how about spending on the military, armaments, and defense? * name = casewt label = Overall sampling weight type = numeric record = 1 column = 11 width = 6 decimals = 3 text = (Overall sampling weight. This weight adjusts for sampling stratum, number of adults in the selected household, and the number of telephone lines into the selected household. The weight is scaled so that the total number of weighted cases equals the number of unweighted cases -- 1113.) * name = gender label = Gender of respondent type = numeric record = 1 column = 28 width = 1 catlabels = 1 Male 2 Female text = CODE OR ASK AS NEEDED: What sex are you? * DDLEnd
Note in the output example above that the lines for the individual cells start with the 'invar' (input variable) code values for the cell. The value in the first column is for the first 'invar' ('ideo'). The second column is for the second 'invar' ('spend'), and so on if there are more than two input variables.

Then a colon is inserted to separate the 'invar' code values from the cell counts.

The first number after the colon is the unweighted N. The second number is the weighted N. (If no weight variable is specified, the weighted N will just be the same as the unweighted N.)

In the output example shown above, the first cell has a value of 1 on the variable 'ideo' and a value of 1 on 'spend'. The are 117 male cases with that combination of codes, and the weighted N of cases with that combination (using the weight variable 'casewt') is 115.823.

Example 2 -- Generate means, and temporarily recode an input variable

If the 'meanvar' keyword is specified in the batch file, the output format is somewhat different. Here is an example with a 'meanvar' and also with the 'age' invar temporarily recoded into 3 categories:

Input commands

study = /sa/sdatest

# If a 'meanvar' is specified, the output for each cell includes
# not only counts, but also the mean for that variable in each cell.

meanvar = spend

invar = ideo
# Age is temporarily recoded into 3 categories

invar = age(r:1=*-30;2=31-50;3=51-*)

weight = casewt
filter = gender(1)

# Suppress metadata (DDL specifications) in the output.
nometadata = yes
savefile = out2.txt

Output from the preceding batch file (example 2)
CasesTotal = 1113 CasesValid = 472 InvarCount = 2 Invar = ideo Invar = age(r: 1 = *-30; 2 = 31-50; 3 = 51-*) Meanvar = spend Weight = casewt Filter = gender(1) # Contents of each cell: # Input variable values, N, weighted N, Numerator, Mean CellsStart 1 1 : 55 60.771 104.063 1.7123792598 1 2 : 78 72.049 99.307 1.3783258616 1 3 : 29 27.578 52.590 1.9069548191 3 1 : 45 56.227 147.513 2.6235260640 3 2 : 49 49.975 109.837 2.1978389195 3 3 : 43 52.756 111.442 2.1124042763 5 1 : 49 55.425 103.743 1.8717726658 5 2 : 85 81.884 156.926 1.9164427727 5 3 : 39 36.238 82.414 2.2742425079 CellsEnd CellsTotal = 9
Note that each line documenting a cell's contents now has two extra pieces of information: the numerator used to compute the mean of the specified 'meanvar' ('spend'), and the mean itself (calculated by dividing the numerator by the weighted N in each cell).


DDL Data Description Language used by SDA Programs
tables Main SDA crosstabulation program

CSM, UC Berkeley/ISA
March 18, 2019