SDA 3.5 Documentation for SUBSET


subset - Make a subset of an SDA dataset


subset -b filename


The subset program can create a data file that contains only a subset of the variables and/or cases in an SDA dataset. The program also generates a matching DDL file.

The output data file is an ASCII fixed-column file with one record per case, with an optional delimiter (blank or comma) between variables on the same record. A header record with variable names is also output, if the selected output format is a comma separated values (CSV) file. For more information on the various output data formats, see the appropriate section in the online help file for the subset program.

Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. However, it is possible to run this procedure directly in batch mode, by preparing a command file which specifies the variables to be included in the subset and the options to use. This document explains how to prepare such a file.

Meaning of the ‘-b’ flag

-b filename
The name of the batch command file is specified to the program after the ‘-b’ flag.


The batch command file contains specifications for the subset. These specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. The valid keywords are as follows (with significant characters shown in capital letters):


Keyword       Possible Specification          Default (if no keyword)

Specify ONE of the following two data sources

STUdy=        path of dataset directory       Required for subset from
                (can be repeated)               SDA dataset

INDATa=       path of ASCII data file         No subset from ASCII file
INDDL=        path of DDL file                No subset from ASCII file
                (specify both data and DDL)

Other specifications

Filter=       name(s) and codes of filter     No filter

VARList=      path of file with list of       REQUIRED
                variables to output

TYPE=         type of data file to produce    TEXT
               -TEXT: text file, no blanks
               -TEXTBL: text file with a
                 blanks between vars
               -CSV: comma separated values

OUTDATa=      filename for output data        outdata.txt
               (overwrite existing file)

OUTDDL=       filename for output DDL         outddl.txt
               (overwrite existing file)

WEBMSGfile=   filename to capture output      No record of user messages
                displayed to Web users of
                the subset procedure



Simple example of a batch command file

The SDA dataset is in the directory ’/sa/sdatest’. The results will overwrite the output data and DDL files if the files already exist.

     study = /sa/sdatest
     varlist = mylist.txt
     outdata = mydata.txt
     outddl = myddl.txt

Example of a file containing a variable list

(Variable names are separated by spaces, commas, or are on new lines.)
     CASEID age educ gender
     spend, spend2, spend3 spend4

CSM, UC Berkeley
April 12, 2011