This file contains the online help available from inside the SDA subset program.
The values of each variable are placed into adjacent columns in the data record. By default, there are no extra blanks or commas between adjacent variables. As an option, the user may specify that adjacent variables be separated either by a blank space or by a comma.
Having a blank or a comma between variables may facilitate reading of the data file by database or spreadsheet programs (like Excel). Note that delimiters do not matter for SAS, SPSS, Stata, SDA, or other statistical packages. They can read a data file in fixed-column format either with or without delimiters between variables.
The codebook is a text file formatted for printing. It should be printed with a fixed width (non-proportional) font such as Courier. Since the codebooks generally use between 70 and 80 characters per line, a 10-point font is a good choice.
The codebook file has a new-page character (form-feed) every 60 lines. Note, however, that not all printers will recognize that character and skip to a new page. In such cases you may have to hand-edit the file to produce the page breaks you want.
The SAS data definitions include the 'DATA IN' command, with location information, and the 'LABEL' section with variable labels, There is also a 'PROC FORMAT' section with category labels; the corresponding 'FORMAT' command associates each label with a variable. There is also a section of 'IF' statements, which set missing-data and out-of-range values of each variable to the SAS missing-data code '.' (a period). A 'PROC DATASETS' command is added after the data definitions; this will generate a list of variables created by SAS.
If you will be running SPSS under Windows, there is a simple way to use the file of SPSS syntax commands: First, change the extension of this file from '.txt' (with which it was downloaded) to '.sps'. Then in Windows, simply double-clicking on the '.sps' file will open SPSS and put this file in a syntax window. You can edit this file so that it contains the name of the data file that you saved. Note that you should include the entire pathname for the data file (for example: C:\mywork\mydata.txt) so that SPSS can find it. Then highlight the whole syntax file and click 'run'. This generates the system file. Switch to the data window to view it. If you get a warning about an obsolete specifier or 'set' command, just ignore it. You can then proceed to analyze the data.
The SPSS data definitions generated by the subset procedure include the 'DATA LIST' command, with location information, the 'VARIABLE LABEL' section with variable labels, and the 'VALUE LABELS' section with category labels. There is also a section with 'MISSING VALUES' statements, which may define certain values of each variable as missing-data values. There may also be some 'IF' statements which set out-of-range values of each variable to the SPSS system-missing code.
To create a system file, it will be necessary to specify the name to assign to the system file; that name is put in the 'SAVE' command in the indicated place near the end of the file. Replace 'y' with the file name you want. The 'SAVE' command at the end of the file includes the 'MAP' option, in order to generate a list of variables created by SPSS. The 'COMPRESSED' option is also specified for the system file, since many users prefer to save disk space; that line can be removed, if you prefer to save the file as an uncompressed file.
Note that if you are running SPSS under Windows, you can create a '.sav' file interactively. However, you may find that a file named 'y' has been saved on your disk (probably in the 'C:\Program_Files\SPSS' directory), unless you delete the 'SAVE' command from the SPSS syntax file before you click on 'run'.
The 'do file' contains category labels and missing-data codes.
The 'dictionary file' contains, for each variable, the type, input format, and label.
For more information on the DDI, see the main DDI Web site.
A DDL file is produced automatically by the subset procedure, even if you do not request that it be sent to you. The DDL file is the basic source of documentation for the subset. The codebook and the data definitions for SAS, SPSS, Stata, and DDI are all derived from the DDL file.
For more information on the content and format of a DDI file, see the SDA manual page for DDI.
Multiple ranges and codes
may be specified.
For example: age(1-17, 25, 95-100)
Multiple filter variables
If you specify more than one filter variable, a case must satisfy ALL of the conditions in order to be included in the table.
For example: gender(1), age(30-50)
Open-ended Ranges using '*' and '**'
A single asterisk, '*', can be used to specify that all cases with VALID codes for a variable will pass the filter.
For example: age(*) includes all cases with valid data on the variable 'age'.
In a range, the '*' can be used to signify the lowest or highest VALID value. For example: age(*-25,75-*). This filter would include all VALID values less than or equal to 25 and all VALID values greater than or equal to 75. However, any missing-data values within those ranges would still be excluded.
In a range, two asterisks '**' can be used to signify the lowest or highest numeric value, regardless of whether or not the codes are defined as missing data. For example: age(50-**) would include ALL numeric values greater than or equal to 50, including data values like 98 or 99, even if they had been defined as missing-data codes. However, any character missing-data values would still be excluded. Note that '**' cannot be used alone in a filter variable. It can only be used as part of a range.
Multiple filter values
can be specified, separated by
spaces or commas:
city( Chicago,Atlanta Seattle)
Character variable filters are
For example, the following filters are functionally identical:
city( Atlanta )
city( ATLANTA )
city( AtLAnta )
If a filter value contains
internal spaces or commas,
it must be
enclosed in matching quotation marks (either single or double):
city( "New York" )
A filter value containing a single quote (apostrophe)
specified by enclosing it in double quotes:
city( "Knot's Landing" )
a filter value containing double quotes
can be specified by enclosing it in single quotes:
name( 'William "Bill" Smith' )
Leading and trailing spaces, and multiple internal spaces,
are NOT significant. The following filters are all functionally
city( "New York " )
city( "New York" )
city( " New York " )
which are legal for numeric variables,
are not allowed
for character variables:
The following syntax is NOT legal: city( Atlanta-Seattle)
To select more than a few variables, you will probably want to use the group selection procedure. However, the group selection procedure is available only for the original variables set up with the dataset, and is not available for the variables created by RECODE or COMPUTE. Note that both individually specified variables and groups of variables can be combined together in the same subset.
Variables named individually will be output onto the data file immediately after the CASEID variable (which is always the first variable put into the subset data file). Then the variables specified by group follow, in the order they are found in the codebook.
TO SEE what is in a group
In the variable selection tree there is a little arrow to the left of the name of each group and each subgroup of variables. Click on the arrow to display the contents of each group and subgroup of variables.
TO SELECT variables from a group