Beginning with version 4.0, the DDL file is generally imported and processed by the SDA Manager to produce the SDA dataset.
The file with DDL can be created with a text editor or with various converter programs. DDL files can be modified or merged with the DDLMOD program.
Default values for many of the keywords of individual variables may also be specified in this section. (Examples would be for the width of the field, the minimum valid code, or the default missing-data code.) In that case, the corresponding characteristic will be set to this specified default, unless overriden in the individual variable specification.
The description of each variable MUST include its name and its location in the data file (beginning column number). The description must also include the following specifications, IF they are different from the default values: the width (number of columns), the record number, and the number of implied decimal places (if there are implied decimal places in the input field).
Each variable description MAY also include a long label, descriptive text (such as questionnaire wording), and category labels for the code categories. If some of the code values represent invalid response codes, they may be flagged for exclusion from analysis; a minimum and a maximum valid code can also be specified (default values for these specifications can also be set).
The first variable description MUST be for a variable named ‘CASEID’, if the DDL file is to be input to the program MAKESDA in order to create or to add variables to an SDA dataset. If variables are added to an existing SDA dataset, MAKESDA checks the contents of CASEID to make sure that the value for each case matches the value stored previously.
description of the dataset as a whole
description of the CASEID variable
description of a variable
description of another variable
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ (Descriptions of the data file) title= remainder of the line REQUIRED records/case= number of records per case 1 reclen= number of characters per record 80 ncases= number of cases No checking for a specific N of cases path= directory for new dataset Current directory charset= character set US-ASCII (used to specify an alternate encoding of text; see below) lang= language No language enforcing (code to pass to browsers for display purposes; see the internationalization document) (DEFAULT VALUES for individual variable specifications) blank= a number into which an No default conversion all-blank field will be for blanks converted blank_c= blank conversion for No default conversion character variables other= a number into which a field No default conversion with other non-numeric for other characters characters will be converted (numeric type only) case_c= default case conversion No default case for character variables conversion min= default minimum valid code No default min max= default maximum valid code No default max md= default missing-data code(s) No default md for numeric variables md_c= default MD code(s) for No default md character variables sysmdlabel= default label for system (No Data) missing-data value record= default record number for 1 location of variables decimals= default number of implied 0 decimal places type= default variable type: numeric numeric or character width= default number of columns 1 for each variable
If default values for variable specifications have been set as part of the general dataset characteristics, those defaults (or global values) can be overridden for a particular variable by simply re-specifying the keyword as part of the definition of that variable.
Those default values can be nullified for a particular variable by setting the keyword equal to a blank or by specifying ’noglobal’. For example, ‘min= ’ or ‘min=noglobal’ will nullify the default ‘min’ for the current variable being defined (because a minimum valid value does not need to be defined for that variable).
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ name= a single string of 1-32 REQUIRED ASCII characters iname= name for this item in the No instrument name instrument or questionnaire record= number of the record Use dataset default containing this variable (usually 1) column= column location of the REQUIRED left-most character width= number of columns used by Use dataset default this variable (usually 1) decimals= number of implied decimal Use dataset default places (usually 0) type= numeric numeric, unless another type has been set as dataset default label= remainder of the line No long variable label catlabels= category labels and text No category labels (see discussion below) md= list of invalid codes and/or No md codes ranges of codes (separated by blanks or commas) See discussion below. min= minimum valid code No defined minimum max= maximum valid code No defined maximum blank= code into which a field System missing-data code containing only blanks will be converted other= code into which a field Unless a non-numeric containing non-numeric character is defined as MD, characters will be non-numeric fields will converted become system missing-data sysmdlabel= label for system missing-data (No Data) value (from a blank input field) text= descriptive text of any length No text stored for this (until next keyword) variable
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ name= a single string of 1-32 REQUIRED ASCII characters iname= name for this item in the No instrument name instrument or questionnaire record= number of the record Use dataset default containing this variable (usually 1) column= column location of the REQUIRED left-most character width= number of columns used by Use dataset default this variable (usually 1) type= character REQUIRED, unless default type is character label= remainder of the line No long variable label catlabels= category values and labels No category labels (see discussion below) md_c= list of character codes to be No missing data codes treated as invalid or MD (multiples separated by blanks or commas; blanks can be specified as MD by using empty quotes -- "") See discussion below. blank_c= character field into which an No conversion of blanks all-blank field will be converted (including quotes if you give them) case_c= upper or lower Mixed case preserved (convert all the characters into upper or lower case) text= descriptive text of any length No text stored for this (until next keyword) variable
Note in particular that embedded blanks or quotes must be enclosed in single or double quotation marks.
Some examples are as follows:
catlabels= 1 Yes 2 No
catlabels= Y Yes N No
The syntax rules for specifying the category codes of character variables are the same as for specifying missing-data codes, as described in the section immediately above. In particular see that section if the category codes for a character variable include blanks or quotation marks.
Long Category Text
If the text corresponding to a category is long, some analysis programs (outside of SDA) will create a shorter category label; this shorter label would be more appropriate for printing the results of an analysis such as crosstabulation.
Depending on the analysis program, the category label might be created by truncating the text to the first 16 or 20 characters. If the label created by truncating the text would be unclear or ambiguous, it is useful to provide your own abbreviated category label. This is done by enclosing the short label in square brackets after the category text. Programs that read the DDL file can then differentiate between the (long) text of a category and the (short) label corresponding to the same category.
catlabels= 1 Definitely will vote in the next election [Definitely vote] 2 Probably will vote in the next election [Probably vote] 3 Probably will not vote in the next election [Prob not vote] 4 Definitely will not vote in the next election [Def not vote]
Category text can extend over more than one line, provided that a backslash (‘\’) is the last character of every line except the last line:
catlabels= 1 Definitely will vote\ in the next election [Def vote] 2 Probably will vote\ in the next election [Prob vote] 3 Probably will not\ vote in the next election [Prob not vote] 4 Definitely will not vote\ in the next election [Def not vote] 8 Don’t know 9 Refused
If decimals=2, for example, the input value ‘1234’ would be
stored as ‘12.34’.
(This is the same as in previous versions of SDA.)
If decimals=2, for example, the input value ‘1.237’ will retain
all of its decimals and will be stored as ‘1.237’ in versions 2.1
and later of SDA. (In previous versions of SDA that input value
would have been rounded to 2 decimal places and would have been
stored as ‘1.24’.)
In SDA, blank input fields will be set to the system missing-data value, unless the DDL specification for that variable (or for all variables, globally) includes the ‘blank=’ keyword, to specify what number those fields are to be converted into. (For example, one could specify ‘blank=-1’, to convert all blank numeric input fields to ‘-1’ in the SDA dataset.) This conversion does NOT affect the original ASCII data file.
Non-numeric characters such as ’D’ and ’R’ are valid for a numeric variable in SDA, and those characters will be stored as such in the dataset, provided that those characters have been defined as missing-data codes. (If those non-numeric characters have not been defined as missing-data codes, they will be treated as invalid codes.)
A period (’.’) by itself in a field, or an
input field containing
other non-numeric characters that have not been defined as
will ordinarily be converted to the system missing-data value in
SDA. However, if the DDL specification for that variable (or for
all variables, globally) includes the ‘other=’ keyword, the non-
numeric fields will be converted by SDA to the value specified
after ‘other=’. That value will then be examined like any other
input value, to see whether it is a valid value or has been
defined as missing-data or out-of-range.
For information on using other character sets, see the document
This copy feature is invoked by putting the word ‘copy’ on the asterisked line preceding the variable’s specifications. The variable whose attributes can be copied is either the previous variable (if no specific name is given) or some specific variable defined earlier in the same DDL file. The general layout is as follows:
description of v101
description of v102,
using all variable definitions of the PREVIOUS variable (v101) that are not specifically redefined in this new variable description.
* copy v75
description of v103,
using all variable definitions for v75 that are not specifically redefined in this new variable description (assuming that v75 has already been defined).
The following keywords are still recognized and are equivalent to the new keywords shown after the equal sign:
labels = catlabels
lrecl = reclen
noglob = noglobal
scale = decimals
The older missing-data keywords ‘MD1=mdvalue1’ and ‘MD2=mdvalue2’ are also recognized and are equivalent to the new form:
MD= mdvalue1, mdvalue2
title= Some Election Study records/case=2 reclen= 80 path= /mysda/election * name= CASEID label= Case ID of Respondent record= 1 column= 1 width= 4 * name= v75 label= R’s Interest in Campaign record= 1 column= 11 md= 8,9 catlabels= 1 Very Interested 2 Somewhat Interested 5 Not Interested 8 Don’t know, can’t answer [DK] 9 Refused to answer [Ref] text= Some people don’t pay much attention to political campaigns so far this year. How about you, are you very interested, somewhat interested, or not interested at all? * copy v75 # Copy the category labels and MD definitions from the variable ’v75’ # (Other specifications are redefined for ’v76’) name= v76 label= R’s Interest in Primary Election Results column= 12 text= How about the results of primary elections. How interested in those results are you? Are you very interested, somewhat interested, or not interested at all? * name= age label= Age of respondent record= 2 column= 20 width= 2 md= 97-* catlabels = 97 Age 97 or over 98 Don’t know 99 Refused * name= region label= Character code for each region record= 2 column= 24 width = 2 type = character md_c = X catlabels= NE Northeastern states NC North Central states S Southern states W Western states X (Not available) text = Region of the country - coded from the state codes * name= weight label= Weight variable record= 2 column= 50 width= 6 decimals= 4 md= 0 text= Weight variable with 4 implied decimal places. _____________________________________________________________________(For a more extended example, see the DDL file for the SDA test data which is distributed with the SDA programs.)
|ddireader||DDI to DDL conversion service|
|ddlmod||Modify or merge DDL files|
|internationalization||Using non-English languages|
|makeddl.sps||Convert an SPSS file into DDL and data files|
|makesda||Make SDA variables out of DDL and an ASCII data file|
|xconvert||Convert SAS, SPSS, or Stata data definitions into DDL|