Beginning with version 4.0, the DDL file is generally imported and processed by the SDA Manager to produce the SDA dataset. And note that the current version of the SDA Manager can import SPSS and Stata system files directly into SDA, without the need to create a DDL file at all.
Begining with SDA version 4.1, the description may also include the filetype of the data file. The default filetype is still a fixed-column text file. However, the data file can now be a CSV file (comma-separated values) or a TSV file (tab-separated values). In a CSV file or a TSV file, the first row must contain the names of the variables.
Default values for many of the keywords of individual variables may also be specified in this section. (Examples would be for the minimum valid code, or the default missing-data code.) In that case, the corresponding characteristic will be set to this specified default, unless overriden in the individual variable specification.
The description of each variable MUST include at least the name of the variable. (See the Rules for variable names.)
If the filetype is fixed, the description must also include the the beginning column number for the variable. If they are different from the default values, the variable description must also include the record number and the width (number of columns).
Each variable description MAY also include a long label, descriptive text (such as questionnaire wording), and category labels for the code categories. If some of the code values represent invalid response codes, they may be flagged for exclusion from analysis; a minimum and a maximum valid code can also be specified (default values for these specifications can also be set).
If the dataset has a case identifier which you want to define as the CASEID variable, that variable definitiion must be the first one in the DDL file. It is useful to have a CASEID variable when you add or modify variables. If new variables are added to an existing SDA dataset, or if a new version of the data is used to modify existing variables, MAKESDA will compare each value of CASEID in the data file with the value of CASEID for the same case in the SDA dataset. If the values do not match, an error message is generated. Note that the values of CASEID do not have to be unique for each case. The only thing that matters is that the new and old values be the same for each case when MAKESDA is run on a pre-existing SDA dataset.
description of the dataset as a whole
*
description of the CASEID variable (if there is one)
*
description of another variable
*
description of another variable
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ (Descriptions of the data file) title= remainder of the line REQUIRED filetype= CSV or TSV fixed ncases= number of cases No checking for a specific N of cases path= directory for new dataset Current directory charset= character set UTF-8 (used to specify an alternate encoding of text; see below) lang= language No language enforcing (code to pass to browsers for display purposes; see the internationalization document) (if filetype = fixed) records/case= number of records per case 1 reclen= number of characters per record 80 (RESET DEFAULT VALUES for individual variable specifications) blank= a number into which an No default conversion all-blank field will be for blanks converted blank_c= blank conversion for No default conversion character variables other= a number into which a field No default conversion with other non-numeric for other characters characters will be converted (numeric type only) case_c= upper or lower No default (default case conversion case conversion for character variables in ASCII only) min= default minimum valid code No default min max= default maximum valid code No default max md= default missing-data code(s) No default md for numeric variables md_c= default MD code(s) for No default md character variables sysmdlabel= default label for system (No Data) missing-data value type= default variable type: numeric numeric or character decimals= default number of implied 0 decimal places (if filetype=fixed): record= default record number for 1 location of variables width= default number of columns 1 for each variable
If default values for variable specifications have been set as part of the general dataset characteristics, those defaults (or global values) can be overridden for a particular variable by simply re-specifying the keyword as part of the definition of that variable.
Those default values can be nullified for a particular variable by setting the keyword equal to a blank or by specifying 'noglobal'. For example, `min= ' or `min=noglobal' will nullify the default `min' for the current variable being defined (because a minimum valid value does not need to be defined for that variable).
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ name= a single string of 1-32 REQUIRED (see rules) ASCII characters iname= name for this item in the No instrument name instrument or questionnaire decimals= number of implied decimal Use dataset default places (usually 0) type= numeric numeric, unless another type has been set as dataset default label= remainder of the line No long variable label catlabels= category labels and text No category labels (see discussion below) md= list of invalid codes and/or No md codes ranges of codes (separated by blanks or commas) See discussion below. min= minimum valid code No defined minimum max= maximum valid code No defined maximum blank= code into which a field System missing-data code containing only blanks will be converted other= code into which a field Unless a non-numeric containing non-numeric character is defined as MD, characters will be non-numeric fields will converted become system missing-data sysmdlabel= label for system missing-data (No Data) value (from a blank input field) text= descriptive text of any length No text stored for this (until next keyword) variable (if filetype=fixed): record= number of the record Use dataset default containing this variable (usually 1) column= column location of the REQUIRED left-most character width= number of columns used by Use dataset default this variable (usually 1)
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ name= a single string of 1-32 REQUIRED (see rules) ASCII characters iname= name for this item in the No instrument name instrument or questionnaire type= character REQUIRED, unless default type is character label= remainder of the line No long variable label catlabels= category values and labels No category labels (see discussion below) md_c= list of character codes to be No missing data codes treated as invalid or MD (multiples separated by blanks or commas; blanks can be specified as MD by using empty quotes -- "") See discussion below. blank_c= character field into which an No conversion of blanks all-blank field will be converted (including quotes if you give them) case_c= upper or lower Mixed case preserved (Convert all the characters into upper or lower case. Note that this only works if the characters are in US-ASCII.) text= descriptive text of any length No text stored for this (until next keyword) variable (if filetype=fixed): record= number of the record Use dataset default containing this variable (usually 1) column= column location of the REQUIRED left-most character width= number of columns used by Use dataset default this variable (usually 1)
Note in particular that embedded blanks or quotes must be enclosed in single or double quotation marks.
Some examples are as follows:
catlabels= 1 Yes 2 No
catlabels= Y Yes N No
The syntax rules for specifying the category codes of character variables are the same as for specifying missing-data codes, as described in the section immediately above. In particular see that section if the category codes for a character variable include blanks or quotation marks.
Long Category Text
If the text corresponding to a category is long, some analysis programs (outside of SDA) will create a shorter category label; this shorter label would be more appropriate for printing the results of an analysis such as crosstabulation.
Depending on the analysis program, the category label might be created by truncating the text to the first 16 or 20 characters. If the label created by truncating the text would be unclear or ambiguous, it is useful to provide your own abbreviated category label. This is done by enclosing the short label in square brackets after the category text. Programs that read the DDL file can then differentiate between the (long) text of a category and the (short) label corresponding to the same category.
catlabels= 1 Definitely will vote in the next election [Definitely vote] 2 Probably will vote in the next election [Probably vote] 3 Probably will not vote in the next election [Prob not vote] 4 Definitely will not vote in the next election [Def not vote]
Category text can extend over more than one line, provided that a backslash (`\') is the last character of every line except the last line:
catlabels= 1 Definitely will vote\ in the next election [Def vote] 2 Probably will vote\ in the next election [Prob vote] 3 Probably will not\ vote in the next election [Prob not vote] 4 Definitely will not vote\ in the next election [Def not vote] 8 Don't know 9 Refused
If decimals=2, for example, the input value `1234' would be
stored as `12.34'.
(This is the same as in previous versions of SDA.)
If decimals=2, for example, the input value `1.237' will retain
all of its decimals and will be stored as `1.237' in versions 2.1
and later of SDA. (In previous versions of SDA that input value
would have been rounded to 2 decimal places and would have been
stored as `1.24'.)
In SDA, blank input fields will be set to the system missing-data value, unless the DDL specification for that variable (or for all variables, globally) includes the `blank=' keyword, to specify what number those fields are to be converted into. (For example, one could specify `blank=-1', to convert all blank numeric input fields to `-1' in the SDA dataset.) This conversion does NOT affect the original ASCII data file.
Non-numeric characters such as 'D' and 'R' are valid for a numeric variable in SDA, and those characters will be stored as such in the dataset, provided that those characters have been defined as missing-data codes. (If those non-numeric characters have not been defined as missing-data codes, they will be treated as invalid codes.)
A period ('.') by itself in a field, or an
input field containing
other non-numeric characters that have not been defined as
missing-data codes,
will ordinarily be converted to the system missing-data value in
SDA. However, if the DDL specification for that variable (or for
all variables, globally) includes the `other=' keyword, the non-
numeric fields will be converted by SDA to the value specified
after `other='. That value will then be examined like any other
input value, to see whether it is a valid value or has been
defined as missing-data or out-of-range.
For information on using other character sets, see the document
on
Internationalization.
This copy feature is invoked by putting the word `copy' on the asterisked line preceding the variable's specifications. The variable whose attributes can be copied is either the previous variable (if no specific name is given) or some specific variable defined earlier in the same DDL file. The general layout is as follows:
description of v101
* copy
description of v102,
using all variable definitions of the PREVIOUS variable (v101) that are not specifically redefined in this new variable description.
* copy v75
description of v103,
using all variable definitions for v75 that are not specifically redefined in this new variable description (assuming that v75 has already been defined).
The following keywords are still recognized and are equivalent to the new keywords shown after the equal sign:
labels = catlabels
lrecl = reclen
noglob = noglobal
scale = decimals
The older missing-data keywords `MD1=mdvalue1' and `MD2=mdvalue2' are also recognized and are equivalent to the new form:
MD= mdvalue1, mdvalue2
title= Some Election Study records/case=2 reclen= 80 path= /mysda/election * name= CASEID label= Case ID of Respondent record= 1 column= 1 width= 4 * name= v75 label= R's Interest in Campaign record= 1 column= 11 md= 8,9 catlabels= 1 Very Interested 2 Somewhat Interested 5 Not Interested 8 Don't know, can't answer [DK] 9 Refused to answer [Ref] text= Some people don't pay much attention to political campaigns so far this year. How about you, are you very interested, somewhat interested, or not interested at all? * copy v75 # Copy the category labels and MD definitions from the variable 'v75' # (Other specifications are redefined for 'v76') name= v76 label= R's Interest in Primary Election Results column= 12 text= How about the results of primary elections. How interested in those results are you? Are you very interested, somewhat interested, or not interested at all? * name= age label= Age of respondent record= 2 column= 20 width= 2 md= 97-* catlabels = 97 Age 97 or over 98 Don't know 99 Refused * name= region label= Character code for each region record= 2 column= 24 width = 2 type = character md_c = X catlabels= NE Northeastern states NC North Central states S Southern states W Western states X (Not available) text = Region of the country - coded from the state codes * name= weight label= Weight variable record= 2 column= 50 width= 6 decimals= 4 md= 0 text= Weight variable with 4 implied decimal places. _____________________________________________________________________(For a more extended example, see the DDL file for the SDA test data which is distributed with the SDA programs.)
DDIreader | Online service to convert DDI files to DDL |
ddlmod | Modify or merge DDL files |
internationalization | Using non-English languages |
makesda | Make SDA variables out of DDL and an ASCII data file |
xconvert | Convert SAS, SPSS, or Stata data definitions into DDL |