SDA 3.4 Documentation for SDAtoXML


SDAtoXML - Read an SDA dataset and create variable definitions in XML


sdatoxml -s SDA_study [-option]


SDAtoXML reads an SDA dataset and outputs variable definitions in XML, using the conventions of the Data Documentation Initiative (DDI-Version 2).

The XML file produced by SDAtoXML is a valid DDI file, but it has only minimal study-level information. Besides the variable definitions, the file contains only the study title, the number of cases, and (if a DDL file is referenced) the dimensions of the original ASCII data file. The current date is also stored as the date on which the DDI file was produced.


Location of SDA study -- REQUIRED

-s SDA_study
Path of the SDA study (either absolute or relative path)

Variables to Process -- May Specify ONE of the following files

These files are used to indicate which variables to document, and in which order. If neither of the following files is specified, the program will output variable definitions for all variables in the SDA dataset, in alphabetical order (except that CASEID will be the first variable defined).

-d DDL_file
Name of the DDL file for this SDA dataset.

If a DDL file is specified, some study-level information is taken from the DDL file and output to the XML file. This includes the dimensions of the original ASCII data file, as contained in the ’reclen=’, and ’records/case=’ specifications.

All of the specifications for each variable, however, are taken only from the SDA dataset, not from the DDL file. Only the names of the variables, given after the ’name=’ specifications in the DDL file, are used by the program (as a variable list).

-w fname
Write variable descriptions only for the variables listed in the file ‘fname’.

Variable names may be listed one per line or several per line, separated by spaces, tabs, or commas. Blank lines in this file are ignored, as is everything to the right of a pound sign (#).


-o fname
Write the XML output onto the file ‘fname’ (instead of to the standard output).

-m max_categories
Maximum number of categories in a variable (numeric or character) for which category frequencies and percentages will be output.
(Default is 40 categories. Maximum is 5000.)

If a variable has more than the specified number of categories, no frequencies or percentages for individual categories are output. However, any category labels that have been defined for a variable will always be output, regardless of the number of categories.

If category percentages are output, they are calculated on the basis of ALL cases, both valid and invalid. The system-missing category, if present, will be output with a ‘.’ (period or dot) as the category value.

-n max_characters
Maximum number of characters to output as a short category label.
(Default is 60)
(See discussion on maximum number of characters below. )

Convert all variable names to capital letters.

Convert all variable names (except CASEID) to lowercase letters.
(This option flag is a lowercase ‘L’.)

Generate summary statistics for EVERY NUMERIC variable. Some of these summary statistics may not make sense for variables with unordered categories, but they are output all the same. All statistics are based on the VALID cases -- excluding missing-data codes and out-of-range values.

The statistics generated are the following: mean, median, mode, standard deviation, number of valid cases, number of invalid cases, minimum valid category value, and maximum valid category value. If a variable has a very large number of distinct categories, the median and mode may not be computed, but the other statistics will be output for all variables.

Write list of variables processed onto the file ‘fname’, instead of the file ‘SDATOXML.LST’.
(See discussion on renaming list of variables below. )

Display short program help and available options.
(The program will not do anything else. Same effect as executing the program with no options specified.)

Maximum Number of Characters in Short Category Labels (-n)

There are two DDI specifications for the labels of categories -- the ’labl’ element, and the ’txt’ element. In general, the ’labl’ element is intended to be used as a shorter label for statistical analysis programs, whereas the ’txt’ element is intended to be used as a longer explanation of the meaning of a particular category.

The ’-n’ option allows the user to define what is meant by ’shorter’ or ’longer’ labels. If the length of the category label in SDA is less than or equal to the specified limit (default=60), the category label (if any) will be output using the ’labl’ element. If the label is longer than the specified limit, it will be output using the ’txt’ element.

In SDA it is possible to define category labels that have both a long and a short version. The long category label would generally be used in a codebook, whereas the short version would be used in a table. In defining such labels for SDA, the short version of a category label is put between square brackets. An example of such a label would be:

Definitely will vote in the next election [Definitely vote]
When converting such labels into XML definitions, the short label will be put out using the ’labl’ element, and the long label will be put out using the ’txt’ element.

Rename the File Containing the List of Variables Produced (-v)

SDAtoXML will always produce a list of the names of all the variables that were written to the XML DDI output file. If you select this option, and give a filename after the ‘-v’ flag, the variable list will be written onto that file instead of onto the default file ‘SDATOXML.LST’. The variable list is written with one variable name per line, in the order that they are written into the XML file.

If the variable names have been changed to upper- or lower-case letters (by the ‘-c’ or ‘-l’ options), those changes will be reflected in the names of the variables written to this file.


sdatoxml -s /mysda -o myddi.xml

Convert SDA variable information in the SDA dataset located in ‘/mysda’ into XML and write the XML into the file ‘myddi.xml’.

sdatoxml -s . -o myddi.xml -c -m 100

Convert the SDA dataset in the current directory (’.’) into XML. Turn all the names of variables into capital letters, and document the categories of all variables containing up to 100 distinct categories per variable.


DDI Data Documentation Initiative - Version 2
DDL Data Description Language used for some SDA Programs
xconvert Convert SAS, SPSS, or Stata definitions into XML (DDI)

CSM, UC Berkeley
January 25, 2010