SDA 3.5 Documentation for XCONVERT
xconvert - Convert SAS, SPSS, or Stata definitions into XML (DDI)
xconvert [-option] -i input_file
XCONVERT converts data definition statements for SAS, SPSS, or
Stata into descriptions of variables either in XML, using the
conventions of the
(DDI - version 2), or in the
used by SDA programs. (XCONVERT supersedes and replaces the
older XTODDL program.) When converting data definitions into a
DDL file, note that variable names longer than 32 characters
cannot currently be used by the
program to create SDA variables.
XCONVERT obtains the information about each variable from a file
that contains data definitions of the form used to create a
system file for SAS, SPSS, or Stata. The program can also read a
dictionary file for an SPSS system file. This dictionary file
should match the ASCII file and table created by the SPSS ‘write’
command (in versions of SPSS before Version 15).
Each variable defined for one of those systems will produce a DDI
or a DDL segment that includes the variable name, the variable
label, the category labels, missing-data information, and the
record and column location in the ASCII data file.
The locations of the variables must be specified as a set of
fixed columns in the SPSS, SAS, or Stata data definition files.
Freefield entry or format statements are not recognized.
Note that the default output of this program is a valid DDI file,
but with no text for survey questions and no study-level
information except a title. You will have to merge question text
and additional study-level information into the file, to make it
a complete DDI file.
WARNING: Conversions from SPSS have been tested in more different
formats than conversions from SAS or Stata. Not all possible
forms of data definitions are recognized, but the program is
general enough to be useful. Contact CSM support if you have
MEANING OF THE OPTIONS
- -x input_type
- Type of input file
(The default is ‘spss’, an SPSS data definition or syntax file.
Other input types must be specified, in either upper or lower
- SPSS data definitions.
Note that the definitions in an SPSS syntax file must have a
period at the end of each command series.
See the discussion of
- spssbatch (or, equivalently, ‘oldspss’)
- Older SPSS data definitions.
In this format, column 1 in each line is reserved for major
keywords. A period at the end of each command series is
See the discussion of
- SPSS dictionary file plus the table produced by using the
SPSS ‘write’ command on an SPSS system file (a file with the
This procedure only works with SPSS versions before Version 15.
For later versions of SPSS, see the document on the
See also the discussion of
SPSS system files
- SAS data definitions.
- See the discussion of
Take input from the file ‘fname’. (This specification is
- -y output_type
- Type of output file: DDI or DDL (Default is DDI).
- An XML file following the conventions of the Data
- A Data Documentation Language file
used by SDA programs.
Write the output onto the file ‘fname’ (instead of to the
Convert all variable names to capital letters.
Convert all variable names (except CASEID for DDL output) to
(The option flag is a lowercase ‘L’.)
Maximum number of characters to output as a short category
label. (Default is 60)
(See discussion on
below.) This option applies only for XML output for the DDI.)
Take overall study definitions from the file ‘fname’.
(See discussion on
overall dataset definitions
below.) This option applies only for DDL output for SDA
Write list of variables processed onto the file ‘fname’,
instead of the file ‘XCONVERT.LST’.
(See discussion on
renaming the variable list
Write variable descriptions only for the variables listed in
the file ‘fname’.
(See discussion on
writing variable descriptions
Display short program help and available options.
(The program will not do anything else. Same effect as executing
the program with no options.)
FURTHER DISCUSSION OF INPUT TYPES
SPSS interactive-style definitions (-x spss)
The current SPSS interactive-style data definition do not require
column 1 of a line to be reserved for the beginning of major
keywords (like ’data list’). Commands can continue on subsequent
lines without regard for columns.
The end of a command series MUST be indicated by a period either
at the end of a line OR the beginning of a line.
SPSS batch-style definitions (-x spssbatch or -x oldspss)
The program can read SPSS data definitions written in the older
SPSS batch format. In this format all commands (like ‘DATA LIST’
or ‘VALUE LABELS’) must begin in column 1. Continuation lines
for specifications cannot use column 1.
If you do not want to begin a command in column 1 (in order to
make the command file more readable by indenting some of the
commands), you can put a plus sign or a minus sign in the first
column and then begin the command in some other column.
Putting a period at the end of each command series is optional.
Converting an SPSS system file (-x spssdict)
An SPSS system file (with a ‘.sav’ suffix) contains both the data
and the metadata. There are various possibilities available to
convert such files, depending on the version of SPSS you are
- SPSS Versions 15 and later
For newer versions of SPSS, a special script named ’makeddl.sps’
can be run from within SPSS. Under the direction of this script,
SPSS will output a text data file and a matching DDL file. See
If you want to produce a DDI file, use the
program to convert the DDL file to a DDI file.
- Older Versions of SPSS
After you start SPSS and load the system file, you can execute
two commands to generate the required files -- the ‘write’
command and the ‘display’ command.
- An ASCII data file is produced by using the SPSS ‘write’
command. Although you will need the data file later to set up
the study in SDA or for some other purpose, the output needed by
XCONVERT is the table of the column locations for each variable.
In order to get this table, you must specify the option ‘table’
for the ‘write’ command.
- The metadata (labels and missing-data specifications) are
produced by using the ‘display dictionary’
The two commands together would look like this:
write outfile=mydata.txt table / all.
Copy the table of column locations (produced by running the
‘write’ command) onto either the top or the bottom of the output
produced by the ‘display dictionary’ command, and save the result
as a new text file.
This combined file is the required ‘spssdict’ input for the
Stata definitions (-x stata)
Data definitions for Stata are usually divided into a ‘do-file’
(a command file with the suffix ‘.do’ at the end of its name) and
a dictionary file (with the suffix ‘.dct’). For purposes of
conversion to DDI or DDL, combine those two files into a single
file (in either order), and use that single file as input to
FURTHER DISCUSSION OF OTHER OPTIONS
Maximum Number of Characters in Short Category Labels -- DDI Output Only (-n xx)
There are two DDI specifications for the labels of categories --
the ’labl’ element, and the ’txt’ element. In general, the
’labl’ element is intended to be used as a shorter label for
statistical analysis programs, whereas the ’txt’ element is
intended to be used as a longer explanation of the meaning of a
The ’-n’ option allows the user to define what is meant by
’shorter’ or ’longer’ labels. Put the desired number of
characters after the ’-n’. If the length of the category label
is less than or equal to the specified limit (default=60), the
category label (if any) will be output using the ’labl’ element.
If the label is longer than the specified limit, it will be
output using the ’txt’ element.
File with overall dataset definitions -- DDL output only (-s)
(This option applies only if DDL output for SDA has been
requested.) Data definition files for SAS, SPSS, and Stata do
not contain all of the overall dataset information required for a
valid DDL file. One way to supply this information is to prepare
a separate file containing the appropriate information, and to
give the name of that file after the ‘-s’ option flag.
This file should include at least two sets of definitions -- the
overall study specifications (study title, directory location,
and default values for some of the specifications for individual
variables) and the specifications for the ‘CASEID’ variable.
for explanations and examples of these specifications.)
XCONVERT will put the contents of this study definition file at
the beginning of its DDL output file. The dataset definition
file, consequently, could also contain previously created DDL
specifications for variables not defined in the data definition
file -- variables created by transforming other variables, for
If this file is not found, XCONVERT will generate whatever
dataset definitions it can obtain from the source files; other
necessary dataset definitions will have blanks after the equal
sign, and you will have to edit the resulting DDL file manually.
If you really want to generate DDL without overall dataset
definitions at the beginning of the file -- in order to append
the output to a DDL file that already has that information, for
example -- simply create a file with nothing in it, and give the
name of that file after the ‘-s’ flag.
Rename the Variable List File (-v)
XCONVERT will always produce a list of the names of all the
variables that were written to the DDI or DDL output file. If
you select this option, and give a filename after the ‘-v’ flag,
the variable list will be written onto that file instead of onto
the default file ‘XCONVERT.LST’. The variable list is written
with one variable name per line, in the order that they are found
in the source file.
This list of variables is particularly useful as input to the SDA
program. XCODEBK uses a variable list to control the order of
variables in a codebook. Without a list of variables, XCODEBK
will output the variable descriptions of an SDA dataset in
alphabetical order by variable name, which is unlikely to be the
same order in which questions were asked during the interview.
The variable list can also be edited to insert headings for
groups of variables and instructions about which output template
to apply to which variables. XCODEBK will then generate a
codebook using those headings and the specified templates.
Write Variable Descriptions Only for Listed Variables (-w)
If you want the program to generate variable descriptions for
only a subset of variables, prepare a file containing the names
of the variables you want. Variable names may be listed one per
line or several per line, separated by spaces, tabs, or commas.
Blank lines in this file are ignored, as is everything to the
right of a pound sign (#).
Note that data descriptions are generated for variables in the
order in which those variables are defined in the source file. A
variable list file does not affect that order.
- xconvert -i myspss.txt -o myddi.xml -l
Convert the SPSS data definition file ‘myspss.txt’ into DDI and
write the DDI onto the file ‘myddi.xml’. Also, make all variable
- xconvert -x spssdict -y DDL -i myspssdict.txt -o myddl.txt
Convert the SPSS data dictionary and table of column locations
contained in the file ‘myspssdict.txt’ into the DDL file
- xconvert -x sas -y DDL -i mysas.txt -o myddl.txt
Convert the SAS data definition file ‘mysas.txt’ into the DDL
|| Data Documentation Initiative
|| Data Description Language used by SDA Programs
|| Examples of SPSS/SAS/Stata files
|| Convert an SPSS file into DDL and data files
|| SDA program to convert DDL files to SPSS/SAS/Stata/DDI
CSM, UC Berkeley
April 12, 2011