SDA 4.1 Documentation for XCONVERT
NAME
xconvert - Convert SAS, SPSS, or Stata definitions into XML (DDI)
or DDL
USAGE
xconvert [-option] -i input_file
DESCRIPTION
XCONVERT converts data definition statements for SAS, SPSS, or
Stata into descriptions of variables either in XML, using the
conventions of the
Data Documentation Initiative
(DDI-Codebook) or in the
DDL format
used by SDA programs.
Users should be aware
that the current version of SDA can directly import SPSS and
Stata system files. So there is no longer a need to create a DDL
file just to create an SDA dataset.
When converting data definitions into a DDL file, note that
variable names longer than 32 characters cannot currently be used
by the
MAKESDA
program to create SDA variables.
CONTENTS OF THIS DOCUMENT
OVERVIEW
XCONVERT obtains the information about each variable from a file
that contains data definitions of the form used to create a
system file for SAS, SPSS, or Stata. (This type of file is
sometimes referred to as a "syntax file.") The program can also
read a dictionary file for an SPSS system file. This dictionary
file should match the ASCII file and table created by the SPSS
`write' command (in versions of SPSS before Version 15).
The locations of the variables must be specified as a set of
fixed columns in the SPSS, SAS, or Stata data definition files.
Freefield entry or format statements are not recognized.
Each variable defined for one of those systems will produce a DDI
or a DDL segment that includes the variable name, the variable
label, the category labels, missing-data information, and the
record and column location in the ASCII data file.
Note that the default output of this program is a valid DDI file,
but with no text for survey questions and no study-level
information except a title. You will have to merge question text
and additional study-level information into the file, to make it
a complete DDI file.
WARNING: Conversions from SPSS have been tested in more different
formats than conversions from SAS or Stata. Not all possible
forms of data definitions are recognized, but the program is
general enough to be useful. Contact SDA support if you have
problems (sda@berkeley.edu).
MEANING OF THE OPTIONS
File Names for Input and Output
- -i fname
- Take input from the file `fname'. (This specification is
REQUIRED.)
- -o fname
- Write the output onto the file `fname' (instead of to the
standard output).
Input Type
- -x input_type
- Type of input file
(The default is `spss', an SPSS data definition or syntax file.
Other input types must be specified, in either upper or lower
case.)
- spss
- SPSS data definitions.
Note that the definitions in an SPSS syntax file must have a
period at the end of each command series.
See the discussion of
SPSS definitions
below.
- spssbatch (or, equivalently, `oldspss')
- Older SPSS data definitions.
In this format, column 1 in each line is reserved for major
keywords. A period at the end of each command series is
optional.
See the discussion of
SPSS definitions
below.
- spssdict
- SPSS dictionary file plus the table produced by using the
SPSS `write' command on an SPSS system file (a file with the
`.sav' suffix).
This procedure only works with SPSS versions before Version 15.
See also the discussion of
SPSS system files
below.
- sas
- SAS data definitions.
- stata
- See the discussion of
Stata definitions
below.
Output Type
- -y output_type
- Type of output file: DDI or DDL (Default is DDI).
- DDI
- An XML file following the conventions of the Data
Documentation Initiative
(DDI-Codebook).
- DDL
- A Data Documentation Language file
(DDL file)
used by SDA programs.
Other Options
- -c
- Convert all variable names to capital letters.
- -l
- Convert all variable names (except CASEID for DDL output) to
lowercase letters.
(The option flag is a lowercase `L'.)
- -n max_characters
- Maximum number of characters to output as a short category
label. (Default is 60)
(See discussion on
maximum characters
below.) This option applies only for XML output for the DDI.)
- -s fname
- Take overall study definitions from the file `fname'.
(See discussion on
overall dataset definitions
below.) This option applies only for DDL output for SDA
programs.)
- -v fname
- Write list of variables processed onto the file `fname',
instead of the file `XCONVERT.LST'.
(See discussion on
renaming the variable list
below.)
- -w fname
- Write variable descriptions only for the variables listed in
the file `fname'.
(See discussion on
writing variable descriptions
below.)
- -h
- Display short program help and available options.
(The program will not do anything else. Same effect as executing
the program with no options.)
FURTHER DISCUSSION OF INPUT TYPES
SPSS interactive-style definitions (-x spss)
The current SPSS interactive-style data definitions do not
require column 1 of a line to be reserved for the beginning of
major keywords (like 'data list'). Commands can continue on
subsequent lines without regard for columns.
The end of a command series MUST be indicated by a period either
at the end of a line OR at the beginning of a line.
SPSS batch-style definitions (-x spssbatch or -x oldspss)
The program can read SPSS data definitions written in the older
SPSS batch format. In this format all commands (like `DATA LIST'
or `VALUE LABELS') must begin in column 1. Continuation lines
for specifications cannot use column 1.
If you do not want to begin a command in column 1 (in order to
make the command file more readable by indenting some of the
commands), you can put a plus sign or a minus sign in the first
column and then begin the command in some other column.
Putting a period at the end of each command series is optional.
Converting an SPSS system file (-x spssdict)
An SPSS system file (with a `.sav' suffix) contains both the data
and the metadata. There are various possibilities available to
convert such files, depending on the version of SPSS you are
running.
- SPSS Versions 15 and later
For newer versions of SPSS, the simplest procedure is to import
the SPSS system file into SDA. Then you can use the
SDAtoXML
program to generate a DDI file.
- Older Versions of SPSS
After you start SPSS and load the system file, you can execute
two commands to generate the required files -- the `write'
command and the `display' command.
- An ASCII data file is produced by using the SPSS `write'
command. Although you will need the data file later to set up
the study in SDA or for some other purpose, the output needed by
XCONVERT is the table of the column locations for each variable.
In order to get this table, you must specify the option `table'
for the `write' command.
- The metadata (labels and missing-data specifications) are
produced by using the `display dictionary'
command.
The two commands together would look like this:
write outfile=mydata.txt table / all.
display dictionary.
execute.
Copy the table of column locations (produced by running the
`write' command) onto either the top or the bottom of the output
produced by the `display dictionary' command, and save the result
as a new text file.
This combined file is the required `spssdict' input for the
XCONVERT program.
Stata definitions (-x stata)
Data definitions for Stata are usually divided into a `do-file'
(a command file with the suffix `.do' at the end of its name) and
a dictionary file (with the suffix `.dct'). For purposes of
conversion to DDI or DDL, combine those two files into a single
file (in either order), and use that single file as input to
XCONVERT.
FURTHER DISCUSSION OF OTHER OPTIONS
Maximum Number of Characters in Short Category Labels -- DDI Output Only (-n xx)
There are two DDI specifications for the labels of categories --
the 'labl' element, and the 'txt' element. In general, the
'labl' element is intended to be used as a shorter label for
statistical analysis programs, whereas the 'txt' element is
intended to be used as a longer explanation of the meaning of a
particular category.
The '-n' option allows the user to define what is meant by
'shorter' or 'longer' labels. Put the desired number of
characters after the '-n'. If the length of the category label
is less than or equal to the specified limit (default=60), the
category label (if any) will be output using the 'labl' element.
If the label is longer than the specified limit, it will be
output using the 'txt' element.
File with overall dataset definitions -- DDL output only (-s)
(This option applies only if DDL output for SDA has been
requested.) Data definition files for SAS, SPSS, and Stata do
not contain all of the overall dataset information required for a
valid DDL file. One way to supply this information is to prepare
a separate file containing the appropriate information, and to
give the name of that file after the `-s' option flag.
This file should include at least two sets of definitions -- the
overall study specifications (study title, directory location,
and default values for some of the specifications for individual
variables) and the specifications for the `CASEID' variable.
(See the
DDL document
for explanations and examples of these specifications.)
XCONVERT will put the contents of this study definition file at
the beginning of its DDL output file. The dataset definition
file, consequently, could also contain previously created DDL
specifications for variables not defined in the data definition
file -- variables created by transforming other variables, for
instance.
If this file is not found, XCONVERT will generate whatever
dataset definitions it can obtain from the source files; other
necessary dataset definitions will have blanks after the equal
sign, and you will have to edit the resulting DDL file manually.
If you really want to generate DDL without overall dataset
definitions at the beginning of the file -- in order to append
the output to a DDL file that already has that information, for
example -- simply create a file with nothing in it, and give the
name of that file after the `-s' flag.
Rename the Variable List File (-v)
XCONVERT will always produce a list of the names of all the
variables that were written to the DDI or DDL output file. If
you select this option, and give a filename after the `-v' flag,
the variable list will be written onto that file instead of onto
the default file `XCONVERT.LST'. The variable list is written
with one variable name per line, in the order that they are found
in the source file.
This list of variables is particularly useful as input to the SDA
XCODEBK
program. XCODEBK uses a variable list to control the order of
variables in a codebook. Without a list of variables, XCODEBK
will output the variable descriptions of an SDA dataset in
alphabetical order by variable name, which is unlikely to be the
same order in which questions were asked during the interview.
The variable list can also be edited to insert headings for
groups of variables and instructions about which output template
to apply to which variables. XCODEBK will then generate a
codebook using those headings and the specified templates.
Write Variable Descriptions Only for Listed Variables (-w)
If you want the program to generate variable descriptions for
only a subset of variables, prepare a file containing the names
of the variables you want. Variable names may be listed one per
line or several per line, separated by spaces, tabs, or commas.
Blank lines in this file are ignored, as is everything to the
right of a pound sign (#).
Note that data descriptions are generated for variables in the
order in which those variables are defined in the source file. A
variable list file does not affect that order.
EXAMPLES
- xconvert -i myspss.txt -o myddi.xml -l
-
Convert the SPSS data definition file `myspss.txt' into DDI and
write the DDI onto the file `myddi.xml'. Also, make all variable
names lowercase
- xconvert -x spssdict -y DDL -i myspssdict.txt -o myddl.txt
-
Convert the SPSS data dictionary and table of column locations
contained in the file `myspssdict.txt' into the DDL file
`myddl.txt'.
- xconvert -x sas -y DDL -i mysas.txt -o myddl.txt
-
Convert the SAS data definition file `mysas.txt' into the DDL
file `myddl.txt'.
SEE ALSO
DDI |
Data Documentation Initiative |
DDL |
Data Description Language used by SDA Programs |
xconverte |
Examples of SPSS/SAS/Stata files |
ddltox |
SDA program to convert DDL files to SPSS/SAS/Stata/DDI |
CSM, UC Berkeley/ISA
September 10, 2020