SDA 4.1 Documentation for XCONVERT


NAME

xconvert - Convert SAS, SPSS, or Stata definitions into XML (DDI) or DDL

USAGE

xconvert [-option] -i input_file

DESCRIPTION

XCONVERT converts data definition statements for SAS, SPSS, or Stata into descriptions of variables either in XML, using the conventions of the Data Documentation Initiative (DDI-Codebook) or in the DDL format used by SDA programs.

Users should be aware that the current version of SDA can directly import SPSS and Stata system files. So there is no longer a need to create a DDL file just to create an SDA dataset.

When converting data definitions into a DDL file, note that variable names longer than 32 characters cannot currently be used by the MAKESDA program to create SDA variables.


CONTENTS OF THIS DOCUMENT


OVERVIEW

XCONVERT obtains the information about each variable from a file that contains data definitions of the form used to create a system file for SAS, SPSS, or Stata. (This type of file is sometimes referred to as a "syntax file.") The program can also read a dictionary file for an SPSS system file. This dictionary file should match the ASCII file and table created by the SPSS `write' command (in versions of SPSS before Version 15).

The locations of the variables must be specified as a set of fixed columns in the SPSS, SAS, or Stata data definition files. Freefield entry or format statements are not recognized.

Each variable defined for one of those systems will produce a DDI or a DDL segment that includes the variable name, the variable label, the category labels, missing-data information, and the record and column location in the ASCII data file.

Note that the default output of this program is a valid DDI file, but with no text for survey questions and no study-level information except a title. You will have to merge question text and additional study-level information into the file, to make it a complete DDI file.

WARNING: Conversions from SPSS have been tested in more different formats than conversions from SAS or Stata. Not all possible forms of data definitions are recognized, but the program is general enough to be useful. Contact SDA support if you have problems (sda@berkeley.edu).


MEANING OF THE OPTIONS

File Names for Input and Output

-i fname
Take input from the file `fname'. (This specification is REQUIRED.)

-o fname
Write the output onto the file `fname' (instead of to the standard output).

Input Type

-x input_type
Type of input file
(The default is `spss', an SPSS data definition or syntax file. Other input types must be specified, in either upper or lower case.)

spss
SPSS data definitions.
Note that the definitions in an SPSS syntax file must have a period at the end of each command series.
See the discussion of SPSS definitions below.

spssbatch (or, equivalently, `oldspss')
Older SPSS data definitions.
In this format, column 1 in each line is reserved for major keywords. A period at the end of each command series is optional.
See the discussion of SPSS definitions below.

spssdict
SPSS dictionary file plus the table produced by using the SPSS `write' command on an SPSS system file (a file with the `.sav' suffix).
This procedure only works with SPSS versions before Version 15. See also the discussion of SPSS system files below.

sas
SAS data definitions.

stata
See the discussion of Stata definitions below.

Output Type

-y output_type
Type of output file: DDI or DDL (Default is DDI).

DDI
An XML file following the conventions of the Data Documentation Initiative (DDI-Codebook).

DDL
A Data Documentation Language file (DDL file) used by SDA programs.

Other Options

-c
Convert all variable names to capital letters.

-l
Convert all variable names (except CASEID for DDL output) to lowercase letters.
(The option flag is a lowercase `L'.)

-n max_characters
Maximum number of characters to output as a short category label. (Default is 60)
(See discussion on maximum characters below.) This option applies only for XML output for the DDI.)

-s fname
Take overall study definitions from the file `fname'.
(See discussion on overall dataset definitions below.) This option applies only for DDL output for SDA programs.)

-v fname
Write list of variables processed onto the file `fname', instead of the file `XCONVERT.LST'.
(See discussion on renaming the variable list below.)

-w fname
Write variable descriptions only for the variables listed in the file `fname'.
(See discussion on writing variable descriptions below.)

-h
Display short program help and available options.
(The program will not do anything else. Same effect as executing the program with no options.)

FURTHER DISCUSSION OF INPUT TYPES

SPSS interactive-style definitions (-x spss)

The current SPSS interactive-style data definitions do not require column 1 of a line to be reserved for the beginning of major keywords (like 'data list'). Commands can continue on subsequent lines without regard for columns.

The end of a command series MUST be indicated by a period either at the end of a line OR at the beginning of a line.

SPSS batch-style definitions (-x spssbatch or -x oldspss)

The program can read SPSS data definitions written in the older SPSS batch format. In this format all commands (like `DATA LIST' or `VALUE LABELS') must begin in column 1. Continuation lines for specifications cannot use column 1.

If you do not want to begin a command in column 1 (in order to make the command file more readable by indenting some of the commands), you can put a plus sign or a minus sign in the first column and then begin the command in some other column.

Putting a period at the end of each command series is optional.

Converting an SPSS system file (-x spssdict)

An SPSS system file (with a `.sav' suffix) contains both the data and the metadata. There are various possibilities available to convert such files, depending on the version of SPSS you are running.

Stata definitions (-x stata)

Data definitions for Stata are usually divided into a `do-file' (a command file with the suffix `.do' at the end of its name) and a dictionary file (with the suffix `.dct'). For purposes of conversion to DDI or DDL, combine those two files into a single file (in either order), and use that single file as input to XCONVERT.

FURTHER DISCUSSION OF OTHER OPTIONS

Maximum Number of Characters in Short Category Labels -- DDI Output Only (-n xx)

There are two DDI specifications for the labels of categories -- the 'labl' element, and the 'txt' element. In general, the 'labl' element is intended to be used as a shorter label for statistical analysis programs, whereas the 'txt' element is intended to be used as a longer explanation of the meaning of a particular category.

The '-n' option allows the user to define what is meant by 'shorter' or 'longer' labels. Put the desired number of characters after the '-n'. If the length of the category label is less than or equal to the specified limit (default=60), the category label (if any) will be output using the 'labl' element. If the label is longer than the specified limit, it will be output using the 'txt' element.

File with overall dataset definitions -- DDL output only (-s)

(This option applies only if DDL output for SDA has been requested.) Data definition files for SAS, SPSS, and Stata do not contain all of the overall dataset information required for a valid DDL file. One way to supply this information is to prepare a separate file containing the appropriate information, and to give the name of that file after the `-s' option flag.

This file should include at least two sets of definitions -- the overall study specifications (study title, directory location, and default values for some of the specifications for individual variables) and the specifications for the `CASEID' variable. (See the DDL document for explanations and examples of these specifications.)

XCONVERT will put the contents of this study definition file at the beginning of its DDL output file. The dataset definition file, consequently, could also contain previously created DDL specifications for variables not defined in the data definition file -- variables created by transforming other variables, for instance.

If this file is not found, XCONVERT will generate whatever dataset definitions it can obtain from the source files; other necessary dataset definitions will have blanks after the equal sign, and you will have to edit the resulting DDL file manually.

If you really want to generate DDL without overall dataset definitions at the beginning of the file -- in order to append the output to a DDL file that already has that information, for example -- simply create a file with nothing in it, and give the name of that file after the `-s' flag.

Rename the Variable List File (-v)

XCONVERT will always produce a list of the names of all the variables that were written to the DDI or DDL output file. If you select this option, and give a filename after the `-v' flag, the variable list will be written onto that file instead of onto the default file `XCONVERT.LST'. The variable list is written with one variable name per line, in the order that they are found in the source file.

This list of variables is particularly useful as input to the SDA XCODEBK program. XCODEBK uses a variable list to control the order of variables in a codebook. Without a list of variables, XCODEBK will output the variable descriptions of an SDA dataset in alphabetical order by variable name, which is unlikely to be the same order in which questions were asked during the interview. The variable list can also be edited to insert headings for groups of variables and instructions about which output template to apply to which variables. XCODEBK will then generate a codebook using those headings and the specified templates.

Write Variable Descriptions Only for Listed Variables (-w)

If you want the program to generate variable descriptions for only a subset of variables, prepare a file containing the names of the variables you want. Variable names may be listed one per line or several per line, separated by spaces, tabs, or commas. Blank lines in this file are ignored, as is everything to the right of a pound sign (#).

Note that data descriptions are generated for variables in the order in which those variables are defined in the source file. A variable list file does not affect that order.


EXAMPLES

xconvert -i myspss.txt -o myddi.xml -l

Convert the SPSS data definition file `myspss.txt' into DDI and write the DDI onto the file `myddi.xml'. Also, make all variable names lowercase

xconvert -x spssdict -y DDL -i myspssdict.txt -o myddl.txt

Convert the SPSS data dictionary and table of column locations contained in the file `myspssdict.txt' into the DDL file `myddl.txt'.

xconvert -x sas -y DDL -i mysas.txt -o myddl.txt

Convert the SAS data definition file `mysas.txt' into the DDL file `myddl.txt'.

SEE ALSO

DDI Data Documentation Initiative
DDL Data Description Language used by SDA Programs
xconverte Examples of SPSS/SAS/Stata files
ddltox SDA program to convert DDL files to SPSS/SAS/Stata/DDI


CSM, UC Berkeley/ISA
September 10, 2020