SDA 4.1 Documentation for MAKESDA


NAME

makesda - generate SDA variables out of DDL and a data file

USAGE

makesda [-option] -l DDLfile -d datafile

DESCRIPTION

MAKESDA creates or modifies an SDA dataset. Prior to SDA version 4.0, the MAKESDA program had to be run as a command-line program. Starting with version 4.0, the SDA Manager generally handles the creation of SDA datasets (by itself running MAKESDA). However, MAKESDA can still be used directly by the archivist to create or modify SDA datasets, provided that the location of the resulting dataset is communicated to the SDA database by updating the configuration information for the dataset in the SDA Manager.

CONTENTS OF THIS DOCUMENT


OVERVIEW

MAKESDA reads data file (specified after '-d') and a DDL file (specified after '-l' -- a lower-case L) which contains the metadata describing the content of the data file. It then stores the defined variables in a special format in the SDA dataset.

An SDA dataset consists of a 'study directory' and two subdirectories named 'VARS' and 'STUDYINF'.

MAKESDA can create an entirely new dataset, or add new variables to an existing dataset, or modify (overwrite) existing variables.

A list of the variables defined in the DDL file is written onto the file 'MAKESDA.LST' whenever MAKESDA is run. If that file already exists (from a previous MAKESDA run), it is overwritten. This list of variables can be very useful for creating a variable list file for the XCODEBK program. Note that variable names longer than 32 characters cannot currently be used by the MAKESDA program to create variables in SDA.


MEANING OF THE OPTIONS

The meaning of the options is as follows:

-c
Check the syntax of the DDL file, but do not create any SDA variables. It is not necessary to specify the name of a datafile on the command line if this option is requested.

-m
Modify existing variables in an SDA dataset. Without this option, only new variables defined in the DDL file will be created, and pre-existing variables will NOT be overwritten. If you use this option, and if you have defined a CASEID variable, the content of the CASEID variable for each case in the existing SDA dataset must match the value in the data file specified after the '-d' flag.

-z
Remove (zap) the specified SDA dataset before creating a new one. This option will remove all of the SDA variables in the 'VARS' subdirectory of the SDA dataset, including the CASEID variable. If you are adding cases to an SDA dataset, you must use this option, which basically removes the previous SDA dataset to make room for the new one. Nevertheless, this option does not remove the contents of the 'STUDYINF' subdirectory such as the 'SEARCH' directory used by the SDA search procedures or the 'disclosure.txt' file, if they exist. Note, however, that the 'studyinf' file (located in the 'STUDYINF' subdirectory) is overwritten every time the program 'makesda' is run.

-x filename
Generate an expanded version of the DDL file onto the file named `filename'. If the DDL file has been created with `copy' commands (to avoid repeating identical specifications for many variables), this expansion procedure will eliminate those commands and produce a full data description for each variable. Also, keywords that have been set globally (in the first segment of a DDL file) are repeated in each variable definition.

-h
Display short program help and available options. (The program will not do anything else.)

INPUT FILES

The data file used as input to MAKESDA must be a plain text file (not a binary file). The data file may be formatted as a CSV file (comma-separated values) or a TSV file (tab-separated values) or a fixed-column ASCII data file. If the format is "fixed," each variable must be in a fixed set of columns, and the file must have a fixed number of records for each case. And if a record is shorter than the number of characters defined by the `reclen=' or `lrecl=' keyword, it is padded at the end with blanks.

The data description file must be written in the Data Description Language (DDL). The file with DDL can be created with a text editor or with a converter program like XCONVERT. MAKESDA can also read older DDL files in the format used by the CSA programs.

The DDL file must describe the characteristics both of the overall data file as well as of the individual variables to be converted into SDA variables. If there is a CASEID variable, the first variable description MUST be for that variable.

If variables are added to an existing SDA dataset, MAKESDA checks the contents of the CASEID variable (if one exists) to make sure that the CASEID value for each case matches the value stored previously in the SDA dataset. It also checks the contents of CASEID if variables are being modified. If you anticipate adding or modifying variables, it is a good idea to have a CASEID variable, to enable this checking.


CHARACTER VARIABLES

Beginning with version 2.0, SDA expanded the treatment of character variables. This section provides important information on how MAKESDA reads character data from the input data file and stores the data as character variables.

Blanks in a character field

When MAKESDA processes character variable values, spaces are automatically "normalized" before being stored as SDA variables. This means:

  1. Leading and trailing blanks are NOT considered significant for character values
  2. Multiple INTERNAL spaces are replaced with a single space.
For example "   New    York    " is stored as "New York".

All-blank fields

There are various things you can do with an input field that is completely blank:

Forcing Upper or Lower Case

By default, the case of a character input field is left as is, and it is stored in SDA as a case-sensitive character variable.

However, in the DDL file specifications for a character variable, you can specify that the input string be converted entirely to upper case or to lower case. This is done by specifying either 'case_c=upper' or 'case_c=lower' for a particular variable (or this can also be done globally for all character variables defined in that DDL file).

Be aware that this conversion only works if the character input field is writen in US-ASCII. If the characters are written with non-ASCII characters in UTF-8 (which is legitimate), the conversion will not be carried out.

Unless the case of a character variable really matters, it is often a good idea to force the characters to be all the same case. For example, if you have a character variable for gender, and if the contents are 'M' or 'm' for male, and 'F' or 'f' for female, you would probably want to make all of the values either upper or lower case. Otherwise, when you use that variable in a table, you will get four rows or columns for gender instead of two.

If you use the 'case_c=' specification, there are some ramifications:

Selection filter variables:

Character variables can be used as selection filters in the same way that numeric variables are used. Note, however, that the values of a selection filter variable are NOT case sensitive. Also, leading and trailing blanks are stripped from character codes specified as filter variables, and multiple internal blanks are reduced to a single blank. This is the same as happens to character values before they are stored as SDA variables, so the filter values should match the stored character values unless there is a substantive difference in the codes.

For example, the following filter specifications all have the same effect, regardless of whether the values of the character variable 'state' have been forced to upper case or to lower case, or have been left as mixed case:

state("New York") state(" NEW YORK ") state("New York")

DIAGNOSTIC MESSAGES

Diagnostic and error messages are appended to the file `MAKESDA.MSG'. Messages about progress in the number of variables processed are displayed on the screen.


EXAMPLES

makesda -c -l myddl
Check the DDL file named 'myddl'

makesda -l myddl -d mydata
Create an SDA dataset out of the files 'myddl' and 'mydata', but do NOT modify any existing SDA variables in the study specified in the 'path=' keyword in the top section of the DDL file 'myddl'.

makesda -m -l myddl -d mydata
Create an SDA dataset out of the files 'myddl' and 'mydata', and MODIFY any existing SDA variables that are included in 'myddl'.

SEE ALSO

DDL Summary of the Data Description Language


CSM, UC Berkeley/ISA
September 25, 2020