SDA 4.0 Documentation for MAKESDA
NAME
makesda - generate SDA variables out of DDL and a data file
USAGE
makesda [-option] -l DDLfile -d datafile
DESCRIPTION
MAKESDA creates or modifies an SDA dataset. Prior to SDA version
4.0, the MAKESDA program had to be run as a command-line program.
Starting with version 4.0, the
SDA Manager
generally handles the creation of SDA datasets (by itself running
MAKESDA). However, MAKESDA can still be used directly by the
archivist to create or modify SDA datasets, provided that the
location of the resulting dataset is communicated to the SDA
database by updating the configuration information for the
dataset in the SDA Manager.
CONTENTS OF THIS DOCUMENT
OVERVIEW
MAKESDA reads a plain text data file (specified after '-d') and a
DDL file
(specified after '-l' -- a lower-case L) which contains the
metadata describing the content of the data file. It then stores
the defined variables in a special format in the SDA dataset.
An SDA dataset consists of a 'study directory' and two
subdirectories named 'VARS' and 'STUDYINF'.
- The 'VARS' subdirectory contains one file for each SDA
variable. Each variable file contains a binary version of the
values of that variable for each case. It also contains the
metadata for that variable, as specified in the
DDL file.
- The 'STUDYINF' subdirectory contains study-level information
for that dataset. There is always a file named 'studyinf' which
contains the title for the study. There may also be information
used for
searching
or for imposing certain
disclosure rules.
MAKESDA can create an entirely new dataset, or add new variables
to an existing dataset, or modify (overwrite) existing variables.
A list of the variables defined in the DDL file is written onto
the file 'MAKESDA.LST' whenever MAKESDA is run. If that file
already exists (from a previous MAKESDA run), it is overwritten.
This list of variables can be very useful for creating a
variable list file
for the
XCODEBK
program.
Note that variable names longer than 32 characters cannot
currently be used by the MAKESDA program to create variables in
SDA.
MEANING OF THE OPTIONS
The meaning of the options is as follows:
- -c
- Check the syntax of the DDL file, but do not create any SDA
variables. It is not necessary to specify the name of a datafile
on the command line if this option is requested.
- -m
- Modify existing variables in an SDA dataset. Without this
option, only new variables defined in the DDL file will be
created, and pre-existing variables will NOT be overwritten. If
you use this option, the content of the CASEID variable for each
case in the existing SDA dataset must match the value in the data
file specified after the '-d' flag. This 'modify' option,
therefore, cannot be used if you are adding new cases to an
existing SDA dataset.
- -z
- Remove (zap) the specified SDA dataset before creating a new
one. This option will remove all of the SDA variables in the
'VARS' subdirectory of the SDA dataset, including the CASEID
variable. If you are adding cases to an SDA dataset, you must
use this option. Nevertheless, this option does not remove the
contents of the 'STUDYINF' subdirectory such as the 'SEARCH'
directory used by the
SDA search
procedures or the
'disclosure.txt'
file, if they exist. Note, however, that the 'studyinf' file
(located in the 'STUDYINF' subdirectory) is overwritten every
time the program 'makesda' is run.
- -x filename
- Generate an expanded version of the DDL file onto the file
named `filename'. If the DDL file has been created with `copy'
commands (to avoid repeating identical specifications for many
variables), this expansion procedure will eliminate those
commands and produce a full data description for each variable.
Also, keywords that have been set globally (in the first segment
of a DDL file) are repeated in each variable definition.
- -h
- Display short program help and available options. (The
program will not do anything else.)
INPUT FILES
The data file used as input to MAKESDA must be
a plain text file, having a fixed number of records for each
case. If a record is shorter than the number of characters
defined by the `reclen=' or `lrecl=' keyword, it is padded at the
end with blanks.
The data description file must be written in
the
Data Description Language
(DDL). The file with DDL can be created with a text editor or
with
various converter programs.
MAKESDA can also read older DDL files in the format used by the
CSA programs.
The DDL file must describe the characteristics both of the
overall data file as well as of the individual variables to be
converted into SDA variables. The first variable description
MUST be for a variable named `CASEID'.
If variables are added to an existing SDA dataset, MAKESDA checks
the contents of the CASEID variable to make sure that the CASEID
value for each case matches the value stored previously in the
SDA dataset. It also checks the contents of CASEID if variables
are being modified.
CHARACTER VARIABLES
Beginning with version 2.0, SDA expanded the treatment of
character variables. This section provides important information
on how MAKESDA reads character data from the input data file and
stores the data as character variables.
Blanks in a character field
When MAKESDA processes character variable values, spaces are
automatically "normalized" before being stored as SDA variables.
This means:
- Leading and trailing blanks are NOT considered significant
for character values
- Multiple INTERNAL spaces are replaced with a single space.
For example " New York
" is stored as "New York".
All-blank fields
There are various things you can do with an input field that is
completely blank:
- Leave it as a valid code:
An input field that is completely blank can be stored as such and
can be used as a filter variable by specifying the content as two
quotation marks with nothing in between: ""
- Define it as missing-data:
An all-blank field can be defined as a missing-data field by
using the following DDL specification:
md_c = ""
- Convert it to other characters
An all-blank input field can be converted to some other character
value before being stored in an SDA variable file
by using the following DDL specification:
blank_c = New Content
If the "New Content" you specify has more characters than are
defined in the 'width=' specification for this variable, the "New
Content" will all be stored anyway in the SDA dataset.
Forcing Upper or Lower Case
By default, the case of a character input field is left as is,
and it is stored in SDA as a case-sensitive character variable.
However, in the DDL file specifications for a character variable,
you can specify that the input string be converted entirely to
upper case or to lower case. This is done by specifying either
'case_c=upper' or 'case_c=lower' for a particular variable (or
this can also be done globally for all character variables
defined in that DDL file).
Unless the case of a character variable really matters, it is
often a good idea to force the characters to be all the same
case. For example, if you have a character variable for gender,
and if the contents are 'M' or 'm' for male, and 'F' or 'f' for
female, you would probably want to make all of the values either
upper or lower case. Otherwise, when you use that variable in a
table, you will get four rows or columns for gender instead of
two.
If you use the 'case_c=' specification, there are some
ramifications:
- Missing data definitions:
The case conversion is applied to the character code defined as a
missing-data code.
For example, if you specify that the input string should be
converted to upper case, then the following specifications all
have the same meaning:
md_c= REFUSED
md_c= Refused
md_c= refused
- Category labels:
The case conversion is applied to the character code for which a
label is defined.
For example, if you specify that the input string should be
converted to upper case, then the following specifications all
have the same meaning:
catlabels=
DK Don't know
dk Don't know
Dk Don't know
Note, however, that the case conversion applies only to the
category code -- and NOT to the category label. In the above
example, the label "Don't know" remains in mixed upper and lower
case, regardless of what happens to the category code itself.
Selection filter variables:
Character variables can be used as selection filters in the same
way that numeric variables are used. Note, however, that the
values of a selection filter variable are NOT case sensitive.
Also, leading and trailing blanks are stripped from character
codes specified as filter variables, and multiple internal blanks
are reduced to a single blank. This is the same as happens to
character values before they are stored as SDA variables, so the
filter values should match the stored character values unless
there is a substantive difference in the codes.
For example, the following filter specifications all have the
same effect, regardless of whether the values of the character
variable 'state' have been forced to upper case or to lower case,
or have been left as mixed case:
state("New York")
state(" NEW YORK ")
state("New York")
DIAGNOSTIC MESSAGES
Diagnostic and error messages are appended to the file
`MAKESDA.MSG'. Messages about progress in the number of
variables processed are displayed on the screen.
EXAMPLES
- makesda -c -l myddl
- Check the DDL file named 'myddl'
- makesda -l myddl -d mydata
- Create an SDA dataset out of the files 'myddl' and 'mydata',
but do NOT modify any existing SDA variables in the study
specified in the 'path=' keyword in the top section of the DDL
file 'myddl'.
- makesda -m -l myddl -d mydata
- Create an SDA dataset out of the files 'myddl' and 'mydata',
and MODIFY any existing SDA variables that are included in
'myddl'.
SEE ALSO
DDL |
Summary of the Data Description Language |
CSM, UC Berkeley/ISA
December 13, 2016