The SDA Archiver procedure facilitates the installation of SDA datasets and codebooks in an SDA data archive.
There are seven steps to the process:
The ID for the study provides a way to identify and to locate a particular dataset. In the SDA data archive the ID is used as the name of a directory for the study.
The ID for the study must be one word containing only letters and numbers (with no spaces or special characters like punctuation marks).
When you enter a study ID, the system will check to see if that ID has already been used.
To avoid further modification to a particular study, you can freeze or lock the study, provided that you have access to the archive file system on the server. To lock a study, create a file named 'locked' in the directory that was created with the name of the study ID. This will prevent any changes from being made to this study through the use of the SDA Archiver procedure. When and if you want to allow further changes, just remove the 'locked' file from that directory. (Note that the content of that file is irrelevant. The only thing that matters is whether or not there is a file named 'locked' in that directory.)
The data file must be a plain text file (an ASCII file) with each variable in a
fixed set of columns.
Specify the data file's location on
your local machine by using the "Browse" button.
When the data file has been located,
click the "Upload File" button.
The data file can be named anything on your own computer and can be located anywhere on your disks. On the server computer, however, it will be stored in the /DATA subdirectory of the study and will be given the name 'data.txt'.
The metadata file must be an SDA DDL file. This DDL file describes the content and layout of the data file you have already uploaded. For information on the syntax and content of a DDL file, see the DDL document in the online SDA manual.
If you have a metadata file in a different format, there are some tools to help you convert that metadata file into a DDL file:
Once you have generated a DDL file that matches the data file, you need to upload it to the server. Specify the DDL file's location on your local machine by using the "Browse" button. When the DDL file has been located, click the "Upload File" button.
The DDL file on your local computer can be named anything. But note that it will be stored on the server in the /METADATA/DDL subdirectory of the study and will be given the name 'ddl.txt'.
After the DDL file has been uploaded to the server, it will be checked for syntax errors. If errors are detected, fix your DDL file and then upload it again.
After the DDL file has been checked for syntax errors, the SDA DDLTOX program is automatically executed to produce metadata files in other formats: SPSS, SAS, Stata, and XML (DDI-version 2). These other metadata formats will then be available for download (unless this option is turned off when finalizing the archive).
Once you have uploaded a data file and a matching DDL (metadata) file, you are ready to create the SDA dataset for this study.
Click the button on this screen, to run the SDA MAKESDA program. The SDA dataset is then stored in the archive directory that has the name of your study ID.
(If you only have a few variables in your dataset and do not need headings or other options, check the box next to "Use the basic variable list without headings" and go on to the next step to create the codebook.)
The variables will appear in the codebook and tree menu in the order in which they are given in the variable list (which is usually the same as the order in the DDL file). If you want to limit the codebook and tree menu to a subset of variables, or have them appear in a different order, you must modify the variable list accordingly.
These headings will appear in both the codebook's and tree menu's list of variables. The use of headings greatly facilitates the ability of users to find variables of interest in the dataset. (A simple tree menu constructed from user-supplied headings is shown below.)
Some variables may have too many categories to describe concisely in a codebook. It is possible to specify recoding rules for a variable for purposes of generating the codebook. (The syntax for constructing recode rules is discussed in the next section.) Those recoding rules do not change anything in the dataset. They are implemented only for the codebook.
You may want summary statistics (mean, median, standard deviation, variance, and range) for some variables, whereas only frequency distributions make sense for other variables. In the variable list, you can use templates to indicate which type of variable description should apply to specific variables. When a template is specified in the variable list, it remains in effect for the variables that follow it on the list, until you specify another template. (The syntax for constructing templates is discussed in the next section.)
The list of variables should include headings, inserted into the listing of variable names. Headings must appear on separate lines.
It is often useful to have one template for variables with just a few categories, and to have another template for variables with many categories. The default template is designed for typical survey variables with a few nominal or ordinal categories and does not include summary statistics like means and standard deviations.
A line beginning with `@name' indicates that subsequent variables in the list should be processed using the template identified as name. A line beginning with `@@' will return processing to the default template.
The names of the templates defined for the Archiver procedure are the following:
CASEID spend spend2 spend3 spend4 ideo party age educ gender race marital casewt
# (Primary headings start with ** and secondary headings start with 2** )
** CASE IDENTIFICATION CASEID ** SURVEY VARIABLES 2** Government Spending spend spend2 spend3 spend4 2** Political Ideology and Party ideo party 2** Background Variables age educ gender race marital ** WEIGHT VARIABLE casewt
# (Template references start with @ )
# (Recode commands are appended to the variable names.)
** CASE IDENTIFICATION CASEID ** SURVEY VARIABLES 2** Government Spending spend spend2 spend3 spend4 2** Political Ideology and Party ideo party 2** Background Variables # Get category frequencies plus summary statistics # for the following variables @cstats # Collapse 'age' into 10-year categories, starting with 1 age(c:10,1) # Return to category frequencies without statistics # (which is also the default template) @categ # Show both the original version of "education" # and a recoded version of the variable educ educ(r:0-8 "Elementary" ; 9-12 "HS" ; 13-* "College") gender race marital ** WEIGHT VARIABLE # Get summary statistics alone, without category frequencies @stats casewt
Once you have created an SDA dataset and have uploaded a variable list with headings (and possibly with other specifications such as recodes, collapses, and templates), you can create a codebook to document the dataset.
Clicking on the "Create Codebook" button will automatically create both an HTML codebook and a "tagged" format codebook. The "tagged" format codebook can be used to create a Microsoft Word codebook, as explained below. (Click on the 'Show/Hide Other Options' button to download the tagged file for Word.)
The online HTML codebook is a set of linked HMTL files that describe each variable. The online codebook contains a sequential index to the variables (with the headings you provide) and an alphabetical index to the variables.
The SDA codebook program creates a special "tagged" file that can be downloaded and then input into Microsoft Word on your local computer. When you apply a special Word macro to the tagged file, the result is an SDA codebook in Word format that can be printed and/or made available in the SDA archive for downloading by users. The Word codebook has a table of contents which lists all of the variables in sequential order (using the headings you provide).
Instructions for installing the necessary Word macro and for creating the Word codebook are available in the WordMacro document in the online SDA manual.
You can add supplementary documentation to your codebook by uploading the files. These supplementary files might be HTML, PDF, Word or simple text format. (Be sure that the file names have appropriate extensions for their file types: ".htm" or ".html", ".pdf", ".doc", ".txt", etc.) When the HTML codebook is created a "Supplementary Documentation" page will be written that includes a hyperlink to each of the uploaded files.
As files are uploaded they will appear in a table that lists the file name, the label, and the order the links will appear in the codebook. Adding a label for a file is optional, but highly recommended. The "Order" column will, by default, match the order in which the files are uploaded, starting with "1" (one). If you want to change the order in which the links will be listed in the codebook, just edit the number in the "Order" column. If you don't want to include an uploaded file in the codebook, set its order to "0" (zero). The numbers you enter in the "Order" column can be any integer between 0-999. Each time you upload a new file or create a new codebook the entries in the table will be re-arranged to match the "Order" column.
The frequencies, percentages, and statistics for each variable can be weighted by one of the variables in the dataset. You enter the name of the weight variable in the designated box.
The frequencies, percentages, and statistics for each variable can be based on a subset of the cases in the dataset. This is accomplished by specifying one or more variables as selection filters.
For each selection filter variable, you specify the codes to be included, following the same syntax as for the SDA analysis programs.
The frequencies and percentages for each variable can be shown separately within categories of another variable in the dataset. For example, the distributions of each variable can be shown separately for males and females by specifying the variable 'gender' (or whatever that variable is named) as the stratifying variable.
If the stratifying variable is set to define the columns, there will be multiple columns for each variable, using a crosstab layout with the stratifying variable as the column variable. This is the default layout.
Alternatively, the stratifying variable can be set as the row variable in a crosstab layout. If the stratifying variable has a large number of categories (like 'year' in a multi-year study), the row stratification option is usually more appropriate.
(Note that a stratifying variable only works for the online HTML codebook. The "tagged" file produced for Word is always an unstratified codebook.)
If you want to create a Word codebook, you must download the tagged file and input it into Microsoft Word on your computer. Then you can upload the Word file (with the suffix '.doc') back into the archive, so that users can view it and download it.
After you have created an SDA dataset and the codebooks, you need to specify the options that will be available for users to access the data online. Then you need to incorporate the study's URL into your archive page.
If your dataset includes weight variables, you should generally provide users with a drop-down list of weights for the analysis screens. Otherwise, the option screens for the analysis programs will simply contain an input text box to enter the name of a weight variable. Since it is difficult to remember the names of possible weight variables, this can be quite confusing.
To create a drop-down list of weight variable selections, enter the variable name and a label for each weight variable you want to appear in the list. They will appear in the drop-down list in the order you enter them here.
Usually a "No Weight" option in the drop-down list will be added automatically after the list of weight variables. If you want to change the label for that option, there is a box for you to enter the desired label.
If you do NOT want to allow analyses to be run without using one of the specified weight variables, check the appropriate box. In that case, the SDA analysis programs will force the use of a weight variable.
By default, ALL of the SDA analysis programs and procedures are set up for the study. If you want to prevent the use of a particular program, check the box corresponding to that program. For example, if you do NOT want users to be able to list the values of variables for individual cases, check the corresponding box.
You can also prevent users from creating recoded and computed variables, downloading the data and metadata files, and creating customized subsets of the data file. Check the appropriate boxes to restrict those procedures.
The last restriction is to disallow 'debug mode' when SDA programs are being run. Debug mode is very useful for detecting the source of problems, if there are any SDA procedures that do not seem to be working as expected. However, it is possible for unauthorized users to learn a great deal about the content of your file system when debug mode is turned on. So it is generally a good idea to disallow debug mode once everything seems to be working well. If you later need to turn debug mode back on, it is easy enough to return to this screen and uncheck the relevant box.
When you click on the button to 'Finish Archiving Study', the options you have decided to allow (or disallow) are incorporated into a configuration file, and that file is added to the SDA archive setup. (An SDA HARC file named 'harc.txt' is added to the study directory.)
At this point the SDA dataset should be set up and ready for use. However, you must preserve the linkage information so that users can find the dataset.
After you click on the 'Finish Archiving Study' button, a study link should appear on the screen. Click on that link, start up SDA for the study, and run a few tables and/or other procedures. If you are satisfied that everything is functioning as it should, copy the given HTML code into a file that users can access on the Web. Those lines of HTML code can be added to any appropriate Web page in your data archive to access this study.
The default description for the link is "Start up SDA for '(study ID)'". You can edit that portion of the HTML code. The remainder of the HTML code, however, must be preserved exactly, in order for users to be able to access online this SDA dataset and codebooks.
My SDA Archive My SDA Archive
Select a Study
- (Replace this line by the HTML code generated by the SDA Archiver.)
(The line to copy is the one that appears after you click
the 'Finish Archiving Study' button.
The line before it says, "Copy this HTML code to your Web page.")
(Answer 'Yes' to the query about changing the file name extension.)
(You can change the name of this file and the text contained in the file, but be sure NOT to change the URL that links to the study.)