The HARC file specifications still relevant for importing into Version 4 are described in this document. Other specifications (keywords) are ignored. Here is a link to a cross-reference between the specifications in the SDA Manager and the various elements of a HARC file.
Stratum and cluster variables can be specified for a study in the HARC file, to enable the analysis programs to calculate complex standard errors. Weight variables for a study can be specified, so that users can select a weight from a drop-down menu, instead of having to enter the name of a weight variable on the option screen.
There were six possible section headings in the HARC file in SDA version 3. Three of those sections are ignored by the SDA Manager (LABELS, HEADER, and FOOTER). The three sections that are converted are the following:
The general layout is as follows:
[GENERAL] keyword = something keyword = something [PROGRAMS] keyword = something keyword = something [DATASETS] keyword = something keyword = something * keyword = something keyword = something
The names of sections and the keywords can be given in either upper or lower case, but they may not be abbreviated. The first section should be the [GENERAL] section. The other sections could be given in any order. However, it is a good idea to put the [DATASET] section last, to facilitate adding datasets to the HARC file.
Keywords within a section can be given in any order. However,
within the [DATASET] section the keywords applicable to a
specific study must be grouped together and be separated by an
asterisk from the specifications for another study.
If the specification is a PATH, it must be a full pathname on the server computer such as: /bravo2/bravo3/sda
If the specification is a URL, it must be a complete one such as:
In principle, a URL can refer to a location on any World Wide Web server. However, the checking for valid URLs done by the SDA debugger will only work within the same domain as the local server.
A slash (/) at the end of a PATH or a URL can be used if the referenced location is a directory (and not a specific file). However, this use of a final slash is optional.
Possible GENERAL keywords are grouped into the following sections:
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ MAXLISTCASE= Maximum number of CASES Limit lists to 500 to list (for ’listcase’) cases MAXLISTVARS= Maximum number of VARIABLES Limit lists to 500 to list in an OUTSTUDY variables before a warning DUMMYGENMAX= A number between 1 and 100 Max of 25 dummy vars can be (max dummy vars for REGRESS generated by the "m:" syntax and LOGIT) for a single categorical var XMEANS= YES (to get special output-- No special output in average differences) MEANS program BATCHSAVEDIR= PATH of directory into which No batch command files to copy the batch command saved files for the analysis programs before they are deleted LANGUAGE= PATH to directory with Use built-in English alternate language files messages and menus for analysis output (File named ’langan.txt’ will be imported.)
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ SUBMAXVARS= Maximum number of variables Limit subsets to 1000 to allow in a subset variables
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ MAXCHARTS= Maximum number of charts per Maximum of 25 charts tables or means request (A number between 1 and 100) CHARTFONT= Name of font to use in charts SanSerif
The ’SDAPROGS=’ keyword indicates which SDA analysis programs are to be made available for the datasets specified in this HARC file. The currently available programs are: tables, means, correl, corrtab, regress, logit, listcase, recode, compute, listvars or listvars(delete).
Note the difference between specifying ’listvars’ or ’listvars(delete)’. If you only specify ’listvars’, the user will be able to list the newly created variables but will not be able to delete them. This will protect the created variables from being deleted, but it will also prevent users from deleting variables that were created erroneously.
Since the ’listcase’ program provides access to individual-level data, this program may not be appropriate for sensitive datasets. If the use of this program is suppressed by the use of a disclosure file, it is best to use global options for sensitive datasets that do not include the ’listcase’ program in the list of available SDA programs. Otherwise, an attempt to use ’listcase’ will generate an error message.
For information on the interactive use of each program, see the online help file for analysis programs or the online help file for creating new variables. For information on the batch command files for each program, see the index to the SDA Manual pages.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ SDAPROGS= Analysis programs to provide REQUIRED (tables, means, etc.)
To enable users to create new variables in a dataset, it is necessary, but not sufficient, to mention ’recode’ and ’compute’ in the list of programs. Each dataset for which new variables can be created must also specify where the new variables are to be stored:
The availability of the ’subset’ procedure for a particular dataset does not depend on this list of SDA programs. Rather, that availablity is assumed, unless you specify ’SUBSET=NO’ for a particular dataset in the datasets section of the HARC file.
Possible keywords for each dataset are grouped into the following sections:
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ DATASET= ID or name of the study (one REQUIRED word, only letters or numbers) DATALABEL= Label of study to appear on REQUIRED menus (one line) CODEBOOK= URL of homepage for HTML REQUIRED codebook or documentation (may be repeated) SDADATA= PATH of SDA dataset REQUIRED directory (may be repeated) OUTSTUDY= PATH of SDA dataset for No recodes or computed newly created variables variables can be stored (but see below) The ’VARCASE=’ specification is ignored on Windows servers VARCASE= LOWER or UPPER Variable names entered on (names of variables entered option screens must match on option screens will be the case of the variables converted automatically to stored in the dataset the specified case)
Notes on Basic Keywords
If the ‘OUTSTUDY’ option is given as a study-level option, all users who access the specified study will store recodes and computed variables in the same SDA dataset directory. Note that in SDA Version 4, it is no longer necessary to provide an SDADATA specification for the location of the OUTSTUDY dataset.
If there are many users who store new variables in the same OUTSTUDY dataset, there may be some conflicts in naming the new variables. One user will be able to overwrite a variable by creating a new one with the same name.
An alternative method of enabling the creation of new variables in Version 4 of SDA is to set up private workspaces for individual users or groups. Different users can then share the main dataset for a study but store new variables in their own accounts. Permissions for individual users are set up in the ’Configure Permissions’ tab of the SDA Manager.
On Linux servers the ‘VARCASE’ option is designed to simplify the entry of variable names on the option screens, by converting the names of variables automatically to the correct case. Since it is easier to enter variable names in lower case, this option is designed primarily to allow users to enter lower-case names for variables originally set up with upper-case names.
Use of this option assumes that the variable names in the SDA dataset (as defined in the DDL file) are all either in upper or in lower case (except for the CASEID variable, which is always in upper case). The VARCASE specification also applies to the names of variables generated by RECODE and COMPUTE; those programs will only generate new variables in the appropriate case.
This ‘VARCASE’ specification will apply to all of the SDA datasets listed for a study, if it is given as a study-level option. If you have variables in more than one SDA dataset for the same study, you can set this option separately for each one by using the form:
SDADATA = PATH(varcase=upper).
If there is only one HTML codebook for a study, use the basic ‘CODEBOOK=’ keyword described above.
However, SDA allows each study to have multiple HTML codebooks. For example, a codebook stratified by year or region could be set up, in addition to the basic unstratified codebook. The user can select one of the codebooks to view at a time.
For each codebook, provide the URL of the main codebook HTML file, together with an appropriate label for the codebook (in parentheses). There can be as many codebooks as you wish. Each codebook should be created in a separate directory, in order to avoid filename conflicts.
If no label is provided for a codebook, the label for the first codebook will be ‘Default’, and the label for the others will be ‘Alternative’. Those labels are not very helpful, so it is much better to include more descriptive labels.
See an example of multiple codebook definitions in example 2.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ CODEBOOK= URL for Codebook #1 (label) Required for codebook CODEBOOK= URL for Codebook #2 (label) Required for codebook
Appropriate weight variables, and a label for each one, can be specified for a study. If no weights are specified, the user can still enter the name of a weight variable on the option screen.
If more than one weight variable is specified, they are presented to the user as a drop-down list on the option screen for each analysis program. The first one listed in the HARC file is the default weight; but the user may select one of the other available weights from the drop-down list.
One of the weight options listed in the HARC file can be the option NOT to use a weight. This is specified as ‘##none’. An optional label can be given for this option; for example ‘##none(Do not use a weight)’. The default label is ‘(No weight)’.
A set of weight variables is specified in the dataset definition in example 1.
Note that the user can be forced to use a specific weight on every analysis run. If only one weight variable is specified in the HARC file for a study, and if the ‘##none’ option is not provided, the specified weight is used automatically on every analysis run.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ WEIGHT= wtvar1 (label) wtvar2 (label) Required for drop-down weights WEIGHT= ##none (label)
Multiple weight variables and labels can be defined on a single line. Alternatively the ‘WEIGHT=’ keyword can be repeated for additional specifications of weight variables and labels.
If calculations of complex standard errors are to be enabled for a study, the stratum and/or cluster variables must be specified. This is done with a ‘design=’ keyword. The method of calculating standard errors depends on whether a stratum variable only, a cluster variable only, or both variables are specified.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ DESIGN= STRATUM(var1) CLUSTER(var2) Both variables are defined DESIGN= CLUSTER(var2) Clusters will be paired into strata DESIGN= STRATUM($1) CLUSTER(var2) Clusters all in one stratum DESIGN= STRATUM(var1) Only strata, no clusters DESIGN= STRATUM(var1) XREGRESSION Default is SRS for REGRESS and LOGIT
If only a cluster variable is defined, the default procedure is
to combine pairs of consecutive clusters (by cluster number) into
strata, for purposes of calculating standard errors.
(See example 2.)
Alternatively, you can force all the clusters to remain in a single stratum by specifying the name of the stratum variable as ’$1’.
See the document on calculating standard errors for more details.
Complex standard errors are computed by default for each analysis using the TABLES, MEANS, REGRESS, and LOGIT programs if a stratum and/or a cluster variable is defined for the dataset. The user, however, may force the calculation of SRS standard errors, effectively assuming that the sample is a simple random sample (SRS), by selecting that option on each program option page.
The calculation of complex standard errors can require a
substantial amount of computer time when analyzing a large
dataset using REGRESS and especially LOGIT.
Therefore, the archive can override the usual default for those
programs and make SRS the default for REGRESS and LOGIT.
To do that, add ’XREGRESSION’ to the specifications after
’DESIGN=’ in the HARC file. Note that users will still be able
to request complex standard errors if they wish, but they should
not be surprised by delays in receiving results if they do so.
(See example 3.)
The next keywords are used to enhance or to suppress the customized subsetting of variables and/or cases. The file with information on groups of variables (which is required in SDA Version 4) enables the user to select entire groups of variables, instead of having to specify all desired variables one by one. That group-information file is generated automatically whenever the XCODEBK program produces HTML codebook files. It has the name ‘Xsub.txt’, where ‘X’ is the root name of the HTML codebook files. The default name of that file is ’hcbksub.txt’.
The ’SUBSET=NO’ specification, on the other hand, will suppress the option for creating a customized subset for this particular dataset.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ SUBGRPINFO= PATH of file with info on REQUIRED groups of variables (from codebook program) SUBSET= NO (to suppress ’subset’ from Allow subsetting the user interface for this dataset)If you want to be absolutely sure that the subset procedure is not available for a particular dataset, you should set up a disclosure file for that dataset and include the specification "subset=no" in that disclosure file. Otherwise, even though the subset option may not be presented to the user, it remains possible to run the subset procedure in batch mode.
The next two keywords are used to specify data and documentation files that have been created ahead of time (that is, they are not custom-made on the fly by the ’subset’ procedure) and are available for downloading. These keywords are usually used in pairs -- a heading, followed by the full Pathname (not a URL) of a file available for downloading. The Pathname itself can also be followed by a label in parentheses; that label will appear on the selection screen next to the file name. (See example 3.)
If a ’DLHEADING’ specification immediately precedes a ’DLFILE’ specification, that heading will be imported to SDA version 4.0 as a label for the downloadable file. If BOTH a ’DLHEADING’ and a file label in parentheses are given, both will be imported as the file label.
Note that the file given as a Pathname should preferably have a suffix of ’.txt’, if it is a text file and if users are to be able to view the file in a browser as well as to save it.
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ DLHEADING= Heading or label for a file No heading DLFILE= Pathname of file to download No file available (any kind of file) DLFILE= Pathname (optional label) No file available
[GENERAL] [PROGRAMS] SDAPROGS = tables means correl corrtab regress logit listcase [DATASETS] DATASET = nes92c DATALABEL = NES 1952-1992 Cumulative Datafile # For the standard SDA interface to function, the codebook must have # been created with version 3 (or later) of the ’xcodebk’ program. CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92c.htm SDADATA = /bravo3/NES/nes52-92.cum/ WEIGHT = sampwt(Sampling wt) WEIGHT = pswt(Post-stratification wt) WEIGHT = ##none DESIGN = STRATUM(stratvar) CLUSTER(psuvar) * DATASET = capums1 DATALABEL = 1990 Census - California 1% Sample SDADATA = /bravo3/capums1/ WEIGHT = houswgt(Household weight) pwgt1(Person weight) ##none(No weight)
[GENERAL] # Limit subsets to 100 variables SUBMAXVARS = 100 [PROGRAMS] SDAPROGS = tables means correl corrtab regress logit listcase [DATASETS] # If customized subsetting is to be enabled, include the following: # # SUBGRPINFO= FULL PATHNAME of file with info on groups of variables # (produced by codebook program, if there are headings; # use is required in SDA 4.0) # # For multiple codebooks, include a URL and (label) for each: # # CODEBOOK= URL for codebook #1 (label for codebook #1) # CODEBOOK= URL for codebook #2 (label for codebook #2) # DATASET = gss DATALABEL = GSS 1972-2004 Cumulative Datafile SDADATA = /bravo3/GSS/sda # Define a cluster variable for this dataset, to calculate # complex standard errors DESIGN = cluster(sampcode) CODEBOOK = http://sda.berkeley.edu/GSS/Doc/GSS.htm (Standard Codebook) CODEBOOK = http://sda.berkeley.edu/GSS/Docyr/GSYR.htm (Codebook by Year) SUBGRPINFO= /bravo3/GSS/Doc/GSSsub.txt * DATASET = multi DATALABEL = 1994 Multi-Investigator Study CODEBOOK = http://socrates.berkeley.edu/Multi/Doc/mult.htm SDADATA = /bravo3/Multi/sda SUBGRPINFO= /bravo3/Multi/Doc/multsub.txt
[GENERAL] [PROGRAMS] SDAPROGS = tables means correl corrtab regress logit listcase [DATASETS] # If files are to be available for downloading, include the following: # # DLHEADING= Heading for a file # # DLFILE= Pathname of file to download, followed by an optional # label (given in parentheses) # Note that the Pathname should best have a suffix # of ’.txt’, if users are to be able to view # the file as well as to save it. # # (Many of these headings and URLs can be given.) DATASET = nes2004c DATALABEL = NES 1952-2004 Cumulative Datafile CODEBOOK = http://sda.berkeley.edu/NES2004C/n04c.htm SDADATA = /bravo3/NES2004C/nes52-04.cum/ # Define stratum and cluster variables for this dataset, but SRS is the # default for the regression and logit/probit programs DESIGN = stratum(stratcode) cluster(psucode) xregression # The following keywords specify files available for downloading. # Notice the optional labels in parentheses after some URLs. DLHEADING = DATA FILES DLFILE = /socrates.berkeley.edu/DL/NESdat.txt (Plain ASCII file) DLFILE = /socrates.berkeley.edu/DL/NESdat.zip (Zipped file for PC’s) DLHEADING = SAS definition file DLFILE = /socrates.berkeley.edu/DL/NESsas.txt DLHEADING = SPSS definition file DLFILE = /socrates.berkeley.edu/DL/NESspss.txt DLHEADING = DDL file DLFILE = /socrates.berkeley.edu/DL/NESddl.txt (Plain ASCII file) DLHEADING = Microsoft Word Codebook ready to be printed DLFILE = /socrates.berkeley.edu/DL/NEScdbk.doc DLHEADING = Set of HTML codebook files DLFILE = /socrates.berkeley.edu/DL/NEShtml.zip (Zip file)
[GENERAL] [PROGRAMS] SDAPROGS = tables means correl corrtab regress logit listcase SDAPROGS = recode, compute, listvars(delete) # To allow variables to be created but not deleted, specify: # SDAPROGS = recode, compute, listvars [DATASETS] DATASET = nes92c DATALABEL = NES 1952-1992 Cumulative Datafile CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92.htm WEIGHT = sampwt(Sampling weight) finalwt(Final weight) ##none(No weight) DESIGN = STRATUM(stratvar) CLUSTER(psuvar) SDADATA = /bravo3/NES/nes52-92.cum/ OUTSTUDY = /bravo3/NES/nes52-92.cum/newvars * DATASET = capums1 DATALABEL = 1990 Census - California 1% Sample CODEBOOK = http://socrates.berkeley.edu/sdadocs/CENSUS/pums.htm WEIGHT = houswgt(Household weight) pwgt1(Person weight) ##none(No weight) SDADATA = /bravo3/capums1/ OUTSTUDY = /bravo3/capums1/newvars
[GENERAL] [PROGRAMS] SDAPROGS = tables means correl corrtab regress logit listcase SDAPROGS = recode compute listvars(delete) [DATASETS] # For this dataset, enable browsing of the codebook, online analysis, # and creation of new variables. # No subsetting or downloading is allowed. DATASET = gss04 DATALABEL = GSS 1972-2004 Cumulative Datafile CODEBOOK = http://socrates.berkeley.edu/GSS/HTMLBOOK/gss.htm SDADATA = /bravo3/docs/GSS OUTSTUDY = /bravo3/docs/GSS/newvars * # For this dataset, allow browsing of the codebook, # online analysis and downloading. # No subsetting is allowed, because the # ’SUBSET=NO’ keyword is included. DATASET = multi DATALABEL = 1994 Multi-Investigator Study CODEBOOK = http://socrates.berkeley.edu/Multi/Doc/mult.htm SDADATA = /bravo3/docs/GSS SUBSET = NO DLHEADING = ALL OF THE FOLLOWING FILES ARE PLAIN ASCII FILES DLHEADING = Data file (616 K) DLFILE= /socrates.berkeley.edu/Multi/DL/multidat.txt DLHEADING = SAS definition file DLFILE= /socrates.berkeley.edu/Multi/DL/multisas.txt DLHEADING = SPSS definition file DLFILE= /socrates.berkeley.edu/Multi/DL/multisps.txt DLHEADING = DDL definition file DLFILE= /socrates.berkeley.edu/Multi/DL/multiddl.txt * # For this dataset, enable all options: # codebook, online analysis with complex standard errors, # creation of new variables, pre-defined weights, # customized subsetting, and downloading of pre-existing files. DATASET = natlrace DATALABEL = 1991 Race and Politics Survey CODEBOOK = http://socrates.berkeley.edu/Natlrace/Doc/race.htm SDADATA = /bravo3/docs/Natlrace DESIGN = stratum(stratvar) cluster(psunum) OUTSTUDY = /bravo3/docs/Natlrace/newvars WEIGHT = sampwt(Sampling wt) WEIGHT = pswt(Post-stratification wt) WEIGHT = ##none SUBGRPINFO= /bravo3/docs/Natlrace/Doc/racesub.txt DLHEADING = ALL OF THE FOLLOWING FILES ARE PLAIN ASCII FILES DLHEADING = Data file (936 K) DLFILE= /socrates.berkeley.edu/Natlrace/DL/racedat.txt DLHEADING = SAS definition file DLFILE= /socrates.berkeley.edu/Natlrace/DL/racesas.txt DLHEADING = SPSS definition file DLFILE= /socrates.berkeley.edu/Natlrace/DL/racespss.txt DLHEADING = DDL definition file DLFILE= /socrates.berkeley.edu/Natlrace/DL/raceddl.txt
|DDL||Data Description Language|
|HARCimport||Import HARC file into Version 4 SDA database|
|internationalization||Using Non-English languages in SDA|
|sdalog||Generate a Report of SDA Usage|