If multiple directory pathnames are specified for this dataset (in the SDA Manager or in the 'SDADATA=' specifications in the HARC file), only one of them (usually the main dataset directory) should have a 'disclosure.txt' file. The other SDA dataset directories (usually created to hold recoded and computed variables) will have the same disclosure rules applied to them automatically in SDA version 4.
This document describes the possible disclosure rules that may be specified. Additional specifications can be added to the 'disclosure.txt' file, in order to suppress results from TABLES and MEANS that are considered too imprecise to display. Those extra parameters are discussed in a separate document on precision.
Note that Quick Tables and SDA version 3.5 require subsidiary datasets (e.g., for recoded and computed variables) to have a file named 'disc- id.txt' in their STUDYINF subdirectory. This 'disc-id.txt' file contains a single ID keyword with the format 'ID=abc', where 'abc' is the same ID or name used for this study in the 'disclosure.txt' file in the main SDA dataset. That file is no longer necessary in SDA version 4. However, if a dataset created using the SDA Manager is later set up to be accessed by Quick Tables or by SDA 3.5 analysis programs, you should put that 'disc-id.txt' file (manually) in the STUDYINF subdirectory so that Quick Tables and/or the 3.5 programs will know about the 'disclosure.txt' file in the main study directory. Otherwise the disclosure rules will not be applied to analysis runs that do not use variables in the main study dataset. See the version 3.5 manual page for details on the 'disc-id.txt' file.
The valid keywords are as follows:
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
DISCLOSURE ID FOR THE STUDY
ID= a unique identifier for the REQUIRED
dataset with disclosure rules
(one word, only letters or
numbers)
PREVENT AN ANALYSIS FROM BEING RUN
VAREXCLUDE= name(s) of variable that All variables allowed
cannot use used in analysis
COMBEXCLUDE= pairs of variables that All combinations allowed
cannot be used together in
the same analysis run
and cannot be used at all
to recode or compute new
variables
(see notes below)
MAXFILTERS= maximum number of selection Any number of filters OK
filter variables that can
be used in a single run
CONTROLVAR= no, if control variables A control variable is OK
cannot used used in tables
LISTCASE= no, if the 'listcase' program Listcase run is OK
is not allowed to run
SUBSET= no, if the 'subset' program Subset run is OK
is not allowed to run
SUPPRESS THE OUTPUT AFTER RUNNING AN ANALYSIS
MINCELLN= minimum number of cases in a No required minimum cell N
table cell to allow a table
to be displayed
(see notes below)
MINCELLWN= minimum number of WEIGHTED No required weighted minimum
cases in a table cell to cell N
allow a table to be displayed
AVGCELLMIN= minimum average cell size to No required average cell N
allow a table to be
displayed (checks both the
mean and the median cell size,
excluding cells with no cases)
AVGCELLWMIN= minimum WEIGHTED average cell No required weighted average
size cell N
MINCASEBYIVAR= for regressions, minimum ratio No limit on the number of
of valid observations to the independent vars
number of independent vars
MONITORVAR= varname, (min_values) No special monitored vars
(see notes below)
SUPPRESS UNWEIGHTED NUMBER OF CASES IN OUTPUT
UNWEIGHTEDN= no Show unweighted N's
For example, you may not want to release analysis results based on cases that are all from the same institution (such as from the same prison). Assuming that there is a variable named 'prison', you could specify that variable as one to be monitored.
By default the cases must come from at least two distinct categories of the monitored variable(s). However, you can specify a higher required number of categories by giving the desired number of categories in parentheses after the variable name. See the example below.
The default messages, following the keyword that would be used in a language file, are as follows. Notice that one or more variable names or a number will sometimes be output after the given message. Those names or numbers are the values specified with the keywords described above.
DIS_VAREXCLUDE = To preserve confidentiality, analyses are not permitted using the following variable(s):
DIS_COMBEXCLUDE = To preserve confidentiality, analyses are not permitted using the following combination(s) of variables:
DIS_VAREXCLUDE_RECODE = To preserve confidentiality, RECODE and COMPUTE are not permitted using the following variable(s):
DIS_MAXFILTERS = To preserve confidentiality, the number of filter variables cannot be greater than:
DIS_CONTROLVAR = To preserve confidentiality, tables cannot be run with control variables.
DIS_LISTCASE = To preserve confidentiality, the LISTCASE program cannot be used with this dataset.
DIS_SUBSET = To preserve confidentiality, the SUBSET program cannot be used with this dataset.
DIS_AVGCELLMIN = To preserve confidentiality, tables cannot be displayed unless the average number of observations in each cell is at least:
DIS_AVGCELLWMIN = To preserve confidentiality, tables cannot be displayed unless the average weighted number of observations in each cell is at least:
DIS_MINCELLN = To preserve confidentiality, tables cannot be displayed unless the number of observations in each cell is at least:
DIS_MINCELLWN = To preserve confidentiality, tables cannot be displayed unless the weighted number of observations in each cell is at least:
DIS_MINCASEBYIVAR = To preserve confidentiality, regression analyses cannot be shown unless the ratio of valid observations to the number of independent variables is at least:
DIS_MONITORVAR = To preserve confidentiality, analysis results cannot be displayed for any set of observations that has only a very small number of values on certain sensitive variables. In this case the sensitive variable(s) (and the minimum required number of valid values) was:
DIS_UNWEIGHTEDN = To preserve confidentiality, only weighted N's can be shown.
# DISCLOSURE SPECIFICATIONS FOR DATA FILE # ID FOR THIS DATASET ID = survey25 # A. PREVENTS AN ANALYSIS FROM BEING RUN # Completely exclude these vars from analysis and recoding/computing VAREXCLUDE = CASEID, LOCATIONID # Exclude these combinations of vars (separated by ';') from analysis # Also exclude the individual vars from being used by the 'recode' # and 'compute' programs COMBEXCLUDE = RACE, GENDER; AGE, RACE # Maximum number of selection filters allowed in an analysis run MAXFILTERS = 2 # No tables with a control variable if set equal to 'no' CONTROLVAR = no # The LISTCASE program cannot be run if set equal to 'no' LISTCASE = no # The SUBSET program cannot be run if set equal to 'no' SUBSET = no # B. SUPPRESS ANALYSIS OUTPUT AFTER RUNNING A PROGRAM # Required average (mean and median) cell sizes - unweighted and weighted AVGCELLMIN = 10 AVGCELLWMIN = 200 # Required size of smallest cell - unweighted and weighted MINCELLN = 5 MINCELLWN = 100 # Ratio of cases to number of independent vars in regression MINCASEBYIVAR = 100 # Check for at least 2 distinct values on the variable 'INSTITUTION' # and at least 3 distinct values on 'CBSA'. MONITORVAR = INSTITUTION CBSA(3) # Suppress all unweighted N's if set equal to 'no' UNWEIGHTEDN = no
| internationalization | Using non-English languages in SDA |
| precision | Precision specifications |
| QuickTables | Quick Tables documentation |