SDA 4.0 Documentation for DISCLOSURE


NAME

disclosure - Specify disclosure specifications to protect confidentiality

DESCRIPTION

All of the analysis programs, including RECODE and COMPUTE, check to see if there is a file named ’disclosure.txt’ in the STUDYINF subdirectory of the SDA dataset directory. If they find such a file, they enforce the disclosure specifications contained in that file.

If multiple directory pathnames are specified for this dataset (in the SDA Manager or in the ’SDADATA=’ specifications in the HARC file), only one of them (usually the main dataset directory) should have a ’disclosure.txt’ file. The other SDA dataset directories (usually created to hold recoded and computed variables) will have the same disclosure rules applied to them automatically in SDA version 4.

This document describes the possible disclosure rules that may be specified. Additional specifications can be added to the ’disclosure.txt’ file, in order to suppress results from TABLES and MEANS that are considered too imprecise to display. Those extra parameters are discussed in a separate document on precision.

Note that Quick Tables and SDA version 3.5 require subsidiary datasets (e.g., for recoded and computed variables) to have a file named ’disc- id.txt’ in their STUDYINF subdirectory. This ’disc-id.txt’ file contains a single ID keyword with the format ’ID=abc’, where ’abc’ is the same ID or name used for this study in the ’disclosure.txt’ file in the main SDA dataset. That file is no longer necessary in SDA version 4. However, if a dataset created using the SDA Manager is later set up to be accessed by Quick Tables or by SDA 3.5 analysis programs, you should put that ’disc-id.txt’ file (manually) in the STUDYINF subdirectory so that Quick Tables and/or the 3.5 programs will know about the ’disclosure.txt’ file in the main study directory. Otherwise the disclosure rules will not be applied to analysis runs that do not use variables in the main study dataset. See the version 3.5 manual page for details on the ’disc-id.txt’ file.


KEYWORDS

The ’disclosure.txt’ file contains specifications for the analysis. These specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. All keywords except the ID specification are optional.

The valid keywords are as follows:


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


DISCLOSURE ID FOR THE STUDY

ID=           a unique identifier for the     REQUIRED
                dataset with disclosure rules
                (one word, only letters or
                 numbers)

PREVENT AN ANALYSIS FROM BEING RUN

VAREXCLUDE=   name(s) of variable that        All variables allowed
                cannot use used in analysis

COMBEXCLUDE=   pairs of variables that        All combinations allowed
                cannot be used together in
                the same analysis run
                and cannot be used at all
                to recode or compute new
                variables
                (see notes below)

MAXFILTERS=   maximum number of selection     Any number of filters OK
                filter variables that can
                be used in a single run

CONTROLVAR=   no, if control variables        A control variable is OK
                cannot used used in tables

LISTCASE=     no, if the ’listcase’ program   Listcase run is OK
                is not allowed to run

SUBSET=       no, if the ’subset’ program     Subset run is OK
                is not allowed to run



SUPPRESS THE OUTPUT AFTER RUNNING AN ANALYSIS


MINCELLN=     minimum number of cases in a    No required minimum cell N
                table cell to allow a table
                to be displayed
                (see notes below)

MINCELLWN=    minimum number of WEIGHTED      No required weighted minimum
                cases in a table cell to        cell N
                allow a table to be displayed

AVGCELLMIN=   minimum average cell size to    No required average cell N
                allow a table to be
                displayed (checks both the
                mean and the median cell size,
                excluding cells with no cases)

AVGCELLWMIN=  minimum WEIGHTED average cell   No required weighted average
                size                            cell N

MINCASEBYIVAR= for regressions, minimum ratio No limit on the number of
                of valid observations to the    independent vars
                number of independent vars

MONITORVAR=   varname, (min_values)           No special monitored vars
                (see notes below)



SUPPRESS UNWEIGHTED NUMBER OF CASES IN OUTPUT


UNWEIGHTEDN=  no                              Show unweighted N’s


NOTES ON THE OPTIONS

Disclosure ID for the Study

The disclosure ID for a study is a one-word name, consisting of letters or numbers. It can be the same as the dataset name given in the SDA Manager, or it can be different. This disclosure ID is a mechanism to ensure that the main SDA data file for a study, and its other associated data files (for recoded or computed variables, for example), all observe the same disclosure rules. One (and only one) of the SDA dataset directories (referred to as the "main" SDA dataset) must have a ’disclosure.txt’ file in its STUDYINF subdirectory.

Variables Excluded from Recoding and Computing

Any variable named in the ’COMBEXCLUDE=’ or the ’VAREXCLUDE=’ specifications cannot be used in the RECODE or COMPUTE programs. This restriction prevents variables from being copied and then used under the new name.

LISTCASE and/or SUBSET not allowed

Since the LISTCASE program and the SUBSET program provide access to individual-level data, those programs may not be appropriate for sensitive datasets. If the use of these programs is suppressed by the disclosure file, it is best not to use global options for sensitive datasets that include the LISTCASE program in the list of available SDA programs. Similarly, the subset option can be disallowed for a dataset. Otherwise, an attempt to use LISTCASE or SUBSET will generate an error message. Note that the disclosure file specifications override any other permissions that may have been set up for a specific dataset or group of datasets.

Minimum Cell Sizes

The cells examined are the individual table cells produced by the TABLES, MEANS, or CORRTAB programs. The number of cases in each cell is evaluated against the required minimum unweighted or weighted cell-size requirement. Cells with no cases at all are not included in the evaluation of the minimum cell size, or the average cell size, in a table.

Average Cell Size in a Table

The cells examined are the individual table cells produced by the TABLES, MEANS, or CORRTAB programs. If a control variable is used, the cells are examined for each separate category of the control variable. The mean number of cases and the median number of cases in the cells of a table are evaluated against the required minimum unweighted or weighted average-cell-size requirement. Cells with no cases at all are not included in the evaluation of the average cell size in a table.

Monitored Variables

The ’MONITORVAR=’ option suppresses analysis results if those results are based on cases or observations that have the same value on one or more sensitive variables. These sensitive variables need not be included in the current analysis run, but their distribution is monitored nevertheless.

For example, you may not want to release analysis results based on cases that are all from the same institution (such as from the same prison). Assuming that there is a variable named ’prison’, you could specify that variable as one to be monitored.

By default the cases must come from at least two distinct categories of the monitored variable(s). However, you can specify a higher required number of categories by giving the desired number of categories in parentheses after the variable name. See the example below.


MESSAGES TO DISPLAY IF AN ANALYSIS IS NOT ALLOWED

If a requested analysis is not run or if analysis output is suppressed, the user receives an explanatory message. The default messages are given below, but they can be modified by inserting revised messages in an SDA internationalization file for analysis output (and then specifying the pathname of that modified language file within the SDA Manager). It is possible to insert an HTML link in a message, if you want the user to be able to link to some document that explains in more detail what the disclosure rules are and why they have been implemented.

The default messages, following the keyword that would be used in a language file, are as follows. Notice that one or more variable names or a number will sometimes be output after the given message. Those names or numbers are the values specified with the keywords described above.

DIS_VAREXCLUDE = To preserve confidentiality, analyses are not permitted using the following variable(s):

DIS_COMBEXCLUDE = To preserve confidentiality, analyses are not permitted using the following combination(s) of variables:

DIS_VAREXCLUDE_RECODE = To preserve confidentiality, RECODE and COMPUTE are not permitted using the following variable(s):

DIS_MAXFILTERS = To preserve confidentiality, the number of filter variables cannot be greater than:

DIS_CONTROLVAR = To preserve confidentiality, tables cannot be run with control variables.

DIS_LISTCASE = To preserve confidentiality, the LISTCASE program cannot be used with this dataset.

DIS_SUBSET = To preserve confidentiality, the SUBSET program cannot be used with this dataset.

DIS_AVGCELLMIN = To preserve confidentiality, tables cannot be displayed unless the average number of observations in each cell is at least:

DIS_AVGCELLWMIN = To preserve confidentiality, tables cannot be displayed unless the average weighted number of observations in each cell is at least:

DIS_MINCELLN = To preserve confidentiality, tables cannot be displayed unless the number of observations in each cell is at least:

DIS_MINCELLWN = To preserve confidentiality, tables cannot be displayed unless the weighted number of observations in each cell is at least:

DIS_MINCASEBYIVAR = To preserve confidentiality, regression analyses cannot be shown unless the ratio of valid observations to the number of independent variables is at least:

DIS_MONITORVAR = To preserve confidentiality, analysis results cannot be displayed for any set of observations that has only a very small number of values on certain sensitive variables. In this case the sensitive variable(s) (and the minimum required number of valid values) was:

DIS_UNWEIGHTEDN = To preserve confidentiality, only weighted N’s can be shown.


EXAMPLE OF A DISCLOSURE FILE

In the following example, note that blank lines and lines beginning with ’#’ are treated as comments, and they are ignored by the SDA programs.

# DISCLOSURE SPECIFICATIONS FOR DATA FILE

# ID FOR THIS DATASET
ID = survey25

# A. PREVENTS AN ANALYSIS FROM BEING RUN

# Completely exclude these vars from analysis and recoding/computing
VAREXCLUDE = CASEID, LOCATIONID

# Exclude these combinations of vars (separated by ’;’) from analysis
# Also exclude the individual vars from being used by the ’recode’
#  and ’compute’ programs
COMBEXCLUDE = RACE, GENDER; AGE, RACE

# Maximum number of selection filters allowed in an analysis run
MAXFILTERS = 2

# No tables with a control variable if set equal to ’no’
CONTROLVAR = no

# The LISTCASE program cannot be run if set equal to ’no’
LISTCASE = no

# The SUBSET program cannot be run if set equal to ’no’
SUBSET = no


# B. SUPPRESS ANALYSIS OUTPUT AFTER RUNNING A PROGRAM

# Required average (mean and median) cell sizes - unweighted and weighted
AVGCELLMIN = 10
AVGCELLWMIN = 200

# Required size of smallest cell - unweighted and weighted
MINCELLN = 5
MINCELLWN = 100

# Ratio of cases to number of independent vars in regression
MINCASEBYIVAR = 100

# Check for at least 2 distinct values on the variable ’INSTITUTION’
#  and at least 3 distinct values on ’CBSA’.
MONITORVAR = INSTITUTION CBSA(3)

# Suppress all unweighted N’s if set equal to ’no’
UNWEIGHTEDN = no


EXAMPLE OF A LANGUAGE FILE WITH EMBEDED LINKS

The words "preserve confidentiality" are set up to link to a file that could explain further the disclosure rules and the reasons for setting them up.
DIS_AVGCELLMIN = To preserve confidentiality, tables cannot be displayed unless the average number of observations in each cell is at least: DIS_AVGCELLWMIN = To preserve confidentiality, tables cannot be displayed unless the average weighted number of observations in each cell is at least: DIS_COMBEXCLUDE = To preserve confidentiality, analyses are not permitted using the following combination(s) of variables: DIS_CONTROLVAR = To preserve confidentiality, tables cannot be run with control variables. DIS_LISTCASE = To preserve confidentiality, the LISTCASE program cannot be used with this dataset. DIS_MAXFILTERS = To preserve confidentiality, the number of filter variables cannot be greater than: DIS_MINCASEBYIVAR = To preserve confidentiality, regression analyses cannot be shown unless the ratio of valid observations to the number of independent variables is at least: DIS_MINCELLN = To preserve confidentiality, tables cannot be displayed unless the number of observations in each cell is at least: DIS_MINCELLWN = To preserve confidentiality, tables cannot be displayed unless the weighted number of observations in each cell is at least: DIS_MONITORVAR = To preserve confidentiality, analysis results cannot be displayed for any set of observations that has only a very small number of values on certain sensitive variables. In this case the sensitive variable(s) (and the minimum required number of valid values) was: DIS_SUBSET = To preserve confidentiality, the SUBSET program cannot be used with this dataset. DIS_UNWEIGHTEDN = To preserve confidentiality, only weighted N’s can be shown. DIS_VAREXCLUDE = To preserve confidentiality, analyses are not permitted using the following variable(s): DIS_VAREXCLUDE_RECODE = To preserve confidentiality, RECODE and COMPUTE are not permitted using the following variable(s):

SEE ALSO

internationalization Using non-English languages in SDA
precision Precision specifications
QuickTables Quick Tables documentation


CSM, UC Berkeley/ISA
September 22, 2015