SDA 3.4 Documentation for RECODE


recode - recode variables


recode -b filename


RECODE uses one or more existing variables as input to create a new SDA variable.

Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is in HTML, which can be viewed with a Web browser. Users who run this program interactively should see the online help document.

It is also possible to run the program directly by preparing a command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the ‘-b’ option flag.


The batch file is laid out in separate parts, separated by asterisks (*). The parts can be given in any order. Since the "map," category labels, and descriptive text can have varying numbers of lines, each of those parts ends with an asterisk (*) on a line by itself. The general layout is as follows:

     (Input and output definitions)

     (Recode map)

     CATLABELS=       [optional]
     (Category text and labels)

     TEXT=            [optional]
     (Descriptive text)


The specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, and the valid keywords are as follows (with significant characters shown in capital letters):

Defining Input Variables

Keyword       Possible Specification          Default (if no keyword)

STudies=      path of source dataset(s)       Look for input variables
                                               only in current directory

INvars=       name(s) of input var(s)         REQUIRED

Defining the New Variable

Keyword       Possible Specification          Default (if no keyword)

OUTSTudy=     path of study for new variable  Current directory

OUTVar=       name of new variable            REQUIRED

LABEL=        long label for new variable     No long label

CATlabels=    (precedes lines of category     No category text
                text - see details below)      or labels

MAP=          (precedes lines with recode     REQUIRED
               map or rules - see below)

MD=           list of invalid codes, ranges   No defined MD codes
              (also used for output value
               if input has missing data
               -- see below)

MIN=          minimum valid code              No defined minimum

MAX=          maximum valid code              No defined maximum

OVERwrite=    yes                             Do not overwrite new var
                                                if it already exists

OTHercases=   name of the input variable      Set to MD code
               from which to take the value    (or system-missing)
               for cases that do not match
               a pattern in the MAP

TEXT=         (precedes lines of descriptive  No item text
                text - see details below)

Other options

Keyword       Possible Specification          Default (if no keyword)

DIAGnostics=  yes                             No diagnostic summary of
                                                the new variable

COLorcoding=  yes                             No colored headings in the
                                                diagnostic output

GVARCase=     LOWER or UPPER                  Do not convert all variable
                                                names to lower/upper case

LAnguagefile= Name of file with non-English   English labels on
                labels and messages             output

SAVebatch=    name of directory               No file preserved with batch
                                                commands to create new var
                                                (for interactive version)
                                                The batch file name is the
                                                name of the new variable,
                                                with the suffix ’.rec’


Most keywords can be abbreviated. Usually only two or three characters are required. The keyword for the category text for the new variable, for instance, can be given as "catlabels=" or "catlab=" or even "cat=". Either upper or lower case may be used. If keywords are repeated, the second specification will override the first.


Anything on a line beginning with "#" is ignored by the batch processor and can therefore be used for comments. Blank lines are also ignored.


The rules for combining the values of one or more input variables into a value on the output variable are contained in the recode map. First put the MAP keyword on a line by itself; then put each recode rule on a separate line. The general format is as follows: New value: values on var 1 [; values on var 2; ... ] The recode rules for different input variables are separated by a semicolon (;). After the last rule, put an asterisk (*) on a line by itself. For example, to recode age and gender into 4 categories (younger male, younger female, older male, older female), one could construct the following recode map:

     1: 18-49; 1
     2: 18-49; 2
     3: 50-97; 1
     4: 50-97; 2

Each recode rule can include more than one value or range for each input variable. A single asterisk (*) in a recode rule matches any VALID value of the corresponding input variable. Two asterisks (**) match ANY value, including missing-data (both user-defined and system-missing) and out-of-range values. It is possible to have more than one rule for a given output value -- notice that the output code 4 has three rules in the example given below.

     1: 1,3-5,7 ; 1-10
     2: 1,3-5,7 ; 11-50
     3: 1,3-5,7 ; 51-90
     4: 8-10,12 ; *
     4: 41,45,55; 11-90
     4: 61-90   ; *
     9:    **   ; **

If a case matches more than one recode rule, the first rule encountered will apply. Notice in this example that the recode rule ‘**; **’ matches all values of the two input variables; any cases not covered by a rule higher up in the map will receive the value 9.


If a case does not match any of the recode rules the output variable can take on one of several values, depending on the options that were specified.


Category text and labels for one or more codes of the new variable can be supplied. First put the ‘CATlabels=’ keyword on a line by itself; then specify on a separate line each code, followed by one or more spaces or tabs, then the category text [and short label, if desired]. (Programs such as TABLES and MEANS will use the short label for a category, if one is available.) Put an asterisk (*) on a line by itself after the last label. For example:

     1 Professional and technical [Prf,Tech]
     2 Managers
     3 Blue collar workers [Blue Col]
     4 Other
     9 Missing


Recode only works with NUMERIC variables, but it can handle character values that have been defined as missing-data codes (such as ‘D’ or ‘R’). One of the examples below illustrates this application.


Descriptive text may be stored with the new variable. This text can then be displayed when the variable is used in analysis programs or in a codebook. First put the ‘TEXT=’ keyword on a line by itself; then write as many lines of text as you wish to store with the new variable. Put an asterisk (*) on a line by itself after the last line of text.


RECODE commands for more than one variable can be included in the same batch file. After the first set of commands, put a line beginning with two asterisks (**); then the commands for another new variable can follow. The value of the ‘STudies=’ keyword is carried over from the previous set of commands, unless it is respecified.


RECODE can read most older CSA recode commands. The following keywords are still recognized and are equivalent to the new keywords shown in parentheses: The missing-data keywords ‘md1=value1’ and ‘md2=value2’ are also recognized and are equivalent to the new form: ‘md= value1, value2’.

Note, however, that in the CSA recode rules, a single asterisk (*) matches ALL values of an input variable. SDA distinguishes between a single asterisk, which matches only the VALID values of an input variable; and two asterisks, which match ALL values.


1. Collapse age into 3 categories

study = /sda/testdata invar = age outvar = age3 label = Collapsed age - 3 categories md = 9

map= 1: 18-29 2: 30-49 3: 50-97 *

catlabels= 1 <30 2 30-49 3 50+ 9 missing *


2. Recode age and gender into 4 categories

invars = age gender outvar = agesex label = Age-gender typology overwrite = yes md = 9

map= 1: 18-49; 1 2: 18-49; 2 3: 50-97; 1 4: 50-97; 2 *

catlabels= 1 Yng Male 2 Yng Feml 3 Old Male 4 Old Feml 9 Missing *

text= This variable is a four-category typology of age and gender * **

3. Collapse highest and lowest values of age

study = /sda/testdata invar = age outvar = age2070 label = Collapsed age - 20-70

# Note the use of the ‘othercases=’ option; # only the codes given in the map are changed. othercases = age

# We want the previous MD codes of 99 to stay as MD md = 99

map= 20: 1-20 70: 70-97 *

catlabels= 20 20 or younger 70 70 or older *


4. Convert character missing data codes to numbers

invar = spend outvar = numspend label = Recoded spend variable md = 8,9

map= 1: 1-2 2: 3 8: D 9: R *

catlabels= 1 A lot 2 Not enough 8 Don’t know 9 Refused *


CSM, UC Berkeley
January 25, 2010