SDA 4.1 Documentation for XCODEBK


NAME

xcodebk - Produce a codebook (in batch mode) for an SDA dataset or a DDL file

USAGE

xcodebk -b command_file [-t language_file] [-n]

Meaning of the option flags:

-b filename
A file of commands for XCODEBK must be prepared; the name of this file is given on the command line after the '-b' flag.

-n
Check the syntax of the command file (but do not execute the commands)

-v filename
Write a list of variables contained in the codebook onto the file `filename'. The names of the variables are written without headings, template commands, or recoding specifications.

-t filename
Write the strings used by XCODEBK into the specified file, so that they can be replaced with strings in another language. XCODEBK will then use the revised strings file, if its name is specified in the SDA Manager. See the internationalization document for details.
Every time XCODEBK is run, a message is appended to the file named `XCODEBK.MSG'. If warnings or error messages are generated by the program, they are put in that file, and a message to that effect appears on the user's screen.

CONTENTS OF THIS DOCUMENT


OVERVIEW

Although a satisfactory codebook with many available options can be generated by the version 4 SDA Manager, the full set of codebook options can only be accessed by running the XCODEBK program in batch mode. The present document describes how to prepare the necessary files to do this.

XCODEBK produces documentation based either on an SDA dataset or on a DDL file (any version).

There are three kinds of output from the XCODEBK program:

  1. A set of HTML files, for creating an online codebook. An HTML codebook can display stratified distributions for each variable.

  2. A file tagged with formatting markers, intended to be input into the Microsoft Word program (as described here.)

  3. A plain text file. Although this file can be viewed or printed, this option has largely been superseded by the other two output options.

This document describes the meaning and use of the various options. It also provides instructions for preparing the necessary input files. Since the three output options require somewhat different preparation, the special instructions for each output mode are given in separate sections below.


FILES TO PREPARE

The program will use the contents of certain files that you can prepare and specify.


Command File

The only required file is the command file which is specified after the '-b' option flag. Keywords and examples are given below for each type of codebook output:

The other files described next are optional, and the name of each such file, if used, is specified in the command file. (The keywords shown in parentheses are the corresponding commands to be inserted in the command file.)



SDA dataset, or DDL file, (STudy=directory or DDL=filename)

By default the codebook program looks for an SDA dataset in the same directory in which the codebook program is being run. If the SDA dataset is located in another directory, it is necessary to specify the pathname (relative or absolute) of that directory. The pathname is given in the command file, with the 'STudy=' keyword.

More than one SDA dataset can be specified, provided that they all have the same number of cases and that the cases are in the same order in each dataset. The clearest way to specify multiple SDA datasets is by repeating the 'STudy=' keyword on multiple lines in the command file. It is also possible to list multiple datasets after one 'STudy=' keyword, by separating the dataset names with a comma or a space.

Even if you specify an explicit study pathname, the program will continue to look for an SDA study in the current directory, as it searches for the variables to include in the codebook. If a variable specified in a Variable List cannot be found in any study directory, an error message will be placed in the 'XCODEBK.MSG' file.

You can also generate a codebook using only a DDL file, instead of a complete SDA dataset. Such a codebook, of course, cannot include frequencies, percentages, or summary statistics; but the basic documentation for a dataset can be produced in this way. The pathname of the DDL file is given in the command file, with the `DDL=' keyword.

Only one of these two keywords can be used for a single codebook procedure.


List of Variables (varlist=filename)

By default, the codebook will include descriptions of all the variables in the specified SDA dataset or DDL file. The variables will appear in the codebook either in alphabetical order (for SDA datasets) or in the order in which they are defined (in a DDL file).

If you want to limit the codebook to a subset of variables or output them in a different order, you must prepare a file with a list of variable names, given in the order in which they should appear in the codebook. The variables can be listed with one or more names on each line, separated by spaces, tabs, or commas.

The name of each variable can be listed with temporary commands for recoding or collapsing the variable, just as the temporary transformations in the analysis programs.

Formatting Commands in the Variable List

The list of variables may include certain formatting commands, in addition to the names of variables. Those commands must appear on separate lines. The available commands are as follows:

* and **
A line in this file beginning with an asterisk (*) will be printed (minus the asterisk) in the table of contents as a subheading; this is useful for providing headings for groups of variables. A line beginning with TWO asterisks (**) will, in addition, insert the remainder of the line in the body of the codebook as a section heading, preceded and followed by the defined "divider" string.

In general, it is better to use headings with two asterisks (**). Note that with Word codebooks, headings specified with a single asterisk will not appear at all (because the table of contents is generated by scanning the body of the codebook).

2* and 2**
A line in the file beginning with `2*' or `2**' will be output as a second level heading. The distinction between one and two asterisks is the same as described above for the main first level headings. In general, it is better to use headings with two asterisks (**). For HTML codebooks, the second level headings will appear in a smaller font than the first level headings.

@ and @@
A line beginning with `@name' will begin processing all subsequent variables using the template named `name'. The use of templates is explained below. A line beginning with `@@' will return processing to the default template. If no template is named in the variable list, the variable descriptions will be formatted using the first defined template; if no templates have been defined, the default template will be used.

It is often useful to have one template for variables with just a few categories, and to have another template for variables with many categories. The default template is designed primarily for variables with just a few categories and does not include summary statistics like means and standard deviations.

[
A line beginning with `[filename' will insert into the codebook, before the next variable on the list, the contents of the file named `filename' in the `CBTEXT' subdirectory. See the discussion of the use of supplementary text files in the next section.

+
A line beginning with `+' is used for creating HTML codebooks. It indicates that you want to begin a new file before the next variable on the list. HTML codebooks are divided into multiple files so that it will not be necessary to load very large files just to view one part of a codebook. The codebook program itself will divide the output into separate files of approximately 50,000 - 100,000 bytes each, and it will try to use the headings in the variable list (indicated by `*') as a guide. But you can force a new file to begin at any point by putting `+' in the variable list. This capability is especially useful for creating good subsets of variables if there are no headings in the variable list. (The issue of the size of individual files is discussed below under the Size of Files. )

#
A line beginning with `#' is interpreted as a comment and is ignored.

EXAMPLE of a Variable List

An example of a variable list with the various commands that can be included in the file is given next. These command functions are also summarized in the corresponding online document.
# Start out using the default template. # For plain text codebooks, note that this will NOT force the use of the default # header, footer, and divider if they have been redefined in the template file. @@ # The following headings will appear BOTH in the table of contents # AND in the body of the codebook (2 asterisks) ** Case identification CASEID charid ** Attitudes about government spending spend,spend2, spend3 spend4 # For HTML codebooks, force a new file to begin here + ** Experiment on equal opportunity # Put the explanation of the experiment here - file `exnote' in CBTEXT [exnote eqopp eqrandom ** Political ideology and party ideo party # Use a template with statistics for the following variables @cstats ** Background variables age income educ # Return to the default template @@ employed gender marital race # Use another template for the weight variable @stats ** Weight variable casewt


Supplementary Material (doclink=filename)

Supplementary material of any kind can be inserted into the online HTML codebook by specifying its filename. The file can be of any type that a browser can display (HTML, PDF, PNG, etc.). Up to 100 of these supplementary files may be specified.

The specified file is copied to the output codebook directory, maintaining the same name. A link to the (copied) file will be added to the codebook under the heading "Supplementary Documentation". If an optional (but recommended) label is supplied it will be displayed next to the file link. Otherwise, only the filename link will be displayed.

Some examples of supplementary information are the following:

     doclink = /mycomputer/mywork/samplinginfo.pdf(Sampling Document)
     doclink = /mycomputer/mywork/networthchart.png(Chart of Networth by Age)

Note that this supplementary file option allows for more types of files than the title, intro, and appendix file options, discussed next, which are basically limited to plain text files -- such as the files created by Notepad or vi or Word using the "Save as Plain Text (.txt)" option. And this supplementary file option is the only one supported by the SDA Manager. To use the title, intro, and appendix file options, you must run the XCODEBK program in batch mode.

(The only virtue of of the title, intro, and appendix file options is that they are carried over to the tagged files for Microsoft Word. However, it is usually easier to use Word directly for adding this material rather than using these options. Furthermore, by adding them directly to a Word file, you can use more file types.)


Title Page File (title=filename)

Codebooks have a title page at the beginning. The default title page contains the SDA study title, the number of cases, and information about weight or filter variables used in running the codebook. Codebooks formatted for HTML also include the date the codebook was created; in addition, the study title becomes the HTML title specification for the set of codebook files that are generated.

If you want a special title page, prepare a plain text file to use, and specify the name of the file to the program. Codebooks will use that file's contents as the title page.

If a title file is specified for an HTML codebook, the contents will be inserted after a '<pre>' tag, which turns off the usual HTML processing. Then after the contents have been inserted, a '</pre>' tag will be added, to turn HTML processing back on.


Introductory Material (intro=filename)

An introduction or various types of study-level information can be inserted into the codebook before the descriptions of variables. This material should be placed in one or more plain text files, spaced out just as you want it to appear. The names of these files are then specified to the codebook program as "intro" files. Up to 100 of these intro files may be specified.

Each introductory file should be given a heading. The heading serves two purposes: (1) the heading will be inserted in the table of contents (or the HTML index); and (2) the heading can (optionally) be centered at the top of the codebook page, before inserting the contents of the specified file; prefix the heading with `**' to insert the heading into the codebook itself.

An example of a set of study-level introductory documents is the following:

     intro = general(**GENERAL DESCRIPTION OF THE STUDY)
     intro = sampinfo( DESCRIPTION OF SAMPLING PROCEDURES)
     intro = convent1( CONVENTIONS USED IN THE DATA FILE AND CODEBOOK)

Note that the heading for the first file (minus the two asterisks) will appear BOTH in the table of contents AND in the codebook centered at the top of the page, preceded and followed by a divider, before the contents of the file named `general' (the filename can be a full pathname and need not be in the current directory). The headings for the other two files will appear only in the table of contents; presumably the files themselves contain the same or similar headings. If the heading for a file is omitted, the word "Introduction" will appear in the table of contents (or HTML index).

If an introductory file is specified for an HTML codebook, the contents will be inserted after a '<pre>' tag, which turns off the usual HTML processing. Then after the contents have been inserted, a '</pre>' tag will be added, to turn HTML processing back on.

Each introductory file will begin on a new page and may extend over as many pages as you wish. For plain text codebooks, if the automatic page break comes at an inappropriate point in your file, you can force a skip to a new page by including a line containing only `%P' in the first two columns. If you want to force a skip to a new ODD-NUMBERED page, use `%OP'. If you want to force a skip to a new EVEN-NUMBERED page, use `%EP'. The first intro file will always begin on an odd-numbered page, unless one-sided printing has been requested. Placing `%OP' at the beginning of a subsequent intro file will ensure that the corresponding introductory section will also begin on an odd- numbered page. (These page commands are ignored for HTML and for tagged output.)


Appendices and Notes to the Codebook (appendix=filename)

Appendices, notes, or other additional material can be added to the codebook, after the descriptions of the variables, by specifying certain files as appendix files. This material should be placed in one or more plain text files, spaced out just as you want it to appear.

If an appendix file is specified for an HTML codebook, the contents will be inserted after a '<pre>' tag, which turns off the usual HTML processing. Then after the contents have been inserted, a '</pre>' tag will be added, to turn HTML processing back on.

For plain text codebooks, the `%P', `%OP', and `%EP' commands can be inserted in the files to force a new page, as for introductory files. The contents of each file will begin on a new page. The first appendix will always begin on an odd-numbered page, unless one-sided printing has been requested. Placing `%OP' at the beginning of a subsequent appendix file will ensure that the corresponding appendix will also begin on an odd-numbered page. (These page commands are ignored for HTML and for tagged output.)

The file names and headings are specified like introductory files as follows:

     appendix = filen1(Note 1: State and Country Codes)
     appendix = filen2(Note 2: Codes for Ethnic Groups)
     appendix = filen3(Note 3: Codes for Religious Groups)
     appendix = file1(**Appendix A: Description of Weighting Procedures)
     appendix = filex(**Appendix B: Outcome of Fieldwork)
     appendix = filesp(**Appendix C: Text for 'other specify' Responses)

In the above example, the six note and appendix files will be output after the descriptions of individual variables. The heading for each file will be listed in the table of contents after the list(s) of variables. In addition the headings for the last three files will be centered, preceded and followed by a divider, and inserted into the codebook before the contents of each of the three appendix files. Each note or appendix will begin on a new page.

Up to 100 of these appendix or note files may be specified. If the heading for a file is omitted, the word "Appendix" will appear in the table of contents (or HTML index).


Additional Blocks of Text for the Codebook (cbtext=directory)

It is frequently useful to insert material into the codebook that either supplements the description of a particular variable or is independent of any specific variable but is relevant to a series of variables.

To use this feature, first create a subdirectory named `CBTEXT'. If the codebook is being generated from an SDA dataset, the CBTEXT directory should ordinarily be located at the same level in the SDA study directory as the VARS and STUDYINF subdirectories. If the codebook is being generated from a DDL file, the CBTEXT directory should ordinarily be a subdirectory of the current directory, where the codebook program will be run. If you want to put the CBTEXT directory in some other location, you must use the `CBTEXT=' keyword to indicate where to find the directory containing the CBTEXT directory.

After you have created the CBTEXT directory, write each block of supplementary text into a separate plain text file in that directory. If you want the contents of a file to be included WITHIN the description of a specific variable, the name of the file should be the SAME as the name of that variable. If, on the other hand, the text is to be included in the codebook BETWEEN the descriptions of two variables, the name of the file should be DIFFERENT from any variable name.

To place each block of text into the codebook, follow one of two procedures, depending on where you want the text to go:

WITHIN variable descriptions: To insert extra text WITHIN the description of a particular variable, put the keyword `CBTEXT' into the template used by that variable. The extra text (if any) will be placed into the variable description beginning at the location indicated in the template. (See the discussion below on 'Template Construction' for more information.) Not all variables need to have a file in the CBTEXT directory. But if there is such a file, and if the `CBTEXT' keyword is included in the template, the codebook program will retrieve the contents of the file and insert the text into the codebook at that point. Note that the default and built-in templates do not include the CBTEXT keyword. To use this option, you must create your own template and run the XCODEBK program in batch mode.

BETWEEN variable descriptions: To insert extra text into the codebook BETWEEN the descriptions of two variables, put the name of the file (the file that is located in the CBTEXT directory) into the variable list in the appropriate place, preceded by a left square bracket ([). For example, the following segment of a variable list would skip to a new page and then insert the contents of the file `xtext1' into the codebook after the variable description of var2 and before the description of var3:

     var1 var2
     [xtext1
     var3
     

The block of text will be preceded and followed by whatever the current `divider' is (usually a horizontal line extending the width of the page). If the filename in the varlist file immediately follows a heading indicated by two asterisks (which also forces a skip to a new page, plus the output of a divider), the text reference does not force yet another skip to a new page.

In older CSA versions of the codebook program, supplementary text WITHIN a variable description was placed in a directory named `TEXT2', and supplementary text BETWEEN variables was placed in a directory named `NOTES'. For purposes of compatibility with previous versions of the codebook program, XCODEBK will still look for files in those directories IF no directory named `CBTEXT' is found. One difference should be noted, however: The older codebook program put a divider after, but not before, the supplementary text taken from the NOTES directory. XCODEBK, on the other hand, puts a divider both before AND after such text when it is inserted into the body of the codebook.


Template File (template=filename)

There is a default layout for the description of each variable, for the header and footer on each page, and for the divider between variables. You can create your own codebook layout by putting one or more templates into a template file. For details see the 'Template Construction' section below.

Additional Hypertext Links File (hlink=filename)

If you are creating an HTML codebook, the codebook program automatically creates many hyperlinks that ease navigation between the various pages of the codebook. Besides these automatically-generated hyperlinks, you can request the creation of additional hyperlinks that link a variable description in your codebook to any URL on the Web. You specify the variable name, label and target URL of these additional links by creating an "hlink" file. For each variable for which a link is desired, add a line in the hlink file of the form: [varname] = [URL]. (The URL can be either fully-qualified or relative.) For example:
      myvar = mypage.html
By default, the text label for the hyperlink will be: "Link to additional information." (Alternatively, you can globally override this default by supplying a different label in the commands file specification.) However, if a variable-specific link label is desired, just add the text in parentheses after the URL. For example:
      myvar = mypage.html (myvar item in questionnaire)
The position of the hyperlink, relative to other elements in the variable description, is determined by the location of the "HLINK" keyword in the template used with the variable. (For details on templates see the section on 'Template Construction' below.) Finally, to add these links, you must specify the location of the hlink specifications file in the command file, using the "HLINK" keyword. (For details, see the section on 'Special Options for HTML Output' below.)


BASIC CODEBOOK SPECIFICATIONS


CATEGORY CODES, FREQUENCIES, AND PERCENTAGES

If a template includes the `CATEGORIES' keyword, the codebook program will attempt to output category codes and labels for variables which use that template. (The default template includes this keyword.) Such variables can be either numeric variables or character variables. The 'CATEGORIES' keyword can be used with an optional "format" argument that controls which labeled categories are displayed when there are no cases for that category. See the following section on 'Template Construction' for more about this option. In addition, the following command file options are also applicable to the display of category codes and labels:

Maximum Number of Categories to Display (maxcats=n)
The program counts the number of separate categories in each variable that would be shown in the codebook and compares that number with this limit. If the number of separate categories (not just the range) would exceed the limit, the display of category codes and labels is suppressed for that variable.

For codebooks with input taken from an SDA study, the default limit is 40 categories (after applying any selection filters), which is the approximate number that will fit on one page. For codebooks with input taken from a DDL file, the default limit is whatever number of categories are given labels in the DDL file for a variable. The user can reset the limit to any number between 1 and 1000 categories per variable.

Suppress Frequencies (pct=nonum)
If category codes and labels are being displayed, the frequency distribution is usually displayed as well. However, it is possible to request that no frequencies be displayed. This option is useful if you want to prepare a codebook that is applicable to data files other than the current one. (If input for the codebook is being taken from a DDL file, instead of from an SDA dataset, frequencies and percentages must necessarily be suppressed.)

Choice of Percentages (pct=include or both or nopct)
If frequencies are displayed, percentages are usually displayed as well. The default is to compute percentages based only on valid cases, EXCLUDING categories flagged as missing data or outside of the valid range. However, the user can specify that percentages be based on all the cases, INCLUDING missing data. It is also possible to request that BOTH types of percentages be displayed side by side, or that NEITHER type of percentage be displayed.

Documenting Blank or Invalid Fields (sysmdlabel=label)
When an SDA dataset is created, input fields (for numeric variables) that are completely blank or that contain invalid characters (which have not been defined as missing-data codes) are converted to the system missing-data value. If a particular variable has any cases with system missing-data, the codebook will include them in a category with the value `.' (a period).

The default label for the system missing-data category is `(No Data)'. You may provide your own label by putting it in the xcodebk command file after the keyword `sysmdlabel='.
(e.g., `sysmdlabel= Does not apply').

Notice, however, that if blank fields or fields with invalid characters have been converted to some numeric value by using the `blanks=' or `other=' keyword in the DDL file when the SDA dataset was created, those cases will be reported in the codebook under the value into which the blanks or other characters were converted.



SUMMARY STATISTICS

If a template includes the `STATISTICS' keyword, the codebook program will generate a set of summary statistics for each numeric variable which uses that template, provided that codebook input is taken from an SDA study and not from a DDL file. (The default template does NOT include this keyword.)

The statistics produced are the following: minimum valid value (excluding missing-data codes), maximum valid value, mean, median, standard deviation, and variance. The mean, standard deviation, and variance are displayed with three decimal places by default. The user can specify between 1 and 6 decimal places by appending the desired number in parentheses to the `STATISTICS' keyword in the template. For example, `STATISTICS(4)' will generate statistics with four decimal places.


WEIGHTS AND SELECTION FILTERS

Weights (weight=varname)
If a weight variable is specified, all frequencies, percentages, and other statistics are based on the weighted number of cases in each category. (This option is ignored if codebook input is taken from a DDL file.)

Selection Filters (filter=filtervar(range))
The frequencies, percentages, and other statistics can be limited to a subset of the cases by specifying one or more filter variables. For each filter variable, a list of individual codes and ranges must be given. Only those cases which match the specified values of ALL of the filter variables will be included in the calculation of frequencies, percentages, and other statistics. (This option is ignored if codebook input is taken from a DDL file.)



TEMPLATE CONSTRUCTION FOR VARIABLE DESCRIPTIONS (template=filename)

A description of a variable comprises several elements. Some elements such as labels for variables and categories have been previously stored in the SDA dataset. Other elements such as summary statistics are produced by processing each variable's data file. All of these elements are available for inclusion in the codebook, and they can be arranged on the codebook page in various ways. The codebook program has a default layout for the elements that describe a variable. In order to customize this layout, prepare a file containing one or more templates. A template shows the codebook program where on the page to put each element of a variable description. Examples of templates are given in a later section, but the elements or components are described here.

Elements of variable descriptions are specified after a line beginning with `*variable(name)', where `name' is the user- specified name given to the template. If only one template is defined for a codebook, it is not necessary to assign a name, and the form `*variable' is sufficient. However, to use more than one template in a codebook, the templates need to be named so that the variable list can specify which variables should use each template. As described above for the variable list file, all variables in that list that follow the line `@x' will use the template named `x' until another template is specified in that list. Note that the default template is always available for use, together with any user-defined templates.

If you are using the SDA Manager to create SDA datasets and generate codebooks, you should be aware that the SDA Manager does not allow users to upload their own template files. However, there are four built-in templates that users can include in a variable list for a dataset. (See them here.) If those templates are insufficient, you can always run XCODEBK in batch mode and supply your own templates.


LIST OF KEYWORDS FOR TEMPLATE FILES

In a template file each element or component to be included in the codebook is referred to by one of the following keywords (which may be given either in upper or lower case but may not be abbreviated). Not all keywords need to be used; if a keyword is omitted, the corresponding element will not appear in the codebook.

VARNAME or VAR

VARNAME is a line containing the variable name and the long label. If the long label is not desired, use VAR, which will retrieve only the variable name.

TEXT

The text segment that was stored when a variable was created. (Usually this is the text of a question.) If a variable was not created with a text segment, the corresponding space in the codebook is left blank.

CBTEXT

Everything in the CBTEXT file, if one exists for a variable. These files are discussed above under 'Additional Blocks of Text'. The keyword `TEXT2' is the equivalent of `CBTEXT', to maintain compatibility with older versions of the codebook program.

CATEGORIES or CATEGORIES(ALL) or CATEGORIES(CASES)

Category codes and labels; also percentages, if requested and input is from an SDA dataset. If percentages are requested, they are placed to the left of the category codes.

The `ALL' or `CASES' specification (in parentheses) is optional. It can be used to control which categories with labels are displayed if the codebook input is taken from an SDA dataset. (If input is from a DDL file, all defined category labels are displayed.)

CATEGORIESF or CATEGORIESF(ALL) or CATEGORIESF(CASES)

Same as CATEGORIES, except that the bracketed short labels, if any, are also output.

STATISTICS or STATISTICS(ndecimals)

Three lines with six summary statistics: minimum valid code, maximum valid code, mean, median, standard deviation, and variance. The mean, standard deviation, and variance are displayed with three decimal places by default; the form `STATISTICS(2)' would display those statistics with only two decimal places. The user may request between 1 and 6 decimal places. If the `categories' keyword is not used in the same template, the number of cases on which the statistics are based is also output on a separate line; the output includes the total N and also the valid N (if different from the total N). (If codebook input is taken from a DDL file, summary statistics cannot be produced, and this keyword is simply ignored.)

PROPERTIES

A set of lines (for numeric variables) which give information on missing data codes, codes specified as the valid minimum or maximum values, plus the data type (`numeric') and number of decimal places. For character variables, the only information provided by this element is the data type (`character') and the width of the character field. The keyword `PARAMETERS' is the equivalent of `PROPERTIES', to maintain compatibility with older versions of the codebook program.

SOURCE

A line telling where the variable came from in the original data file -- the record number and the column(s). The keyword `SOURCEF' is the equivalent of `SOURCE', to maintain compatibility with older versions of the codebook program.

GROUP

A set of lines that combines the information in the PROPERTIES and SOURCE elements. Those two tags can be used separately, but the GROUP tag provides a more compact format.

HLINK

A hypertext link to supplementary information, if an HLINK file was specified and an HTML codebook is being generated. See the discussion of 'Additional Hypertext Links' above.

DATE

A line with the date of creation of the SDA variable. (This keyword is ignored, if codebook input is taken from a DDL file.)

TITLE

A line with the study title.

TEMPLATE EXAMPLES

Below are some examples of templates that illustrate the use of the various elements listed above. (The default template is shown in Example 5.)

It is important to note that a variable template layout specifies several things: 1) which elements will be displayed; 2) the order of these elements; 3) the indent of each element (relative to the margin); and 4) the number of blank lines between the elements.

For tagged output and for plain text codebooks ALL of these characteristics make a difference in how the codebook is displayed. For HTML codebooks, however, the indents and blank lines in a template are ignored. For HTML codebooks it only matters WHICH elements are specified in the template and IN WHAT ORDER.

1. Simple example

This example shows a template named `simple'. The template defines only the layout for the description of variables; it retains the default header, footer, and divider.

IF this template is used to generate tagged output or a plain text codebook file, note how the keywords are indented and how blank lines are placed after each keyword. In the resulting codebook the element corresponding to each keyword will be printed beginning in the same column in which the keyword in the template file begins (not counting the left margin). One or more blank lines placed after a keyword in the template file will generate the same number of blank lines after the corresponding element in the resulting codebook.

IF this template is used for HTML, the only information that counts is which keywords are mentioned and what order they are in.

*variable(simple) VARNAME TEXT CATEGORIES SOURCE


2. More complex example

This example shows a more complex template named `full'. The template for variable descriptions includes both the regular text for each variable and the extra textual information found in the supplementary CBTEXT subdirectory; it also includes all the other basic elements defined for each variable.

This template file also contains templates for the divider, the header, and the footer for a plain text codebook file. (These specifications will be ignored, if the template is used for HTML or for tagged output.) The divider will be an unbroken series of `='s with a blank line on either side. The header for odd- numbered pages will have the study title on the left and the page number on the right. For even-numbered pages those two fields are reversed -- the page number is on the left, and the title is on the right. The footer will have the date centered in the middle of the line. (These are the same as the default headers and footers for a plain text codebook file.)


*variable(full)
VARNAME

        TEXT

        CBTEXT

        HLINK

              CATEGORIES

        PROPERTIES

        DATE
        TITLE
        SOURCE
*divider

========================================================================

*oheader
%t | | Page %p
*eheader
Page %p | | %t
*footer
| %d

3. Example with statistics

This example shows a template named `stats'. The `*variable' section is modified here by dropping the `CATEGORIES' keyword and adding `STATISTICS'. Variables with too many categories to display can often be summarized conveniently with the set of summary statistics, provided that the categories are ordered in approximately a linear manner. Note, however, that if codebook input is taken from a DDL file, the `STATISTICS' keyword (and the blank line after it in the template) are ignored.

Note also that the PROPERTIES and SOURCE keywords have been replaced by the single GROUP keyword.


*variable(stats)
VARNAME

        TEXT

        STATISTICS

        GROUP


4. Example with both categories and statistics

This example shows a template named `cstats', which includes both the `CATEGORIES' and the `STATISTICS' keywords. For some variables both kinds of information will be useful. And if a variable has too many categories to display, at least the summary statistics will be output. The `STATISTICS(4)' specification means that statistics will be output with four decimal places, instead of the default three decimal places. Note once again, however, that if codebook input is taken from a DDL file, the `STATISTICS' keyword (and the blank line after it in the template) are ignored.

The template named `stats' from the preceding example is also included here, to illustrate the point that more than one template definition can be included in the same template file.


*variable(cstats)
VARNAME

        TEXT

        CATEGORIES

        STATISTICS(4)

        GROUP
*variable(stats)
VARNAME

        TEXT

        STATISTICS

        GROUP

5. The default template

The default template is shown here, for reference purposes. Note that the length of the default divider (for a plain text codebook file) depends on the specified line length (the default length is 72 characters).

Each type of codebook output ignores any specifications that do not apply. HTML and tagged output ignore the header, footer, and divider template information. And the `HLINK' keyword is ignored for tagged and for plain text codebooks. Since each type of codebook output simply ignores specifications that do not apply, the same template can be used for more than one type of output.

*VARIABLE
VARNAME

     HLINK

     TEXT

     CATEGORIES

     GROUP
*DIVIDER
________________________________________________________________________

*HEADER
 %t | | Page %p
*EHEADER
 Page %p | | %t
*FOOTER
 | %d |


VERSION 4 BUILT-IN TEMPLATES

The SDA Manager does not allow users to upload their own template files. However, there are four built-in templates that users can include in a variable list for a dataset. The four built-in templates are the following:
  1. categ -- Category Distribution (only)
    *variable(categ)
    VARNAME
    
    TEXT
    
         CATEGORIES
    
         GROUP
    
    
  2. cstats -- Category Distribution plus Summary Statistics
    *variable(cstats)
    VARNAME
    
    TEXT
    
         CATEGORIES
    
         STATISTICS
    
         GROUP
    
    
  3. stats -- Summary Statistics Only (no category distribution)
    *variable(stats)
    VARNAME
    
    TEXT
    
         STATISTICS
    
         GROUP
    
    
  4. simple -- Simple Variable Identification
    *variable(simple)
    VARNAME
    
    TEXT
    
         GROUP
    
    
If a variable list does not begin with a template specification, the first template ('categ') will apply to all the variables until some other template is specified. The 'categ' template provides information on the category distribution of each variable (up to 40 categories), including percentages, but no summary statistics.

Although the version 4 SDA Manager does not permit users to upload and apply their own templates, user templates can be created in a text editor, and the XCODEBK program can be run in batch mode. Then the codebook URL can be linked to the dataset in the 'Configure Datasets' tab of the SDA Manager.



AVAILABLE OUTPUT FORMATS


CREATING AN HTML CODEBOOK (type=html)

An HTML codebook consists of a set of HTML files with links between the files. In this section the various options are explained.

Indexes

The HTML version of a codebook includes more than one index, if a variable list has been provided. (If no variable list has been provided, only an alphabetical index can be generated.)

In an HTML codebook, the sequential index, the alphabetical index, and the group indexes contain hypertext links to the location of each variable description in the codebook files. The index of sequential headings, on the other hand, contains links back to the same headings in the sequential variable index. The sequential index and the alphabetical index are usually divided into multiple files, with links between the succeeding files.

Name of File with Definitions of Groups of Variables (groups=filename)

In addition to the sequentially defined groups of variables based on the varlist, it is possible to define other groups of variables. The variables in these groups need not be next to each other in the codebook, and variables can belong to multiple groups. These groups can also include other groups of variables as subgroups.

The variable groups file can contain an unlimited number of variable group definitions. Each definition is of the form:

id = [group identifier -- used in other group definitions]
type = [group type]
label = [group label -- used in codebook output]
vars = [list of variables in group -- "vars" can be repeated]
groups = [list of variable sub-groups -- "groups" can be repeated]
*
Note that group definitions in a file are separated by an asterisk (*) as the first non-blank character on a line. Blanks lines and lines beginning with '#' are ignored.

An example would be as follows:

id = demo1
type = topical
label = Basic Background Variables
vars = age, education, gender, marital, income
*
id = demo2
type = topical
label = Additional Background Variables
vars = employed, occup, industry
*
id = demographics
type = topical
label = Background Variables
groups = demo1, demo2
*

In this example, note that the group "id" is used as a way to include a group as a subgroup in a more general group. The "type=" specification can be omitted. If it is used, it is intended to be a description of the type of grouping that is being done. (At present, nothing is done with that information.)

Names of HTML Codebook Files (savefile=rootname)

A codebook in HTML format is divided into many small files, to facilitate quick reading by the browser. If the `savefile=' keyword is omitted, the names of the codebook files will begin with `hcbk', and they are created in the current directory (where you are running the XCODEBK program). The "home" HTML codebook page is named `hcbk.htm' and the XCODEBK program adds various suffixes and numbers to generate the other filenames.

It is possible to have XCODEBK create files with a prefix other than `hcbk' by specifying the desired prefix with the `savefile=' keyword. This specified prefix can have up to four characters; if a longer prefix is given, only the first 4 characters will be used. Most files for the same codebook, consequently, have names that begin with `hcbk' or the user-specified prefix with two exceptions:

  1. The file `tree_items.js', which contains the JavaScript specifications for the variable tree for the SDA interface, does not have a prefix.
  2. The file 'hcbksub.txt', which contains the variable tree for the subset procedure, is always created, even if another version with a user-supplied stem name is also created.
In order to avoid name conflicts, you should only place one HTML codebook in a directory.

If you want to save the codebook files in a directory other than the current directory, you specify the (relative or absolute) path for that directory, plus the desired prefix, after the `savefile=' keyword. That directory must already exist before you run the XCODEBK program. Also, you must specify the prefix, even if it is the same as the default `hcbk'. For example, if you specify `savename=Codebook/hcbk', the subdirectory `Codebook' must already exist. The codebook files will then be placed in the subdirectory named `Codebook', and the files will have the root name `hcbk'. The files `tree_items.js' and 'hcbksub.txt' will also be placed in that subdirectory.

Size of Files (hsize=n)

HTML codebooks are divided into many small files, to facilitate rapid transfer to the browser of just the portion requested by the user. Each introductory file, appendix file, and index is stored as a separate file, referenced from the main codebook page.

The body of the HTML codebook, containing the descriptions of variables, is divided into separate files in the following manner: The maximum file size (default = 100,000 bytes) is divided in half, to produce a minimum file size. After a codebook file has reached the minimum size, the program will start a new file when it encounters a heading in the variable list (indicated by `*' or `**' in the varlist file). If the maximum file size is reached before a heading is encountered, the next variable description will start a new file anyway.

The maximum file size can be specified at the time the codebook is generated. If the codebook files are to be accessed mostly through slow modem links, the maximum might be set as low as 20,000 bytes. On the other hand, fast ethernet connections can easily support maximum file sizes of 200,000 bytes or more. The default size of 100,000 bytes is intended as a reasonable starting point for many applications. Notice that there is no particular reason to set the maximum file size to a very large number. Having many small files is generally better than having a few large files.


KEYWORDS FOR COMMAND FILES FOR HTML OUTPUT

The command file contains specifications for the codebook options. These specifications are given in the form "keyword = something" in upper or lower case, with one keyword per line. Keywords may be given in any order.

The valid keywords are as follows (with significant characters shown in capital letters):


TYPE OF CODEBOOK TO CREATE


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

TYPE=          HTML [create HTML files]         (Required)


SOURCE OF THE DATA


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


(Only one of the following two keywords can be used in the same run.)

STudy=         path(s) of dataset directories   Look for variables only
                (can be repeated)                 in current directory
                                                  (or in a DDL file)

DDL=           name of DDL file                 Look for an SDA dataset


VARIABLE LIST AND TEMPLATES


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

VARlist=       filename of list of variables    All variables in
                                                  alphabetical order
                                                  (for an SDA study)
                                                  or in the order found
                                                  (in a DDL file)

GROUPsfile=    name of file with group names    No extra variable groups
                and variables in each group

TEmplate=      filename containing              Default template used
                 template(s)


OUTPUT LOCATIONS


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

SAvefile=      directory/prefix for output      hcbk
                (see note for HTML filenames)    (in current directory)


Errorfile=     filename to receive messages     XCODEBK.MSG
                 about errors and warnings

SOURCE OF SUPPLEMENTARY, INTRODUCTORY, AND APPENDIX TEXT


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

DOCLink=       filename(heading)                No supplementary
                 (can be repeated)               documents

INTro=         filename(heading)                No introduction
                 for intro material
                 (can be repeated)

Appendix=      filename(heading)                No appendix
                 for appendix
                 (can be repeated)

CBTEXT=        directory in which to find       SDA study directory
                 the /CBTEXT subdirectory         (for SDA input)
                 (for supplementary text)        or current directory
                                                  (for DDL input)

TItle=         filename of title page           Default title page

FILTER AND WEIGHT VARIABLES


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

FILTer=        names and codes of filter        No selection filter
                 variables; for example:
                 filter=gender(1),age(18-50)
                 (can be repeated)
                 (ignored for DDL input)

Weight=        name of weight variable          No weighting for
                 (ignored for DDL input)          frequencies or stats

VARIABLE FOR A STRATIFIED CODEBOOK


(Only one of the following two keywords can be used in the same run.)

ROWVAR=        name of stratifying variable     No row stratifier

COLUMNVAR=     name of stratifying variable     No column stratifier

(If 'all' is given after the name of the stratifying variable, then
all categories of the stratifying variable are shown.  Otherwise,
the categories with missing data are excluded.
For example: rowvar = year all)
 

PROCESSING INSTRUCTIONS


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

LANGfile=      name of file with non-English    English labels on
                 labels and messages              output

MAXCat=        max number of categories         SDA input: 40 rows
                 to display                       or 20 columns
                                                DDL input:
                                                  all with labels

PCt=           EXCLUDE [missing data in pcts]   EXCLUDE [missing data]
                  INCLUDE [missing data]
                  BOTH [kinds of percents]
                  NOPCT [no percentages]
                  NONUM [no frequencies
                    or percentages]
                 (all ignored for DDL input)

SYSMDlabel=    Label for system missing-data    (No Data)
                 (ignored for DDL input)
                 (overrides the "SYSMIS_LABEL"
                  string in the Language File)
 

SPECIAL OPTIONS FOR HTML OUTPUT


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

HLINK=         filename with external links

HSIZE=         maximum file size                      100
                 (in 1000's of bytes)

HEADER=        name of file for header          No logo, text, or links
                 text or for logo                 put at TOP of title page
                                                  and main index pages

FOOTER=        name of file for footer          No logo, text, or links
                 text or for logo                 put at BOTTOM of title page
                                                  and main index pages

FRAMes=        YES (Create codebook             Codebook created without
                   using HTML frames)             HTML frames

Repetition of Keywords

The `intro=' and `appendix=' keywords can be repeated, up to a total of 100 times each; multiple study-level introductory and appendix sections are included in this way.

Multiple filter variables can be specified either by including more than one on the line after `filter=' or by using multiple `filter=' lines. (Filter specifications are ignored for DDL input.)

If other keywords are repeated, an error message will result.

Comments

Anything on a line beginning with `#' is ignored by the command processor and can therefore be used for comments. Blank lines are also ignored.

EXAMPLES OF COMMAND FILES FOR HTML OUTPUT

1. Command file with multiple study documents


type = html

study = natlrace
varlist = rvarlist
savefile = race
template = template.cb

# Show codes as `.' that were blank in the original data file;
#   also label them "Blank - Does Not Apply" unless another
#   label is available for the relevant category.
blankconvert = Blank - Does Not Apply

title = titlefile

doclink = /mycomputer/proposal.pdf (Proposal for the Study)
doclink = /mycomputer/location.jpg (Photo of the Study Location)

intro = general(General Introduction to the Study)
intro = sponsors(Organization and Funding of the Study)

appendix = note1(Codes for States and Countries)
appendix = note2(Codes for Religious Denominations)

# The following headings go at the top of each appendix
#   file (because of the '**'), as well as in the index.
appendix = appA(**Sample Description)
appendix = appB(**Description of Weighting Procedures)
appendix = appC(**Outcome of Fieldwork)


2. Command file with weight and filter variables


type = html

study = natlrace
varlist = rvarlist
savefile = rcodebk.txt
template = template.cb


weight = casewt
filter = race(1-5) gender(1)


title = titlefile
intro = introfile

appendix = appA(Sample Description)
appendix = appB(Description of Weighting Procedures)


3. Command file for an HTML codebook stratified by 'gender'


type = html

study = natlrace
varlist = rvarlist
savefile = rcodebk.txt
template = template.cb


columnvar = gender


title = titlefile
intro = introfile

appendix = appA(Sample Description)
appendix = appB(Description of Weighting Procedures)



CREATING A TAGGED OUTPUT FILE FOR WORD (type = tagged1side or tagged2side)

The XCODEBK program can generate a file with style tags that can be input into Microsoft Word. The Word file can then be formatted, viewed, or printed. It can also be converted to a PDF file, if the appropriate software is available.

In order to allow Word to read and format the tagged codebook file correctly, it is necessary to install a special macro on the PC that will be reading the tagged file. There is a separate document that describes how to install the macro and use it.


Formatting for One- or Two-Sided Printing

Since codebooks tend to be quite large, they will generally be printed or copied onto both sides of a page. However, the user can specify that either one-sided or two-sided formatting is desired.

There are two differences between one-sided and two-sided output. For two-sided output, each major section of the codebook will begin on an odd-numbered page -- the table of contents, the first introductory section, the first variable description, and the first appendix section. To ensure that this happens, blank pages will be added to the end of the preceding sections when necessary. For one-sided output, no extra blank pages are generated, and each major section can begin either on an odd- numbered or an even-numbered page.

The second difference is the header. For two-sided output, there are separate headers for odd-numbered and even-numbered pages, so that the page number will always appear on the outer part of a page, and the study title os inner part of the page. For one- sided output, there is only one header, and the page number is at the right side of the header line.

For tagged output, the user cannot instruct XCODEBK to modify the contents of the header or footer (as is possible for plain text output). However, after the tagged file has been read into Word, the headers and footers can easily be changed, by using ordinary Word formatting commands.


KEYWORDS FOR COMMAND FILES FOR TAGGED OUTPUT

The command file contains specifications for the codebook options. These specifications are given in the form "keyword = something" in upper or lower case, with one keyword per line. Keywords may be given in any order.

The valid keywords are as follows (with significant characters shown in capital letters):


TYPE OF CODEBOOK TO CREATE


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

TYPE=          TAGGED1SIDE [for MS Word,        (Required)
                 1-sided printing]
               TAGGED2SIDE [for MS Word,
                 2-sided printing]


SOURCE OF THE DATA


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


(Only one of the following two keywords can be used in the same run.)

STudy=         path(s) of dataset directories   Look for variables only
                (can be repeated)                 in current directory
                                                  (or in a DDL file)

DDL=           name of DDL file                 Look for an SDA dataset

VARIABLE LIST AND TEMPLATES


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

VARlist=       filename of list of variables    All variables in
                                                  alphabetical order
                                                  (for an SDA study)
                                                  or in the order found
                                                  (in a DDL file)

TEmplate=      filename containing              Default template used
                 template(s)


OUTPUT LOCATIONS


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

SAvefile=      filename to receive output       CODEBOOK.TXT
                (see note for HTML filenames)

Errorfile=     filename to receive messages     XCODEBK.MSG
                 about errors and warnings

SOURCE OF INTRODUCTORY AND APPENDIX TEXT


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

INTro=         filename(heading)                No introduction
                 for intro material
                 (can be repeated)

Appendix=      filename(heading)                No appendix
                 for appendix
                 (can be repeated)

CBTEXT=        directory in which to find       SDA study directory
                 the /CBTEXT subdirectory         (for SDA input)
                 (for supplementary text)        or current directory
                                                  (for DDL input)

TItle=         filename of title page           Default title page

FILTER AND WEIGHT VARIABLES


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

FILTer=        names and codes of filter        No selection filter
                 variables; for example:
                 filter=gender(1),age(18-50)
                 (can be repeated)
                 (ignored for DDL input)

Weight=        name of weight variable          No weighting for
                 (ignored for DDL input)      frequencies or stats

PROCESSING INSTRUCTIONS


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

LANGfile=      name of file with non-English    English labels on
                 labels and messages              output

MAXCat=        max number of categories         SDA input: 40 rows
                 to display                       or 20 columns
                                                DDL input:
                                                  all with labels

PCt=           EXCLUDE [missing data in pcts]   EXCLUDE [missing data]
                  INCLUDE [missing data]
                  BOTH [kinds of percents]
                  NOPCT [no percentages]
                  NONUM [no frequencies
                    or percentages]
                 (all ignored for DDL input)

SYSMDlabel=    Label for system missing-data    (No Data)
                 (ignored for DDL input)

Repetition of Keywords

The `intro=' and `appendix=' keywords can be repeated, up to a total of 100 times each; multiple study-level introductory and appendix sections are included in this way.

Multiple filter variables can be specified either by including more than one on the line after `filter=' or by using multiple `filter=' lines. (Filter specifications are ignored for DDL input.)

If other keywords are repeated, an error message will result.

Comments

Anything on a line beginning with `#' is ignored by the command processor and can therefore be used for comments. Blank lines are also ignored.


EXAMPLE OF A COMMAND FILE FOR TAGGED OUTPUT

Formatted for 2-sided printing


type = tagged2side

study = natlrace
varlist = rvarlist
savefile = rcodebk.txt
template = template.cb

title = titlefile

intro = general(General Introduction to the Study)
intro = sponsors(Organization and Funding of the Study)

# The following headings go in the body of the codebook
#   as well as in the index (because of the '**')
appendix = appA(**Sample Description)
appendix = appB(**Description of Weighting Procedures)
appendix = appC(**Outcome of Fieldwork)




CREATING A PLAIN TEXT CODEBOOK FILE FOR VIEWING OR PRINTING (type=print1side or print2side)



PAGE DIMENSIONS (for plain text output)

The following page dimension specifications apply only to plain text codebooks. They do not apply to HTML codebooks or to tagged output.

Left Margin or Indent (margin=xx)

The default is 6 spaces. If you plan to bind the codebook, you may want to increase the left margin.

Page Length (in lines) (pagelength=xx)

The default is 60 lines, which corresponds to paper 11 inches long, with 6 lines printed per inch, allowing 3 extra lines at the top and the bottom of the page. Some printers add these extra blank lines by themselves; if your printer does not add blank lines, you could increase the page length to 66 lines.

If you want to print the codebook on longer or shorter paper, or to print it sideways, or to use a font with a different number of lines per inch, you will probably need to specify another number as the maximum page length. It may be necessary to experiment a bit in order to discover the proper number of lines per page to be output for a given printer and font.

Line Length (excluding the left margin) (linelength=xx)

The default is 72 characters. The valid range is 60-132 characters.

The line length is used to create headers and footers for each page. A centered heading, for instance, is centered within the space defined by the line length. In setting the line length, keep in mind how long the lines are for the text of each variable and for category labels. The program checks the length of each printed line; if a line is longer than the defined line length, a warning message to that effect is placed in the `XCODEBK.MSG' file, but the program will continue processing the codebook.


Formatting for One- or Two-Sided Printing (type=print1side or print2side)

Since codebooks tend to be quite large, they will generally be printed or copied onto both sides of a page. However, the user can specify that either one-sided or two-sided formatting is desired.

There are two differences between one-sided and two-sided output. For two-sided output, each major section of the codebook will begin on an odd-numbered page -- the table of contents, the first introductory section, the first variable description, and the first appendix section. To ensure that this happens, blank pages will be added to the end of the preceding sections when necessary. For one-sided output, no extra blank pages are generated, and each major section can begin either on an odd- numbered or an even-numbered page.

The second difference is the header. For one-sided output, there is only one page header, and the page number is put on the right side of the page. For two-sided output, there are separate headers for odd-numbered and even-numbered pages, so that the page number will always appear on the outer part of the page, and the study title on the inner part of the page.


HEADER, FOOTER, AND DIVIDER TEMPLATES (for plain text output)

The codebook program outputs a header line at the top of each page, a footer line at the bottom, and a divider string in between variable descriptions on the same page. If you want to modify the default header, footer, or divider, put the revised definitions in the template file before or after any of the templates for variable descriptions. (This can only be done for plain text output. Headers, footers, and dividers for tagged output cannot be modified until the file has been imported into Word.)

Since the templates of the header, footer, and divider remain constant throughout the entire codebook, only one of each of those templates can be specified. Note that it is possible to use the default template for variable descriptions and only include templates for the header and/or footer and/or divider in the template file. By the same token, it is possible to use the default templates for header, footer, and divider and only include one or more templates for variable descriptions in the template file.

Examples of the use of header, footer, and divider templates are given above in the section 'Template Examples'.

Header and Footer

The default or standard header for odd-numbered pages has the study title beginning at the left margin and the page number right-justified over to the right margin. The default header for even-numbered pages has these two fields reversed -- that is, the page number is on the left and the study title is on the right. If one-sided printing has been selected, the header is the same for all pages and is the same as the header for odd-numbered pages. The default footer for plain text output has the date of printing centered in the line.

In order to replace the standard header, put the line you want to appear at the top of each page into the template file after a line beginning with `*header'. Similarly, put your footer line in the template file after a line beginning with `*footer'. In creating your own header and footer you can use the following codes, which the codebook program will replace with the appropriate content:

     %d  current date (month, day, and year)
     %p  page number
     %t  title of the study
     

To center or right-justify a field within a header or footer line, separate the parts of the line with the "pipe" character (|). For example, the following header would put the date on the left, center the study title, and put the page number over at the right margin:

     *header
     %d | %t | Page %p
     

If nothing is to be centered, leave that segment blank. Since a codebook will usually be printed or copied onto both sides of a page, it is often desirable to have one header (or footer) for odd-numbered pages and another for even-numbered pages. (For two-sided printer format, there are separate default headers for odd-numbered pages and even-numbered pages.) Note the following keywords:

     *oheader    Header for ODD-numbered pages
     *eheader    Header for EVEN-numbered pages
     *header     Header for pages not otherwise specified

     *ofooter    Footer for ODD-numbered pages
     *efooter    Footer for EVEN-numbered pages
     *footer     Footer for pages not otherwise specified
     

The header or footer template is given on a separate line after one of the above keywords. See the template example for the default template.

Divider

The program prints a divider between variable descriptions, if there is more than one on a page. The divider is also used before and after headings placed into the codebook because they were preceded by two asterisks (**) either in the variable list or in the specification of headings for intro and appendix files. It also separates extra text (taken from the CBTEXT directory) from the variable description which follows (for the type of extra text that goes between, rather than within, variable descriptions).

The default divider is a solid underscore that begins after the left indent and continues for 70 columns (or whatever the maximum line length is set to); it is followed by a blank line. If you want a different divider between variables, put the line(s) you want to serve as the divider into the template file after a line beginning with `*divider'. A maximum of five lines can be specified. Blank lines are significant here -- a blank line appearing in the divider template will generate blank lines in the codebook. See example 2 below.


KEYWORDS FOR COMMAND FILES FOR PLAIN TEXT OUTPUT

The command file contains specifications for the codebook options. These specifications are given in the form "keyword = something" in upper or lower case, with one keyword per line.

In general, keywords may be given in any order, except that a varlist specification is assumed to refer only to the preceding specification of the SDA study or DDL file. (See the section below on `Study/DDL and Varlist Repetition' for a discussion of this issue.)

The valid keywords are as follows (with significant characters shown in capital letters):


TYPE OF CODEBOOK TO CREATE


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

TYPE=          PRINT1SIDE [Plain text for       print2side
                 1-sided printing]
               PRINT2SIDE [Plain text for
                 2-sided printing]


SOURCE OF THE DATA


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


(Only one of the following two keywords can be used in the same run.)

STudy=         path(s) of dataset directories   Look for variables only
                (can be repeated)                 in current directory
                                                  (or in a DDL file)

DDL=           name of DDL file                 Look for an SDA dataset

VARIABLE LIST AND TEMPLATES


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

VARlist=       filename of list of variables    All variables in
                                                  alphabetical order
                                                  (for an SDA study)
                                                  or in the order found
                                                  (in a DDL file)

TEmplate=      filename containing              Default template used
                 template(s)


OUTPUT LOCATIONS


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

SAvefile=      filename to receive output       `CODEBOOK.TXT' if
                (see note for HTML filenames)     printer format;
                                                 `hcbk' if HTML


Errorfile=     filename to receive messages     XCODEBK.MSG
                 about errors and warnings

SOURCE OF INTRODUCTORY AND APPENDIX TEXT


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

INTro=         filename(heading)                No introduction
                 for intro material
                 (can be repeated)

Appendix=      filename(heading)                No appendix
                 for appendix
                 (can be repeated)

CBTEXT=        directory in which to find       SDA study directory
                 the /CBTEXT subdirectory         (for SDA input)
                 (for supplementary text)        or current directory
                                                  (for DDL input)

TItle=         filename of title page           Default title page

FILTER AND WEIGHT VARIABLES


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

FILTer=        names and codes of filter        No selection filter
                 variables; for example:
                 filter=gender(1),age(18-50)
                 (can be repeated)
                 (ignored for DDL input)

Weight=        name of weight variable          No weighting for
                 (ignored for DDL input)      frequencies or stats

PROCESSING INSTRUCTIONS


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

LANGfile=      name of file with non-English    English labels on
                 labels and messages              output

MAXCat=        max number of categories         SDA input: 40 rows
                 to display                       or 20 columns
                                                DDL input:
                                                  all with labels

PCt=           EXCLUDE [missing data in pcts]   EXCLUDE [missing data]
                  INCLUDE [missing data]
                  BOTH [kinds of percents]
                  NOPCT [no percentages]
                  NONUM [no frequencies
                    or percentages]
                 (all ignored for DDL input)

SYSMDlabel=    Label for system missing-data    (No Data)
                 (ignored for DDL input)

FORMATTING FOR PLAIN TEXT FILE


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

LINelength=    number of characters in a line         72
                 (not counting left margin)

MARgin=        number of spaces for                    6
                 left margin

PAgelength=    maximum page length (lines)            60

Study/DDL and Varlist Repetition

Please note: For HTML and for tagged codebooks for Word, multiple input datasets (or multiple DDL files) are NOT currently allowed; each dataset be documented individually.

For codebooks in plain text format, up to five datasets (or DDL files) can be specified for inclusion in a single codebook. However, if weight or filter variables are used for the codebook, all of the datasets must have the same number of cases. (Weight and filter specifications are ignored if DDL files are specified instead of SDA study datasets.)

If multiple input datasets (or DDL files) are specified, the `study=' (or `ddl=' or `idl=') and `varlist=' keywords may each be repeated. The order in which they appear is important: the specification of the varlist for a dataset (or DDL file) must FOLLOW the specification of that dataset (or DDL file) and precede the specification of any other dataset (or DDL file).

For example, in the following command file the varlist for the "faculty" study or dataset will be "flist" and the varlist for the "students" dataset will be "slist": The name for each dataset is the pathname of the dataset directory, and the name for each list is the pathname of the file containing the list of variables. In the form given in this example, both studies would have to be subdirectories of the current directory, and both lists would have to be located in the current directory.

study = faculty
varlist = flist
study = students
varlist = slist

If an SDA study is specified without a matching varlist, all variables in the study will be included in the codebook, in alphabetical order. If a DDL file is specified without a matching varlist, all variables in the DDL file will be included in the codebook, in the order in which they are found in the DDL file.

If a varlist is specified without a preceding `study=' or `ddl=' or `idl=' specification, the list is assumed to apply to an SDA dataset located in the user's current directory.

Repetition of Keywords

The `intro=' and `appendix=' keywords can be repeated, up to a total of 100 times each; multiple study-level introductory and appendix sections are included in this way.

Repetition of `study=', `ddl=', `idl=' and `varlist=' keywords are also allowed, as described above; however, `study=', `ddl=', and `idl=' specifications cannot be mixed for the same codebook.

Multiple filter variables can be specified either by including more than one on the line after `filter=' or by using multiple `filter=' lines. (Filter specifications are ignored for DDL input.)

If other keywords are repeated, an error message will result.

Comments

Anything on a line beginning with `#' is ignored by the command processor and can therefore be used for comments. Blank lines are also ignored.


EXAMPLES OF COMMAND FILES FOR PLAIN TEXT OUTPUT

1. One-sided format, with an intro file and two appendix files


type = print1side

study = natlrace
varlist = rvarlist
savefile = rcodebk.txt
template = template.cb
margin = 8

title = titlefile
intro = introfile

# Both appendix headings will go into the table of contents
# The second heading will also go into the body of the codebook
#  as a heading for that appendix.
appendix = appA(Sample Description)
appendix = appB(**Description of Weighting Procedures)


2. Same as Example #1, except that input is from a DDL file

type = print1side

DDL = nrace.ddl

varlist = rvarlist
savefile = rcodebk.txt

# Note that any template references to `statistics' or
#  `date' (date of creation of SDA variable) will be ignored.
template = template.cb

margin = 8
margin = 8
title = titlefile

intro = introfile

appendix = appA(Sample Description)
appendix = appB(**Description of Weighting Procedures)



SEE ALSO

Word macro Installing Word macro for codebooks
xcodebk formatting Summary of formatting instructions
xcodebk keywords Summary of keywords for command files
internationalization Modifying the SDA user interface language files


CSM, UC Berkeley/ISA
July 6, 2021