SDA 3.5 Documentation for QEXTRACT


NAME

qextract - Extract item definitions from CASES 4.3 instruments

USAGE

qextract -b command_file_name

DESCRIPTION

QEXTRACT extracts item definitions and logical flow information from CASES version 4.3 Q-language instrument and layout files; it then writes the parsed information to an IDL file, written in the Instrument Documentation Language.

Commands for QEXTRACT are placed in a batch command file, the name of which is supplied to the program after the ‘-b’ flag. Most commands are optional, but some commands must be used if the CASES instrument was translated in certain ways. Examples of command files are given below. Simplified instructions are also provided in the summary document on Instrument Documentation Procedures.

CONTENTS OF THIS DOCUMENT

OVERVIEW

QEXTRACT reads the CASES Q-language files and the layout file produced by the CASES ‘layout’ program, and it produces an IDL file. The IDL file can then be processed by the XCODEBK program, to create an Instrument Document (IDOC).

A file of commands must be prepared for the QEXTRACT program; the name of this file is given on the command line after the ’-b’ flag. This document explains how to prepare such a file.

Every time QEXTRACT is run, a message is appended to the file named ‘QEXTRACT.MSG’. If warnings or error messages are generated by the program, they are put in that file, and a message to that effect appears on the user’s screen.

QEXTRACT obtains the information about each variable or item from a CASES Q-language instrument designed for computer-assisted interviewing, data entry, or coding. It is important to understand that some of the key elements used in IDL to describe an item (such as a descriptive label and the designation of certain codes as invalid or missing data) are not necessarily included in a Q-language instrument. QEXTRACT will use whatever information it finds in the Q-language files, but some specifications may have to be added.

These additional specifications for items can either be added to the IDL file produced by QEXTRACT, or they can be included in the Q-language instrument itself in a way that will not affect the execution of the CASES programs. The advantage of including these additional specifications in the Q-language instrument itself is that it consolidates in one place all of the relevant specifications for an item; instructions for doing this are given below under the heading Optional Element Tags in Q-language Instruments.

DESCRIBING THE CASES INSTRUMENT FILES

The descriptive information about each item comes from the CASES Q-language instrument files. Those files must be available to QEXTRACT. Furthermore, QEXTRACT must know if certain options were used in translating the instrument with the CASES ‘qt’ program.

Macros Used in Instruments (macros= yes)

All instrument files to be processed usually have names ending with ‘.q’. However, if the instrument was translated with the ‘-m’ option for the CASES ‘qt’ program, all the macro-expanded instrument files will have names ending with ‘.m’. If such is the case, the ‘macros=yes’ option must be specified for QEXTRACT; otherwise the program will only look for the unexpanded ‘.q’ files.

No Case Distinction in Item Names (nocase= yes)

If the instrument was translated with the ‘case insensitive’ option specified in the CASES ‘STUDYDEF’ file, there is no distinction between upper and lower case item names. For example, a command to ‘goto itemx’ means the same as ‘goto ITEMX’. The QEXTRACT program needs to know whether or not the instrument was designed to operate in case-insensitive mode, in order to figure out the proper logical paths to and from each item. If the instrument was translated with that CASES option, the ‘nocase=yes’ option must also be specified for QEXTRACT.

Location of the Instrument Files (cdir= directory)

If the Q-language instrument files are not in the directory in which QEXTRACT is being run, the ‘CDir=’ option must be specified. (See the keywords for command files below.)

LAYOUT FILE TO PREPARE (Layout= filename)

The layout file produced by the CASES ‘layout’ program contains information on the location and type of each item in the instrument, and also on the logical path from one item to the next. It also contains the list of the Q-language files.

Note that the layout file MUST be generated by running the CASES ‘layout’ program using the ‘-qx’ flag, in order to be used as input to QEXTRACT. Also, be sure to redirect the output of the CASES ‘layout’ program to a file. The name of this file must be supplied to the QEXTRACT program (unless you use the default name ‘LAYOUT’). For example:

layout -qx > LAYOUT
If you redirect the output of the CASES ‘layout’ program to a file named anything other than ‘LAYOUT’, use the QEXTRACT ‘layout=’ command, to indicate to the program what the name of the file is. (See the keywords for command files below.)

FILES PRODUCED BY QEXTRACT

IDL File (Output= filename)

The main output of the program is the IDL file, which contains the information necessary to document the instrument. The default name for this output file is ‘IDOC.IDL’, but another name can be specified as an option using the ‘output=’ specification. (See the keywords for command files below.) If the named file already exists, it will be overwritten.

Diagnostic Messages

Diagnostic and error messages are saved in a file named ‘QEXTRACT.MSG’. That file should always be viewed after running QEXTRACT. Note that diagnostic messages are appended to that file, so it can contain the record of many runs. Delete that file when you wish.

Inventory File

QEXTRACT will save a list or inventory of all the items processed during the QEXTRACT job. This list is written onto the file ‘QEXTRACT.IN1’ with one item name per line, in the order that the items are found in the Q-language file(s).

List of Orphan Items

QEXTRACT figures out the path to each item in the instrument. Items with no direct path to them are listed in the file ‘QEXTRACT.ORF’. For each item, the Q-language file in which it is defined is also given.

Some items are intended to be reached only through a deliberate skip on the part of the interviewer, such as a skip to an item that sets up a callback. Those items will appear in the list of orphan items, but are not instrument problems.

In the current version of QEXTRACT some items that are reached only via references on the same form or screen will also appear (improperly) on the list of orphan items, even though they are not instrument errors.

As a result, this list can contain a certain number of "false positive" results. Nevertheless, the list should be checked carefully, since items with no path to them can cause serious instrument problems.

OPTIONAL ELEMENT TAGS IN Q-LANGUAGE INSTRUMENTS

Q-language instruments do not usually include all of the possible elements available in IDL files to describe an item. As a result, IDL files that are produced by QEXTRACT may have to be edited, in order to provide more complete documentation for the instrument. An alternative to editing IDL files is to insert the additional specifications or element tags into the Q-language file itself in a way that will not affect the execution of the CASES programs, by using the comment mechanism.

The tags for a specific item or variable may be placed either in the template area corresponding to that item (the part of the template BEFORE the field marker ‘@’) or in the part of the post- template area corresponding to that item (the part of the post- template AFTER the field marker ‘[@’ for that item).

If elements applicable to single items are specified in the pre- template area of a multi-item form or screen, those elements will apply to ALL of the items in the same form, unless overridden by another specification of that element for a particular item.

In the list that follows, a distinction is made between (1) element tags that are relevant both for documenting instruments and for defining data files for current statistical software and (2) element tags currently relevant only for documenting instruments.

1. Element tags for both instrument and data file documentation

[##label= Label for this item]
One-line label for the content of an item; overrides the default behavior of QEXTRACT to construct a label out of the first line or two of item text, if any

[##md= Missing-data codes or ranges]
Missing-data codes, in addition to those defined in the instrument

[##min= Minimum valid code]
Minimum value to consider as a valid code, for data analysis purposes

[##max= Maximum valid code]
Maximum value to consider as a valid code, for data analysis purposes

[##blank= Number]
Number into which an all-blank field should be converted

[##other= Number]
Number into which a non-numeric field should be converted

[##type= Variable type]
Variable type (numeric or character); overrides information in the LAYOUT file

[##decimals= Decimal places]
Number of decimal places (for numeric variables); overrides information in the LAYOUT file

[##dname= Dataset name]
Name to give to this item for data analysis purposes (when generating definitions for SAS or SPSS)

[##[Short label ] ]
Short label for a code value -- up to 16 chars (this bracketed label can be located either in the template or in the post-template, after the relevant code value in ‘<>’ and before the next code value; this label is in addition to any plain text for a category found in the item template.)

2. Element tags relevant only for documenting instruments

[##universelabel= Description of item universe]
Description of how you get to this item in the instrument

[##flowlabel= Description of forward flow]
Description of where you go next in the instrument

[##analysisunit= Unit of analysis]
Description of who or what the data from this item applies to

[##responseunit= Response source]
Description of who is answering this question

[##keywords= 1st phrase; 2nd phrase; ...]
Keywords that will be used for creating a keyword index of items; separate the individual key words or phrases by semi- colons

[##formlabel= Label for this form]
One-line label for this multi-item form

[##filelabel= Label for this file]
One-line label for this Q-file or instrument module (should be placed near the beginning of the Q-file).

[##sectionlabel= Label for the current section]
One-line label for this section or group of items (should be placed near the beginning of the section).

[##rosterlabel(xyz) = Label for roster ’xyz’]
One-line label for this roster (can be placed anywhere in the instrument).

[##cyclelabel= Label for this roster cycle]
One-line label for this cycle through a roster (should be placed soon after the ‘[roster begin]’ command).

Element names/tags can be in upper or lower case, and they can be abbreviated down to the first three letters (except for the short label for a code value, which just has brackets); the equal sign is optional. For instance, [##label=myvariable] can be abbreviated as [##lab myvariable]. If descriptions extend over more than one line, repeat the element tag at the beginning of each subsequent line.

NOTE ON VARIABLE TYPES

There are three major types of variables that can be specified in a CASES instrument: integer, float, and character. QEXTRACT converts both the integer and the float types in CASES into the numeric type in IDL. If there are implied decimal places in a number, that information is contained in the layout file. For example, if the layout file specifies that a variable is of type float, and has a width of 4.2, QEXTRACT will translate that into an IDL specification for a variable of numeric type, with a width of 4, and with 2 implied decimal places.

Character variables in CASES are specified as character type also in IDL. Note, however, that you may not really want a variable defined by CASES as a character variable to be treated as such for purposes of analysis. CASES will consider an item to be of character type if it has any non-numeric precodes that are not designated as ‘missing’, even if those non-numeric precodes were never actually used. If such items are really intended to be interpreted as numeric variables for purposes of analysis, the type of that item can be changed to numeric by including ‘[##type=numeric]’ in the Q-language instrument for that item, before running QEXTRACT.

QEXTRACT does not currently attempt to create category labels for character variables; for numeric variables it generates labels for the numeric precodes, but it ignores any non-numeric precodes.

The ‘no data’ type of variable in CASES is used primarily for informational screens. Such variables do not have data or data locations. However, they are of interest when generating an IDOC. QEXTRACT will treat such items as input items but will not assign them any record and column specifications in the IDL file.

LIMITS

The maximum number of items that can be processed at one time (listed in the LAYOUT file) depends on available memory. On PCs, the processing of more than a few thousand items may result in excessive swapping of memory to disk. In such cases, it is helpful to close other open applications, to preserve memory for running QEXTRACT. If the LAYOUT file is still too large, it may be necessary to run the job on a computer with more memory.

KEYWORDS FOR COMMAND FILES

The command file contains the optional specifications for the run. These specifications are given in the form "keyword = something." Keywords may be given in any order, in upper or lower case, one to a line. The valid keywords are as follows, with significant characters shown in capital letters:

Keyword       Possible Specification          Default (if no keyword)
---------------------------------------------------------------------
Title=        Title of the study              REQUIRED

Layout=       Name of layout file             Use ‘LAYOUT’

MACros=       Yes                             Use ’.q’ instrument
               (assumes that the instrument     files
                was translated with the
                ‘-m’ option, to produce
                files with ‘.m’ suffix)

NOCase=       Yes                             Upper/lower case in item
               (assumes that the instrument     names is significant
                was translated with
                ‘case insensitive’ specified
                in the STUDYDEF file)

Output=       Name of file into which         IDOC.IDL
                the IDL will be written

CDir=         Name of directory containing    Current directory
                the CASES instrument files
                (either .q or .m files,
                whichever are being used,
                depending on the ‘macros=’
                option.)

Abbreviations

Keywords can be abbreviated down to the number of characters required to differentiate them from other keywords. Sometimes only one character is required. The keyword for the layout file, for instance, can be given as "layout=" or "lay=" or even "l=". Either upper or lower case may be used.

Comments

Anything on a line beginning with ‘#’ is ignored by the command processor and can therefore be used for comments. Blank lines are also ignored.

EXAMPLES OF COMMAND FILES

1. Basic commands, using mostly defaults


Title = Fish and Hunt Survey
output = fish.idl

2. Using all the optional keywords


Title = Survey of Program Participation
output = sipp.idl
layout = qxlayout
macros = yes
nocase = yes
cdir = c:\sipp\e-inst


CSM, UC Berkeley
April 12, 2011