SDA 4.1 Documentation for SREPORT BATCH


NAME

sreport batch - run SDA report program in batch mode

USAGE

sreport -b filename [-n]

DESCRIPTION

SREPORT is a general-purpose program to generate summaries of one or more variables. These variables come from a plain text data file and a matching DDL file. The program runs interactively unless the `-b' option is used. The filename given after `-b' refers to a file containing the commands which specify the variables to be summarized and the options and layout to use.

Since the construction of a report layout or template can be somewhat difficult to do from scratch, it is usually a good idea to run SREPORT in interactive mode first. The interactive processor will use many default options and will produce a file containing the commands and the template for the report. The default report will be adequate for many purposes. If, however, you need to refine the report layout, it will be relatively simple to edit the file produced by the interactive processor. The revised file can then be executed by SREPORT in either interactive or batch mode.

In order to check the syntax of the batch file (but not execute the commands), use the `-n' option:

sreport -b filename -n

CONTENTS OF A COMMAND FILE

A command file includes COMMANDS and FUNCTIONS


COMMANDS

A command file for SREPORT can include several types of commands and definitions, most of which are optional:

DESCRIPTION OF EACH TYPE OF COMMAND

1. SPECIFICATION OF GENERAL REPORT CHARACTERISTICS

These specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order but in general may not be repeated (except as noted after the list of keywords). The valid keywords are as follows (with significant characters shown in capital letters):

Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

(One of the next pair must be given, to indicate the type of report.)

BREAKvar=     Name of break variable          REQUIRED for breakvar type
ROWvars=      Name(s) of row variables        REQUIRED for rowvars type

DATA=         Name(s) of datafile(s)          REQUIRED
DDL=          Name of DDL file                REQUIRED

Weight=       Name of weight variable         No weighting
Filter=       Name(s) and codes of filter     No filters
                variable(s)

SAvefile=     Filename to receive output      Output sent to screen
                                                (standard output)

Margin=       Number of spaces for            0
                left margin
PAgelength=   Number of lines on a page       66
                (0 = continuous output)
NMAX=         Maximum number of cases to read Read all cases
                 (used for testing)
The only keywords that can be repeated are `rowvars=', `data=', and `filter='. When repeated, the specifications given after the equal sign are appended to the previously given specifications. Multiple specifications, however, can also be given after a single keyword by separating the file names or variable names with blanks or commas. If there are too many specifications to fit on one line, the keyword must be repeated.

2. SPECIFICATION OF VARIABLES (Required for BREAKVAR type)

The BREAKVAR type of report requires that each variable and function of a variable be defined. These definitions are of the following form:
&a = age &b = education &c = mean(&a) &d = stddev(&a) &e = mean(&b)
Each of the definitions uses syntax of the form `&[character] = something', where any SINGLE upper- or lower-case character can be used after the `&'. In order to use a function of a variable, that variable must have been previously defined; notice in the example above that the mean of `age' is expressed as `mean(&a)'. Many functions can be used. A complete list is given below in the section labeled "Function Definitions."

The ROWVARS type of report does not require this series of commands, since the variables are specified with the `rowvars=' keyword as indicated above. However, it is possible to specify the desired statistics for the row variables in the form `&a=MEAN' and then use the `&a' as the definition of a column, as described in the next section. Although this two-step specification of statistics is unnecessarily complex for contructing a template from scratch, the interactive processor generates templates for ROWVARS reports in this manner, because it is easier to use the same syntax for both types of reports.

3. COLUMN SPECIFICATIONS (Required)

The outcome of variable and function definitions are placed into specific columns of the report by a listing of definitions in the desired order after a line beginning with `*COLUMNS' (which can be abbreviated to `*COL' or `*col'. For the specification of columns, only the order is significant -- not the spacing or the number of lines. Using the variable and function definitions given in the example above, a column specification for a BREAKVAR type of report could be the following:
*COLUMNS &cat &clabel &c &d &e
The meaning of each column is as follows:
&cat The category code of the break variable &clabel The label of the category code &c mean(&a) - the mean of the variable &a (age) &d stddev(&a) - the standard deviation of &a &e mean(&b) - the mean of &b (education)
Note in particular the special meaning of `&cat' and `&clabel' (these can be in upper or lower case, and `clabel' can be abbreviated to `clab'). The meanings of the other symbols are defined within the command file itself. The column specification for a ROWVARS type of report is simpler. A typical example would be:
*COLUMNS &name &label MEAN STDDEV NVALID
The meaning of each column is as follows:
&name The name of each row variable &label The long label of each row variable MEAN The mean of each row variable STDDEV The standard deviation of each row variable VALIDN The number of valid cases for each row variable
Once again, note the special meaning of `&name' and `&label'. The other column specifications are simply the names of functions; any of the arithmetic functions listed in the summary below can be used. (All of these specifications and function names can be in upper or lower case, and `label' can be abbreviated to `lab'.)

4. TEMPLATE FOR THE LAYOUT OF EACH ROW (Required)

The template for each row of the report is given on one or more lines following a line beginning with `*ROWS' (which can be abbreviated to `*ROW' or `*row'). The purpose of the row template is to indicate how wide each field should be, where it should be located on the page, and how numeric and character fields should be formatted. A row template that would match the BREAKVAR column specification above might be:
     *ROWS
     XXX  LLLLLLLLLL      X,XXX.X     X,XXX.XX    XX,XXX.X

This template is a "picture" of how each row of the report will be written. The first defined column (&cat) is a number and will be placed in the first three positions in the row. The second defined column (&clabel) is a string of characters; it will be left justified in the 10-position area designated by `LLLLLLLLLL'. The third and fifth columns (means of two variables) will be numbers printed with one decimal place; commas will also be inserted if the number exceeds 999.9. The fourth column (standard deviation) will be printed with two decimal places. The formats that can be used in row specifications are given below. The maximum width of a field is defined by the width of the format string. If a number is too large to fit into a field, an error message will be generated, and the program will stop. If a string of characters such as a variable name or label is longer than the length of the format string, the string will be truncated in the report, and a warning message will be displayed. Since a template can include literal text as well as formats, each format must be clearly recognizable as such. That is to say, each format must only contain consecutive and identical capital letters (such as CCC, LLL, RRRR, or BB); the XX numeric format, however, can include a few other characters as shown below. A format must be separated by blanks from other text in the template. Text format strings must be at least two characters long.

Numeric formats


(All can include decimal points; numbers are rounded to fit.)

   XXXXXX      Whole number, no decimal points
   XXX.XX      Number of decimal points specified
   XX,XXX      Comma(s) inserted as needed
               (The comma can appear anywhere in the format)
   $X,XXX      Dollar sign and comma(s)
   XXXXX%      Percent sign put at the end of the number
   (XXXX)      Parentheses put around the number
               (The number part of the format can be any of the above)


Text formats


   CCCCCC      Centered text
   LLLLLL      Left-justified text
   RRRRRR      Right-justified text


Format to suppress a field


   BBBBBB      Blank out the field (numeric or text)
               (used to blank out a field in a row or a total)

5. TEMPLATE FOR THE ROWS OF TOTALS (Optional for BREAKVAR type only)

A report may or may not have rows of totals. These totals are the same statistics as for the other rows, except that they are calculated on all the valid cases of the break variable. By default the totals are not printed. To get totals, include a `*TOTAL' command and a template. The template for the totals is given on one or more lines following a line beginning with `*TOTAL' (which can be abbreviated to `*TOT' or `*tot'). Such a template usually looks very much like the template for the other rows, except that category values and labels are not relevant. A template for totals that would match the BREAKVAR row template given above is the following:
     *TOTAL
                          -------     --------    --------
     Total                X,XXX.X     X,XXX.XX    XX,XXX.X

Notice that the first row of the template has some lines, to indicate the bottom of each column. The next line has the word "Total" and the same format strings as the template for the rows. The format strings are the only required elements, and they can be different from the formats for the other rows -- with a different number of decimal places, for instance. If you want some of the totals but not all, you can blank out the ones you do not want by using a special format string `BBBBBB' (at least two B's are necessary). For example, to print only the first and third totals in the example above, you would use the following command:
     *TOTAL
                          -------                  -------
     Total                X,XXX.X     BBBBBBBB     X,XXX.X

6. TEMPLATES FOR HEADER AND FOOTER (Optional)

It is usually desirable to have some text at the top of a report and also at the bottom. To put text in those positions, use the `*HEADER' and `*FOOTER' commands, followed by the template or "picture" of what you want. For example, a header for a report could be as follows:

     *HEADER

                                                    Page &page

                      Comparison of Groups

                           Mean        Std Dev      Mean
                            Age          Age      Education
      Group

      *(next command or end of file)

The template of the header includes any blank lines up to the next command or until the end of the command file. The text of the header will appear as is at the top of the report, except for the substitution of the value of `&page', which is the page number. A footer can also be specified in the same way as the header. One possible footer is the following:

     *FOOTER everypage

     Report printed on &date
     Weight variable: &weight
     Filter variable: &filter

     *(next command or end of file)

The substitution variables that can be included in the template for a header or a footer are given in the following list. The names can be given in either upper or lower case, and some names can be abbreviated. The minimum required name is given below in upper case; the characters in lower case are optional and are ignored by the program. References to INDIVIDUAL variables, however, are sensitive to case -- `&a' refers to a different variable than `&A'.

     &DATE     Current date, in form: July 15, 1993
     &PAGE     Page number within the report

     &NAME     Name of the break variable
     &LABel    Long label of the break variable

     &FILter   Name of filter variable(s) and codes
     &WEIght   Name of the weight variable

     &DATA     Name of ascii data file(s) used as input
     &DDL      Name of DDL file

     &a        Name of variable referred to as `&a'
               (if &a=age, the string `age' will be substituted)

If there are no actual filter or weight variables in a report, the `&filter' and `&weight' substitution variables return blanks. Headers and footers by default only appear once in a multi-line report -- the header appears at the top of the first page, and the footer appears only on the last page. If the word `everypage' (which can be in upper or lower case and can be abbreviated to `EV' or `ev') follows a *HEADER or *FOOTER command, however, the corresponding header or footer is printed on every page of a multi-page report. With the `everypage' specification, the footer is also placed at the bottom of the page, instead of immediately following the last row of statistics. Templates can include tabs, formfeeds, and other special characters. The user should be aware, however, that the formatting of the resulting report will be dependent on the settings of the output device. The program will issue a warning if such characters are used, in case they were inserted by accident.

7. COMMANDS FOR INTERACTIVE REPORTS (Optional)

Reports can be constructed which prompt the user for some of the variable names or other information required to run the report. The following commands (whose names can be given in upper or lower case) may be used for this purpose:
     &say "message"      Display "message" on the screen.
                         If no message is specified,
                         a blank line is produced.

     &prompt             Use a standard prompt to obtain data
                         (can be used with any keyword or
                         reference to a variable).  For keywords
                         that can be repeated, it is permissible
                         to mix fixed specifications with the
                         "&prompt" function in order to allow
                         additional specifications at run-time.
                         However, the "keyword = &prompt" line must
                         follow any fixed specifications for that
                         keyword.

     &ask "message"      Alternative to `&prompt', if you
                         want to customize the prompt for the user.
                         Display "message," input a line of text,
                         and pass the input to one of the keywords.
Some examples of the usage of these interactive commands are the following:
     &say "Enter the names of the two variables to be analyzed"
     &say "Possible variables include age, educ, gender, income, spend"
     &say
     &a = &prompt
     &b = &prompt
     # (The program will prompt the user for variable names, which
     #  will then be used in the remaining commands.)

     weight = &prompt
     savefile = &ask "Enter name of file on which to write results"
     # (The program will prompt the user for the required names.)

     filter = &prompt
     &say "This report can be limited to either male or female repondents"
     filter=gender( &ask"Enter `1' for males; `2' for females" )
     &say "If you want to use another filter, enter the information now"
     filter= &ask"Filter var:" ( &ask"Categories to include:" )
     # (Filters can use the standard prompt, or you can customize the
     #  prompt in various ways.)
     


FUNCTIONS

There are four kinds of functions that can be used with input variables. Note that a ROWVARS type of report can only use the first kind (arithmetic functions).

The name of each function can be given in either upper or lower case. Some function names can be abbreviated. The minimum required name is given below in upper case; the characters in lower case are optional and are ignored by the program.

In the following lists of functions the arguments given as examples have the following meaning:

&a,&b
Reference to a variable (generated by a statement such as `&a=age')

c-g
Numbers (codes for the values of variables)

&x,&y,&z
Most general type of argument - can be a reference to a variable, the result of a function, or a numeric constant

The term `valid' in the following definitions means that the value of a variable is not equal to the missing-data code nor is it beyond the defined range of valid codes.

1. Arithmetic functions

     MEAN(&a)                 Mean
     MAXimum(&a)              Maximum valid value
     MINimum(&a)              Minimum valid value
     STDDEV(&a)               Standard deviation
     STDERR(&a)               Standard error
     SUM(&a)                  Sum of a variable across cases
     Nvalid(&a)               Number of valid cases
     VARiance(&a)             Variance
     WNvalid(&a)              Weighted number of valid cases
     WSTDERR(&a)              Standard error, computed with weighted N
     
The functions `wnvalid' and `wstderr' generate the same numbers as `nvalid' and `stderr' if no weight variable has been specified.

In a ROWVARS type of report, only the function name is given; it applies automatically to every row variable.

2. Frequencies


     FREQuency(&a,c-d,e,f-g)       Number of cases that fall within the
                                   given ranges of `&a'.

     BIFREQuency(&a,c-d; &b,e-f,g) Number of cases that fall within
                                   the given ranges of BOTH variables.
                                   NOTICE THE SEMICOLON, used to separate the
                                   ranges of the two variables.
     
The character `*' used as the range of a variable refers to all valid codes. If a weight variable has been specified, the number of cases is weighted.

3. Percentages

COLPCTN(&m)
The column or vertical percent based on the NUMBER OF CASES in a column. The argument `&m' can be either a reference to a variable (produced by a statement such as `&m=age') or the result of a FREQUENCY or BIFREQUENCY command (produced by a series of statements such as `&a=age' and `&m=FREQ(&a,18-30)').

COLPCTSUM(&a)
The column or vertical percent based on the SUM OF THE VALUES of the variable given as the argument.

ROWPCT(&x,&y)
The row or horizontal percent based on the values of the two arguments. This function divides the first argument by the second argument and multiplies by 100: &x / &y * 100.

4. Functions of Aggregates

The following functions operate on the aggregated values of variables or functions, within categories of the break variable.
ADD(&x,&y,&z,...) Add the arguments SUBtract(&x,&y) Subtract: &x - &y MULTiply(&x,&y,&z,...) Multiply the arguments DIVide(&x,&y) Divide: &x / &y AVerage(&x,&y,&z,...) Average (mean) of the arguments GReat(&x,&y,&z,...) The greatest argument (e.g., great(2,4,6) = 6) LEast(&x,&y,&z,...) The smallest argument (e.g., least(2,4,6) = 2) SQRT(&x) Square root of argument


EXAMPLE OF A COMMAND FILE

The following example illustrates many of the features of a command file. Each type of command is discussed in detail in the preceding sections.

Note that anything on a line begining with `#' is ignored by the batch processor and can therefore be used for comments. Blank lines are ignored except within templates.

Additional examples of command files can be examined in the SREPORT example file.

Example of a command file for a BREAKVAR type of report

# General report characteristics breakvar = gender data = testdat ddl = testddl # Specification of variables &a = age &b = education &c = mean(&a) &d = stddev(&a) &e = mean(&b) # Which variables go in which column (required) *COLUMNS &cat &clabel &c &d &e # Template for each row (required) *ROWS XXX LLLLLLLLLL X,XXX.X X,XXX.XX XX,XXX.X # Template for a row of totals (optional) *TOTAL ------- -------- -------- Total X,XXX.X X,XXX.XX XX,XXX.X # Template for header - top of report (optional) *HEADER Page &page Comparison of Groups Mean Std Dev Mean Age Age Education Group # Template for footer - bottom of report (optional) *FOOTER everypage Report printed on &date Weight variable: &weight Filter variable: &filter

MULTIPLE REPORTS

More than one report can be requested in the same command file. Each report specification is ended by a line beginning with `*END' or `**'. The values of the following keywords are carried over to subsequent reports unless they are redefined: data=, ddl=, savefile=, margin=, pagelength=, nmax=.

SEE ALSO

sreport Basic information about SREPORT
sreport examples Examples of SREPORT command files
sreport functions Summary of SREPORT functions and formats
DDL Data Description Language


CSM, UC Berkeley/ISA
March 18, 2019