SAS Programming Tips for RTRA Users

Alternative to the disallowed word PUT

The word PUT is a disallowed word in RTRA because the PUT statement allows a user to write values from the microdata to the SAS log. However, users may want to use the PUT function to create character values by applying a format (typically used to convert numeric values to character). Since the word PUT is disallowed, users can instead use the PUTC or PUTN functions which are similar to the PUT function. PUTC creates a character value by applying a character format. PUTN creates a character value by applying a numeric format.

Note: Unlike the PUT function, for the PUTC and PUTN functions, the format to apply (the second argument) must be in quotation marks. For example:

AgeChar = PUTN(Age, "3.");

Converting character values to numeric

In some cases, a user may want to convert character microdata values to numeric. For example, the LFS microdata variable SP_WEARN is a character variable. Because of this, SP_WEARN cannot be used as an RTRA analysis variable (in RTRAMean for example). It must be converted to numeric first. This conversion can be done using the INPUT function.

For example, in the data step below, a new numeric variable SP_WEARN_NUM is created by applying the INPUT function to SP_WEARN. It is assumed that the values in SP_WEARN include two implicit decimal places.

data work.LFS;

    set RTRAData.LFS200005;

    length SP_WEARN_NUM 8;

    SP_WEARN_NUM = INPUT(SP_WEARN,7.2);

run;

The new variable SP_WEARN_NUM can then be used as an analysis variable in the RTRA procedures.

Applying the KEEP option to the RTRAData data set

Applying the KEEP option to the RTRAData data set can make the data step more efficient because SAS will only retrieve the variables in the KEEP list. It is useful when only a small number of variables are needed. Note that if KEEP is specified, the variable named ID must be included in the list of variables.

For example:

data work.CSDDis;

    set RTRAData.csd2012_disab(keep=DDIS_FL REF_AGE SEX DCLASS DLFS ID);

run;

Note: While KEEP can make the data step more efficient when only a small number of variables are needed, KEEP is not a requirement. If there is a large number of variables to keep, it is easier to omit KEEP. SAS will automatically keep all variables (including the variable ID).

Defining new variables with a LENGTH statement

The example below shows how the values of a new character variable can be inadvertently truncated when the variable is not defined with a LENGTH statement.

data work.CSDDis;

    set RTRAData.csd2012_disab;

    if (REF_AGE < 10) then AgeGroup = "Under10";

    else if (10 <= REF_AGE <= 30) then AgeGroup = "Between10and30";

    else if (31 <= REF_AGE <= 90) then AgeGroup = "Between31and90";

    else if (REF_AGE > 90) then AgeGroup = "OlderThan90";

    else AgeGroup = "AgeUnknown";

run;

Since the variable AgeGroup is not defined with a LENGTH statement, SAS uses the first occurrence of AgeGroup in the data step to determine the length to assign the variable. The first occurrence is where AgeGroup is assigned the value Under10. Therefore, SAS assigns a length of 7 to AgeGroup. The problem with this is that the length of 7 is not sufficient to accommodate values assigned to AgeGroup later in the data step.

Here are the values of AgeGroup in the output data step for the different age groups. Notice the truncation that has occurred:

Defining new variables with a length statement
REF_AGE AgeGroup [char(7)]
< 10 Under10
10 - 30 Between
31 - 90 Between
> 90 OlderTh
Any other value AgeUnkn

If AgeGroup is a class variable, the values in the tabulated results will be truncated as shown above. Even worse, all REF_AGE values from 10 - 90 will end up in the same category – Between.

To avoid this problem, use a LENGTH statement to assign a sufficient length to AgeGroup before assigning it a value:

data work.CSDDis;

    set RTRAData.csd2012_disab;

    length AgeGroup $ 15;

    if (REF_AGE < 10) then AgeGroup = "Under10";

    else if (10 <= REF_AGE <= 30) then AgeGroup = "Between10and30";

    else if (31 <= REF_AGE <= 90) then AgeGroup = "Between31and90";

    else if (REF_AGE > 90) then AgeGroup = "OlderThan90";

    else AgeGroup = "AgeUnknown";

run;

Defining new variables with a length statement
REF_AGE AgeGroup [char(15)]
< 10 Under10
10 - 30 Between10and30
31 - 90 Between31and90
> 90 OlderThan90
Any other value AgeUnknown

Missing ELSE statement when defining a derived variable

When defining a derived variable in a data step, IF/ELSE statements are usually used.

For example:

data work.CSDDis;

    set RTRAData.csd2012_disab;

    length AgeGroup $ 15;

    if (0 <= REF_AGE < 10) then AgeGroup = "Under10";

    else if (10 <= REF_AGE <= 30)  then AgeGroup = "Between10and30";

    else if (31 <= REF_AGE <= 90)  then AgeGroup = "Between31and90";

    else if (91 <= REF_AGE <=120) then AgeGroup = "Between91and120";

run;

The potential problem with this code is that it ignores any special values of REF_AGE that may exist in the data. For example, the data set csd2012_disab may contain missing REF_AGE values (.) or a value such as 999 may represent “Not Stated”. For observations where REF_AGE is not 0-120, AgeGroup will be set to blank. If AgeGroup is used as a class variable in RTRA, RTRA will generate an error message since a class variable cannot have any missing values.

To prevent this problem, an additional ELSE statement should be used as a “catch all”. This ensures that AgeGroup will be non-blank in all observations in the output data set.

data work.CSDDis;

    set RTRAData.csd2012_disab;

    length AgeGroup $ 15;

    if (0 <= REF_AGE < 10) then AgeGroup = "Under10";

    else if (10 <= REF_AGE <= 30)  then AgeGroup = "Between10and30";

    else if (31 <= REF_AGE <= 90)  then AgeGroup = "Between31and90";

    else if (91 <= REF_AGE <=120) then AgeGroup = "Between91and120";

    else AgeGroup = "Other";

run;

Date modified: