SAS Programming Tips

Alternative to the disallowed word PUT

Alternative to the disallowed word PUT

The word PUT is not allowed in Real Time Remote Access (RTRA) because the PUT statement allows a user to write values from the microdata to the SAS  log. However, users may want to use the PUT function to create character values by applying a format (typically used to convert numeric values to character). Since the word PUT is disallowed, users can instead use the PUTC or PUTN functions which are similar to the PUT function. PUTC creates a character value by applying a character format. PUTN creates a character value by applying a numeric format.

Note: Unlike the PUT function, for the PUTC and PUTN functions the format to apply (the second argument) must be in quotation marks. For example:

AgeChar = PUTN(Age, "3.");

Converting character values to numeric

Converting character values to numeric

In some cases, a user may want to convert character microdata values to numeric. For example, the LFS microdata variable SP_WEARN is a character variable. Because of this, SP_WEARN cannot be used as an RTRA statistical analysis variable (in RTRAMean for example). It must be converted to numeric first. This conversion can be done using the INPUT function.

In the data step example below, a new numeric variable SP_WEARN_NUM is created by applying the INPUT function to SP_WEARN. It is assumed that the values in SP_WEARN include two implicit decimal places.

data work.LFS;

   set RTRAData.LFS200005;

   length SP_WEARN_NUM 8;

   SP_WEARN_NUM = INPUT(SP_WEARN,7.2);

run;

The new variable SP_WEARN_NUM can then be used as an analysis variable in the RTRA procedures.

Applying the KEEP option to the RTRAData data set

Applying the KEEP option to the RTRAData data set

Applying the KEEP option to the RTRAData data set can make the data step more efficient.  SAS will only retrieve the variables listed in the KEEP option. It is useful when only a small number of variables are needed. Note that if KEEP is specified, the variable named ID must be included in the list of variables.

For example:

data work.CSDDis;

   set RTRAData.csd2012_disab(keep=DDIS_FL REF_AGE SEX DCLASS DLFS ID);

run;

Note: While KEEP can make the data step more efficient when only a small number of variables are needed, KEEP is not a requirement. If there is a large number of variables to keep, it is easier to omit the KEEP option. SAS will automatically keep all variables (including the variable ID).

Defining new variables with a LENGTH statement

Defining new variables with a LENGTH statement

The example below shows how the values of a new character variable can be inadvertently truncated when the variable is not defined with a LENGTH statement.

data work.CSDDis;

    set RTRAData.csd2012_disab;

    if (REF_AGE < 10) then AgeGroup = "Under10";

    else if (10 <= REF_AGE <= 30) then AgeGroup = "Between10and30";

    else if (31 <= REF_AGE <= 90) then AgeGroup = "Between31and90";

    else if (REF_AGE > 90) then AgeGroup = "OlderThan90";

   else AgeGroup = "AgeUnknown";

run;

Since the new variable AgeGroup is not defined with a LENGTH statement, SAS uses the first occurrence of AgeGroup in the data step to determine the character length to assign the variable. The first occurrence is where AgeGroup is assigned the value "Under10". Therefore, SAS assigns a length of 7 to the variable AgeGroup. The problem with this is that the length of 7 is not sufficient to accommodate character values assigned to AgeGroup later in the data step, such as "Between10and30".

Here are the values of AgeGroup in the output data step for the different age groups. Notice the truncation that has occurred:

Defining new variables with a length statement
REF_AGE AgeGroup [char(7)]
< 10 Under10
10 - 30 Between
31 - 90 Between
> 90 OlderTh
Any other value AgeUnkn

If AgeGroup is a class variable, the values in the tabulated results will be truncated as shown above. Even worse, all REF_AGE values from 10 - 90 will end up in the same category – Between.

To avoid this problem, use a LENGTH statement to assign a sufficient length to AgeGroup before assigning it a value:

data work.CSDDis;

   set RTRAData.csd2012_disab;

   length AgeGroup $ 15;

   if (REF_AGE < 10) then AgeGroup = "Under10";

   else if (10 <= REF_AGE <= 30) then AgeGroup = "Between10and30";

   else if (31 <= REF_AGE <= 90) then AgeGroup = "Between31and90";

   else if (REF_AGE > 90) then AgeGroup = "OlderThan90";

   else AgeGroup = "AgeUnknown";

run;

Defining new variables with a length statement
REF_AGE AgeGroup [char(15)]
< 10 Under10
10 - 30 Between10and30
31 - 90 Between31and90
> 90 OlderThan90
Any other value AgeUnknown
Missing ELSE statement when defining a derived variable

Missing ELSE statement when defining a derived variable

When defining a derived variable in a data step, IF/ELSE statements are usually used.

For example:

data work.CSDDis;

   set RTRAData.csd2012_disab;

   length AgeGroup $ 15;

   if (0 <= REF_AGE < 10) then AgeGroup = "Under10";

   else if (10 <= REF_AGE <= 30)  then AgeGroup = "Between10and30";

   else if (31 <= REF_AGE <= 90)  then AgeGroup = "Between31and90";

   else if (91 <= REF_AGE <=120) then AgeGroup = "Between91and120";

run;

The potential problem with this code is that it ignores any special values of REF_AGE that may exist in the data. For example, the data set csd2012_disab may contain missing REF_AGE values (.) or a value such as 999 may represent "Not Stated". For observations where REF_AGE is not 0-120, AgeGroup will be set to blank. If AgeGroup is used as a class variable in RTRA, RTRA will generate an error message since a class variable cannot have any missing values.

To prevent this problem, an additional ELSE statement should be used as a "catch all". This ensures that AgeGroup will be non-blank in all observations in the output data set.

data work.CSDDis;

   set RTRAData.csd2012_disab;

   length AgeGroup $ 15;

   if (0 <= REF_AGE < 10) then AgeGroup = "Under10";

   else if (10 <= REF_AGE <= 30)  then AgeGroup = "Between10and30";

   else if (31 <= REF_AGE <= 90)  then AgeGroup = "Between31and90";

   else if (91 <= REF_AGE <=120) then AgeGroup = "Between91and120";

   else AgeGroup = "Other";

run;

In the example shown above, for all observations where REF_AGE is not 0-120, AgeGroup will be assigned a value of "Other".

Date modified: