Skip to content

Importing SAS formats catalogue with negative format values #328

@Adamishere

Description

@Adamishere

Passing along an issue presented in the R haven package, which appears to be an upstream issue with ReadStat that haven uses: tidyverse/haven#768.

To summarize, in the attached example (test.zip), if you have a sas7bdat file (test.sas7bdat) with a single numeric variable named x with values -7, 1, and 2 and a SAS format catalog file that defines the format (format.sas7bcat):

proc format;
value testf
-7="Missing"
1="Yes"
2="No"
;
run;

The format value -7 = "Missing" gets imported by haven (using ReadStat) as -0.625 = "Missing". They also noted that they can reproduce this error in pyreadstats as well and suggested it may be an upstream issue with ReadStat.

Some additional investigation by me (not in the attached example) suggests sort of deterministic pattern in between the original SAS format values and the transformed ReadStat values. I noticed that the lagged difference of the imported values change in increasing doubles 1x, 2x, 4x, and when the lag differences change, they descrease by a factor of 4 (e.g., 2.00 -> 0.50 -> 0.125).

SAS Format Value	Imported value		(Lagged difference of Imported Value) 
-1					-4.0000000			 N/A
-2					-2.0000000			 2.000000000
-3					-1.5000000			 0.500000000
-4					-1.0000000			 0.500000000
-5					-0.8750000			 0.125000000
-6					-0.7500000			 0.125000000
-7					-0.6250000			 0.125000000
-8					-0.5000000			 0.125000000
...					...					...   

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions