Notes and metadata not converted to utf-8

It appears that ReadStat is not converting the encoding of some metadata for Stata dta and SAS xpt files.

This came up in Roche/pyreadstat#298 because pyreadstat expects all text to be returned to it as utf-8 and errors when this is not the case. Tagging @ofajardo (pyreadstat maintainer).

## Examples

Errors occur when reading notes from stata .dta files (#73) (`"These data are a subset of those used in the study Caulkins, J.P. and R. Padman (1993), \x93Quantity Discounts and Quality Premia for Illicit Drugs\x94, Journal of the American Statistical Association, 88, 748-757"`):

```python
wget http://www.principlesofeconometrics.com/stata/cocaine.dta

# errors because readstat returns notes as WINDOWS-1252 encoded text
python -c 'import pyreadstat; pyreadstat.read_dta("cocaine.dta")'
```

For value labels (`"don\xe2\x80�t know"`)

```python
wget https://gss.norc.org/documents/stata/GSS_stata.zip
unzip GSS_stata.zip GSS_stata/gss7224_r1.dta
python -c 'import pyreadstat; pyreadstat.read_dta("GSS_stata/gss7224_r1.dta", row_limit = 10)'
```

For column labels (`"Ferritin(\xb5g/L)"`):

```python
wget https://wwwn.cdc.gov/Nchs/Data/Nhanes/Public/2021/DataFiles/FERTIN_L.xpt
python -c 'import pyreadstat; pyreadstat.read_xport("FERTIN_L.xpt")'
```

Similar issue in flavor to #152 and #172.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Notes and metadata not converted to utf-8 #344

Examples

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Notes and metadata not converted to utf-8 #344

Description

Examples

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions