Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error when reading a SPSS file that is in Spanish with Pandas (Python)

Good morning!

I am trying to work with a SPSS file (.sav) in Python.

This is my code:

import pandas as pd

df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')

df.head()

I get this error:

df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')
  File "C:\Users\bonif\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\spss.py", line 44, in read_spss
    df, _ = pyreadstat.read_sav(
  File "pyreadstat\pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
  File "pyreadstat\_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
  File "pyreadstat\_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
  File "pyreadstat\_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)

I figure out that the error may be because there are some words with the letter "ñ" or maybe some words with the following character "á". How may I solve this?

The data base is in this google drive: https://drive.google.com/drive/folders/1P8v5NWE-GdAEJRZdmrp5KiL-DODClmfU?usp=sharing

Thank you so much

like image 442
Andres Portocarrero Avatar asked Oct 24 '25 14:10

Andres Portocarrero


1 Answers

as ti7 suggests, use pyreadstat, and you need to specify the encoding, in this case latin1 will do the trick:

>>> import pyreadstat
# This raises an error
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyreadstat/pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
  File "pyreadstat/_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
  File "pyreadstat/_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
  File "pyreadstat/_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)

# This is fine
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav", encoding="latin1")
>>> 


like image 61
Otto Fajardo Avatar answered Oct 26 '25 06:10

Otto Fajardo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!