I'm looking to work on a SPSS files (.sav) using pandas. In the absence of the SPSS program, here's what a typical file looks like when converted to .csv:

On investigation into what the first two rows signify (I don't know SPSS), it seems that the first row contains the Labels, while the second row contains the VarNames.

When I bring the file into pandas thus:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
w = com.convert_robj(w)
return w
and then do a head(), the first row (Label) is missing:

How can labels be maintained?
Labels in a sav file are stored in variable.labels attribute of the returning object from the read.spss function.
You can get the variable labels with the following:
import pandas.rpy.common as com
def get_labels(filename):
w = com.robj.r('attr(foreign::read.spss("%s"), "variable.labels")' % filename)
w = com.convert_robj(w)
return w
If you want to set the labels as the column names of your dataframe:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
cols = list(com.robj.r("attr")(w, "variable.labels"))
w = com.convert_robj(w)
w.columns = cols
return w
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With