Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep variables type after using data frame

I'm trying to use kproto() function from R package clustMixType to cluster mixed-type data in Julia, but I'm getting error No numeric variables in x! Try using kmodes() from package.... My data should have 3 variables: 2 continuous and 1 categorical. It seems after I used DataFrame() all the variables became categorical. Is there a way to avoid changing the variables type after using DataFrame() so that I have mixed-type data (continuous and categorical) to use kproto()?

using RCall
@rlibrary clustMixType

# group 1 variables
x1=rand(Normal(0,3),10)
x2=rand(Normal(1,2),10)
x3=["1","1","2","2","0","1","1","2","2","0"]
g1=hcat(x1,x2,x3) 

# group 2 variables
y1=rand(Normal(0,4),10)
y2=rand(Normal(-1,6),10)
y3=["1","1","2","1","1","2","2","0","2","0"]
g2=hcat(y1,y2,y3)
 
#create the data
df0=vcat(g1,g2)
df1 = DataFrame(df0) 

#use R function
R"kproto($df1, 2)"
like image 419
Adam Avatar asked Oct 28 '25 03:10

Adam


2 Answers

I don't know anything about the R package and what kind of input it expects, but the issue is probably how you construct the data matrix from which you construct your DataFrame, not the DataFrame constructor itself.

When you concatenate a numerical and a string column, Julia falls back on the element type Any for the resulting matrix:

julia> g1=hcat(x1,x2,x3)
10×3 Matrix{Any}:
  0.708309  -4.84767   "1"
  0.566883  -0.214217  "1"
...

That means your df0 matrix is:

julia> #create the data
       df0=vcat(g1,g2)
20×3 Matrix{Any}:
  0.708309   -4.84767   "1"
  0.566883   -0.214217  "1"
...

and the DataFrame constructor will just carry this lack of type information through rather than trying to infer column types.

julia> DataFrame(df0)
20×3 DataFrame
 Row │ x1         x2         x3  
     │ Any        Any        Any 
─────┼───────────────────────────
   1 │ 0.708309   -4.84767   1
   2 │ 0.566883   -0.214217  1
...

A simple way of getting around this is to just not concatenate your columns into a single matrix, but to construct the DataFrame from the columns:

julia> DataFrame([vcat(x1, y1), vcat(x2, y2), vcat(x3, y3)])
20×3 DataFrame
 Row │ x1         x2          x3     
     │ Float64    Float64     String 
─────┼───────────────────────────────
   1 │  0.708309   -4.84767   1
   2 │  0.566883   -0.214217  1
...

As you can see, we now have two Float64 numerical columns x1 and x2 in the resulting DataFrame.

like image 94
Nils Gudat Avatar answered Oct 31 '25 19:10

Nils Gudat


As an addition to the nice answer by Nils (as the problem is indeed when a matrix is constructed not when DataFrame is created) there is this little trick:

julia> df = DataFrame([1 1.0 "1"; 2 2.0 "2"], [:int, :float, :string])
2×3 DataFrame
 Row │ int  float  string
     │ Any  Any    Any
─────┼────────────────────
   1 │ 1    1.0    1
   2 │ 2    2.0    2

julia> identity.(df)
2×3 DataFrame
 Row │ int    float    string
     │ Int64  Float64  String
─────┼────────────────────────
   1 │     1      1.0  1
   2 │     2      2.0  2
like image 35
Bogumił Kamiński Avatar answered Oct 31 '25 17:10

Bogumił Kamiński



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!