I'm trying to use kproto() function from R package clustMixType to cluster mixed-type data in Julia, but I'm getting error No numeric variables in x! Try using kmodes() from package.... My data should have 3 variables: 2 continuous and 1 categorical. It seems after I used DataFrame() all the variables became categorical. Is there a way to avoid changing the variables type after using DataFrame() so that I have mixed-type data (continuous and categorical) to use kproto()?
using RCall
@rlibrary clustMixType
# group 1 variables
x1=rand(Normal(0,3),10)
x2=rand(Normal(1,2),10)
x3=["1","1","2","2","0","1","1","2","2","0"]
g1=hcat(x1,x2,x3)
# group 2 variables
y1=rand(Normal(0,4),10)
y2=rand(Normal(-1,6),10)
y3=["1","1","2","1","1","2","2","0","2","0"]
g2=hcat(y1,y2,y3)
#create the data
df0=vcat(g1,g2)
df1 = DataFrame(df0)
#use R function
R"kproto($df1, 2)"
I don't know anything about the R package and what kind of input it expects, but the issue is probably how you construct the data matrix from which you construct your DataFrame, not the DataFrame constructor itself.
When you concatenate a numerical and a string column, Julia falls back on the element type Any for the resulting matrix:
julia> g1=hcat(x1,x2,x3)
10×3 Matrix{Any}:
0.708309 -4.84767 "1"
0.566883 -0.214217 "1"
...
That means your df0 matrix is:
julia> #create the data
df0=vcat(g1,g2)
20×3 Matrix{Any}:
0.708309 -4.84767 "1"
0.566883 -0.214217 "1"
...
and the DataFrame constructor will just carry this lack of type information through rather than trying to infer column types.
julia> DataFrame(df0)
20×3 DataFrame
Row │ x1 x2 x3
│ Any Any Any
─────┼───────────────────────────
1 │ 0.708309 -4.84767 1
2 │ 0.566883 -0.214217 1
...
A simple way of getting around this is to just not concatenate your columns into a single matrix, but to construct the DataFrame from the columns:
julia> DataFrame([vcat(x1, y1), vcat(x2, y2), vcat(x3, y3)])
20×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 String
─────┼───────────────────────────────
1 │ 0.708309 -4.84767 1
2 │ 0.566883 -0.214217 1
...
As you can see, we now have two Float64 numerical columns x1 and x2 in the resulting DataFrame.
As an addition to the nice answer by Nils (as the problem is indeed when a matrix is constructed not when DataFrame is created) there is this little trick:
julia> df = DataFrame([1 1.0 "1"; 2 2.0 "2"], [:int, :float, :string])
2×3 DataFrame
Row │ int float string
│ Any Any Any
─────┼────────────────────
1 │ 1 1.0 1
2 │ 2 2.0 2
julia> identity.(df)
2×3 DataFrame
Row │ int float string
│ Int64 Float64 String
─────┼────────────────────────
1 │ 1 1.0 1
2 │ 2 2.0 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With