I am trying to add columns of having mismatched length(number of rows) to a dataframe, it throws an error of,
DimensionMismatch("length of new column Target which is 60000 must match the number of rows in data frame (47040000)")
My code snippet is,
df = DataFrame(:Feature => train_x, :Target => train_y)
#train_x has 47040000 rows
#train_y has 60000 rows
Please suggest a solution for this problem. Thank you in advance.
Are you sure this is what you're trying to do? Normally one would expect that there are a many rows of features as there are rows of the target column, so this error might point to a conceptual issue in your code.
If you absolutely have to do this though, I see two options:
missing or some value of your choice, so :Target => [train_y; [missing for _ in length(train_x) - length(train_y)] . Here I'm padding at the end of the vector, which might or might not be appropriate in your caseleftjoin of a dataframe with your train_x  column onto a dataframe with your train_y column - for this you will need an ibex column in both DataFrames that describes how the rows of y match to x. If you just add a running index 1:length(train_*) to both DataFrames the result will be the same as padding the end of train_y with missing
Since a DataFrame is actually a set of columns this is possible:
df = DataFrame(x=Int[],y=Int[])
append!(df.x,[1,2])
append!(df.y,[1,2,3])
However, since such data frame does not make sense, you will not be able to work with it via the standard DataFrames API (it will be seen as a corrupt DataFrame):
julia> df[1,:]
DataFrameRowError showing value of type DataFrameRow{DataFrame,DataFrames.Index}:
ERROR: AssertionError: Data frame is corrupt: length of column :y (3) does not match length of column 1 (2). The column vector has likely been resized unintentionally (either directly or because it is shared with another data frame).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With