Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to visualize high volumn 3 dimensional data

I have a data set like the following:

import numpy as np
from pandas import DataFrame
mypos = np.random.randint(10, size=(100, 2))
mydata = DataFrame(mypos, columns=['x', 'y'])
myres = np.random.rand(100, 1)
mydata['res'] = myres

The res variable is continous, the x and y variables are integers representing positions (therefore largely repetitive), and res represents kind of correlations between pairs of positions.

I am wondering what are the best ways of visualizing this data set? Possible approaches already considered:

  1. Scatter plot, with the res variable visualized by a color gradient.
  2. Parallel coordinates plot.

The first approach is problematic when the number of positions get large, because high values (which are the values we care about) of the res variable would be drowned in a sea of small dots.

The second approach could be promising, but I am having trouble producing it. I have tried the parallel_coordinates function from the pandas module, but it's not behaving as I would like it to. (see this question here: parallel coordinates plot for continous data in pandas )

like image 275
qed Avatar asked Nov 22 '25 08:11

qed


1 Answers

I hope this helps to find a solution in R. Good luck.

# you need this package for the colour palette
library(RColorBrewer)

# create the random data
dd <- data.frame(
    x = round(runif(100, 0, 10), 0),
    y = round(runif(100, 0, 10), 0),
    res = runif(100)
)

# pick the number of colours (granularity of colour scale)
nColors <- 100 

# create the colour pallete
cols <-colorRampPalette(colors=c("white","blue"))(nColors)

# get a zScale for the colours
zScale <- seq(min(dd$res), max(dd$res), length.out = nColors)

# function that returns the nearest colour given a value of res
findNearestColour <- function(x) {
    colorIndex <- which(abs(zScale - x) == min(abs(zScale - x)))
    return(cols[colorIndex])
}

# the first plot is the scatterplot
### this has problems because points come out on top of eachother
plot(y ~ x, dd, type = "n")
for(i in 1:dim(dd)[1]){
    with(dd[i,],
        points(y ~ x, col = findNearestColour(res), pch = 19)
    )
}

# this is your parallel coordinates plot (a little better)
plot(1, 1, xlim = c(0, 1), ylim = c(min(dd$x, dd$y), max(dd$x, dd$y)), 
     type = "n", axes = F, ylab = "", xlab = "")
for(i in 1:dim(dd)[1]){
    with(dd[i,],
        segments(0, x, 1, y, col = findNearestColour(res))
    )
}
like image 74
roman Avatar answered Nov 24 '25 20:11

roman