Similar to this question
Is there any way to calculate the area of this ellipse when type = "norm"?
Default is type = "t". type = "norm" displays a different ellipse because it assumes a multivariate normal distribution instead of multivariate t-distribution
Here is the code and the plot (using similar code as other post):
library(ggplot2)
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
ggplot (data, aes (x = x, y = y))+
geom_point()+
stat_ellipse(type = "norm")
Previous answer was:
#Plot object
p = ggplot (data, aes (x = x, y = y))+
geom_point()+
stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.
#Get ellipse coordinates from plot
pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]
# Center of ellipse
ctr = MASS::cov.trob(el)$center
# I tried changing this to 'stats::cov.wt' instead of 'MASS::cov.trob'
#from what is saw from (https://github.com/tidyverse/ggplot2/blob/master/R/stat-ellipse.R#L98)
# Calculate distance to center from each point on the ellipse
dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))
# Calculate area of ellipse from semi-major and semi-minor axes.
These are, respectively, the largest and smallest values of dist2center.
pi*min(dist2center)*max(dist2center)
Changing to stats::cov.wt wasn't enough to get the area of the "norm" ellipse (value calculated was the same). Any ideas on how to change the code?
Thanks!
Nice question, I learned something. But I cannot reproduce your problem and get (of course) different values with the different approaches.
I think the approach in the linked answer is not quite correct because the ellipse center is not calculated with the data, but based on the ellipse coordinates. I have updated to calculate this based on the data.
library(ggplot2)
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
p_norm <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
stat_ellipse(type = "norm")
pb <- ggplot_build(p_norm)
el <- pb$data[[2]][c("x", "y")]
ctr <- MASS::cov.trob(data)$center #updated previous answer here
dist2center <- sqrt(rowSums((t(t(el) - ctr))^2))
pi * min(dist2center) * max(dist2center)
#> [1] 18.40872
Created on 2020-02-27 by the reprex package (v0.3.0)
update thanks to Axeman for the thoughts.
The area can be directly calculated from the covariance matrix by calculating the eigenvalues first. You need to scale the variances / eigenvalues by the factor of confidence that you want to get. This blog helped me a lot to understand this a bit better
set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
cov_dat <- cov(dat) # covariance matrix
eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix
vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse
pi * vec[1] * vec[2]
#> [1] 18.38858
Created on 2020-02-27 by the reprex package (v0.3.0)
In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))
pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814
Created on 2020-02-27 by the reprex package (v0.3.0)
The factor 5.991 represents the Chi-square likelihood for the 95% confidence of the data. For more information, see this thread
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With