Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the area of a ggplot stat_ellipse() when 'type = "norm"?

Tags:

r

ggplot2

Similar to this question

Is there any way to calculate the area of this ellipse when type = "norm"?

Default is type = "t". type = "norm" displays a different ellipse because it assumes a multivariate normal distribution instead of multivariate t-distribution

Here is the code and the plot (using similar code as other post):

library(ggplot2)
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(type = "norm")

Previous answer was:

#Plot object
p = ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.

#Get ellipse coordinates from plot

pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]

# Center of ellipse

ctr = MASS::cov.trob(el)$center 
# I tried changing this to 'stats::cov.wt' instead of 'MASS::cov.trob' 
#from what is saw from (https://github.com/tidyverse/ggplot2/blob/master/R/stat-ellipse.R#L98)

# Calculate distance to center from each point on the ellipse

dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))

# Calculate area of ellipse from semi-major and semi-minor axes. 
These are, respectively, the largest and smallest values of dist2center. 

pi*min(dist2center)*max(dist2center)

Changing to stats::cov.wt wasn't enough to get the area of the "norm" ellipse (value calculated was the same). Any ideas on how to change the code?

Thanks!

like image 491
Olivia Jean Avatar asked Dec 06 '25 05:12

Olivia Jean


1 Answers

Nice question, I learned something. But I cannot reproduce your problem and get (of course) different values with the different approaches.

I think the approach in the linked answer is not quite correct because the ellipse center is not calculated with the data, but based on the ellipse coordinates. I have updated to calculate this based on the data.

library(ggplot2)

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

p_norm <- ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  stat_ellipse(type = "norm")

pb <- ggplot_build(p_norm)
el <- pb$data[[2]][c("x", "y")]
ctr <- MASS::cov.trob(data)$center #updated previous answer here
dist2center <- sqrt(rowSums((t(t(el) - ctr))^2))
pi * min(dist2center) * max(dist2center)
#> [1] 18.40872

Created on 2020-02-27 by the reprex package (v0.3.0)

update thanks to Axeman for the thoughts.

The area can be directly calculated from the covariance matrix by calculating the eigenvalues first. You need to scale the variances / eigenvalues by the factor of confidence that you want to get. This blog helped me a lot to understand this a bit better

set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

cov_dat <- cov(dat) # covariance matrix

eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix

vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse

pi * vec[1] * vec[2]  
#> [1] 18.38858

Created on 2020-02-27 by the reprex package (v0.3.0)

In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814

Created on 2020-02-27 by the reprex package (v0.3.0)

The factor 5.991 represents the Chi-square likelihood for the 95% confidence of the data. For more information, see this thread

like image 193
tjebo Avatar answered Dec 07 '25 20:12

tjebo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!