How to calculate the area of a ggplot stat_ellipse() when 'type = "norm"?

Question

Similar to this question

Is there any way to calculate the area of this ellipse when type = "norm"?

Default is type = "t". type = "norm" displays a different ellipse because it assumes a multivariate normal distribution instead of multivariate t-distribution

Here is the code and the plot (using similar code as other post):

library(ggplot2)
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(type = "norm")

Previous answer was:

#Plot object
p = ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.

#Get ellipse coordinates from plot

pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]

# Center of ellipse

ctr = MASS::cov.trob(el)$center 
# I tried changing this to 'stats::cov.wt' instead of 'MASS::cov.trob' 
#from what is saw from (https://github.com/tidyverse/ggplot2/blob/master/R/stat-ellipse.R#L98)

# Calculate distance to center from each point on the ellipse

dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))

# Calculate area of ellipse from semi-major and semi-minor axes. 
These are, respectively, the largest and smallest values of dist2center. 

pi*min(dist2center)*max(dist2center)

Changing to stats::cov.wt wasn't enough to get the area of the "norm" ellipse (value calculated was the same). Any ideas on how to change the code?

Thanks!

tjebo · Accepted Answer

Nice question, I learned something. But I cannot reproduce your problem and get (of course) different values with the different approaches.

I think the approach in the linked answer is not quite correct because the ellipse center is not calculated with the data, but based on the ellipse coordinates. I have updated to calculate this based on the data.

library(ggplot2)

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

p_norm <- ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  stat_ellipse(type = "norm")

pb <- ggplot_build(p_norm)
el <- pb$data[[2]][c("x", "y")]
ctr <- MASS::cov.trob(data)$center #updated previous answer here
dist2center <- sqrt(rowSums((t(t(el) - ctr))^2))
pi * min(dist2center) * max(dist2center)
#> [1] 18.40872

^{Created on 2020-02-27 by the reprex package (v0.3.0)}

update thanks to Axeman for the thoughts.

The area can be directly calculated from the covariance matrix by calculating the eigenvalues first. You need to scale the variances / eigenvalues by the factor of confidence that you want to get. This blog helped me a lot to understand this a bit better

set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

cov_dat <- cov(dat) # covariance matrix

eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix

vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse

pi * vec[1] * vec[2]  
#> [1] 18.38858

^{Created on 2020-02-27 by the reprex package (v0.3.0)}

In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814

^{Created on 2020-02-27 by the reprex package (v0.3.0)}

The factor 5.991 represents the Chi-square likelihood for the 95% confidence of the data. For more information, see this thread

How to calculate the area of a ggplot stat_ellipse() when 'type = "norm"?

Tags:

r

ggplot2

Olivia Jean

1 Answers

tjebo

Recent Activity

Donate For Us

How to calculate the area of a ggplot stat_ellipse() when 'type = "norm"?

Tags:

r

ggplot2

Olivia Jean

1 Answers

tjebo

Related questions

Recent Activity

Donate For Us