Sometimes I try to simulate data by using the rnorm function, which I have done below:
mom.iq <- rnorm(n=1000,
mean=120,
sd=15)
kid.score <- rnorm(n=1000,
mean=45,
sd=20)
df <- data.frame(mom.iq,
kid.score)
But when I plot something like this, it usually ends up with data thats highly uncorrelated:
library(ggpubr)
ggscatter(df,
x="mom.iq",
y="kid.score")+
geom_smooth(method = "lm")

However, I would like to simulate something with a stronger correlation if possible. Is there an easy way to do this within R? I'm aware that I could just as easily just produce my own values manually, but thats not super practical for recreating large samples.
What you are doing is to generate two independent variables; so, it is normal not to be correlated. What you can do is this:
# In order to make the values reproducible
set.seed(12345)
# Generate independent variable
x <- rnorm(n=1000, mean=120, sd=15)
# Generate the dependen variable
y <- 3*x + 6 + rnorm(n=1000, mean = 0, sd = 5)
I used 3 and 6, but you can define them as you want (a and b) in order to get a linear dependence defined as y = a*x + b.
The sum of rnorm(n=1000, mean = 0, sd = 5) is done to add some variability and avoid a perfect correlation between x and y. If you want to get a more correlated data, reduce the standard deviation (sd) and to get a lower correlation, increase its value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With