I have a dataset that looks like the following:
| INCOME | WEALTH |
|---|---|
| 10.000 | 100000 |
| 15.000 | 111000 |
| 14.200 | 123456 |
| 12.654 | 654321 |
I have many more rows.
I now want to now find how much INCOME a household in a specific WEALTH percentile has. The following quantiles are relevant:
c(0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.95,0.99)
I have always used the following code to get specific percentile values:
a <- quantile(WEALTH, probs = c(0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.95,0.99))
But now I want to base my percentiles on WEALTH but get the respective INCOME. I have tried the following code but the results are not plausible:
df$percentile = ntile(df$WEALTH,100)
df <- df[df$percentile %in% c(1,5,10,25,50,75,90,95,99), ]
a <- df %>%
group_by(percentile) %>%
summarise(max = max(INCOME))
The results that I get a not consistent with other parts of the analysis that I have done. I assume that the percentile when using the "quantile" function are calculated differently that simply taking the maximum.
Im not sure if i understood your question correctly, but the quantile has different methods of calculation. I for example always go for number 6, since this is what i was taought in my stat courses.
type: an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
Read more about different types by using ?quantile commands (help on quantile)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With