Properly calling variable name when creating multiple Benford plots

Question

I am creating Benford plots for all the numeric variables in my dataset. https://en.wikipedia.org/wiki/Benford%27s_law

Running a single variable

#install.packages("benford.analysis")
library(benford.analysis)
plot(benford(iris$Sepal.Length))

Looks great. And the legend says "Dataset: iris$Sepal.Length", perfect!.

Benford 1

Using apply to run 4 variables,

apply(iris[1:4], 2, function(x) plot(benford(x)))

Creates four plots, however, each plot's legend says "Dataset: x"

Benford 2

I attempted to use a for loop,

for (i in colnames(iris[1:4])){
  plot(benford(iris[[i]]))
}

This creates four plots, but now the legends says "Dataset: iris[[i]]". And I would like the name of the variable on each chart.

Benford 3

I tried a different loop, hoping to get titles with an evaluated parsed string like "iris$Sepal.Length":

for (i in colnames(iris[1:4])){
  plot(benford(eval(parse(text=paste0("iris$", i)))))
}

But now the legend says "Dataset: eval(parse(text=paste0("iris$", i)))".

Benford 4

AND, Now I've run into the infamous eval(parse(text=paste0( (eg: How to "eval" results returned by "paste0"? and R: eval(parse(...)) is often suboptimal )

I would like labels such as "Dataset: iris$Sepal.Length" or "Dataset: Sepal.Length". How can I create multiple plots with meaningfully variable names in the legend?

Chris · Accepted Answer

This is happening because of the first line within the benford function=:

benford <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3){

  data.name <- as.character(deparse(substitute(data)))

Source: https://github.com/cran/benford.analysis/blob/master/R/functions-new.R

data.name is then used to name your graph. Whatever variable name or expression you pass to the function will unfortunately be caught by the deparse(substitute()) call, and will be used as the name for your graph.

One short-term solution is to copy and rewrite the function:

#install.packages("benford.analysis")
library(benford.analysis)
#install.packages("data.table")
library(data.table) # needed for function

# load hidden functions into namespace - needed for function
r <- unclass(lsf.str(envir = asNamespace("benford.analysis"), all = T))
for(name in r) eval(parse(text=paste0(name, '<-benford.analysis:::', name)))


benford_rev <- function{} # see below

for (i in colnames(iris[1:4])){
   plot(benford_rev(iris[[i]], data.name = i))
}

enter image description here

This has negative side effects of:

Not being maintainable with package revisions
Fills your GlobalEnv with normally hidden functions in the package

So hopefully someone can propose a better way!

benford_rev <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3, data.name = as.character(deparse(substitute(data)))){ # changed

 # removed line

  benford.digits <- generate.benford.digits(number.of.digits)

  benford.dist <- generate.benford.distribution(benford.digits)

  empirical.distribution <- generate.empirical.distribution(data, number.of.digits,sign, second.order = FALSE, benford.digits)

  n <- length(empirical.distribution$data)

  second.order <- generate.empirical.distribution(data, number.of.digits,sign, second.order = TRUE, benford.digits, discrete = discrete, round = round)

  n.second.order <- length(second.order$data)

  benford.dist.freq <- benford.dist*n

  ## calculating useful summaries and differences
  difference <- empirical.distribution$dist.freq - benford.dist.freq

  squared.diff <- ((empirical.distribution$dist.freq - benford.dist.freq)^2)/benford.dist.freq

  absolute.diff <- abs(empirical.distribution$dist.freq - benford.dist.freq)

  ### chi-squared test
  chisq.bfd <- chisq.test.bfd(squared.diff, data.name)

  ### MAD
  mean.abs.dev <- sum(abs(empirical.distribution$dist - benford.dist)/(length(benford.dist)))

  if (number.of.digits > 3) {
    MAD.conformity <- NA
  } else {
    digits.used <- c("First Digit", "First-Two Digits", "First-Three Digits")[number.of.digits]  
    MAD.conformity <- MAD.conformity(MAD = mean.abs.dev, digits.used)$conformity
  }





  ### Summation
  summation <- generate.summation(benford.digits,empirical.distribution$data, empirical.distribution$data.digits)
  abs.excess.summation <- abs(summation - mean(summation))

  ### Mantissa
  mantissa <- extract.mantissa(empirical.distribution$data)
  mean.mantissa <- mean(mantissa)
  var.mantissa <- var(mantissa)
  ek.mantissa <- excess.kurtosis(mantissa)
  sk.mantissa <- skewness(mantissa)

  ### Mantissa Arc Test
  mat.bfd <- mantissa.arc.test(mantissa, data.name)

  ### Distortion Factor
  distortion.factor <- DF(empirical.distribution$data)  

  ## recovering the lines of the numbers
  if (sign == "positive") lines <- which(data > 0 & !is.na(data))
  if (sign == "negative") lines <- which(data < 0 & !is.na(data))
  if (sign == "both")     lines <- which(data != 0 & !is.na(data))
  #lines <- which(data %in% empirical.distribution$data)

  ## output
  output <- list(info = list(data.name = data.name,
                             n = n,
                             n.second.order = n.second.order,
                             number.of.digits = number.of.digits),

                 data = data.table(lines.used = lines,
                                   data.used = empirical.distribution$data,
                                   data.mantissa = mantissa,
                                   data.digits = empirical.distribution$data.digits),

                 s.o.data = data.table(second.order = second.order$data,
                                       data.second.order.digits = second.order$data.digits),

                 bfd = data.table(digits = benford.digits,
                                  data.dist = empirical.distribution$dist,
                                  data.second.order.dist = second.order$dist,
                                  benford.dist = benford.dist,
                                  data.second.order.dist.freq = second.order$dist.freq,
                                  data.dist.freq = empirical.distribution$dist.freq,
                                  benford.dist.freq = benford.dist.freq,
                                  benford.so.dist.freq = benford.dist*n.second.order,
                                  data.summation = summation,
                                  abs.excess.summation = abs.excess.summation,
                                  difference = difference,
                                  squared.diff = squared.diff,
                                  absolute.diff = absolute.diff),

                 mantissa = data.table(statistic = c("Mean Mantissa", 
                                                     "Var Mantissa", 
                                                     "Ex. Kurtosis Mantissa",
                                                     "Skewness Mantissa"),
                                       values = c(mean.mantissa = mean.mantissa,
                                                  var.mantissa = var.mantissa,
                                                  ek.mantissa = ek.mantissa,
                                                  sk.mantissa = sk.mantissa)),
                 MAD = mean.abs.dev,

                 MAD.conformity = MAD.conformity,

                 distortion.factor = distortion.factor,

                 stats = list(chisq = chisq.bfd,
                              mantissa.arc.test = mat.bfd)
  )

  class(output) <- "Benford"

  return(output)

}

Properly calling variable name when creating multiple Benford plots

Tags:

r

benford.analysis

M.Viking

1 Answers

Chris

Recent Activity

Donate For Us

Properly calling variable name when creating multiple Benford plots

Tags:

r

benford.analysis

M.Viking

1 Answers

Chris

Related questions

Recent Activity

Donate For Us