Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Count the number of files with a specific extension in different sub-directories/folders

Tags:

r

I have a bibliographic directory/folder (/Biblio) with 66 subdirectories/folders (/01 folder, /02 folder, … /66 folder) that contain a different number of files with different extensions (e.g. pdf, txt, csv, …), and subfolders with files with similar extensions, but I am not interested on the information of the these sub-subfolders. Some subfolders do not have any “pdf” file. I want to count the number of “pdf” files in each subfolder.

I can list the pdf files in all subfolders of “/Biblio” with:

BiblioPath = "C:/Biblio"
BiblioDir = list.dirs(path = BiblioPath, full.names = TRUE, recursive = FALSE)
BiblioFiles = list.files(path = BiblioDir, pattern = "pdf", recursive = FALSE, full.names = TRUE) 

(Note: the string “pdf” does never occur in my filenames). “BiblioFiles” is the full list of the pdf files, but I do not know how to count how many “pdf” files are in each subdirectory without a loop.

like image 367
JASC Avatar asked Oct 28 '25 14:10

JASC


1 Answers

tidyverse:

library(tidyverse)

fils <- list.files("~/Development", pattern="pdf$", full.names = TRUE, recursive = TRUE)

data_frame(
  dir = dirname(fils)
) %>% 
  count(dir) %>% 
  mutate(dir = map_chr(dir, digest::digest)) # you don't need to see my dir names so just remove this from your work

## # A tibble: 14 x 2
##                                 dir     n
##                               <chr> <int>
##  1 06e6c4fed6e941d00c04cae3bd24888b    18
##  2 98bf27d6686a52772cb642a136473d86     9
##  3 c07bfc45ce148933269d7913e1c5e833     1
##  4 84088c9c18b0eb10478f17870886b481     1
##  5 baeb85661aad8bff2f2b52cb55f14ede     1
##  6 c484306deae0a70b46854ede3e6b317a    22
##  7 70750a506855c6c6e09f8bdff32550f8     4
##  8 8c5cbe2598f1f24f1549aaafd77b14c9     1
##  9 9008083601c1a75def1d1418d8acf39e     1
## 10 0c25ef8d27250f211d56eff8641f8beb     1
## 11 3e30987a34a74cb6846abc51e48e7f9e     1
## 12 e71c330b185bf4974d26d5379793671b     1
## 13 fe2e8912e58ba889cf7c6c3ec565b2ee     4
## 14 e07698c59f5c11ac61e927e91c2e8493    27

base:

fils <- list.files("~/Development", pattern="pdf$", full.names = TRUE, recursive = TRUE)
dirs <- dirname(fils)
dirs <- sapply(dirs,digest::digest) # you don't need to see my dir names so just remove this from your work
as.data.frame(table(dirs))
##                                dirs Freq
## 1  06e6c4fed6e941d00c04cae3bd24888b   18
## 2  0c25ef8d27250f211d56eff8641f8beb    1
## 3  3e30987a34a74cb6846abc51e48e7f9e    1
## 4  70750a506855c6c6e09f8bdff32550f8    4
## 5  84088c9c18b0eb10478f17870886b481    1
## 6  8c5cbe2598f1f24f1549aaafd77b14c9    1
## 7  9008083601c1a75def1d1418d8acf39e    1
## 8  98bf27d6686a52772cb642a136473d86    9
## 9  baeb85661aad8bff2f2b52cb55f14ede    1
## 10 c07bfc45ce148933269d7913e1c5e833    1
## 11 c484306deae0a70b46854ede3e6b317a   22
## 12 e07698c59f5c11ac61e927e91c2e8493   27
## 13 e71c330b185bf4974d26d5379793671b    1
## 14 fe2e8912e58ba889cf7c6c3ec565b2ee    4
like image 133
hrbrmstr Avatar answered Oct 31 '25 05:10

hrbrmstr