I am interested to skip some lines of my data frame before the header names . How can i do it by skiping all the lines before ID_REF or if ID_REF is not present, check for the pattern ILMN_ and deleting all the lines keeping immediate first if not containing #.
# GEOarchive matrix file.
ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS 1688628068_A.BEAD_STDERR 1688628068_A.Detection Pval
ILMN_1343291 62821.84 135 413.9399 0
ILMN_1343292 3255.167 131 47.76587 0
ILMN_1343293 42924.91 152 539.3026 0
ILMN_1343294 55255.21 100 746.1457 0
In linux, you could use awk with fread or it can be piped with read.table. Here, I changed the delimiter to , using awk
pth <- '/home/akrun/file.txt' #change it to your path
v1 <- sprintf("awk '/^(ID_REF|LMN)/{ matched = 1} matched {$1=$1; print}' OFS=\",\" %s", pth)
and read with fread
library(data.table)
fread(v1)
# ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS
#1: ILMN_1343291 62821.840 135
#2: ILMN_1343292 3255.167 131
#3: ILMN_1343293 42924.910 152
#4: ILMN_1343294 55255.210 100
# 1688628068_A.BEAD_STDERR 1688628068_A.Detection_Pval
#1: 413.93990 0
#2: 47.76587 0
#3: 539.30260 0
#4: 746.14570 0
Or using read.table
read.table(pipe(v1), header=TRUE, sep=',', check.names=FALSE)
# ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS
#1 ILMN_1343291 62821.840 135
#2 ILMN_1343292 3255.167 131
#3 ILMN_1343293 42924.910 152
#4 ILMN_1343294 55255.210 100
# 1688628068_A.BEAD_STDERR 1688628068_A.Detection_Pval
#1 413.93990 0
#2 47.76587 0
#3 539.30260 0
#4 746.14570 0
NOTE: I changed the column name from 1688628068_A.Detection Pval to 1688628068_A.Detection_Pval
For some reason, the extra spaces is creating problems with fread. With read.table it is not an issue. So, the following also works fine with read.table
v2 <- sprintf("awk '/^(ID_REF|ILMN)/{ matched = 1} matched { print}' %s", pth)
read.table(pipe(v2), header=TRUE, check.names=FALSE)
# ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS
#1 ILMN_1343291 62821.840 135
#2 ILMN_1343292 3255.167 131
#3 ILMN_1343293 42924.910 152
#4 ILMN_1343294 55255.210 100
# 1688628068_A.BEAD_STDERR 1688628068_A.Detection_Pval
#1 413.93990 0
#2 47.76587 0
#3 539.30260 0
#4 746.14570 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With