Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to write into a Pandas dataframe using nothing as the delimiter?

Tags:

python

pandas

I have a text file with many DNA sequences, each one on a separate line with 20 base pairs. I would like to read the file into a dataframe with each base as its own column without using a for loop or something else that requires an iteration through the entire file, since the file is very large.

I've tried using "" as the delimiter, but it just causes the entire line to be processed into one column. I've also tried using "." and "\w" which both did not do what I wanted it to.

For example, for a file that has:

ACGT
CGTA
GTAC
TACG

The dataframe should look like this:

      1   2   3   4
1     A   C   G   T
2     C   G   T   A
3     G   T   A   C
4     T   A   C   G
like image 735
Ethan Li Avatar asked Nov 22 '25 10:11

Ethan Li


2 Answers

You can read it as one column and split later

# csv
# ATGC
# CTAG

df = pd.read_csv(header=None)
# df
#       0
# 0  ATGC
# 1  CTAG

df[0].str.split('', expand=True)

Output:

    0   1   2   3   4   5
0       A   T   G   X   
1       G   T   A   X   

which means you have two extra columns, one front and one back. But you can drop them easily, for example:

df[0].str.split('', expand=True).iloc[:,1:-1]

gives:

   1  2  3  4
0  A  T  G  C
1  C  T  A  G
like image 115
Quang Hoang Avatar answered Nov 24 '25 01:11

Quang Hoang


You can use pandas.read_fwf instead of pandas.read_csv to accomplish this. If you have the file named "dna.txt" as below:

ACGT
CGTA
GTAC
TACG

You can execute the following:

df = pd.read_fwf("dna.txt", header=None, widths=[1] * 4)
print(df)

To output:

   0  1  2  3
0  A  C  G  T
1  C  G  T  A
2  G  T  A  C
3  T  A  C  G
like image 38
Primer81 Avatar answered Nov 24 '25 01:11

Primer81