Is there any way to write into a Pandas dataframe using nothing as the delimiter?

Question

I have a text file with many DNA sequences, each one on a separate line with 20 base pairs. I would like to read the file into a dataframe with each base as its own column without using a for loop or something else that requires an iteration through the entire file, since the file is very large.

I've tried using "" as the delimiter, but it just causes the entire line to be processed into one column. I've also tried using "." and "\w" which both did not do what I wanted it to.

For example, for a file that has:

ACGT
CGTA
GTAC
TACG

The dataframe should look like this:

      1   2   3   4
1     A   C   G   T
2     C   G   T   A
3     G   T   A   C
4     T   A   C   G

Quang Hoang · Accepted Answer

You can read it as one column and split later

# csv
# ATGC
# CTAG

df = pd.read_csv(header=None)
# df
#       0
# 0  ATGC
# 1  CTAG

df[0].str.split('', expand=True)

Output:

    0   1   2   3   4   5
0       A   T   G   X   
1       G   T   A   X

which means you have two extra columns, one front and one back. But you can drop them easily, for example:

df[0].str.split('', expand=True).iloc[:,1:-1]

gives:

   1  2  3  4
0  A  T  G  C
1  C  T  A  G

Primer81 · Answer

You can use pandas.read_fwf instead of pandas.read_csv to accomplish this. If you have the file named "dna.txt" as below:

ACGT
CGTA
GTAC
TACG

You can execute the following:

df = pd.read_fwf("dna.txt", header=None, widths=[1] * 4)
print(df)

To output:

   0  1  2  3
0  A  C  G  T
1  C  G  T  A
2  G  T  A  C
3  T  A  C  G

Is there any way to write into a Pandas dataframe using nothing as the delimiter?

Tags:

python

pandas

Ethan Li

2 Answers

Quang Hoang

Primer81

Recent Activity

Donate For Us

Is there any way to write into a Pandas dataframe using nothing as the delimiter?

Tags:

python

pandas

Ethan Li

2 Answers

Quang Hoang

Primer81

Related questions

Recent Activity

Donate For Us