parse continous text file just containing lines into pandas dataframe

Question

I have a text file containing repeating lines and I want to convert into a dataframe.

10/21/2019
abcdef
100.00
10/22/2019
ghijk
120.00

There is an obvious pattern and I'd like the dataframe to look like this:

Data       | Description | Amount
10/21/2019 | abcdef      | 100.00
10/22/2019 | ghijk       | 120.00

How is this done?

Thanks.

sammywemmy · Accepted Answer

A bit of regex to pull out the details, then forward fill on the first two columns and remove the nulls

pattern = r"(?P<Date>\d{2}/\d{2}/\d{4})|(?P<Description>[a-z]+)|(?P<Amount>\d{1,}\.00)"

res = (df1.text.str.extract(pattern)
       .assign(Date = lambda x: x.Date.ffill(),
               Description = lambda x: x.Description.ffill()
              )
       .dropna(how='any')
      )

res


     Date   Description Amount
2   10/21/2019  abcdef  100.00
5   10/22/2019  ghijk   120.00

If you don't care about regex, and the format is constant, then we can reshape the data with numpy and create a new dataframe.

#reshape the data
#thanks to @Chester 
#removes unnecessary computation
res = np.reshape(df1.to_numpy(),(-1,3))


#create new dataframe
pd.DataFrame(res,columns=['Date','Description','Amount'])

       Date Description Amount
0   10/21/2019  abcdef  100.00
1   10/22/2019  ghijk   120.00

wombatonfire · Answer

Read raw data from a file to a Series and convert to PandasArray to simplify dealing with indices later:

raw_data = pd.read_csv("path	o\a\data\file.txt", names=['raw_data'], squeeze=True).array

Create a DataFrame using slicing:

df = pd.DataFrame(data={'Data': raw_data[::3], 'Description': raw_data[1::3], 'Amount': raw_data[2::3]})

Just 2 simple steps without regexes and unnecessary transformations. Short and efficient.

parse continous text file just containing lines into pandas dataframe

Tags:

python

pandas

gio888

2 Answers

sammywemmy

wombatonfire

Recent Activity

Donate For Us

parse continous text file just containing lines into pandas dataframe

Tags:

python

pandas

gio888

2 Answers

sammywemmy

wombatonfire

Related questions

Recent Activity

Donate For Us