Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse a tree like data into a nested list in Python?

I have some files which have tree like structure. For example:

A
  Result
    a11
    a12
  Lolim
    a21
    a22
  Uplim
    a31
    a32
B
  Result
    b11
    b12
  Lolim
    b21
    b22

I am interested in parsing this files in order to obtain a dataframe which looks like this:

Name Result Lolim Uplim
A    a12    a22   a32
B    b12    b22   NA

My idea was to split somehow the file in two parts: A and B. And after that split each one in subcategories. For A would be Result, Lolim and Uplim and for B Result and Lolim. Finally each subcategory in 2 parts. Therefore I will end up with a nested list, and than I will be able to create a dataframe. But I don't know how to obtain this nested list.

Or is there another method for this? Can you recommend me modules or functions which can be useful?

like image 821
sanyi14ka Avatar asked Feb 20 '26 20:02

sanyi14ka


1 Answers

import collections
import pandas as pd

with open("data_tree.dat", "r") as data:
    dct = collections.OrderedDict()
    key = ""
    sub_key = ""
    for line in data:
        if " " not in line:  # single space
            key = line.strip()
            dct[key] = collections.OrderedDict()
        elif " " * 4 in line and " " * 6 not in line:  # 4 spaces
            sub_key = line.strip()
            dct[key][sub_key] = ""
        elif " " * 6 in line:  # 6 spaces
            item = line.strip()
            dct[key][sub_key] = item  # overwrite, last element only

df = pd.DataFrame.from_dict(dct).transpose()
df.columns.names = ["Name"]
df = df[["Result", "Lolim", "Uplim"]]  # if column order matters
df = df.fillna("NA")  # in case you want NA and not NaN

print(df)

Output:

Name Result Lolim Uplim
A       a12   a22   a32
B       b12   b22   NA

This assumes that data_tree.dat looks like this and is contained within the same folder as the .py file containing the above code.

Or as a function:

import collections
import pandas as pd


def dat_to_df(path_to_file):
    with open(path_to_file, "r") as data:
        dct = collections.OrderedDict()
        key = ""
        sub_key = ""
        for line in data:
            if " " not in line:
                key = line.strip()
                dct[key] = collections.OrderedDict()
            elif " " * 4 in line and " " * 6 not in line:
                sub_key = line.strip()
                dct[key][sub_key] = ""
            elif " " * 6 in line:
                item = line.strip()
                dct[key][sub_key] = item

    df = pd.DataFrame.from_dict(dct).transpose()
    df.columns.names = ["Name"]
    df = df[["Result", "Lolim", "Uplim"]]
    return df.fillna("NA")

dataframe = dat_to_df("data_tree.dat")

print(dataframe)
like image 191
Spherical Cowboy Avatar answered Feb 22 '26 10:02

Spherical Cowboy