Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing big python classes the right way [closed]

When writing a python class that have different functions for getting the data, and parsing the data; what is the most correct way? You can write it so you are populating self.data... one by one, and then running parse functions to populate self.parsed_data.... Or is it correct to write functions that accept self.data and returns self.parsed_data..?

Examples below. MyClass1 populates self.variables, and MyClass2 takes them as parameters. I think MyClass2 is "most" correct.

So, what is correct? And why? I have been trying to decide upon which of these two coding styles for a while. But I want to know which of these are considered best practice.

class MyClass1(object):
    def __init__(self):
        self.raw_data = None

    def _parse_data(self):
        # This is a fairly complex function xml/json parser
        raw_data = self.raw_data
        data = raw_data  #  Much for is done to do something with raw_data
        cache.set('cache_key', data, 600)  # Cache for 10 minutes
        return data

    def _populate_data(self):
        # This function grabs data from an external source
        self.raw_data = 'some raw data, xml, json or alike..'

    def get_parsed_data(self):
        cached_data = cache.get('cache_key')
        if cached_data:
            return cached_data
        else:
            self._populate_data()
            return self._parse_data()

mc1 = MyClass1()
print mc1.get_parsed_data()


class MyClass2(object):
    def _parse_data(self, raw_data):
        # This is a fairly complex function xml/json parser
        data = raw_data  # After some complicated work of parsing raw_data
        cache.set('cache_key', data, 600)  # Cache for 10 minutes
        return data

    def _get_data(self):
        # This function grabs data from an external source
        return 'some raw data, xml, json or alike..'

    def get_parsed_data(self):
        cached_data = cache.get('cache_key')
        if cached_data:
            return cached_data
        else:
            return self._populate_data(self._get_data())

mc2 = MyClass2()
print mc1.get_parsed_data()
like image 853
xeor Avatar asked Sep 11 '25 15:09

xeor


2 Answers

It's down to personal preference, finally. But IMO, it's better to just have a module-level function called parse_data which takes in the raw data, does a bunch of work and returns the parsed data. I assume your cache keys are somehow derived from the raw data, which means the parse_data function can also implement your caching logic.

The reason I prefer a function vs having a full-blown class is the simplicity. If you want to have a class which provides data fields pulled from your raw data, so that users of your objects can do something like obj.some_attr instead of having to look inside some lower-level data construct (e.g. JSON, XML, Python dict, etc.), I would make a simple "value object" class which only contains data fields, and no parsing logic, and have the aforementioned parse_data function return an instance of this class (essentially acting as a factory function for your data class). This leads to less state, simpler objects and no laziness, making your code easier to reason about.

This would also make it easier to unit test consumers of this class, because in those tests, you can simply instantiate the data object with fields, instead of having to provide a big blob of test raw data.

like image 87
tom Avatar answered Sep 13 '25 05:09

tom


For me the most correct class is the class the user understands and uses with as few errors as possible.

When I look at class 2 I ask myself how would I use it...

mc2 = MyClass2()
print mc1.get_parsed_data()

I would like only

print get_parsed_data()

Sometimes it is better to not write classes at all.

like image 44
User Avatar answered Sep 13 '25 06:09

User