Given this data:
foo kk type1 1 2 3
bar kk type2 3 5 1
I would like to create a dictionary of dictionary of list.
In Perl it's called hash of hash of array. It can be achieve with the following line (executable here https://eval.in/118535)
push @{$hohoa{$name}{$type}},($v1,$v2,$v3);
Output of $hohoa in Perl:
$VAR1 = {
          'bar' => {
                     'type2' => [
                                  '3',
                                  '5',
                                  '1'
                                ]
                   },
          'foo' => {
                     'type1' => [
                                  '1',
                                  '2',
                                  '3'
                                ]
                   }
        };
What's the way to do it in Python?
Update: Why the following for loop variation didn't store all the values?
#!/usr/bin/env python
import sys
import pprint
from collections import defaultdict
outerdict = defaultdict(dict)
with open('data.txt') as infh:
    for line in infh:
        name, _, type_, values = line.split(None, 3)
        valist = values.split();
        for i in range(len(valist)):
            thval = valist[i];
            outerdict[name][type] = thval
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(outerdict)
It prints this:
defaultdict(<type 'dict'>, {'foo': {<type 'type'>: '3'}, 'bar': {<type 'type'>: '1'}})
Update 2: The output seems problematic when the data looks like this:
foo kk type1 1.2 2.10 3.3
bar kk type2 3.2 5.2 1.0
It depends on what you are trying to achieve; how many keys should be added to the inner dict?
The simplest way is to just create new dict literals for the inner dict:
outerdict = {}
outerdict[name] = {type_: [v1, v2, v3]}
or you could use dict.setdefault() to materialize the inner dict as needed:
outerdict.setdefault(name, {})[type_] = [v1, v2, v3]
or you could use collections.defaultdict() to have it handle new values for you:
from collections import defaultdict
outerdict = defaultdict(dict)
outerdict[name][type_] = [v1, v2, v3]
When parsing a file line by line, I'd use the latter, albeit a little simplified:
from collections import defaultdict
outerdict = defaultdict(dict)
with open(filename) as infh:
    for line in infh:
        name, _, type_, *values = line.split()
        outerdict[name][type_] = [int(i) for i in values]
This uses Python 3 syntax to capture the remaining whitespace-delimited values on the line past the first 3 into values.
The Python 2 version would be:
with open(filename) as infh:
    for line in infh:
        name, _, type_, values = line.split(None, 3)
        outerdict[name][type_] = map(int, values.split())
where I limited the whitespace split to just 3 splits (giving you 4 values), then splitting the values string separately.
To have the inner-most list accumulate all values for repeated (name, type_) key combinations, you'll need to use a slightly more complex defaultdict setup; one that produces an inner defaultdict() set to produce list values:
outerdict = defaultdict(lambda: defaultdict(list))
with open(filename) as infh:
    for line in infh:
        name, _, type_, values = line.split(None, 3)
        outerdict[name][type_].extend(map(int, values.split()))
For the file you actually posted, I'd use a different approach altogether:
import csv
from itertools import islice
outerdict = defaultdict(lambda: defaultdict(list))
with open('ImmgenCons_all_celltypes_MicroarrayExp.csv', 'rb') as infh:
    reader = csv.reader(infh, skipinitialspace=True)
    # first row contains metadata we need
    celltypes = next(reader, [])[3:]
    # next two rows can be skipped
    next(islice(infh, 2, 2), None)
    for row in reader:
        name = row[1]
        for celltype, value in zip(celltypes, row[3:]):
            outerdict[name][celltype].append(float(value))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With