Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 - how to make mosaic plots from higher-dimensional data?

I have a pandas-DataFrame:

data = pd.read_csv(r'C:\data-path\demographics.csv', sep=',') print(data)

PersonID Married No. of Children Sex 1 yes 0 male 2 no 0 female 3 no 1 male 4 yes 1 male 5 no 1 female 6 no 2 female 7 no 1 male 8 no 2 male 9 no 2 male 10 no 1 male 11 no 0 female

Now I try to create a mosaicplot out of it, using statsmodels.graphics.mosaicplot

mosaic(data, ['Married', 'No. of Children'])

...which works, however, whenever I try to add a third dimension, for example:

mosaic(data, ['Married', 'No. of Children', 'Sex'])

... I get the following error-message:

ValueError: at least one proportion should begreater than zero

I am not sure, what it wants from me. Is there some parameter missing/wrongly set?

It also doesn't matter which columns/dimensions I choose, or in what order. Whenever I have more than 2, I get an error.

Anybody have an idea?

Thanks in advance

like image 896
Carlo1990 Avatar asked Nov 29 '25 14:11

Carlo1990


1 Answers

After some tinkering I didn't find the solution but the origin of the bug.

It lies within the code of the mosaicplot-class/-module: http://nipy.bic.berkeley.edu/nightly/statsmodels/doc/html/_modules/statsmodels/graphics/mosaicplot.html

In short: It is unable to handle classes in a dataset, which are empty, i.e. have 0 instances.

Consider the dataset from the original question and then the following function call:

mosaic(data, ['Married', 'No. of Children'])

What the 'mosaic'-method will do, is to determine how many classes the first category has (in this case: 2), and then how often each of the classes occurs. It will then generate a list of 'Proportions' for the plot's rectangles based on that, which in case of the 'Married'-category will be

[2, 9]

...as there are 2 'yes' and 9 'no' instances.

For each of the classes, there will be new splits, according to the second category, here: 'No. of Children'. There are 3 classes (0, 1, and 2) and this will generate the folloing 'proportions':

[1, 1, 0] (1 married with 0 children, 1 married with 1 child, 0 married with 2 children)

[2, 4, 3] ( 1 single with 0 children, etc....)

Based on the above alone, it is able to draw a perfectly fine mosaic plot.

However, once we take a third category into account (for instance: 'Sex'), the 0 in one of the lists above becomes a problem. It will spawn the list [0, 0] as they are 0 married men/women with 2 children.

And in line 45 of the source code, there is an if-clause that will raise the exception at all-0-lists (as they are 'not meaningful').

As said above, I was not able to find a fix/workaround to do this. Simply out-commenting said if-clause will allow all the splits to be performed normally, however, this will also cause the drawing of the mosaic-plot to throw an exception in matplotlib's backend_agg.py, as somehow they are now values which are NaN (not a number).

Why this is, I have no idea, and I would be glad if someone brighter and more experienced than me would look into this.

I still won't rule out that I have just have to set some parameters differently.

like image 133
Carlo1990 Avatar answered Dec 01 '25 07:12

Carlo1990



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!