Multi-user efficient time-series storing for Django web app

Question

I'm developing a Django app. Use-case scenario is this:

50 users, each one can store up to 300 time series and each time serie has around 7000 rows.

Each user can ask at any time to retrieve all of their 300 time series and ask, for each of them, to perform some advanced data analysis on the last N rows. The data analysis cannot be done in SQL but in Pandas, where it doesn't take much time... but retrieving 300,000 rows in separate dataframes does!

Users can also ask results of some analysis that can be performed in SQL (like aggregation+sum by date) and that is considerably faster (to the point where I wouldn't be writing this post if that was all of it).

Browsing and thinking around, I've figured storing time series in SQL is not a good solution (read here).

Ideal deploy architecture looks like this (each bucket is a separate server!):

enter image description here

Problem: time series in SQL are too slow to retrieve in a multi-user app.

Researched solutions (from this article):

PyStore: https://github.com/ranaroussi/pystore
Arctic: https://github.com/manahl/arctic

Here are some problems:

1) Although these solutions are massively faster for pulling millions of rows time series into a single dataframe, I might need to pull around 500.000 rows into 300 different dataframes. Would that still be as fast?

This is the current db structure I'm using:

class TimeSerie(models.Model):
    ...

class TimeSerieRow(models.Model):
    date = models.DateField()
    timeserie = models.ForeignKey(timeserie)
    number = ...
    another_number = ...

And this is the bottleneck in my application:

for t in TimeSerie.objects.filter(user=user):
    q = TimeSerieRow.objects.filter(timeserie=t).orderby("date")
    q = q.filter( ... time filters ...)
    df = pd.DataFrame(q.values())
    # ... analysis on df

2) Even if PyStore or Arctic can do that faster, that'd mean I'd loose the ability to decouple my db from the Django instances, effictively using resources of one machine way better, but being stuck to use only one and not being scalable (or use as many separate databases as machines). Can PyStore/Arctic avoid this and provide an adapter for remote storage?

Is there a Python/Linux solution that can solve this problem? Which architecture I can use to overcome it? Should I drop the scalability of my app and/or accept that every N new users I'll have to spawn a separate database?

dirkgroten · Accepted Answer

The article you refer to in your post is probably the best answer to your question. Clearly good research and a few good solutions being proposed (don't forget to take a look at InfluxDB).

Regarding the decoupling of the storage solution from your instances, I don't see the problem:

Arctic uses mongoDB as a backing store
pyStore uses a file system as a backing store
InfluxDB is a database server on its own

So as long as you decouple the backing store from your instances and make them shared among instances, you'll have the same setup as for your posgreSQL database: mongoDB or InfluxDB can run on a separate centralised instance; the file storage for pyStore can be shared, e.g. using a shared mounted volume. The python libraries that access these stores of course run on your django instances, like psycopg2 does.

Multi-user efficient time-series storing for Django web app

Tags:

python

pandas

django

arctic

Saturnix

1 Answers

dirkgroten

Recent Activity

Donate For Us

Multi-user efficient time-series storing for Django web app

Tags:

python

pandas

django

arctic

Saturnix

1 Answers

dirkgroten

Related questions

Recent Activity

Donate For Us