Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use Rails for consistency? (for ETL project)

CONTEXT

  • I'm new to Ruby and all that jazz, but I'm not new to dev.
  • I'm taking over a project based on 2 rails/puma repositories for web & APIs.
  • I'm building a new repository for a backend data processing app, using Kiba, that will run through scheduled jobs.
  • Also, I'm to be joined by other devs later on, so I'd like to make something maintainable by design.

MY QUESTION : Should I use Rails on that ETL project?

Using it means we can apply the same folder structure as the other repos, use RSpec all the same etc. It also appeared to me that Rails changes the way classes like Hash act.

At the same time, it seems to bring unnecessary complexity to a project that will run on CLI and could consist of only a dozen of files.

like image 832
Tristan M Avatar asked Oct 17 '25 13:10

Tristan M


2 Answers

Kiba author here! This is an important question, thanks for asking it!

MY QUESTION : Should I use Rails on that ETL project?

By default, I would recommend to start with a separate project (like a kind of "macro-service" approach), unless you have important things (more than just RSpec & ENV setup) to reuse from the Rails app.

If there is an important expected coupling between the app and the ETL (e.g. by "scheduled jobs" you mean jobs triggered through Sidekiq, to react to events, or you have classes shared between the 2 projects), then you can place the ETL in a etl subfolder of your Rails app, for instance, to provide a bit of separation and leave the opportunity to split the code out later if it becomes a better path (this is a middle ground I'm using on some projects).

If it is not the case, though, and the data pipeline is expected to become large and live its own life, you can instead split it to its own project.

Using it means we can apply the same folder structure as the other repos, use RSpec all the same etc.

You can use RSpec or minitest from a dedicated ETL (pure Ruby) project too, introduce a notion of ETL_ENV (development, test, production), build your own ENV-based (or file based) configuration with dotenv or similar, and support cron jobs from there too if you need that.

Pure Ruby projects can be structured just like a Rails app, and there is usually less magic (more explicit), which is helpful.

It also appeared to me that Rails changes the way classes like Hash act.

I would actually recommend to use an "explicit" approach about depending about that. Today I prefer to "cherry-pick" the exact extensions I need, at the top of each file (as described here).

One last word, you can test out Kiba ETL pipelines just as much as your individual ETL components, and I would recommend to do so (I will cover that in a future blog post), since it helps moving things around and upgrading Ruby with ease, and generally scale the team of developers easily (CI + tests).

I hope this provides enough guidance for you to take a decision on this, if this is not the case, please comment out!

like image 62
Thibaut Barrère Avatar answered Oct 19 '25 10:10

Thibaut Barrère


From my point of view using Rails for ETL projects is an overhead. Take a look at dry-rb. Using https://dry-rb.org/gems/dry-system/ you can build a small application to process data. Also, there is a gem to build CLI https://dry-rb.org/gems/dry-cli/

Here is a list of all dry gems https://dry-rb.org/gems/

like image 35
Yakov Avatar answered Oct 19 '25 09:10

Yakov