Me and my team are using Unity Catalog in Databricks for ease of data storage & retrieval. So far so good, until I needed to install a library for reading Excel files easily...
I've hit a pretty big roadblock, according to DB you give up the ability to use Third-part libraries in UC. Is there any possible workaround ? Spark does not have a native ability to open .xlsx files. Disabling UC would be a big set back as it makes data retrieval/acesss straightforward for other teams.
I was thinking of running a Notebook on a Non/UC cluster and somehow passing the results to the UC enabled Notebook but I think that's not possible, unless I'm missing something

EDIT: User Defined Functions which are often needed for transformations and it's set of APIs also don't work in UC Shared cluster mode
I'm running into the same thing as we just started a proof of concept for Unity Catalog. What I have found is that limitation only applies to shared clusters as a cluster library. You can still use third party libraries as notebook scoped libraries i.e.
%python
pip install library_of_interest
I have not found a way to install a library for more than one user or for more than one notebook.
Single user clusters seem to still allow you to install cluster libraries though!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With