Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Data Factory Data-Set Slicing

I have some trouble understanding slicing (Dataset Availability) in Azure Data Factory. Let's say I have a source dataset which never changes. Then I for some reason set up hourly slicing for my source data set. Will each slice then be identical? What is the point of using slices at all in such case (i.e. why is it Required)? Or another case, let's say my source dataset is appended with new data continuously (for example an event log). And each morning I want to do some analysis on all history of that log. Should I then set up daily slicing? Will each slice include the full history or just the last day?

like image 680
Lars Avatar asked Dec 06 '25 05:12

Lars


1 Answers

The slices are the intervals in which the pipeline is executed within the period defined in the start and end properties of the pipeline. If you have a fix source and you execute an activity more than once, it will always use the same source (because it does not change). Lets say you set the start time and end time to be a day, and set the frequency to be 1 hour - the activity will be executed 24 times. You will have 24 slices, all using the same data source.

For your second scenario, if the data keeps changing, you can set the frequency to once a day. What will be processed depends on the activity you define in the pipeline - lets say that the pipeline deletes the old source once it finish processing, or there's logic in the activity the takes only the new data.

like image 52
Nava Vaisman Levy Avatar answered Dec 08 '25 19:12

Nava Vaisman Levy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!