Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining massive CTE tables (13,000,000 rows+) performance problems

We have a production database that manages personnel booking at 100s of branches for years in advance with minute level accuracy.

Part of this system are reports that highlight gaps, i.e. compare branch opening hours and staff bookings to see if any branches are open with nobody booked.

It also checks for overlaps, double bookings etc all at the same time, basically minute level accuracy is required.

The way we're doing this is to expand the start and end times of openings hours and bookings into minutes with an integer tally table:

--===== Create and populate the Tally table on the fly
 SELECT TOP 16777216
        IDENTITY(INT,1,1) AS N
   INTO dbo.Tally
   FROM Master.dbo.SysColumns sc1,
        Master.dbo.SysColumns sc2,
        Master.dbo.SysColumns sc3

--===== Add a Primary Key to maximize performance
  ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally_N 
        PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

We utilise this static indexed tally table to expand opening hours and bookings as follows:

SELECT   [BranchID] ,
        [DayOfWeek] ,
        DATEADD(MINUTE, N - 1, StartTime)
FROM     OpeningHours
        LEFT OUTER JOIN tally ON tally.N BETWEEN 0
                                         AND     DATEDIFF(MINUTE, OpeningHours.StartTime, OpeningHours.EndTime) + 1

The problem is, once we have the 13,000,000 "open minutes" and the "booked minutes" we then need to join the results to see what's covered:

SELECT   OpenDatesAndMinutes.[Date] ,
                                OpenDatesAndMinutes.[Time] ,
                                OpenDatesAndMinutes.[BranchID] ,
                                ISNULL(BookedMinutes.BookingCount, 0) AS BookingCount
                       FROM     OpenDatesAndMinutes
                                LEFT OUTER JOIN BookedMinutes ON OpenDatesAndMinutes.BranchID = BookedMinutes.BranchID
                                                                 AND OpenDatesAndMinutes.[Date] = BookedMinutes.[Date]
                                                                 AND OpenDatesAndMinutes.[Time] = BookedMinutes.[Time]

As you can imagine, joining on the branch, date & time with 13,000,000 rows all stored in CTE tables takes AGES - running it for a week isnt too bad, about 10 seconds but if we run it for 6 months (13,000,000 minutes) bloats to 25 minutes+

Once we have joined the open minutes to the booked minutes we then group the data on islands and present to the user:

CrossTabPrep ( [Date], [Time], [BranchID], [BookingCount], [Grp] )
  AS ( SELECT   [Date] ,
                [Time] ,
                [BranchID] ,
                [BookingCount] ,
                DATEPART(HOUR, Time) * 60 + DATEPART(MINUTE, Time) - ROW_NUMBER() OVER ( PARTITION BY [BranchID], Date, [BookingCount] ORDER BY Time ) AS [Grp]
       FROM     PreRender
     ),
FinalRender ( [BranchID], [Date], [Start Time], [End Time], [Duration], [EntryCount], [EntryColour] )
  AS ( SELECT   [BranchID] ,
                [Date] ,
                MIN([Time]) AS [Start Time] ,
                MAX([Time]) AS [End Time] ,
                ISNULL(DATEDIFF(MINUTE, MIN([Time]), MAX([Time])), 0) AS Duration ,
                [BookingCount] AS EntryCount ,
                CASE WHEN [BookingCount] = 0 THEN 'Red'
                     WHEN [BookingCount] = 1 THEN 'Green'
                     ELSE 'Yellow'
                END AS EntryColour
       FROM     CrossTabPrep
       GROUP BY [BranchID] ,
                [Date] ,
                [BookingCount] ,
                [Grp]
     )

Quite simply, is my method efficient? is there any way i can improve on this method whilst retaining minute level accuracy? When dealing with massive CTE tables such as this, would there be any benefit in dumping this data to indexed temp tables & joining them instead?

Another thing I was considering is replacing the DATE & TIME(0) data types that the big join uses, would is be more efficient if I cast these to integers?

Here is the Full CTE in case that helps:

WITH    OpeningHours ( [BranchID], [DayOfWeek], [StartTime], [EndTime] )
          AS ( SELECT   BranchID ,
                        DayOfWeek ,
                        CONVERT(TIME(0), AM_open) ,
                        CONVERT(TIME(0), AM_close)
               FROM     db_BranchDetails.dbo.tbl_ShopOpeningTimes (NOLOCK)
                        INNER JOIN @tbl_Days Filter_Days ON db_BranchDetails.dbo.tbl_ShopOpeningTimes.DayOfWeek = Filter_Days.DayNumber
               WHERE    CONVERT(TIME(0), AM_open) <> CONVERT(TIME(0), '00:00:00')
               UNION ALL
               SELECT   BranchID ,
                        DayOfWeek ,
                        CONVERT(TIME(0), PM_open) ,
                        CONVERT(TIME(0), PM_close)
               FROM     db_BranchDetails.dbo.tbl_ShopOpeningTimes (NOLOCK)
                        INNER JOIN @tbl_Days Filter_Days ON db_BranchDetails.dbo.tbl_ShopOpeningTimes.DayOfWeek = Filter_Days.DayNumber
               WHERE    CONVERT(TIME(0), PM_open) <> CONVERT(TIME(0), '00:00:00')
               UNION ALL
               SELECT   BranchID ,
                        DayOfWeek ,
                        CONVERT(TIME(0), EVE_open) ,
                        CONVERT(TIME(0), EVE_close)
               FROM     db_BranchDetails.dbo.tbl_ShopOpeningTimes (NOLOCK)
                        INNER JOIN @tbl_Days Filter_Days ON db_BranchDetails.dbo.tbl_ShopOpeningTimes.DayOfWeek = Filter_Days.DayNumber
               WHERE    CONVERT(TIME(0), EVE_open) <> CONVERT(TIME(0), '00:00:00')
             ),
        DateRange ( [Date], [DayOfWeek] )
          AS ( SELECT   CONVERT(DATE, DATEADD(DAY, N - 1, @StartDate)) ,
                        DATEPART(WEEKDAY, DATEADD(DAY, N - 1, @StartDate))
               FROM     tally (NOLOCK)
               WHERE    N <= DATEDIFF(DAY, @StartDate, @EndDate) + 1
             ),
        OpenMinutes ( [BranchID], [DayOfWeek], [Time] )
          AS ( SELECT   [BranchID] ,
                        [DayOfWeek] ,
                        DATEADD(MINUTE, N - 1, StartTime)
               FROM     OpeningHours
                        LEFT OUTER JOIN tally ON tally.N BETWEEN 0
                                                         AND     DATEDIFF(MINUTE, OpeningHours.StartTime, OpeningHours.EndTime) + 1
             ),
        OpenDatesAndMinutes ( [Date], [Time], [BranchID] )
          AS ( SELECT   DateRange.[Date] ,
                        OpenMinutes.[Time] ,
                        OpenMinutes.BranchID
               FROM     DateRange
                        LEFT OUTER JOIN OpenMinutes ON DateRange.DayOfWeek = OpenMinutes.DayOfWeek
               WHERE    OpenMinutes.BranchID IS NOT NULL
             ),
        WhiteListEmployees ( [DET_NUMBERA] )
          AS ( SELECT   DET_NUMBERA
               FROM     [dbo].[tbl_ChrisCache_WhiteList]
               WHERE    [TimeSheetV2_SecurityContext] = @TimeSheetV2_SecurityContext
             ),
        BookedMinutesByRole ( [Date], [Time], [BranchID], BookingCount )
          AS ( SELECT   [BookingDate] ,
                        DATEADD(MINUTE, N - 1, StartTime) ,
                        BranchID ,
                        COUNT(BookingID) AS Bookings
               FROM     tbl_Booking (NOLOCK)
                        INNER JOIN tbl_BookingReason  (NOLOCK) ON dbo.tbl_BookingReason.ReasonID = dbo.tbl_Booking.ReasonID
                        INNER JOIN tbl_ChrisCache  (NOLOCK) ON dbo.tbl_Booking.DET_NUMBERA = dbo.tbl_ChrisCache.DET_NUMBERA
                        INNER JOIN @ValidPosCodes AS Filter_PostCodes ON dbo.tbl_ChrisCache.POS_NUMBERA = Filter_PostCodes.POSCODE
                        LEFT OUTER JOIN tally (NOLOCK) ON tally.N BETWEEN 0
                                                                  AND     DATEDIFF(MINUTE, tbl_Booking.StartTime, tbl_Booking.EndTime) + 1
               WHERE    ( Void = 0 )
                        AND tbl_BookingReason.CoverRequired = 0 --#### Only use bookings that dont require cover
                        AND tbl_booking.BranchID <> '023'   --#### Branch 23 will always have messy data
                        AND ( dbo.tbl_Booking.BookingDate BETWEEN @StartDate
                                                          AND     @EndDate )
               GROUP BY [BookingDate] ,
                        BranchID ,
                        DATEADD(MINUTE, N - 1, StartTime)
             ),
        BookedMinutesByWhiteList ( [Date], [Time], [BranchID], BookingCount )
          AS ( SELECT   [BookingDate] ,
                        DATEADD(MINUTE, N - 1, StartTime) ,
                        BranchID ,
                        COUNT(BookingID) AS Bookings
               FROM     tbl_Booking(NOLOCK)
                        INNER JOIN tbl_BookingReason (NOLOCK) ON dbo.tbl_BookingReason.ReasonID = dbo.tbl_Booking.ReasonID
                        INNER JOIN tbl_ChrisCache (NOLOCK) ON dbo.tbl_Booking.DET_NUMBERA = dbo.tbl_ChrisCache.DET_NUMBERA
                        INNER JOIN WhiteListEmployees Filter_WhiteList ON dbo.tbl_Booking.DET_NUMBERA = Filter_WhiteList.DET_NUMBERA
                        LEFT OUTER JOIN tally (NOLOCK) ON tally.N BETWEEN 0
                                                                  AND     DATEDIFF(MINUTE, tbl_Booking.StartTime, tbl_Booking.EndTime) + 1
               WHERE    ( Void = 0 )
                        AND tbl_BookingReason.CoverRequired = 0 --#### Only use bookings that dont require cover
                        AND tbl_booking.BranchID <> '023'   --#### Branch 23 will always have messy data
                        AND ( dbo.tbl_Booking.BookingDate BETWEEN @StartDate
                                                          AND     @EndDate )
               GROUP BY [BookingDate] ,
                        BranchID ,
                        DATEADD(MINUTE, N - 1, StartTime)
             ),
        BookedMinutes ( [Date], [Time], [BranchID], BookingCount )
          AS ( SELECT   [Date] ,
                        [Time] ,
                        [BranchID] ,
                        BookingCount
               FROM     BookedMinutesByRole
               UNION
               SELECT   [Date] ,
                        [Time] ,
                        [BranchID] ,
                        BookingCount
               FROM     BookedMinutesByWhiteList
             ),
        PreRender ( [Date], [Time], [BranchID], [BookingCount] )
          AS ( SELECT   OpenDatesAndMinutes.[Date] ,
                        OpenDatesAndMinutes.[Time] ,
                        OpenDatesAndMinutes.[BranchID] ,
                        ISNULL(BookedMinutes.BookingCount, 0) AS BookingCount
               FROM     OpenDatesAndMinutes
                        LEFT OUTER JOIN BookedMinutes ON OpenDatesAndMinutes.BranchID = BookedMinutes.BranchID
                                                         AND OpenDatesAndMinutes.[Date] = BookedMinutes.[Date]
                                                         AND OpenDatesAndMinutes.[Time] = BookedMinutes.[Time]
             ),
        CrossTabPrep ( [Date], [Time], [BranchID], [BookingCount], [Grp] )
          AS ( SELECT   [Date] ,
                        [Time] ,
                        [BranchID] ,
                        [BookingCount] ,
                        DATEPART(HOUR, Time) * 60 + DATEPART(MINUTE, Time) - ROW_NUMBER() OVER ( PARTITION BY [BranchID], Date, [BookingCount] ORDER BY Time ) AS [Grp]
               FROM     PreRender
             ),
        DeletedBranches ( [BranchID] )
          AS ( SELECT   [ShopNo]
               FROM     [dbo].[vw_BranchList]
               WHERE    [Branch_Deleted] = 1
             ),
        FinalRender ( [BranchID], [Date], [Start Time], [End Time], [Duration], [EntryCount], [EntryColour] )
          AS ( SELECT   [BranchID] ,
                        [Date] ,
                        MIN([Time]) AS [Start Time] ,
                        MAX([Time]) AS [End Time] ,
                        ISNULL(DATEDIFF(MINUTE, MIN([Time]), MAX([Time])), 0) AS Duration ,
                        --dbo.format_timeV2(ISNULL(DATEDIFF(SECOND, MIN([Time]), MAX([Time])), 0)) AS DurationF ,
                        [BookingCount] AS EntryCount ,
                        CASE WHEN [BookingCount] = 0 THEN 'Red'
                             WHEN [BookingCount] = 1 THEN 'Green'
                             ELSE 'Yellow'
                        END AS EntryColour
               FROM     CrossTabPrep
               GROUP BY [BranchID] ,
                        [Date] ,
                        [BookingCount] ,
                        [Grp]
             )
            SELECT  [BranchID] ,
                    CONVERT(VARCHAR(10), DATEADD(DAY, 7, CONVERT(DATETIME, CONVERT(VARCHAR(10), DATEADD(day, -1 - ( DATEPART(dw, [Date]) + @@DATEFIRST - 2 ) % 7, [Date]), 103) + ' 23:59:59', 103)), 103) AS WeekEnding ,
                    [Date] ,
                    [Start Time] ,
                    [End Time] ,
                    [Duration] ,
                    CONVERT(VARCHAR, ( [Duration] * 60 ) / 3600) + 'h ' + CONVERT(VARCHAR, ROUND(( ( CONVERT(FLOAT, ( ( [Duration] * 60 ) % 3600 )) ) / 3600 ) * 60, 0)) + 'm' AS [DurationF] ,
                    [EntryCount] ,
                    [EntryColour] ,
                    CASE WHEN [EntryCount] = 0 THEN 'Red'
                         WHEN [EntryCount] >= 1 THEN 'Green'
                    END AS DurationColour ,
                    CASE WHEN [EntryCount] = 0 THEN 'This period of open-time isnt covered'
                         WHEN [EntryCount] >= 1 THEN 'This period of open-time is covered by ' + CONVERT(VARCHAR, [EntryCount]) + ' booking(s)'
                    END AS [DurationComment]
            FROM    FinalRender
            WHERE   FinalRender.BranchID NOT IN ( SELECT    [BranchID]
                                                  FROM      DeletedBranches )
like image 228
HeavenCore Avatar asked Mar 11 '26 04:03

HeavenCore


1 Answers

It's funny, because you have answered your own question with your questions at the end. You should just try them all but to summarize:

  1. Materialize CTEs for better performance. You never know when SQL Server will evaluate a CTE more than once
  2. You can build indexex against temporary tables.
  3. I'm not sure how you jumped from [DayOfWeek],DATEADD(MINUTE, N - 1, StartTime) to the join on [Date],[Time] on the other, but having two columns here doesn't make sense. Use either a single datetime or a bigint representing the seconds from an epoch. UnixTimestamp works well here.
like image 137
RichardTheKiwi Avatar answered Mar 12 '26 22:03

RichardTheKiwi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!