Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Start reading massive text file from the end

Tags:

vb.net

I would ask if you could give me some alternatives in my problems.

basically I'm reading a .txt log file averaging to 8 million lines. Around 600megs of pure raw txt file.

I'm currently using streamreader to do 2 passes on those 8 million lines doing sorting and filtering important parts in the log file, but to do so, My computer is taking ~50sec to do 1 complete run.

One way that I can optimize this is to make the first pass to start reading at the end because the most important data is located approximately at the final 200k line(s) . Unfortunately, I searched and streamreader can't do this. Any ideas to do this?

Some general restriction

  • # of lines varies
  • size of file varies
  • location of important data varies but approx at the final 200k line

Here's the loop code for the first pass of the log file just to give you an idea

Do Until sr.EndOfStream = True                                                                              'Read whole File
            Dim streambuff As String = sr.ReadLine                                                      'Array to Store CombatLogNames
            Dim CombatLogNames() As String
            Dim searcher As String

    If streambuff.Contains("CombatLogNames flags:0x1") Then                                             'Keyword to Filter CombatLogNames Packets in the .txt

        Dim check As String = streambuff                                                                'Duplicate of the Line being read
        Dim index1 As Char = check.Substring(check.IndexOf("(") + 1)                                    '
        Dim index2 As Char = check.Substring(check.IndexOf("(") + 2)                                    'Used to bypass the first CombatLogNames packet that contain only 1 entry


        If (check.IndexOf("(") <> -1 And index1 <> "" And index2 <> " ") Then                           'Stricter Filters for CombatLogNames

            Dim endCLN As Integer = 0                                                                   'Signifies the end of CombatLogNames Packet
            Dim x As Integer = 0                                                                        'Counter for array

            While (endCLN = 0 And streambuff <> "---- CNETMsg_Tick")                                    'Loops until the end keyword for CombatLogNames is seen

                streambuff = sr.ReadLine                                                                'Reads a new line to flush out "CombatLogNames flags:0x1" which is unneeded
                If ((streambuff.Contains("---- CNETMsg_Tick") = True) Or (streambuff.Contains("ResponseKeys flags:0x0 ") = True)) Then

                    endCLN = 1                                                                          'Value change to determine end of CombatLogName packet

                Else

                    ReDim Preserve CombatLogNames(x)                                                    'Resizes the array while preserving the values
                    searcher = streambuff.Trim.Remove(streambuff.IndexOf("(") - 5).Remove(0, _
                    streambuff.Trim.Remove(streambuff.IndexOf("(")).IndexOf("'"))                       'Additional filtering to get only valuable data
                    CombatLogNames(x) = search(searcher)
                    x += 1                                                                              '+1 to Array counter

                End If
            End While
        Else
            'MsgBox("Something went wrong, Flame the coder of this program!!")                          'Bug Testing code that is disabled
        End If
    Else
    End If

    If (sr.EndOfStream = True) Then

        ReDim GlobalArr(CombatLogNames.Length - 1)                                                      'Resizing the Global array to prime it for copying data
        Array.Copy(CombatLogNames, GlobalArr, CombatLogNames.Length)                                    'Just copying the array to make it global

    End If
Loop
like image 978
MDuh Avatar asked Feb 01 '26 09:02

MDuh


1 Answers

You CAN set the BaseStream to the desired reading position, you just cant set it to a specfic LINE (because counting lines requires to read the complete file)

    Using sw As New StreamWriter("foo.txt", False, System.Text.Encoding.ASCII)
        For i = 1 To 100
            sw.WriteLine("the quick brown fox jumps ovr the lazy dog")
        Next

    End Using
    Using sr As New StreamReader("foo.txt", System.Text.Encoding.ASCII)
        sr.BaseStream.Seek(-100, SeekOrigin.End)
        Dim garbage = sr.ReadLine ' can not use, because very likely not a COMPLETE line
        While Not sr.EndOfStream
            Dim line = sr.ReadLine
            Console.WriteLine(line)
        End While
    End Using

For any later read attempt on the same file, you could simply save the final position (of the basestream) and on the next read to advance to that position before you start reading lines.

like image 164
igrimpe Avatar answered Feb 03 '26 09:02

igrimpe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!