I have a lot of text data and want to translate it to different languages.
Possible ways I know:
The problem is that all these services have limitations on text length, number of calls, etc. which makes them inconvenient in use.
What services / ways you could advice to use in this case?
If you happen to paste a long text that has more than 5000 characters, you'll get an error message ("maximum characters exceeded: X characters over 5000 maximum") and a "translate more" option that lets you translate the rest of the text.
On your computer, open a document in Google Docs. Translate document. Enter a name for the translated document and select a language. Click Translate.
I had to solve the same problem when integrating language translation with an XMPP chat server. I partitioned my payload (the text I needed to translate) into smaller subsets of complete sentences.
I can’t recall the exact number, but with Google's REST-based translation URL, I translated a set of completed sentences that collectively had a total of less than (or equal to) 1024 characters, so a large paragraph would result in multiple translation service calls.
Break your big text into tokenized strings, and then pass each token through the translator via a loop. Store the translated output in an array and once all tokens are translated and stored in the array, put them back together and you will have a completely translated document.
Just to prove a point, I threw this together :) It is rough around the edges, but it will handle a whole lot of text and it does just as good as Google for translation accuracy because it uses the Google API. I processed Apple's entire 2005 SEC 10-K filing with this code and the click of one button (took about 45 minutes).
The result was basically identical to what you would get if you copied and pasted one sentence at a time into Google Translate. It isn't perfect (ending punctuation is not accurate and I didn't write to the text file line by line), but it does show a proof of concept. It could have better punctuation if you worked with Regex some more.
Imports System.IO
Imports System.Text.RegularExpressions
Public Class Form1
    Dim file As New String("Translate Me.txt")
    Dim lineCount As Integer = countLines()
    Private Function countLines()
        If IO.File.Exists(file) Then
            Dim reader As New StreamReader(file)
            Dim lineCount As Integer = Split(reader.ReadToEnd.Trim(), Environment.NewLine).Length
            reader.Close()
            Return lineCount
        Else
            MsgBox(file + " cannot be found anywhere!", 0, "Oops!")
        End If
        Return 1
    End Function
    Private Sub translateText()
        Dim lineLoop As Integer = 0
        Dim currentLine As String
        Dim currentLineSplit() As String
        Dim input1 As New StreamReader(file)
        Dim input2 As New StreamReader(file)
        Dim filePunctuation As Integer = 1
        Dim linePunctuation As Integer = 1
        Dim delimiters(3) As Char
        delimiters(0) = "."
        delimiters(1) = "!"
        delimiters(2) = "?"
        Dim entireFile As String
        entireFile = (input1.ReadToEnd)
        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "." Then filePunctuation += 1
        Next
        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "!" Then filePunctuation += 1
        Next
        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "?" Then filePunctuation += 1
        Next
        Dim sentenceArraySize = filePunctuation + lineCount
        Dim sentenceArrayCount = 0
        Dim sentence(sentenceArraySize) As String
        Dim sentenceLoop As Integer
        While lineLoop < lineCount
            linePunctuation = 1
            currentLine = (input2.ReadLine)
            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "." Then linePunctuation += 1
            Next
            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "!" Then linePunctuation += 1
            Next
            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "?" Then linePunctuation += 1
            Next
            currentLineSplit = currentLine.Split(delimiters)
            sentenceLoop = 0
            While linePunctuation > 0
                Try
                    Dim trans As New Google.API.Translate.TranslateClient("")
                    sentence(sentenceArrayCount) = trans.Translate(currentLineSplit(sentenceLoop), Google.API.Translate.Language.English, Google.API.Translate.Language.German, Google.API.Translate.TranslateFormat.Text)
                    sentenceLoop += 1
                    linePunctuation -= 1
                    sentenceArrayCount += 1
                Catch ex As Exception
                    sentenceLoop += 1
                    linePunctuation -= 1
                End Try
            End While
            lineLoop += 1
        End While
        Dim newFile As New String("Translated Text.txt")
        Dim outputLoopCount As Integer = 0
        Using output As StreamWriter = New StreamWriter(newFile)
            While outputLoopCount < sentenceArraySize
                output.Write(sentence(outputLoopCount) + ". ")
                outputLoopCount += 1
            End While
        End Using
        input1.Close()
        input2.Close()
    End Sub
    Private Sub translateButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles translateButton.Click
        translateText()
    End Sub
End Class
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With