I have a text file with a 1122 x 1122 matrix of precipitation measurements. Each measurement is represented with 4 decimal digits. Example lines look like this:
0.0234 0.0023 0.0123 0.3223 0.1234 0.0032 0.1236 0.0000 ....
(and this 1122 values long and 1122 lines down.
I need this same text file, but with all values divided by 6. (and I have to do this for 920 files like that....)
I managed to do this, but in a no doubt atrociously ineffective and memory exhaustive way:
I am sure there is a much faster and professional way to do this. I have looked at endless sites about Matrix.Divide but don't see (or understand) a solution there for this problem. Any help will be appreciated! This is a code snippet as used for each file:
    foreach (string inputline in inputfile)
    {
        int count = 0;
        string[] str_precip = inputline.Split(' ');  // holds string measurements
        string[] str_divided_precip = new string[str_precip.Length]; // will hold string measurements divided by divider (6)
        foreach (string measurements in str_precip)
        {
            str_divided_precip[count] = ((Convert.ToDouble(measurements)) / 6).ToString("F4", CultureInfo.CreateSpecificCulture("en-US"));
            count++;
        }
        string divline = string.Join(" ", str_divided_precip);
        using (System.IO.StreamWriter newfile = new System.IO.StreamWriter(@"asc_files\divfile.txt", true))
        {
            newfile.WriteLine(divline);
        }
    } 
C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...
In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.
C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.
What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.
Assuming the files are well-formed, you should essentially be able to process them a character at a time without needing to create any arrays or do any complicated string parsing.
This snippet shows the general approach:
string s = "12.4567 0.1234\n"; // just an example
decimal d = 0;
foreach (char c in s)
{
    if (char.IsDigit(c))
    {
        d *= 10;
        d += c - '0';
    }
    else if (c == ' ' || c == '\n')
    {
        d /= 60000; // divide by 10000 to get 4dps; divide by 6 here too
        Console.Write(d.ToString("F4"));
        Console.Write(c);
        d = 0;
    }
    else {
        // no special processing needed as long as input file always has 4dp
        Debug.Assert(c == '.');
    }
}
Clearly you would be writing to a (buffered) file stream instead of the console.
You could probably roll your own faster version of ToString("F4") but I doubt it would make a significant difference to the timings. But if you can avoid creating a new array for each line of the input file by using this approach, I'd expect it to make a substantial difference. (In contrast, one array per file as a buffered writer is worthwhile, especially if it is declared big enough from the start.)
Edit (by Sani Singh Huttunen)
Sorry for editing your post but you are absolutely correct about this.
Fixed point arithmetics will provide a significant improvement in this case.
After introducing StreamReader (~10% improvement), float (another ~35% improvement) and other improvements (yet another ~20% improvement) (see comments) this approach takes ~12 minutes (system specs in my answer):
public void DivideMatrixByScalarFixedPoint(string inputFilname, string outputFilename)
{
    using (var inFile = new StreamReader(inputFilname))
    using (var outFile = new StreamWriter(outputFilename))
    {
        var d = 0;
        while (!inFile.EndOfStream)
        {
            var c = (char) inFile.Read();
            if (c >= '0' && c <= '9')
            {
                d = (d * 10) + (c - '0');
            }
            else if (c == ' ' || c == '\n')
            {
                // divide by 10000 to get 4dps; divide by 6 here too
                outFile.Write((d / 60000f).ToString("F4", CultureInfo.InvariantCulture.NumberFormat));
                outFile.Write(c);
                d = 0;
            }
        }
    }
}
You open/close the output for every value, I think we can do better! Just replace it with this code:
using (System.IO.StreamWriter newfile = new System.IO.StreamWriter(@"asc_files\divfile.txt", true))
{
    foreach (string inputline in inputfile)
    {
        int count = 0;
        foreach (string measurements in inputline.Split(' '))
        {
            newfile.Write((Convert.ToDouble(measurements) / 6).ToString("F4", CultureInfo.CreateSpecificCulture("en-US")));
            if (++count < 1122)
            {
                newfile.Write(" ");
            }
        }
        newfile.WriteLine();
    }
} 
For the reading part, you may want to read one line at a time with ReadLine() instead of reading the whole file in a huge block and then splitting it in-memory. This streaming approach will greatly reduce memory allocation and based on hardware (how much memory you have, how fast your disks (HDD? SSD?) are) may enhance performance in a sensible way!
Let me please know how it works now, I'm very curious!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With