We are generating xlsx files using a perl script. Files usually contains thousands of records. This makes spotting errors a very difficult operation.
This process was working since years without problems.
This week we got a request to check a file which contains errors. While opening Excel prompted that the file contains errors and asked whether we want to repair them.
In fact we do not want to recover the data but want to know which part of the file is corrupt. The error should be coming from corrupt data and we are interested to identify these data.
the log message shows the following:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error068200_01.xml</logFileName> </br>
<summary>Errors were detected in file 'D:\Temp\20161020\file_name.xlsx'</summary>
<repairedRecords summary="Following is a list of repairs:"><repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>
The error should come from corrupt data. Is there any tool/method which helps to spot this corrupt data?
I tried renaming it a zip file, extracting it and opening it via an XML editor but was not able to find any errors in XML file.
We also checked that the different XML file structures are fine.
Thank you and best regards
As expected, the problem was coming from text cells containing numbers having an E in the middle.
I used the following steps to identify the erronous cells.
1. Wrote small Java class to read the file. The class was checking the cell type and displaying the value afterwards.
The java program generated an Exception at some line "Cannot get a numeric value from a text cell" even If I was correctly checking the cell type before displaying the content.
2. I checked the opened Excel file at that line and found that the cell contains only 'inf'.
3. I opened the file using open office and looked at the same cells. They contain 0.
4. I debugged the program generating the data and found out that these cells contain data like '914E5514'. Seems that E which was interpreted by Excel as an exponent.
We changed the program to use the format '@' for that cell and this solved the issue.
Thank you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With