I have read several articles on machine learning.
An example is this https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1 and was wondering if it's possible to modify the code to do the use case below.
Let's say the machine is given an excel file (assuming .xlsx) that has only 1 sheet.
Inside the sheet there will be a table (defined as having a header row and at least 2 data rows and will be agreed as a table by common sense humans who read the file. Not a named table or object as defined by MS Excel)
The following is unknown before hand:
This table in the excel file is also NOT named or identified beforehand in any way. There are no objects in this excel file. No named tables.
If I were to attempt to use VBA to list objects, I will get this error message

There will be at least 1 other cell outside of this table that will contain some text or value.
How or what algorithm using what software library can I train a machine to identify where does the table start and end?
E.g.
The table first cell (identified as the top left most cell in the header row) is A4 and the table last cell (identified as the bottom right most cell) is G12
The machine is able to read the excel file and say that the table is at A4:G12 even if there are other cells outside the table that has data.
My various google search involving "excel and machine learning" or "identify tables in excel" tends to give me articles on how to feed data to machine learning software using excel files or how to find/name data tables that are named in the Excel. I want to emphasize the table data is NOT named.
Also come across answers talking about extracting such table data from PDFs. But my focus is on Excel.
I have read that machine learning can be used to read images. So I imagine trying to "read" an excel file which is a highly structured file just to find where a table is should be possible.
Here are some sample screenshots


Here are the links to the Excel files
https://www.dropbox.com/s/l3vjjsgunp0zu23/A4toG12.xlsx?dl=0
https://www.dropbox.com/s/nwzw0211ruhwvf0/G7toN19.xlsx?dl=0
Please note that these are dummy files to illustrate my point about identifying the table data.
The comment by JanLauGe about ctrl + a is interesting See Using machine learning to parse excel file and extract table data with no named tables involved? If so, how do I get started?
I have googled for the equivalent in VBA but to no avail.
I wrote a script as suggested by Scott Craner to find all the cells with values I get this.
Thanks to his comments, I got it to work.
Public Sub LookForCells()
For Each block In ActiveSheet.Columns("A:Z").SpecialCells(xlCellTypeConstants, 23).Areas
MsgBox block.Count
Next block
End Sub
And you should see the following

I realize this is an old question, but found it while considering this topic and going through the exercise of extracting tabular data from Excel files. We also concluded that ML wasn't the right answer, at least for now, but that a rules-based approach was better.
The rules that we landed on were to identify table "corners" that had either a blank cell or a sheet boundary above and to the left, that were part of at least a 2x2 matrix of densely or sparsely populated data.
The second thing that helped was to extract greedily, and rely on filtering to narrow down the data we were interested in.
If you are still interested in this topic, I released a Python tool on on GitHub and PyPI (called "eparse") that crawls a filesystem for Excel spreadsheets, parses tabular data from them, and injects the parsed tabular data into a database. I also wrote an article about it.
I think for your particular use case you will be much better off with more conventional approaches, e.g.:
However, to answer your question: To apply the most conventional flavours of machine learning (supervised learning) to your problem you will need labelled training data. This would consist of a number of excel spreadsheets for which the table is known and saved, ideally as the index of cells that are part of the table.
An alternative would be to use a pre-trained model if you can find one, or more sophisticated methods such as reinforcement learning (but not sure how this would work in this case)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With