Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct way to link sheet names and XML files

Tags:

excel

openxml

I've inherited a spreadsheet module that reads *.xlsx files with XML functions. The app maps data using sheet names and that part of the module is pretty broken.

I don't have a month to find and read the specs of the Office Open XML format so I've composed a quick hack after a quick look to some samples files:

  1. Open xl/workbook.xml and loop /workbook/sheets:

    • Grab names from name.
    • Grab IDs from r:id.
    • Use file order as display order.
  2. Open xl/_rels/workbook.xml.rels and loop /Relationships filtering by Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet:

    • Map to name through Id.
    • Map to file through Target.

Is this algorithm correct?

like image 514
Álvaro González Avatar asked Oct 21 '25 11:10

Álvaro González


1 Answers

This is an older post and OP has likely moved on, but in case any one else is interested heres what I found.

I came across this problem as well while writing a simple parser to do some XLSX massaging and couldn't find the documentation to indicate the exact way the XML file names map to the sheets defined in workbook.xml.

Here's what I discovered:

There seems to be three ways to determine this; I've illustrated 2 below and OP's would be the third.

First, given from what I've seen OP's solution should indeed work and is probably the correct way to do this as the relationship between reference IDs and files are maintained in the .rels files. I took a different approach and, in hindsight, I should have used OPs solution but I didn't known exactly how the relationships worked at the time.


Given this excerpt from a xl/workbook.xml file which is a fragment of an XLSX file generated from Excel where I:

  1. Added 3 sheets (with some content so I could tell them apart in the XML files) then saved.
  2. Deleted the second sheet then saved again.
  3. Moved the new second sheet (was the third originally) in front of the first and resaved for a third time.
<sheets>
  <sheet name="Third Sheet" sheetId="3" r:id="rId1"/>
  <sheet name="First Sheet" sheetId="1" r:id="rId2"/>
</sheets>

In the extracted XML files the XLSX sheet mapping are listed below based on the content I originally added to the sheets:

xl/worksheets/sheet1.xml ---> sheetId="3" Third Sheet r:id="rid1"
xl/worksheets/sheet2.xml ---> sheetId="1" First Sheet r:id="rid2"

So from that it looks to be that the sheet names can map as such:

  1. The positional index + 1 of the sheet element under the sheets element in the xl/workbook.xml file matches the trailing ID on the sheet filename.

  2. The trailing number of the id attribute (rid1 and rid3) matches the trailing ID on the sheet filename.

PSA:

Don't be like me and initially assume the sheetID attribute maps to the sheet filename, that is incorrect. Also, use OP's solution as it is more correct and relies on hard references rather than mine which infers the references.

like image 196
Bradley Avatar answered Oct 23 '25 14:10

Bradley