I have an issue with trying to create a large PDF file. Basically I have a list of byte arrays, each containing a PDF in a form of a byte array. I wanted to merge the byte arrays into a single PDF. This works great for smaller files (under 2000 pages), but when I tried creating a 12,00 page file it bombed). Originally I was using MemoryStream but after some research, a common solution was to use a FileStream instead. So I tried a file stream approach, however get similar results. The List contains 3,800 records, each containing 4 pages. MemoryStream bombs after around 570. FileStream after about 680 records. The current file size after the code crashed was 60MB. What am I doing wrong? Here is the code I have, and the code crashes on "copy.AddPage(curPg);" directive, inside the "for(" loop.
private byte[] MergePDFs(List<byte[]> PDFs)
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();
byte[] completePDF;
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/" + uniqueId.ToString() + ".pdf");
//using (MemoryStream ms = new MemoryStream())
using(FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(PDF);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
iTextSharp.text.pdf.PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
}
// Close the reader
reader.Close();
}
// Close the document
doc.Close();
// Close the copier
copy.Close();
// Convert the memorystream to a byte array
//completePDF = ms.ToArray();
}
//return completePDF;
return GetPDFsByteArray(tempFileName);
}
A couple of notes:
PdfCopy
implements iDisposable
, so you should try and see if a using
helps.PdfCopy.FreeReader()
will help.Anyway, not sure if you're using MVC or WebForms, but here's a simple working HTTP handler tested with a 15 page 125KB test file that runs on my workstation:
<%@ WebHandler Language="C#" Class="MergeFiles" %>
using System;
using System.Collections.Generic;
using System.Web;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
public class MergeFiles : IHttpHandler
{
public void ProcessRequest(HttpContext context)
{
List<byte[]> pdfs = new List<byte[]>();
var pdf = File.ReadAllBytes(context.Server.MapPath("~/app_data/test.pdf"));
for (int i = 0; i < 4000; ++i) pdfs.Add(pdf);
var Response = context.Response;
Response.ContentType = "application/pdf";
Response.AddHeader(
"content-disposition",
"attachment; filename=MergeLotsOfPdfs.pdf"
);
Response.BinaryWrite(MergeLotsOfPdfs(pdfs));
}
byte[] MergeLotsOfPdfs(List<byte[]> pdfs)
{
using (var ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdfs.Count; ++i)
{
using (PdfReader reader = new PdfReader(
new RandomAccessFileOrArray(pdfs[i]), null))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}
return ms.ToArray();
}
}
public bool IsReusable { get { return false; } }
}
Tried to make the output file similar to what you described in the question, but YMMV, depending on how large the individual PDFs you're dealing with are in size. Here's the test output from my run:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With