I got a 261MB text file (xdebug output) and when I read it in it occupies an additional 2GB of space dynamic space.
(defun stream->string (tmp-stream)
(do ((line (read-line tmp-stream nil nil)
(read-line tmp-stream nil nil))
(lines nil))
((not line) (progn
(FORMAT T "COLLECTED~%")
(FORMAT nil "~{~a~^~%~}" (reverse lines))))
(push line lines)))
(defparameter *test* nil)
(progn
(setf *test* nil)
(sb-ext:gc :full t)
(room)
(FORMAT T "----~%")
(with-open-file (stream "/home/.../debugFiles/xdebug_1.xt")
(room)
(FORMAT T "----~%")
(setf *test* (stream->string stream))
(sb-ext:gc :full t)
(room)
(FORMAT T "----~%"))
(sb-ext:gc :full t)
(room))
Output
Dynamic space usage is: 84,598,224 bytes.
Read-only space usage is: 5,856 bytes.
Static space usage is: 4,160 bytes.
Control stack usage is: 8,408 bytes.
Binding stack usage is: 1,072 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.
Breakdown for dynamic space:
20,841,808 bytes for 20,691 code objects.
15,989,600 bytes for 999,350 cons objects.
14,532,960 bytes for 118,880 simple-vector objects.
13,951,792 bytes for 168,301 instance objects.
5,994,864 bytes for 41,648 simple-character-string objects.
13,287,200 bytes for 215,901 other objects.
84,598,224 bytes for 1,564,771 dynamic objects (space total.)
----
Dynamic space usage is: 85,346,752 bytes.
Read-only space usage is: 5,856 bytes.
Static space usage is: 4,160 bytes.
Control stack usage is: 8,536 bytes.
Binding stack usage is: 1,072 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.
Breakdown for dynamic space:
20,842,928 bytes for 20,692 code objects.
16,125,008 bytes for 1,007,813 cons objects.
14,698,784 bytes for 120,834 simple-vector objects.
14,239,440 bytes for 171,411 instance objects.
6,014,144 bytes for 41,776 simple-character-string objects.
13,426,448 bytes for 219,723 other objects.
85,346,752 bytes for 1,582,249 dynamic objects (space total.)
----
COLLECTED
Dynamic space usage is: 2,557,851,296 bytes.
Read-only space usage is: 5,856 bytes.
Static space usage is: 4,160 bytes.
Control stack usage is: 8,536 bytes.
Binding stack usage is: 1,072 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.
Breakdown for dynamic space:
2,466,544,480 bytes for 817,255 simple-character-string objects.
91,306,816 bytes for 2,303,370 other objects.
2,557,851,296 bytes for 3,120,625 dynamic objects (space total.)
----
Dynamic space usage is: 1,131,069,056 bytes.
Read-only space usage is: 5,856 bytes.
Static space usage is: 4,160 bytes.
Control stack usage is: 8,360 bytes.
Binding stack usage is: 1,072 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.
Breakdown for dynamic space:
1,053,183,424 bytes for 41,547 simple-character-string objects.
77,885,632 bytes for 1,510,521 other objects.
1,131,069,056 bytes for 1,552,068 dynamic objects (space total.)
I could understand a tripling of the size (even though this would still surprise me):
*test*However, a factor 10 increase is way to big.
How can that be?
as Rainer points out, your problem is that sbcl represents string as a vector of utf32 code points, which means that each character is 32 bits.
ideally, the right way to handle files is to process them streaming line-by-line, rather than slurping them all into memory, but if that isn't an option for you, and if you're confident that every character in your file is a base-char i.e. an ascii character, you can pass :element-type 'base-char to with-open-file, and coerce the result of read-line to simple-base-string. this might look like:
(defun file->lines (path)
(with-open-file (stream path :element-type 'base-char)
(do ((line (read-line stream nil nil)
(read-line stream nil nil))
(lines nil))
((not line) (nreverse lines))
(push (coerce line 'simple-base-string) lines))))
also, note that if your file has many lines, the overhead of storing the lines in a linked list may be significant. if you can predict the number of lines in your file, you may have better performance pre-allocating a large vector, and storing the lines in it, like:
(defun file->lines (path number-of-lines)
(with-open-file (stream path :element-type 'base-char)
(do ((line (read-line stream nil nil)
(read-line stream nil nil))
(lines (make-array number-of-lines :fill-pointer 0)))
((not line) lines)
(vector-push (coerce line 'simple-base-string) lines))))
but make sure your number-of-lines is an overestimate, or else you may have to do slow reallocate and copy. (that's why i wrote vector-push instead of vector-push-extend.
if you can't predict a number of lines, you may be best off reading into a list, then coercing to a vector at the end, like:
(defun file->lines (path)
(with-open-file (stream path :element-type 'base-char)
(do ((line (read-line stream nil nil)
(read-line stream nil nil))
(lines nil))
((not line) (coerce (nreverse lines) 'vector))
(push (coerce line 'simple-base-string) lines))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With