Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed increase for in memory decoding #15

Open
psilord opened this issue Jun 11, 2016 · 4 comments
Open

Speed increase for in memory decoding #15

psilord opened this issue Jun 11, 2016 · 4 comments

Comments

@psilord
Copy link

psilord commented Jun 11, 2016

Hello, I noticed how to get a ten times speed increase in decoding of in memory
encoded data. Thank you!

;; in the fast-io package, replace these two macros with the
;; supplied forms. The THE is the part I added. It informs the
;; compiler that fast-read-byte is returning an
;; (unsigned-byte 8). This way the compiler inference can
;; propagate properly and we can remove the need for bignum
;; and a bunch of type checking.
;;
;; Fix this in file:
;; io.lisp in fast-io.

(defmacro read-unsigned-be (size buffer)
  (with-gensyms (value)
    (once-only (buffer)
      `(let ((,value 0))
         ,@(loop for i from (* (1- size) 8) downto 0 by 8
                 collect `(setf (ldb (byte 8 ,i) ,value) (the (unsigned-byte 8) (fast-read-byte ,buffer))))
         ,value))))

(defmacro read-unsigned-le (size buffer)
  (with-gensyms (value)
    (once-only (buffer)
      `(let ((,value 0))
         ,@(loop for i from 0 below (* 8 size) by 8
                 collect `(setf (ldb (byte 8 ,i) ,value) (the (unsigned-byte 8) (fast-read-byte ,buffer))))
         ,value))))
@psilord
Copy link
Author

psilord commented Jun 11, 2016

In looking at this more closely, my change actually has an error. I hadn't realized that fast-read-byte could also return an eof-value in addition to an unsigned-byte. I would suggest fixing it to return a values if eof-value is not an (unsigned-byte 8).

Hrm, in more thinking about it. There is no generic eof-value that can be an (unsigned-byte 8) since it overlaps the domain of any byte the fast-read-byte function can nominally return. So, a values is very much a solution here to return the eof-value out of the domain of (unsigned-byte 8).

It also seems no current code uses the eof-value feature, so my fix is valid, but a hidden land mine.

@psilord
Copy link
Author

psilord commented Jun 11, 2016

And, here is a patch which fixes fast-read-byte to have a 1MB cache vector when reading from the stream as opposed to byte by byte reading. This made my read performance from disk about 25% faster.

This patch also includes the above patch to allow type inference to happen.

io.txt

@psilord
Copy link
Author

psilord commented Jun 11, 2016

And, here is the function I used to test it:

(defun cpk-test (num-elements file)
  (let ((data (make-array num-elements
              :element-type '(unsigned-byte 32)
              :initial-element 0)))
    (format t "Initializing data array.~%")
    (loop for i from 0 below num-elements do (setf (aref data i) i))

    (format t "encoding...~%")
    (time (encode-to-file data file))

    (format t "decoding...~%")
    (let ((result (car (time (decode-file file)))))
      (format t "Found ~A elements.~%" (length result))
      (format t "First 32 elements: ")
      (loop for i from 0 below (min num-elements 32) do
       (format t "~A " (aref result i)))
      (format t "~%")))
  (sb-ext:gc :full t)
  T)

@psilord
Copy link
Author

psilord commented Jun 14, 2016

I checked out sbcl 1.3.6 and discovered that the THE patch has almost no effect, so the compiler seems to have gotten smarter, but the buffering patch indeed improved performance. So, the THE
patch seems superfluous at this time.

I did my initial tets with sbcl 1.3.1 and there the THE patch produce huge speedups. Now I just get the same speedups naturally with 1.3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant