Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add static-dispatch versions #13

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

digikar99
Copy link

This is still of limited use though.

  1. It does not emit compiler notes when static-dispatch fails. This will require changes in static-dispatch itself.
  2. It still needs a fair amount of declarations and inlining to actually get much benefits out of this approach.

In the upcoming commits, I will attempt the second task. For example, the following shows a generic-+ in its disassembly.

(disassemble
 (lambda (x y)
   (declare (optimize speed)
            (type (simple-array single-float 1) x y))
   (num-utils:e2+ x y)))

To get rid of it requires one to inline the ref inside mapping-array. But after doing both of these, we get a 2.5x performance boost on SBCL 2.3.4:

CL-USER> (let ((x (aops:rand* 'single-float 1000))
               (y (aops:rand* 'single-float 1000)))
          (declare (notinline num-utils:e2+))
          (time (loop repeat 10000 do (num-utils:e2+ x y))))
Evaluation took:
  0.252 seconds of real time
  0.250618 seconds of total run time (0.250618 user, 0.000000 system)
  99.60% CPU
  553,358,300 processor cycles
  40,972,272 bytes consed
NIL
CL-USER> (let ((x (aops:rand* 'single-float 1000))
               (y (aops:rand* 'single-float 1000)))
           (declare (optimize speed)
                    (type (simple-array single-float 1) x y))
           (time (loop repeat 10000 do (num-utils:e2+ x y))))
Evaluation took:
  0.104 seconds of real time
  0.103522 seconds of total run time (0.103522 user, 0.000000 system)
  100.00% CPU
  228,570,630 processor cycles
  40,938,752 bytes consed
NIL

This can be optimized further if we allow for an optional out argument to the e2+ function.

num-utils.asd Outdated
(defsystem "num-utils/tests"
:version "1.0.0"
:description "Unit tests for NUM-UTILS."
:author "Steven Nunez <[email protected]>"
:license "Same as NUM-UTILS -- this is part of the NUM-UTILS library."
#+asdf-unicode :encoding #+asdf-unicode :utf-8
:depends-on (#:num-utils
:depends-on (#:num-utils/static-dispatch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this temporary? It looks like this replaces all the num-util tests with the ones from static-dispatch. Ideally we could keep two test systems until static-dispatch is ready to be folded into the main system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, my bad! I will fix it.

@digikar99
Copy link
Author

I think I can relate to tpapp now. Optimizing Common Lisp - even with SBCL - is not trivial. There are many gotchas. For instance, take sequence-minimum and suppose we also inline it:

(declaim (inline sequence-minimum))
(defun sequence-minimum (x)
  "Return the maximum value in the sequence X"
  (check-type x alexandria:proper-sequence)
  (cond ((listp x) (apply 'min x))
    ((vectorp x) (reduce #'min x))))

First of all, as of SBCL 2.3.4, reduce does not inline. So, we might attempt to write it using a loop, expecting that types will be propagated appropriately during inlining - as they should.

(defun sequence-minimum (x)
  "Return the minimum value in the sequence X"
  (check-type x alexandria:proper-sequence)
  (cond ((listp x)
         (cond ((null x)
                (error "Empty sequence"))
               ((null (cdr x))
                (car x))
               (t
                (let ((min (car x)))
                  (dolist (elt x)
                    (setf min (min min elt)))
                  min))))
        ((vectorp x)
         (case (length x)
           (0 (error "Empty sequence"))
           (1 (row-major-aref x 0))
           (t (let ((min (row-major-aref x 0)))
                (dotimes (index (1- (array-total-size x)))
                  (let ((elt (row-major-aref x (1+ index))))
                    (setf min (if (< elt min) elt min))))
                min))))))

However, even this is not enough, a generic-< is left in the disassembly of

(disassemble
 (lambda (x)
   (declare (optimize speed)
            (type (simple-array single-float 1) x))
   (num-utils:sequence-minimum x)))

To actually avoid a generic-<, completely - and we get a 10-15x speed boost if we do this - we need something like the following:

(defun sequence-minimum (x)
  "Return the minimum value in the sequence X"
  (check-type x alexandria:proper-sequence)
  (etypecase x
    (list
     (cond ((null x)
            (error "Empty sequence"))
           ((null (cdr x))
            (car x))
           (t
            (let ((min (car x)))
              (dolist (elt x)
                (setf min (min min elt)))
              min))))
    (vector
     (case (length x)
       (0 (error "Empty sequence"))
       (1 (row-major-aref x 0))
       (t (let ((min (row-major-aref x 0)))
            (declare (type (simple-array single-float 1) x)
                     (type single-float min))     ; <== THIS TYPE DECLARATION
            (dotimes (index (1- (array-total-size x)))
              (let ((elt (row-major-aref x (1+ index))))
                (setf min (if (< elt min) elt min))))
            min))))))

But that makes the code non-generic to vector element types :/. In this particular case, polymorphic-functions too are only helpful if we write a polymorph-compiler-macro, which actually should not be necessary at all. To actually the generic versions we need better type inference.

So, I'm unsure how much benefit static-dispatch alone could get us. May be a 1.5-2x, but so long as generic-+, generic-< etc remain, we are losing out on a 10x performance boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants