Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have replace-* work with characters. Add a compiler macro for this. #97

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

K1D77A
Copy link

@K1D77A K1D77A commented Jun 3, 2023

Add 3 aliases for the replace-* called substitute-* because cl-user:replace is destructive.

Add 3 aliases for the replace-* called substitute-* because
cl-user:replace is destructive.
@K1D77A
Copy link
Author

K1D77A commented Jun 3, 2023

I was also going to ask why such heavy use of ppcre is used when many functions could be replaced with plain CL?

@kilianmh
Copy link
Collaborator

kilianmh commented Jun 3, 2023

Hi @K1D77A, thank you for your pull request!

I appreciate the suggestions of setting alias functions and adding support for characters in the replace-* functions.

Here are some comments:

  1. Rename all replacement functions to substitute-*, while the replace-* functions should be alias (considered deprecated). Additionally, nsubstitute-* functions should be defined with define-modify-macro. This approach should align better with the cl standard? @vindarel

  2. Currently, the compiler macros cause sbcl 2.3.5 compilation to crash due to infinite inlining. In my opinion, compiler macros will not work here because we need to either check the type of old & new at runtime, or alternatively, we can simply coerce old & new to strings using (string ...).

  3. In my opinion, you are right that we should use standard cl-functions if possible. However, it might sometimes make sense to use cl-ppcre, if we give the regex option to the users split string doesn't allow regular expression #63

@K1D77A
Copy link
Author

K1D77A commented Jun 8, 2023

(define-compiler-macro substitute-first (&whole form &rest args)
  (declare (ignore form))
  (destructuring-bind (old new string)
      args 
    (if (and (constantp old)
             (characterp old)
             (constantp new)
             (characterp new))
        `(%substitute-first-character ,old ,new ,string)
        `(%substitue-first-string ,old ,new ,string))))

(defun %substitute-first-character (old-char new-char string)
  (substitute new-char old-char string :test #'char= :count 1))

(defun substitute-first (old new s)
  "Substitute the first occurrence of `old` by `new` in `s`. Arguments are not regexs."
  (if (and (characterp old)
           (characterp new))
      (%substitute-first-character old new s)
      (%substitute-first-string (string old) (string new) s)))

(defun %substitute-first-string (old new s)
  (let* ((ppcre:*allow-quoting* t)
         (old (concatenate 'string "\\Q" old))) ;; treat metacharacters as normal.
    ;; We need the (list new): see !52
    (ppcre:regex-replace old s (list new))))

Does this still cause infinite inlining for you?

@vindarel
Copy link
Owner

vindarel commented Jun 8, 2023

Add 3 aliases for the replace-* called substitute-* because cl-user:replace is destructive.

good catch, but is it really an issue though? The replace verb is less CL-standard-compliant but more… standard in general and discoverable. Isn't it enough to trust cl-str? (accordingly, it is documented nowhere that replace-all returns a new string).

The goal of cl-str was not to make a better standard.

It is much simpler to coerce old and new to a string, did you have an extra motivation for the compiler macro? (super-optimizing?)

Thanks for sending a PR (and for help in the review).

@kilianmh
Copy link
Collaborator

From cl-ppcre documentation (Hints, comments, performance considerations):

CL-PPCRE uses compiler macros to pre-compile scanners at load time if possible. This happens if the compiler can determine that the regular expression (no matter if it's a string or an S-expression) is constant at compile time and is intended to save the time for creating scanners at execution time (probably creating the same scanner over and over in a loop). Make sure you don't prevent the compiler from helping you. For example, a definition like this one is usually not a good idea:

(defun regex-match (regex target)
  ;; don't do that!
  (scan regex target))

Accordingly, the old parameter should be stringified at run-time only if it is
not a constant (= neither string nor character).
Passing the compile-time known old string, or (stringified!) character, in a compiler macro to cl-ppcre:regex-replace should allow the creation of scanners at load time and therefore improve run time performance for these specific cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants