-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make search more consistent #24121
Make search more consistent #24121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many unrelated formatting changes in this PR. Also I won't repeat comments I made at #24116. I'll review it again when it's more focused.
or UTF-8 strings). The third argument optionally specifies a starting index. The return | ||
value is a range of indexes where the matching sequence is found, such that `s[search(s,x)] == x`: | ||
or UTF-8 strings). | ||
Alternatively the first and the second arguments may be arrays of `Int8` or `UInt8`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not document this, it really should be an internal function which shouldn't have been exposed. I was preparing a PR to merge search
with findfirst
which would also handle this.
or UTF-8 strings). | ||
Alternatively the first and the second arguments may be arrays of `Int8` or `UInt8`. | ||
The third argument optionally specifies a starting index. | ||
If it is not a valid index into the first argument then error is thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes without saying, I don't think it's worth making the description longer than it already is to mention this.
Alternatively the first and the second arguments may be arrays of `Int8` or `UInt8`. | ||
The third argument optionally specifies a starting index. | ||
If it is not a valid index into the first argument then error is thrown. | ||
The return value when the second argument is a string or a vector of bytes is the samllest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"samllest". "exits".
|
||
If the first argument is empty and third argument is not given `0:-1` is returned. | ||
|
||
The return value when the second argument is a signle character, a vector or a set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"signle"
`search(string, 'c')` = `index` such that `string[index] == 'c'`, or `0` if unmatched. | ||
In particular a search for a zero length second argument returns `i:(i-1)`. | ||
|
||
If the first argument is empty and third argument is not given `0:-1` is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better give the general rule and use start=start(s)
in the signature. People will figure out what happens in special cases.
in(c::Char, s::AbstractString) = (search(s,c)!=0) | ||
search(s::AbstractString, c::Chars) = s == "" ? 0 : search(s, c, start(s)) | ||
|
||
in(c::Char, s::AbstractString) = search(s,c) != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plase don't change unrelated lines. Same below. This makes it really painful to review.
function _search_bloom_mask(c) | ||
UInt64(1) << (c & 63) | ||
end | ||
|
||
_nthbyte(s::String, i) = codeunit(s, i) | ||
_nthbyte(a::ByteArray, i) = a[i] | ||
_nthbyte(s::String, i) = @inbounds codeunit(s, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better add @inbounds
at the call site, and add @propagate_inbounds
annotations here.
|
||
Similar to [`search`](@ref), but return only the start index at which | ||
the substring is found, or `0` if it is not. | ||
Similar to [`search`](@ref), but return only the start index at which the second argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"substring" was clearer, let's not make the docs worse because of that UInt8
thing.
@nalimilan Sorry for messing with the code formatting - lesson taken and I will fix it. Before I correct this PR I would like a clear recommendation about the issue also discussed in #24116 regarding Currently:
and I think this is what users would expect.
is something we want to avoid as The only problem is what we want to return in
which is painful and error prone as in general user might not know if Remember that my proposal additionally fixes:
which is incorrect for sure. But I would say that @nalimilan What do you think is a proper way to go given the above? |
I hadn't realized it was such a Catch 22. I'd say we should have a special case for The only other solution I can think of is to allow |
Just to make sure I understand your sentence:
If this is the decision I can implement it and it should go into the documentation then. |
That's my preferred solution at this stage, but you may want to way for possible comments at #24103 (comment). |
closing as it is outdated. |
This is an implementation of #24103. If will fail the tests now. I will fix the tests when there is an agreement that the proposed functionality is OK.
All key assumptions how
search
now works are given in its docstring