Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@W-8522881@ Invalidate input URL if it contains special characters #534

Merged
merged 8 commits into from
Jan 8, 2021

Conversation

winterslu
Copy link
Contributor

@winterslu winterslu commented Dec 9, 2020

Due to latest java.net.URL security constrain, we invalidate URL containing special characters, for example '@', this issue should be addressed globally and properly by normalizing special characters into single form.

@winterslu winterslu added the bug label Dec 9, 2020
@winterslu winterslu changed the title encoding url with uri in text Escape special characters in URL Dec 9, 2020
@@ -181,11 +182,11 @@ class URL(value: Option[String]) extends Text(value){
/**
* Extracts url domain, i.e. 'salesforce.com', 'data.com' etc.
*/
def domain: Option[String] = value map (new java.net.URL(_).getHost)
def domain: Option[String] = value map (s => new java.net.URL(new URI(s, false).toString).getHost)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather make it a parameter with a default value, e.g.

/**
 * Extracts url domain, i.e. 'salesforce.com', 'data.com' etc.
 *
 * @param escaped true if URI character sequence is in escaped form. false otherwise. 
 */ 
def domain(escaped: Boolean = false): Option[String] = value map (s => new java.net.URL(new URI(s, escaped).toString).getHost)

@winterslu winterslu changed the title Escape special characters in URL @W-8522881@ Escape special characters in URL Dec 9, 2020
@gerashegalov
Copy link
Contributor

I still don't see an exception:

scala> val url = new java.net.URL("http://specialchars.@.com/")
url: java.net.URL = http://specialchars.@.com/

scala> url.getHost
res5: String = specialchars.@.com

@winterslu
Copy link
Contributor Author

Looks like a JDK related issue, with openjdk version "1.8.0_192" issue cannot be reproduced, however with openjdk version "1.8.0_232" issue can be reproduced on local. Thanks to @gerashegalov checking this

@codecov
Copy link

codecov bot commented Dec 9, 2020

Codecov Report

Merging #534 (dcc70fa) into master (91724f1) will increase coverage by 28.33%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #534       +/-   ##
===========================================
+ Coverage   58.44%   86.77%   +28.33%     
===========================================
  Files         347      347               
  Lines       12019    12019               
  Branches      389      394        +5     
===========================================
+ Hits         7024    10430     +3406     
+ Misses       4995     1589     -3406     
Impacted Files Coverage Δ
.../scala/com/salesforce/op/features/types/Text.scala 94.36% <100.00%> (+78.87%) ⬆️
.../op/features/types/FeatureTypeSparkConverter.scala 99.12% <0.00%> (+0.87%) ⬆️
...orce/op/aggregators/MonoidAggregatorDefaults.scala 100.00% <0.00%> (+1.78%) ⬆️
...a/com/salesforce/op/filters/RawFeatureFilter.scala 92.97% <0.00%> (+2.16%) ⬆️
...p/evaluators/OpBinaryClassificationEvaluator.scala 82.60% <0.00%> (+2.17%) ⬆️
...s/impl/feature/DecisionTreeNumericBucketizer.scala 94.73% <0.00%> (+2.63%) ⬆️
...com/salesforce/op/test/PassengerFeaturesTest.scala 90.00% <0.00%> (+3.33%) ⬆️
...ain/scala/com/salesforce/op/aggregators/Maps.scala 96.55% <0.00%> (+3.44%) ⬆️
...rce/op/stages/impl/feature/ScalerTransformer.scala 96.29% <0.00%> (+3.70%) ⬆️
...op/stages/impl/regression/OpXGBoostRegressor.scala 26.92% <0.00%> (+3.84%) ⬆️
... and 165 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91724f1...dcc70fa. Read the comment docs.

@gerashegalov
Copy link
Contributor

it looks like it was a CVE patch: http://hg.openjdk.java.net/jdk-updates/jdk11u/rev/2c5fa7382e75

@gerashegalov
Copy link
Contributor

instead of working around the security patch I suggest we simply return None for illegal URI.

@tovbinm
Copy link
Collaborator

tovbinm commented Dec 9, 2020

@gerashegalov that might be hiding the problem under the carpet. None would be indistinguishable from a null value in this case, or any other illegal URL. I this particular case the solution exists.

@gerashegalov gerashegalov added the BUG P3 git2gus P3 label Dec 10, 2020
@gerashegalov
Copy link
Contributor

@gerashegalov that might be hiding the problem under the carpet. None would be indistinguishable from a null value in this case, or any other illegal URL. I this particular case the solution exists.

I am not convinced that replacing the URL parsing library is an acceptable solution in this case.

@winterslu
Copy link
Contributor Author

@tovbinm I traced to this change in jdk, and also the reason it is been added. Line http://hg.openjdk.java.net/jdk-updates/jdk11u/rev/2c5fa7382e75#l3.39 declared there is a CRLF concern has been masked out.
@gerashegalov I can return None, but still, its not given reasonable response and hiding the problem

@tovbinm
Copy link
Collaborator

tovbinm commented Dec 11, 2020

@winterslu +1

@gerashegalov
Copy link
Contributor

gerashegalov commented Dec 11, 2020

This PR in the original form with escaping was "sweeping the problem under the rug" @tovbinm

@gerashegalov
Copy link
Contributor

gerashegalov commented Jan 6, 2021

note that implementation no longer matches the PR description

@winterslu winterslu changed the title @W-8522881@ Escape special characters in URL @W-8522881@ Invalid input URL if it it contains special characters Jan 8, 2021
@winterslu winterslu changed the title @W-8522881@ Invalid input URL if it it contains special characters @W-8522881@ Invalid input URL if it contains special characters Jan 8, 2021
@winterslu winterslu changed the title @W-8522881@ Invalid input URL if it contains special characters @W-8522881@ Invalidate input URL if it contains special characters Jan 8, 2021
@winterslu
Copy link
Contributor Author

@crupley @gerashegalov To merge and close this PR, i have created an issue (feature request) #539 to log additional works that would handle string/url input in this bug, by replacing special characters with standard form.

@winterslu winterslu merged commit 37df638 into master Jan 8, 2021
@tovbinm
Copy link
Collaborator

tovbinm commented Jan 8, 2021

Thank you.

@tovbinm tovbinm deleted the klu/text-url-encoding branch January 18, 2021 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG P3 git2gus P3 bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants