-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cvs n rows issue #337 #51
base: master
Are you sure you want to change the base?
Conversation
The following vulnerabilities are fixed with an upgrade: - https://snyk.io/vuln/SNYK-JAVA-COMAMAZONREDSHIFT-6841702 Co-authored-by: snyk-bot <[email protected]>
Add statement about intended use.
… lines read from the csv file
…ious implementation
I think this problem has actually a bit more to it than the label "good first issue" suggests.
|
What happens when sampleSize is set to a positive value, and the csv has more lines than that value (this is a likely scenario)? By default, sampleSize is set to 100.000 in the GUI ("Rows per table" in the Scan tab), and also in example config files. Unless I miss something, the proposed implementation would lead to inconsistencies for the "N rows" column for csv files where the actual number of lines exceeds the sample size: it would no longer reflect the actual number of lines (rows) in the file. |
Ah yes, that would indeed be confusing. I also don't see a way to count the number of lines without incurring a performance cost in the case a |
What I am thinking of, but we would need to discuss this more extensive for feasibilty:
Part of the reason I am considering this, is that in case of SQL databases, we now also do a count, which could also be slow. However, there are "cheap" ways to obtain an estimate from (some) SQL databases as well. Giving a rough estimate aligns better with the intent of the WhiteRabbit scan: a "quick" survey of what the data looks like. |
…cate an estimate (some other manner of comunicating estimates should be used)
This pull request resolves issue #337 and adds row length to the cvs report.
It does so by using the scanned number of lines as the number of rows. Furthermore I have edited the test csv scanreport to include row length as well.