Throw Warning When Reading File with Blank Lines#7707
Throw Warning When Reading File with Blank Lines#7707Asa-Henry wants to merge 12 commits intoRdatatable:masterfrom
Conversation
…t it doesn't solve the problem yet.
…kip' is greater than 0 when blank lines are present, so I also check if blank lines should be skipped so I can throw a warning to let the user know.
|
PR could have meaningful title |
|
please add a test case |
|
This is a good start. Can you run You can see that the tests don't pass. I think that tests 1578.1, 1578.2, 1578.6, 1883 will need to be updated for the new warning, possibly 1867.07 and 1867.11 too. Unfortunately, other tests show a regression:
Any ideas how the spurious warnings can be fixed? If you need to debug |
|
Thanks for this comprehensive response @aitap! This made it easier to understand what went wrong with my changes! In response to your question: it appears I introduced confusion between a file with intermittent blank lines and a file which has the header and the data separated by a blank line. For issues such as 1840.2, looks like I over looked how spaces come into play when parsing... I picked up on that |
… accomodate situation where the header and data are separated by a blank line.
…y a blank line. In the case of each line separated by a blank line, 'prevStart' is always NULL because each line could be a possible header.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7707 +/- ##
=======================================
Coverage 99.04% 99.04%
=======================================
Files 87 87
Lines 17123 17041 -82
=======================================
- Hits 16959 16878 -81
+ Misses 164 163 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
What is the problem with the 'atime performance tests'? It's saying that |
That CI doesn't work on PRs from forks -- you can ignore it |
|
May I get some help writing a test? I'm trying to make a test using the input shown in the initial issue #3339. The following is what I have: input = "x y\n\n1 a\n\n2 b\n\n3 c"
test(
1578.10,
fread(input),
data.table(V1=3, V2=c),
warning="The rows in this file appear to be separated by blank lines. This resulted in most rows being skipped. If this was not the intended outcome, please consider setting 'blank.lines.skip' to TRUE."
)But then test 1578.1 fails. If I don't add my test, then all tests pass. |
|
Congratulations on figuring this out! (Sorry I didn't find the time to help.) Your test should work after you replace |
…nted out. Causes an error in test 1578.1?
|
Oh, right. Test numbers are ordinary doubles, so |
|
I might need some more support writing an issue or two for my pull request. When testing locally, is there a file where more information about what went wrong with particular issue is stored? |
|
A failed test tests/*.R leaves a file named data.table.Rcheck/tests/*.Rout.fail.
In your case the difference boils down to the extra newline. If you capture the warning condition using withCallingHandlers() or tryCatch(), you can see that it doesn't end with a newline; R adds a newline by itself when it prints the warning.
|
|
Thanks for pointing out the newline @aitap! Looks like that was the last base to be covered! Is it okay that the issue is '1578.91'? I noticed there is a scheme where 0s are added as needed to accommodate tests which have many significant figures. I believe the last thing is to add to NEWS.md? |
|
Yes, a NEWS.md entry is in order.
You can also renumber the tests to go from .01 to .10 instead of .1 to .91 if you'd like to. The former option is more aesthetically pleasing (people reading the code will be happier) and the latter gives a smaller diff (less potential for conflicts with other PRs, easier to git bisect).
|
…hed where 0s are prepended when there is more than one significant digit following the decimal.
Adds a check to detect when blank lines are present in the file passed to
fread, but the user did not specify to skip blank lines. This change throws a warning to let the user know they may want to consider setting theblank.lines.skiptoTRUE. Closes #3339