-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could the project switch to using log4j for logs? #1
Comments
My initial reaction is that it would be quite a lot of work for very little Why specifically log4j? John
|
I don't have any strong opinion about log4j. It is pretty standard and does the job well. |
IMHO log4j would be a pretty bad choice, since it is a logging framework, not a logging API. Apache Commons Logging for example is split into an API and an implementation part. The default implementation can be replaced with e.g. a SLF4J bridge which can then log using any logging backend, including log4j. There is also a logging API built directly into Java: Java Utility Logging - it requires no extra dependencies and can likewise (although with a bit of extra effort) be rerouted over another logging backend. My all-time-favourite is still Apache Commons Logging, since it is very easy to re-route, widely used, and very stable (no releases since ages). I usually re-route it to log4j using SLF4J, but that's only an option, not a must. |
Stuff that gets done around here tends to fall into one or more of the It does sound like if we were to use any alternate framework, Java Utility On Fri, Sep 13, 2013 at 12:05 AM, reckart [email protected] wrote:
|
On writing to stdout/stderr: There are certainly a lot of things written to stderr. However, there should not be any logging/error messages written to stdout. If anyone finds any, we will fix them. Things should (and at least mainly do) run cleanly so that only genuine NLP output gets sent to stdout. Logging: There is now some use of logging in CoreNLP (whereas a few years ago, there was none). I think my preference would be to stick to j.u.l, which is what is used a bit now in the code, just because the marginal gains in flexibility from some of the other frameworks just don't seem to outweigh the cost of adding another library dependency (exacerbated by the fact that there are several different ones that different people like...). Our use of logging is likely to increase gradually over time, but, as AngledLuffa commented, there's not likely to be someone tasked to convert everything. If someone like you wanted to take this on, we'd certainly be willing to support a change to j.u.l everywhere. |
I'd be very interested in getting involved here. I think having CoreNLP write to a logging API which can route into any particular logging backend of the library user's choice seems like the right approach - this seems pretty industry standard at this point. It's unfortunate to write / maintain / use programs which maintain a nice separation between logging API and logging backend for the purpose of maintaining consistent logging format, level, and log destination, only to include libraries which don't take advantage of these nice logging APIs and just dump unformatted text onto stdout / stderr (I don't mean to pick on CoreNLP here alone - there are hundreds, probably thousands of libraries in Maven Central which blindly dump text onto stdout / stderr). As for a logging API for CoreNLP to interface with, my recommendation would be to use SLF4J. It doesn't bias the client code (which makes calls to CoreNLP) to use any particular logging backend, nor does it require client code to provide additional backend shims to re-route logging events between backends. For example, by using SLF4j as the logging API, a program including the CoreNLP library needs only to include a single logging backend of their choice, for example logback (the successor to log4j), log4j, commons-logging, or java logging (jul). No logging bridges or shims are required for the end client to use any logging backend that they choose. (CoreNLP -> SLF4j API -> SLF4j implementation -> logging backend -> log destination) If CoreNLP used commons-logging as the logging API, then programs which want to use logback, log4j, or jul as the backend would need to include some sort of log event redirection shim (as reckart mentioned above) (typically in the form of SLF4j plus some additional commons-logging to destination backend bridge) which increases logging backend complexity, adds additional library dependencies to the client program, and increases logging event overhead. (CoreNLP -> commons-logging -> slf4j-commons-logging-shim -> slf4j API -> slf4j implementation -> logging backend -> log destination). If CoreNLP used JUL as the logging API, then programs which want to use logback, log4j, or commons-logging as the backend need the shims described in the paragraph above - plus, JUL + shims has some very unfortunate performance penalties as desribed (look for jul-to-slf4j bridge "Note on performance" section). (CoreNLP -> jul -> slf4j-jul-shim -> slf4j API -> slf4j implementation -> logging backend -> log destination). Anyway, I'd love to discuss further and/or get involved to help out. |
The thing is the Commons Logging is basically done - I doubt it will ever change again - as stable as it can be. While with SLF4J I've run into trouble in the past due to conflicting versions of SLF4J being on the classpath, etc. I like the SFL4J and I think it's a great choice for applications - for frameworks, I'd still vote for commons logging. If you want to use commons logging with log4j, no additional shims are needed. Just add log4j to the classpath and commons logging picks it up and uses it automatically. If you prefer another backend, you can route it via SLF4J. |
Cool, thanks for the insight. If CoreNLP is starting to migrate to Commons Logging, then that's fine - maybe I could help to just finish it up? The latest stable release still writes to stderr without Commons Logging, in places. Question for Commons Logging: do you happen to know if a library (like CoreNLP) uses Commons Logging as the API and client code uses Logback (which is log4j's "successor") as the backend, does that work without additional shims (ie, no need to SLF4j or any other libraries to be on the classpath?) |
I don't think the folks in Stanford are convinced just yet that they want to do this, but maybe if somebody comes up with a patch? No idea... I guess it will be quite some work and would be a pity if they later didn't merge it. Unless logback comes with a "shim" that allows it to pose as another framework such as log4j, I doubt it. ACL apparently supports these frameworks: Log4j (1.2), LogKit, Avalog, and Java 1.4. Btw. I just noted that ACL indeed is still being improved - veeery slowly - and activity picked up only recently again: there were 6 years between the release of 1.1.1 and 1.1.2. The latest version is 1.2 (Jul 2014). Changes are apparently minimal and mostly bugfixes. http://commons.apache.org/proper/commons-logging/changes-report.html |
@dinoboy197 You'll need to use SLF4J if you want to use/control/configure both logback and ACL. |
Heh. Better late than never, right? :) Anyway, as crazy as it may seem, I may write a patch to convert library methods to use ACL versus |
Hi everyone, good news: We are convinced to start moving to logging! Our plan is to go with slf4j, essentially for the reasons @dinoboy197 gives: It just seems the most modern, most flexible facade, and it seems to be winning in practice: http://zeroturnaround.com/rebellabs/the-state-of-logging-in-java-2013/ . This is likely to be a slow, gradual implementation for the reasons @AngledLuffa gave, but in the next release, you should start to see slf4j appearing and j.u.l -- and a few System.err.println's -- disappearing. Happy to have help. We are actually at 0 unmerged pulled requests right now (!). |
Any progress in moving to logging? Or at least give to user an ability to put custom LogListener |
Super awesome! I missed this message about a month ago. Very excited!
@manning - What, in particular can I do to help out this effort? Once slf4j-api is integrated into the build, I can start replacing print statements with slf4j statements, but am not sure if you want smaller PRs, larger PRs, PRs for certain classes, packages, etc. Let me know! |
Since work on this has started (slowly) on this - and slf4j-api is now present in the project - I'm going to close this issue. |
@manning - thanks for merging that first PR! That one only had 4 files - was that size ok? The second slf4j PR that I filed (with 11 files) was in response to a comment from @J38 mentioning to me that I should file individual PRs for whole packages in corenlp. Which size do you like better? |
Just in case anyone looks at this issue, we've decided to change again what we're doing for logging. (Flip-flopping Californian liberals, or something!) While we agree that much of the world would like logging support, the adoption of slf4j logging in v.3.6.0 was unpleasant for many users, as it meant that not only CoreNLP but all our other smaller tools (NER, parser, classifier, POS tagger, ...) required the slf4j library. This caused breakages for many tools, especially integrations into non-Java frameworks which used to just load one jar and to have everything that they needed.... So, for v.3.6.1, we're going to switch to logging in CoreNLP directly via the Redwoods logging package that @gangeli wrote (and which already appeared in parts of CoreNLP). We've extended it -- and speeded it up -- and it will log via slf4j, if slf4j is present, and otherwise it will (by default) log to stderr. If you looked at what's in master on github, we've now made a lot of progress on swapping System.err.println's across to Redwoods logging commands, though there are still quite a few minor things to fix. |
Out of curiosity, if the goal was to keep to a single jar, why not just use java.util.logging? I'm not familiar with Redwoods. What features are available there that's not available in j.u.l? |
There are actually features in Redwood that are not in j.u.l (tracks, --Gabor On Mon, Feb 22, 2016 at 4:40 PM, Mark Woon [email protected] wrote:
|
j.u.l can also be routed through anything you like, for example through SLF4J. Only the respective logging backend needs to provide an adapter for j.u.l. Although the way that j.u.l is implemented makes it just a tad bit more effort to route it through something else than e.g. Apache Commons Logging. |
I use SLF4J, so I'm familiar with routing j.u.l. Just curious about Redwood. It sounds like it will automatically use SLF4J if it's detected, so that works fine for me. |
Please reconsider this. For all I can tell, Redwood comes at quite some overhead. Interfaces such as slf4j are designed to minimize performance cost of logging, both enabled and disabled. There is a reason why they use message templates, for example: to avoid string concatenation when logging is disabled. Redwood appears to be very expensive here, performing several array allocations for every log call (even when disabled) and by not allowing the use of templates. All the good logging frameworks are designed so that the Hotspot VM will inline the logger call, and when logging is disabled, it is a Also, the logger should never kill the application, but
Use an exception, and give users a chance to catch and recover, rather than doing a hard exit.
I believe the better approach here is to simply provide a "standalone" jar (also known as "fat jar") for these users, which is self-contained and thus includes e.g. |
I completely support this approach. Logging framework changes should not cause undue pain with existing developers. However, I would hate to see a lock-in approach where best-practices in Java software engineering (for instance, third-party logging) can't be introduced to benefit those clients who want to follow them. Perhaps this idea of dual distribution (fat jar with custom logging AND standalone jar via maven dependency which pulls in slf4j) suggestion is something which could be rolled into a major version change; ex: v3 -> v4 with explicit communication to clients that they now have choices? |
Redwood does indeed come with some overhead, but CoreNLP doesn't exactly spew a lot of logs, even on the debug channel. Are there benchmarks that show measurable performance degradation? On my machine, I count 7.7M log messages / second, which means that even if CoreNLP were to spew an absurd 1k log messages per second, this would add an extra second for every 2 hours of runtime. Have you encountered concrete issues with runtime / interfacing with Redwood? As the author of Redwood I of course I bring my own biases into the discussion, but in my view CoreNLP is first and foremost an academic library meant to support NLP research. Redwood was created first and foremost as a logger to facilitate this: hierarchical logging ( Second, I do think there's an appeal to having pure CoreNLP be entirely dependency-free. Sure, certain features depend on external libraries (SUTime / serialization formats / etc.), but the library will boot without these. Keep in mind that not everyone uses Maven, and not everyone calls CoreNLP from Java. "Fat Jars" are a wonderful idea for applications, but often lead to a special sort of Jar hell with libraries, and I dislike the "use Maven or be sad" answer to that. |
@gangeli - understood. I'm commenting from a non-academic setting; my organization makes use of CoreNLP in combination with a variety of other libraries in a combined experiment that we use internally (so a fat jar doesn't work as losing that dependency versioning and conflict resolution creates a mess). Having said that, whatever internal implementation a library (such as corenlp) has to interact with the slf4j api (redwood, for instance) is non-material; as long as the slf4j-api is the final logging backend, that's what matters. I know Chris mentioned that a year ago ("and it will log via slf4j, if slf4j is present, and otherwise it will (by default) log to stderr.") - that's the important part for me. |
@gangeli I am an academic user. Nevertheless, I do care about performance; because it limits how fast I can do development iterations and in the worst case, may mean missing a deadline. But yes, I have not benchmarked the share of runtime cause by logging currently, and it may well be negligible overall (but note that you want a next-to-zero overhead logging to make people actually use it). Fat jars of course should only be used standalone, then there is no dependency hell. It's just the solution for those unable to handle multiple jars; the problematic case mentioned above. These are the users that will want a fat jar:
There is no excuse to use
slf4j has a |
Some thoughts: 1.) I'll look into improving the I do think we have to be mindful of making sure Stanford CoreNLP doesn't bring entire applications down and gives them a chance to recover if for instance a pipeline has a serious problem. 2.) I am extremely skeptical that Redwood logging is causing any kind of meaningful performance hit. I would have to see someone provide a concrete example to consider this an issue. 3.) We support slf4j logging, so if users want to plug in their favorite logging framework that works with slf4j, we allow for that (though with the added cost of Redwood). 4.) If there are more problems with Redwood please feel free to let me know and I am open to modifying it to address concerns. 5.) Thanks everyone for bringing up great points! |
It is relatively straightforward to create a fat jar in addition to a non-fat JAR by adding the maven-shade-plugin to a POM. So offering both shouldn't complicate your build setup too much. |
Some numbers, with a jmh microbenchmark, tinylog backend, info level, logging disabled: Logging 1 million string constants:
Logging with the format string pattern (i.e.
Logging with string concatenation antipattern (
So while slf4j adds some overhead over e.g. tinylog, the cost of the facade is small. The cost of redwood however is massive. With redwood, I can ignore 0.5 disabled log statements per microsecond. With the others, this is around 1000. With format strings, about 130 (apparently there is still some array construction or boxing involved then). We can also look at the runtime of a single call to
Then with a format string:
Or with string concatenation:
Cost of string construction with
So Redwood Channels logging costs about 1600 ns per call, alternative approaches are about 1 ns for constant strings, and 8 ns for string patterns. Avoid using string concatenation in logging statements, unless they are warning level or higher (so usually enabled anyway). Instead, use templates. You really don't want to use Redwood Channels for any debug-level logging. Too heavyweight. Right now, it may not matter, but if you at some point want to add (optional) detailed logging for debugging, you may find this cost become problematic. |
I agree entirely with this statement, and with the benchmarks showing Redwood to be much slower than SLF4J. But, the point of Redwood is not to be a replacement for SLF4J -- SLF4J does a wonderful job of what it does. The point is to provide a different sort of logging, very much tuned to a particular use-case. CoreNLP is not expected to spew debug logs anytime in the near future (why would it? It's not like we analyze logs, or keep historical logs, or any of the normal things companies do with logs.). I don't see this as a runtime concern for any practical application. If we're optimizing for efficiency, there's much more low hanging fruit in, e.g., your IntPair bug, or in CoreMaps, or in the Simple API serializing+deserializing on every annotation. RE Fat Jar: My issue isn't with our own build process, but with people using it downstream. I'd like people to be able to put CoreNLP into a lib/ folder and add it to their classpath, and have it work (as, actually, I did for all of my research projects, or as wrappers in other languages often do it). This is awkward if CoreNLP doesn't boot as a simple Jar, and problematic if CoreNLP includes dependencies that I may be including from elsewhere -- effectively forcing a Maven-like dependency resolution scheme. SLF4J itself is not awful, because there aren't many versions and it doesn't have transitive dependencies, but nonetheless it's a trend I'd like to argue against. RE Fatal: I've used this in my own projects to bring down multi-threaded programs. There's nothing worse than starting a run, having the main thread crash on something, and then having the whole process hang (holding up, e.g., cluster resources) because a non-daemon thread didn't exit. Sure, all my threads should be Daemons; but again, academic code. Yes, it breaks abstractions, since it's more than what a logging framework should do, but the whole point of Redwood is to be a bit different -- otherwise again, SLF4J does a wonderful job. Redwood also manages threads a bit to synchronize tracks in multithreaded applications, and does its own logic to suppress repeated log messages, etc. Clearly out of scope for a "logging framework," but useful in code that I and others have written. That said, I'm not opposed to removing the function, or changing its behavior. And of course, at the end of the day the final say is Chris' and Jason's. I'm just voicing opinions. :) |
A big +1 for removing the System.exit() in logger. |
Does Redwood have any means of logging error traces (i.e. Throwables)? I got this out-of-memory warning (logged as info) due to some Wikipedia documents, and the backtrace could be helpful to see where it got the OOM: #382 On a side note, when running multi threaded, the out of memory will also affect other threads, so it's not as easy as just continuing...) I probably need to rerun the entire database, because I can't reliably tell which documents were correctly processed, and which were not... Also, CoreNLP currently has much of its debug logging code-disabled via |
Redwood should support stack traces. The following works for me:
Yielding:
The |
I see a lot of logs printed to System.out or System.err.
Would it be possible to use a library like log4j http://logging.apache.org/log4j/2.x/ and use log.error, log.warning, log.info, log.debug instead?
That would make it easier for users of the StanfordCoreNLP to manage which logs should be printed by choosing the log level of the project.
The text was updated successfully, but these errors were encountered: