Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, if my posted declaration fails validation, I want to receive a 4xx error and I want the validation issues returned #336

Closed
HankHerr-NOAA opened this issue Oct 11, 2024 · 16 comments · Fixed by #338
Assignees
Labels
bug Something isn't working
Milestone

Comments

@HankHerr-NOAA
Copy link
Contributor

The declaration I attempted to post to the COWRES is below; note that probability_threhsolds is declared twice. When posted, curl returns the following:

[hank.herr@nwcal-ised-dev1 issue103897]$ curl --cacert cowres_dod_ca_chain.pem -d "projectConfig=$(cat hefs_validation_failure_135592.yml)" https://[COWRES]/job/
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 Request failed.</title>
</head>
<body><h2>HTTP ERROR 500 Request failed.</h2>
<table>
<tr><th>URI:</th><td>/job/</td></tr>
<tr><th>STATUS:</th><td>500</td></tr>
<tr><th>MESSAGE:</th><td>Request failed.</td></tr>
<tr><th>SERVLET:</th><td>org.glassfish.jersey.servlet.ServletContainer-5c448ef</td></tr>
</table>
<hr/><a href="https://jetty.org">Powered by Jetty:// 11.0.24</a><hr/>

</body>
</html>

The same thing happens if I paste the declaration to the API front-end and click Check to validate it: I'm forwarded to a page that reflects the above HTML.

In the tasker logs, I can see the validation error:

Caused by: org.yaml.snakeyaml.constructor.DuplicateKeyException: while constructing a mapping
 in 'string', line 1, column 1:
    label: HEFS_EnsPost Streamflow
    ^
found duplicate key probability_thresholds
 in 'string', line 50, column 1:
    probability_thresholds:

That message is embedded in an exception that is very long and looks like this (snipped):

Oct 11, 2024 11:57:12 AM org.glassfish.jersey.server.ServerRuntime$Responder process
WARNING: An exception mapping did not successfully produce and processed a response. Logging the exception propagated to the default exception mapper.
while constructing a mapping
 in 'string', line 1, column 1:
    label: HEFS_EnsPost Streamflow
    ^
found duplicate key probability_thresholds
 in 'string', line 50, column 1:
    probability_thresholds:
    ^

    at org.yaml.snakeyaml.constructor.SafeConstructor.processDuplicateKeys(SafeConstructor.java:126)
    at org.yaml.snakeyaml.constructor.SafeConstructor.flattenMapping(SafeConstructor.java:78)
    at org.yaml.snakeyaml.constructor.SafeConstructor.flattenMapping(SafeConstructor.java:73)
    at org.yaml.snakeyaml.constructor.SafeConstructor.constructMapping2ndStep(SafeConstructor.java:209)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructMapping(BaseConstructor.java:552)
    at org.yaml.snakeyaml.constructor.SafeConstructor$ConstructYamlMap.construct(SafeConstructor.java:597)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObjectNoCheck(BaseConstructor.java:264)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:247)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:201)
    at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:185)
    at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:493)
    at org.yaml.snakeyaml.Yaml.load(Yaml.java:422)
    at wres.config.yaml.DeclarationFactory.deserialize(DeclarationFactory.java:408)
    at wres.config.yaml.DeclarationValidator.validate(DeclarationValidator.java:183)
    at wres.config.yaml.DeclarationValidator.validate(DeclarationValidator.java:252)
    at wres.tasker.WresJob.postWresValidateHtml(WresJob.java:423

... SNIP ...

2024-10-11T11:57:12.774+0000  [qtp1191430552-34871] WARN HttpChannelState - unhandled due to prior sendError
jakarta.servlet.ServletException: while constructing a mapping
 in 'string', line 1, column 1:
    label: HEFS_EnsPost Streamflow
    ^
found duplicate key probability_thresholds
 in 'string', line 50, column 1:
    probability_thresholds:
    ^

    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:412)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:349)
... SNIP ...

Caused by: org.yaml.snakeyaml.constructor.DuplicateKeyException: while constructing a mapping
 in 'string', line 1, column 1:
    label: HEFS_EnsPost Streamflow
    ^
found duplicate key probability_thresholds
 in 'string', line 50, column 1:
    probability_thresholds:
    ^
...

I'll try to get a complete copy of the exception, but its pages long which makes it hard for me to grab through an ssh terminal. I'll have to download the log and open it up locally to grab the entire stack trace.

More information in a bit,

Hank

label: HEFS_EnsPost Streamflow
observed:
  label: OBS Streamflow
  sources: [omitted]/THNW1_QME.xml
  variable: QME
  feature_authority: nws lid
  type: observations
predicted:
  label: HEFS_newSeason Streamflow
  sources: [omitted]/THNW1_HEFS_EnsPost_SQINnewSeasonDefinition.tgz
  variable: SQIN
  type: ensemble forecasts
baseline:
  label: HEFS_oldSeason Streamflow
  sources: [omitted]/THNW1_HEFS_EnsPost_SQIN.tgz
  variable: SQIN
  type: ensemble forecasts
  separate_metrics: true
features:
  - {observed: THNW1, predicted: THNW1, baseline: THNW1}
unit: cms
lead_times:
  minimum: 18
  maximum: 720
  unit: hours
lead_time_pools:
  period: 24
  frequency: 24
  unit: hours
time_scale:
  function: mean
  period: 24
  unit: hours
pair_frequency:
  period: 24
  unit: hours
cross_pair: exact
probability_thresholds: [0.01,0.1,0.5,0.9,0.95,0.99,0.995,0.999]
minimum_sample_size: 30
season:
  minimum_day: 1
  minimum_month: 1
  maximum_day: 31
  maximum_month: 12
metrics:
  - continuous ranked probability score difference
sampling_uncertainty:
  sample_size: 1000
  quantiles: [0.05,0.95]
probability_thresholds:
  values: [0.01,0.1,0.5,0.9,0.95,0.99,0.995,0.999]
  apply_to: predicted
duration_format: hours
decimal_format: '#0.000'
output_formats:
  - csv2
  - format: png
    orientation: lead threshold
@HankHerr-NOAA HankHerr-NOAA added the bug Something isn't working label Oct 11, 2024
@HankHerr-NOAA HankHerr-NOAA added this to the v6.27 milestone Oct 11, 2024
@HankHerr-NOAA
Copy link
Contributor Author

A standalone is reporting an "uncaught exception" as well, so I don't think this is in the tasker:

2024-10-11T12:15:34.256+0000 89356 [main] ERROR Main - Encountered an uncaught exception in thread Thread[main,5,main].
org.yaml.snakeyaml.constructor.DuplicateKeyException: while constructing a mapping
 in 'string', line 1, column 1:
    label: HEFS_EnsPost Streamflow
    ^
found duplicate key probability_thresholds
 in 'string', line 50, column 1:
    probability_thresholds:
    ^

        at org.yaml.snakeyaml.constructor.SafeConstructor.processDuplicateKeys(SafeConstructor.java:126)
        at org.yaml.snakeyaml.constructor.SafeConstructor.flattenMapping(SafeConstructor.java:78)
        at org.yaml.snakeyaml.constructor.SafeConstructor.flattenMapping(SafeConstructor.java:73)
        at org.yaml.snakeyaml.constructor.SafeConstructor.constructMapping2ndStep(SafeConstructor.java:209)
        at org.yaml.snakeyaml.constructor.BaseConstructor.constructMapping(BaseConstructor.java:552)
        at org.yaml.snakeyaml.constructor.SafeConstructor$ConstructYamlMap.construct(SafeConstructor.java:597)
        at org.yaml.snakeyaml.constructor.BaseConstructor.constructObjectNoCheck(BaseConstructor.java:264)
        at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:247)
        at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:201)
        at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:185)
        at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:493)
        at org.yaml.snakeyaml.Yaml.load(Yaml.java:422)
        at wres.config.yaml.DeclarationFactory.deserialize(DeclarationFactory.java:411)
        at wres.config.yaml.DeclarationFactory.from(DeclarationFactory.java:314)
        at wres.config.yaml.DeclarationFactory.from(DeclarationFactory.java:255)
        at wres.config.MultiDeclarationFactory.from(MultiDeclarationFactory.java:92)
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:161)
        at wres.Functions.evaluate(Functions.java:135)
        at wres.Functions.evaluate(Functions.java:97)
        at wres.helpers.MainUtilities.call(MainUtilities.java:107)
        at wres.Main.completeExecution(Main.java:206)
        at wres.Main.main(Main.java:126)

Hank

@james-d-brown
Copy link
Collaborator

james-d-brown commented Oct 11, 2024

No, this is another one of those instances where the underlying parser (snakeyaml) can throw an exception that is either not advertised in the api or has not been properly handled on our end (core wres). We have had this several times before with various exception types and may need to resort to catching RuntimeException and rethrowing the relevant 4xx exception trigger (DeclarationException or whatever our core wres api advertises).

@epag
Copy link
Collaborator

epag commented Oct 15, 2024

This is what we currently have on the embedded server. Then when we attempt to get the results we catch the "unknown" exception in a 500 INTERNAL SERVER ERROR. Prior when this had come up we had said that we should have that checking on the embedded server and to simply add the unadvertised exceptions to the core as they come up and wrap them into UserInputException, but I am not opposed to adding a catch all for these exceptions to the formatting stuff so we don't have this known timebomb

catch ( UserInputException e )
{
String failureMessage = "I received something I could not parse. The top-level exception was";
LOGGER.info( failureMessage, e );
updateStatus( COMPLETED, id );
return Response.status( Response.Status.BAD_REQUEST )
.entity( failureMessage + e.getMessage() )
.build();
}
catch ( InternalWresException iwe )
{
String failureMessage = "WRES experienced an internal issue. The top-level exception was";
LOGGER.info( failureMessage, iwe );
updateStatus( COMPLETED, id );
return Response.status( Response.Status.INTERNAL_SERVER_ERROR )
.entity( failureMessage + iwe.getMessage() )
.build();
}
catch ( RuntimeException r )
{
// When we encounter an unexpected exception we want to set the status to closed before propagating the exception
updateStatus( COMPLETED, id );
throw r;
}

@james-d-brown
Copy link
Collaborator

We should address in the core app because the api advertises typed exceptions.

I wouldn't really describe as a time bomb, though. Nothing breaks. It just produces a wrongly typed exception and resulting http error code. Not brilliant, but not a big deal.

Will fix this and add a unit test later today after finishing some HEFS work.

@james-d-brown
Copy link
Collaborator

Incidentally, I think we're just mitigating a bug in a dependent api because these various exception types are not advertised, from what I recall, or aggregated, so it's a whac-a-mole as they arise. Still, better not to catch a generic RuntimeException when the consequences of an uncaught exception are quite small.

@epag
Copy link
Collaborator

epag commented Oct 15, 2024

Another question this potentially arises is:

Should we create a job id before validation? Currently since this validation exception isn't handled we are redirected to a generic 500 page. From what Hank shows the interal logging is correct, but since an job ID is never generated we can't actually propogate that information to the user in any sort of way.

Also if we were to generate the job id first we could properly log validation failures in the database. I don't know how much effort that would be, just thinking outloud

@james-d-brown
Copy link
Collaborator

james-d-brown commented Oct 15, 2024

Probably.

Afterall, any web service api call is a job in some sense, at least a web service job.

Not all jobs are evaluation jobs and not all web service jobs generate core app jobs (specifically, this validation pathway) but they are all jobs. This is reflected in the core app api, for example, where any call to validate will generate a core app job id and even a database record when using a database. Conversely, the web server pre-validation pathway is currently making a direct call to a helper in a core app module, rather than generating a core app job. This in itself is a bit yucky, but probably necessary for efficiency (I think we decided). On the other hand, a validation from the POV of the web service api should at least generate a web service job id for a user to track, if not a core app job id.

@james-d-brown
Copy link
Collaborator

I think we discussed adding some "validation workers" at some point, i.e., core app instances that are dedicated to validation tasks (or non-evaluation tasks). There is probably a connected ticket to which we need to close the loop. This would eliminate the slightly yucky special snowflake pathway for pre-validation via the tasker. I think the reason we decided not to send these validation jobs to a core app was to avoid hogging an evaluation worker with declaration validation tasks, although they are quick, so the above would probably be the best solution and I vaguely recall a ticket on our backlog.

@james-d-brown
Copy link
Collaborator

Although, tbh, it is debatable whether that extra complexity is worthwhile. We could just send all validation jobs via the same pathway as evaluation jobs, using evaluation workers. Again, they should complete very quickly indeed and users engaged in validation checks are not different users than those engaged in evaluations, so it isn't as though we have thousands of validation endpoint users competing with evaluation end point users.

@james-d-brown
Copy link
Collaborator

Anyway, this is total déjà vu and we should dig out the earlier discussion/ticket and close the loop.

@james-d-brown james-d-brown self-assigned this Oct 15, 2024
@james-d-brown
Copy link
Collaborator

Fixing the superficial issue with the core app.

@james-d-brown
Copy link
Collaborator

Going to create a separate issue to have the validation go through the service front door and/or discuss why this hasn't been done, as I cannot immediately find the earlier conversation.

@HankHerr-NOAA
Copy link
Contributor Author

We created a special pathway for validation, because we did not want to have validation waiting on a worker to be free. As a user, it would be annoying if I have to wait an hour for my evaluation to get processed just to see that it failed validation.

However, maybe I'm missing a key point made above, so let me review the comments more closely,

Hank

@HankHerr-NOAA
Copy link
Contributor Author

So I think I see the opposite case, more or less, mentioned above where we want to, "avoid hogging an evaluation worker with declaration validation tasks". However, that wasn't the primary concern if it was a concern at all. The primary concern was a COWRES that was busy handling 5 evaluations (which is rare, but has happened) and then someone posting another job (perhaps via the WRES GUI) and having to wait a while just to see that it failed validation. We want quick feedback on evaluations that don't pass validation.

At least, that's what I recall,

Hank

@james-d-brown
Copy link
Collaborator

Please add your comments to #339, but I think they are already addressed there.

james-d-brown added a commit that referenced this issue Oct 16, 2024
Catch and rethrow an exception with an advertised type, #336.
@epag
Copy link
Collaborator

epag commented Oct 21, 2024

image
This looks good now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants