Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Fix for missing physical plan in UI #3786

Merged
merged 85 commits into from
Apr 12, 2022
Merged

Fix for missing physical plan in UI #3786

merged 85 commits into from
Apr 12, 2022

Conversation

nicknezis
Copy link
Contributor

Fixes #3784 and Fixes #3785

@nicknezis
Copy link
Contributor Author

Nice catch. The tests seem to work for me, but there's definitely more to clean up as you mentioned. I guess I started making the code match this line:

var u = baseUrl + '/topologies/metrics/timeline?'
with the assumption we had broken something in the Tracker cleanup. But looking back at old versions of the code, this UI query may never have matched up with the tracker api. Very odd. I personally prefer the /metrics/query syntax, but might be easier to just roll back to /metricsquery style route. And instead I'll update the javascript to use the proper api call.

@surahman
Copy link
Member

surahman commented Mar 3, 2022

But looking back at old versions of the code, this UI query may never have matched up with the tracker api. Very odd.

My guess is technical debt.

I personally prefer the /metrics/query syntax, but might be easier to just roll back to /metricsquery style route.

I agree. Good REST API design necessitates versioning of the API as well as nesting under path routes. All metrics queries should be nested under the /metrics/ path.

Edit: I stand corrected, this is generating a query request.

var url = baseUrl + '/topologies/metricstimeline?';

@nicknezis nicknezis requested a review from surahman March 4, 2022 04:30
@nicknezis nicknezis marked this pull request as draft March 4, 2022 05:22
Copy link
Member

@surahman surahman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to pay down tech debt. I think there might be a non-functional Python docstring update required:

### <a name="metricsquery">Metrics Query Language</a>

@nicknezis
Copy link
Contributor Author

I'm running into some other issues with parameter mismatch betwen ui and tracker. So I'm doing a bit of testing. One example is topology vs topology_name. There is an alias here, but it didn't seem to work. I was testing with a failed Tracker API call and got different response when using either topology or topology_name. Perhaps the alias isn't working? @Code0x58 might have some insight in helping answer that question.

Another issue I'm seeing is an exception on an API call because role isn't present. But it's listed as Optional. The parameter is treated differently between the two set of routes. Example 1 and Example 2.

@surahman
Copy link
Member

surahman commented Mar 4, 2022

I have always had issues bringing up the UI so I can only look at the code.

One example is topology vs topology_name. There is an alias here, but it didn't seem to work.

The naming convention for topologies has changed on K8s and I am not sure if this would affect the UI. Labels have the topology's name and the StatefulSet name has either -manager or -executors appended to it. I am not sure how the tracker manages the scraping and collecting of stats.

Another issue I'm seeing is an exception on an API call because role isn't present. But it's listed as Optional. The parameter is treated differently between the two set of routes. Example 1 and Example 2.

According to the Optional docs, a value of None will be returned as required. How is a role value of None handled? Exception? What changed?

@nicknezis
Copy link
Contributor Author

So for testing heron-tracker and heron-ui, I realized doing the local install is much easier. You can try this:

  1. bazel run --config=darwin_nostyle -- scripts/packages:heron-install.sh --user
  2. Add ~/.heron/bin/ to your $PATH
  3. Submit analytic:
heron submit local ~/.heron/examples/heron-streamlet-examples.jar \  org.apache.heron.examples.streamlet.WindowedWordCountTopology \
WindowedWordCountTopology \
--deploy-deactivated
  1. Run the tracker: heron-tracker --type file --rootpath ~/.herondata/repository/state/local/
  2. Run the UI: heron-ui

@Code0x58
Copy link
Contributor

Code0x58 commented Mar 6, 2022

I'm running into some other issues with parameter mismatch betwen ui and tracker. So I'm doing a bit of testing. One example is topology vs topology_name. There is an alias here, but it didn't seem to work. I was testing with a failed Tracker API call and got different response when using either topology or topology_name. Perhaps the alias isn't working? @Code0x58 might have some insight in helping answer that question.

It looks like alias may be a bit of a misnomer, it may be better to call it source, if this reference covers everything

@nicknezis
Copy link
Contributor Author

I've been banging my head against this code. Things are definitely broken. I have more drastic changes, but I haven't pushed them yet because I'm still testing. The exceptions page is still broken. Somewhere we lost the combo of component and instances query parameters.

@surahman
Copy link
Member

surahman commented Mar 7, 2022

I am time-constrained for the next few weeks and I still cannot bring up the UI. I typically find it faster to jump in with a debugger and find the issue causing the problem and address it there. Failing that it is time to find what caused the issue.

We need to isolate the commit in which this happened. I would start by checking out the last tagged release to see if the UI is working there. If it is, run a Git bisect between that commit and master branch's HEAD to find the problematic commit. You should have a better idea of what caused the issue. There were major changes to the backend of the UI in the PR that bumped Python to 3.8/3.9, so that is a potential candidate.

@nicknezis nicknezis marked this pull request as ready for review April 7, 2022 22:38
@nicknezis nicknezis requested a review from surahman April 7, 2022 22:42
Copy link
Member

@surahman surahman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible work!

I have not had the time to deployment test this PR and it would be nice if someone else could assist with that. There are some old review comments that still need to be addressed a handful of new ones. Overall it is pretty much in a mergeable state.

heron/statemgrs/src/python/statemanager.py Outdated Show resolved Hide resolved
heron/tools/tracker/src/python/main.py Outdated Show resolved Hide resolved
heron/tools/tracker/src/python/metricstimeline.py Outdated Show resolved Hide resolved
heron/tools/ui/src/python/main.py Outdated Show resolved Hide resolved
heronpy/connectors/pulsar/pulsarstreamlet.py Outdated Show resolved Hide resolved
@surahman
Copy link
Member

surahman commented Apr 8, 2022

I just removed a reference to update the global Log that I added.

@nicknezis nicknezis marked this pull request as draft April 8, 2022 18:09
@nicknezis
Copy link
Contributor Author

I can't figure out why the timeline is not working. I've tracked my issue to this line of code. Within the compute_max function, I can print the final dict that is returned. It has values. But when I print data, it only has an empty dict.

@surahman
Copy link
Member

surahman commented Apr 9, 2022

I am not sure but I do not think there is a need to convert the zipped together values into a list and then a dict. But then I also do not believe this would affect the return value.

return dict(list(zip(keys, values)))

Maybe try this:

return dict(zip(keys, values))

@nicknezis
Copy link
Contributor Author

I am not sure but I do not think there is a need to convert the zipped together values into a list and then a dict. But then I also do not believe this would affect the return value.

return dict(list(zip(keys, values)))

Maybe try this:

return dict(zip(keys, values))

I think zip() returns an Iterator and the list() call converts to an actual List. I was not familiar with the zip method, but it seems to be a standard pattern. When I print the value returned from that call I can see the value. It's only when that same result is returned from the compute_max that I get an empty {} in the data variable.

@surahman
Copy link
Member

surahman commented Apr 9, 2022

I think zip() returns an Iterator and the list() call converts to an actual List.

I believe dict might have a constructor to handle zip return values as seen here.

@nicknezis nicknezis marked this pull request as ready for review April 10, 2022 16:52
@Code0x58
Copy link
Contributor

I can't figure out why the timeline is not working. I've tracked my issue to this line of code. Within the compute_max function, I can print the final dict that is returned. It has values. But when I print data, it only has an empty dict.

Did you do that print within the len(filtered_ts) > 0 and len(filtered_ts[0]["timeline"]) > 0: branch of compute_max(...)? I'm wondering if the {} you are seeing is from the fall-though branch.

Beware that timelines and values in compute_max(...) are generators, so will be exhausted if you do something with them before they are used in the return dict(list(zip(keys, values))), so if you did print(dict(list(zip(keys, values)))) it would consume values and lead to zip of the keys list and an empty generator which results in an empty zip (python 3.10 adds strict=True to help detect these) and would so make the return dict(...) line return an empty dict.

p.s.

I am not sure but I do not think there is a need to convert the zipped together values into a list and then a dict. But then I also do not believe this would affect the return value.

return dict(list(zip(keys, values)))

Maybe try this:

return dict(zip(keys, values))

both are equivalent and will be handled the same way due to python's duck typing, as both a generator (returned by zip, in type annotations it is compatible with typing.Iterator but is more strictly a typing.Generator) and list will be treated as iterables. The use of list(generator) can be faster for some cases, but constructs an object which will be discarded so can be a waste of memory.

@surahman
Copy link
Member

Thank you for the detailed explanation!

@nicknezis
Copy link
Contributor Author

nicknezis commented Apr 11, 2022

I think this PR is finally done. I'm now able to see the timeline in the UI and most of the functionality is working (as good as it probably was working in the past). There are some super deep menus I think I'd like to clean up in the future, but this is good to go. Thank you for the help @surahman @Code0x58 and @thinker0 !

Edit: I'll add. This PR fixes the following functionality which was temporarily broken with the upgrade to Python 3. Most of the issues were related to the Heron UI and Heron Tracker APIs. Some of the issues were due to the updated code enforcing type hints in and out of methods. I've manually verified the below items.

  1. Front page status circles
  2. Front page timelines
  3. Exception browser in the UI
  4. Topology Counters
  5. Container File browser
  6. File stats
  7. Log viewer

@thinker0
Copy link
Member

thinker0 commented Apr 11, 2022

+1

My Env Test Success

@thinker0
Copy link
Member

#3814 typo

Copy link
Member

@surahman surahman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐ ⭐ ⭐ ⭐ ⭐ 👏

@nicknezis nicknezis merged commit 700125f into master Apr 12, 2022
@nicknezis nicknezis deleted the nicknezis/tracker-fixes branch April 12, 2022 00:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Heron UI does not render timeline metrics Heron UI fails to show the Physical Plan layout
4 participants