Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map visualization using Vega #1629

Merged
merged 17 commits into from
Aug 29, 2024
Merged

Add map visualization using Vega #1629

merged 17 commits into from
Aug 29, 2024

Conversation

ar-jan
Copy link
Contributor

@ar-jan ar-jan commented Jul 13, 2024

Initial version of a Vega map. Some notes and questions:

I looked at using a separate file vega-spec.ts for holding the specification, but since we need component values like this.results it would have to involve more passing values back and forth via functions, so I didn't do that.

Reusability. It might make sense to make certain things configurable while keeping the rest of the spec the same, e.g. options to use a background map of land mass without borders as here, and another one with country borders. On the other hand there might be all kinds of configurable things per corpus, like background colors, symbol sizes, and desired map center and zoom level (could also be dynamic based on search results, but that is more involved and even then a corpus default config may be desirable). So it could make sense to allow providing a vega spec per corpus that uses a Vega visualization. Or maybe we postpone such issues till it becomes relevant.

I first looked at Luka's experiment in this branch, but found that using vega-embed makes various things like tooltips easier.

Background geodata. At some point I ran into issues with data not loading from CDN due to CORS headers, so I added a local asset from world-atlas. But actually it does work now from CDN:

@@ -161,7 +161,7 @@ export class MapComponent implements OnChanges {
             "data": [
                 {
                     "name": "world",
-                    "url": "assets/world-atlas/land-110m.json",
+                    "url": "https://unpkg.com/world-atlas@2/land-110m.json",
                     "format": {
                         "type": "topojson",
                         "mesh": "land",

Do we have a preference for local or cdn? The local file does come with a different ISC license. Alternatively we could generate our own topojson files from Natural Earth data, which is public domain.

Lastly, I'm not sure what the best approach for lifecycle is here. I think currently this.vegaMap is only guaranteed to exist at ngAfterViewInit, but is expected by functions used in ngOnChanges. This works in practice, but I suppose it isn't correct?

@ar-jan ar-jan requested a review from BeritJanssen July 15, 2024 09:37
@lukavdplas
Copy link
Contributor

Reusability. It might make sense to make certain things configurable while keeping the rest of the spec the same, e.g. options to use a background map of land mass without borders as here, and another one with country borders. On the other hand there might be all kinds of configurable things per corpus, like background colors, symbol sizes, and desired map center and zoom level (could also be dynamic based on search results, but that is more involved and even then a corpus default config may be desirable). So it could make sense to allow providing a vega spec per corpus that uses a Vega visualization. Or maybe we postpone such issues till it becomes relevant.

Of course, it would be nice if this implementation can readily be reused in future corpora where documents have associated coordinates, and it's really the design philosophy of I-analyzer that analysis modules are independent of corpora.

However, there is no such thing as a "normal" or "neutral" map representation, so it's hard to judge now what kind of map would make sense for future corpora, or what should be configurable. I recommend against building lots of options "just in case".

Currently, I-analyzer doesn't really have a system to add visualisation configurations per corpus. (Though some things are set implicitly based on elasticsearch mappings and such.) So even if you know what to configure, there is a question of where these options would even go.

For what it's worth, my intuition would be:

  • Whether or not political borders are relevant (and if so, which) is definitely going to depend on the corpus, but I think it's an acceptable choice to only offer land masses for now.
  • Background colours, symbol sizes, etc., are all aesthetic choices that should be made for the whole application to maintain a consistent look. We don't offer these kind of options for other visualisations either. (That is, not per corpus.)
  • It would be a shame if the centre or zoom level end up being hard coded, because that will really affect reusability. Setting this dynamically is probably preferable since, as mentioned, we don't actually have a configuration system for things like this. You might get better UX if this is based on the complete corpus rather than the queryset.

In the long term, a plugin system for visualisations would make it more acceptable to have a visualisation plugin tailored to a single corpus, or might include a system to add some configurations per corpus.

@ar-jan
Copy link
Contributor Author

ar-jan commented Jul 15, 2024

It would be a shame if the centre or zoom level end up being hard coded, because that will really affect reusability. Setting this dynamically is probably preferable since, as mentioned, we don't actually have a configuration system for things like this. You might get better UX if this is based on the complete corpus rather than the queryset.

Right, it's better for this to be dynamic. The most straightforward way to do that is probably via a Vega transform, but that will be based on the current results rather than the whole corpus. If we want to have it per corpus, we are probably back to needing a corpus configuration system for such options. Maybe a separate backend method that calculates the center or bounding box for the corpus is possible, but that seems wasteful?

@lukavdplas
Copy link
Contributor

If we want to have it per corpus, we are probably back to needing a corpus configuration system for such options. Maybe a separate backend method that calculates the center or bounding box for the corpus is possible, but that seems wasteful?

It's plausible that elasticsearch can give you a bounding box for the whole index super efficiently (i.e., that it optimises storage for that kind of question), so then there wouldn't really be any meaningful waste in making an aggregation request during runtime. That's just speculation, though.

@BeritJanssen
Copy link
Contributor

BeritJanssen commented Jul 17, 2024

Jep, Elasticsearch can do bounding box queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-bounding-box-query.html

EDIT: Only glanced over the PR before and realize now that this wasn't the question. At the same time, a bounding box query for dynamic centering is a good starting point. If it proves to inefficient in the long run, we might save the corpus' bounding box in the database, Elasticsearch index, or file.

Copy link
Contributor

@BeritJanssen BeritJanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It would be good to change the following points before merging though:
Screenshot 2024-07-24 at 14 41 21

  • change the colour of the markers to "I-Analyzer blue"
  • hide the color palette tool as this doesn't do anything

One thing I was wondering about is whether we don't want to investigate working with map tiles, e.g., Open Map Tiles Positron. The feature-less map is fine for now, but I remember that the researchers were interested in seeing names of cities, rivers, etc. on the map, which might help with visualizing political centers and/or travelling routes in the ancient world, too. This goes beyond the scope of this PR, and I also cannot estimate the complexity of integrating this in the map visualization with vega (though it seems that libraries to this purpose do exist).

NB: my screenshot shows data points in the water or in central Africa - this is probably due to my obsolete dataset (which had lat/long swapping problems)

A question from Tijmen: what happens with exactly overlapping data points? Are they jittered? Or could it happen that one datapoint exactly overlays another?

Copy link
Contributor

@BeritJanssen BeritJanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from the permission issue: better to be less restrictive here. I see that the MapView also has IsAuthenticated. In practice, we'll see little difference right now whether this permission is on or off, as the only corpus we'll have it available for is a authentication-only one. But with plans to open this corpus at a later stage, it's good to already change permissions now.

backend/visualization/views.py Outdated Show resolved Hide resolved
@ar-jan
Copy link
Contributor Author

ar-jan commented Aug 29, 2024

I seem to have angered the Lifecycle Hook Gods, I was getting occasional TypeError: this.mapCenter is undefined errors using ngOnInit. Does this change make sense? Some otherwise unneeded requests fetching the map center this way, but it seems more reliable.

@ar-jan ar-jan merged commit 4e5a6d7 into develop Aug 29, 2024
2 checks passed
@ar-jan ar-jan deleted the feature/map-vega branch August 29, 2024 13:58
@ar-jan ar-jan mentioned this pull request Sep 5, 2024
3 tasks
@ar-jan ar-jan mentioned this pull request Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants