Skip to content

Conversation

@lucaswoj
Copy link
Contributor

@lucaswoj lucaswoj commented Dec 4, 2025

Summary

Part of #4364

This PR carves out an exception from the optimization added in #6668. When GeoJSON data is loaded from a URL, we now send it back to the main thread so that it can be:

When GeoJSON is provided inline or via diff updates, the main thread already has the data, so we skip sending it back (preserving the #6668 optimization).

This will make GeoJSONSource#updateData much faster when the GeoJSON data was loaded from a URL.

Benchmarks

$ npm run benchmark -- GeoJSONSourceUpdateData && npm run benchmark -- GeoJSONSourceSetData 

> [email protected] benchmark
> node --no-warnings --loader ts-node/esm test/bench/run-benchmarks.ts GeoJSONSourceUpdateData

Starting headless chrome at: http://localhost:9966/test/bench/versions/index.html
                                      v5.14.0                  main  set-data-url 85c1f56 
 GeoJSONSourceUpdateData                                92.3814 ms            92.6344 ms 

> [email protected] benchmark
> node --no-warnings --loader ts-node/esm test/bench/run-benchmarks.ts GeoJSONSourceSetData

Starting headless chrome at: http://localhost:9966/test/bench/versions/index.html
                                   v5.14.0                  main  set-data-url 85c1f56 
 GeoJSONSourceSetData                               279.7234 ms           284.3850 ms 

Launch Checklist

  • Confirm your changes do not include backports from Mapbox projects (unless with compliant license)
  • Briefly describe the changes in this PR.
  • Link to related issues.
  • Include before/after visuals or gifs if this PR includes visual changes.
  • Write tests for all new functionality.
  • Document any changes to public APIs.
  • Post benchmark scores.
  • Add an entry to CHANGELOG.md under the ## main section.

@lucaswoj lucaswoj changed the title Send GeoJSON data from worker to main thread IF loaded from a URL Send GeoJSON data from worker to main if loaded from a URL Dec 4, 2025
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.42%. Comparing base (f681bbb) to head (6af1a9b).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6811   +/-   ##
=======================================
  Coverage   92.42%   92.42%           
=======================================
  Files         288      288           
  Lines       23807    23809    +2     
  Branches     5056     5058    +2     
=======================================
+ Hits        22003    22005    +2     
  Misses       1804     1804           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@HarelM
Copy link
Collaborator

HarelM commented Dec 5, 2025

How about fetching it in the main thread instead of serializing it from the worker?
I think the browser will cache the response so the worker won't really fetch it again.
This is similar to what we did in the rtl plugin fetching.

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Dec 5, 2025

There's no way to guarantee the browser will cache the response. If the response has a Cache-Control: no-store header, for example, it will certainly be fetched twice.

@HarelM
Copy link
Collaborator

HarelM commented Dec 5, 2025

Does it make sense now to return the data in getData method not from the worker but from the main thread class?

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Dec 5, 2025

Does it make sense now to return the data in getData method not from the worker but from the main thread class?

As we discussed in #6738, there are still cases where GeoJSONSource#getData returns a different value than GeoJSONSource#serialize

... if you called setData with some improperly wound GeoJSON geometries, and then called Map#getStyle, you'd get the rewound geometries back. Even though this is undocumented behavior, I suspect at least a couple folks are relying on it. They will need to use the GeoJSONSource#getData method instead.

I think it's worth having this method around for backwards compatibility.

(Sidenote: If and when we're ready to make some breaking changes, I'd love to see GeoJSONSource#serialize and Map#getStyle become async and pull this giant dataset from the worker only when it's requested)

@lucaswoj lucaswoj requested a review from HarelM December 5, 2025 07:44
@HarelM
Copy link
Collaborator

HarelM commented Dec 5, 2025

If you have ideas for breaking changes please add them to the following issue:

@HarelM
Copy link
Collaborator

HarelM commented Dec 5, 2025

I'm not sure I'm following the last comment. If we now have the data on the main thread, as the user set it, what's the point of going to the worker to get it?
It can be out of scope for this PR but I don't understand why not do it.

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Dec 5, 2025

I'll try to explain a different way: GeoJSONSource#getData doesn't return the data exactly "as the user set it." It returns data that has undergone some additional cleanup (namely, rewinding geometries via the geojson-rewind package and possibly filtering via the GeoJSONSourceOptions#filter option). Our documentation doesn't explicitly say that GeoJSONSource#getData returns rewound geometries but it's possible that some users really depend on that. I see no reason to break GeoJSONSource#getData for them.

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Dec 5, 2025

Oh! I might've explained the wrong part:

If we now have the data on the main thread, as the user set it, what's the point of going to the worker to get it?

The only reason we must keep the full GeoJSON data on the main thread is to support the syncronous GeoJSONSource#serialize method. If that method were async, it's possible we could entirely get rid of the need to keep it in memory on the main thread. In some cases, this could save hundreds of megabytes of memory usage.

@HarelM
Copy link
Collaborator

HarelM commented Dec 5, 2025

The only reason we must keep the full GeoJSON data on the main thread is to support the syncronous GeoJSONSource#serialize method

I thinks this is no longer true given the recent changes with reload tile and the possible solution for undefined properties, or am I understanding this wrong?

@HarelM
Copy link
Collaborator

HarelM commented Dec 5, 2025

In any case, we can always change that in a future PR, this code can be merged regardless of this discussion.

@HarelM HarelM merged commit d103e4f into maplibre:main Dec 5, 2025
26 checks passed
@lucaswoj
Copy link
Contributor Author

lucaswoj commented Dec 5, 2025

I thinks this is no longer true given the recent changes with reload tile and the possible solution for undefined properties, or am I understanding this wrong?

It could easily be true if we merged #6801 instead of #6800; and chose one of the other possible solutions for undefined properties in #6803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants