Skip to content
This repository has been archived by the owner on Sep 17, 2020. It is now read-only.

"Indexing has stalled" #78

Open
nar001 opened this issue May 3, 2019 · 4 comments
Open

"Indexing has stalled" #78

nar001 opened this issue May 3, 2019 · 4 comments

Comments

@nar001
Copy link

nar001 commented May 3, 2019

So I'm trying to load a WARC file, it goes to 100% and then tells me it stalled. Navigating to the interface says "Almost Done!" but never goes farther. A long time ago, I used to be able to browse it, but now it doesn't work and I'm not quite sure why. Thanks!

@nienkedekker
Copy link

nienkedekker commented Oct 11, 2019

I'm running into the same issue on the latest release on MacOS. It'll index and then stall, eating up CPU like crazy. This is the additional information available:

Screenshot 2019-10-11 22 47 21

http://localhost:54292 just lists a JSON file:

{"/live": {"modes": ["list_sources", "index", "resource"]}, "/live/postreq": {"modes": ["list_sources", "index", "resource"]}, "/extract": {"modes": ["list_sources", "index", "resource"]}, "/extract/postreq": {"modes": ["list_sources", "index", "resource"]}, "/replay": {"modes": ["list_sources", "index", "resource"]}, "/replay/postreq": {"modes": ["list_sources", "index", "resource"]}, "/replay-coll": {"modes": ["list_sources", "index", "resource"]}, "/replay-coll/postreq": {"modes": ["list_sources", "index", "resource"]}, "/patch": {"modes": ["list_sources", "index", "resource"]}, "/patch/postreq": {"modes": ["list_sources", "index", "resource"]}}

The size of WARC file I'm trying to open is 5,01 GB. Please let me know if you need any additional information :)

ikreymer added a commit to Rhizome-Conifer/conifer that referenced this issue Nov 9, 2019
- limit pages and bookmarks to 10000
- add settings to limit bookmarks and pages separately
- include page and bookmark creation in progress bar
before, page/bookmark creation was taking a long time but not included in progress update
should fix #768, likely webrecorder/webrecorder-player#87, webrecorder/webrecorder-player#78, webrecorder/webrecorder-player#86
ikreymer added a commit to Rhizome-Conifer/conifer that referenced this issue Nov 9, 2019
* upload improvements:
- limit pages and bookmarks to 10000
- add settings to limit bookmarks and pages separately
- include page and bookmark creation in progress bar, last 80-90% for page indexing, and 90-100% for bookmark creation.
- optimize: use zscan_iter() for iterating over pages, add polyfill for fakeredis to still use zrange
- fix tests
- bump version to 4.8.4

previously, page/bookmark creation was taking a long time but not included in progress update
should fix #768, likely webrecorder/webrecorder-player#87, webrecorder/webrecorder-player#78, webrecorder/webrecorder-player#86
@ikreymer
Copy link
Member

Please try the 1.8.0 release. We've made some improvements to large WARC indexing and should work much better.

@alvar-freude
Copy link

I have the same issue with (small) HAR files and the 1.8.0 release (MacOS).

The progress bar is at 100%.

Extra Debug Info is:

Created user local with the email test@localhost and the role: 'public-archivist'
ERROR PARSING: /path/to/file.har
'pages'
WARCSERVER_HOST=http://localhost:52971

skip {'name': 'Admin', 'description': 'Admin API'}
skip {'name': 'Stats', 'description': 'Stats API'}
skip {'name': 'Automation', 'description': 'Automation API'}
APP_HOST=http://localhost:52972

The page on http://localhost:52972 shows the message "Almost Done!" and a progress bar on 100%. It seems, that everything is finished but something else fails …

@alvar-freude
Copy link

alvar-freude commented Apr 20, 2020

I made some more tests. With HAR files from Safari developer toolbar there seems to be no problem. A simple website (like "hello world" without any other files) is OK and this Github Page here is also OK.

But a HAR file saved with the firefox developer toolbar has the problem described above. Even the real simple HAR fails.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants