Difference between revisions of "Zimit"

Jump to navigation Jump to search
1,835 bytes added ,  12:31, 24 May 2023
 
(2 intermediate revisions by 2 users not shown)
Line 125: Line 125:
* jsonp need access to the "callback" querystring value of the request.
* jsonp need access to the "callback" querystring value of the request.


We could do the static rewriting by setting placeholder (<code>${RW_SERVER_HOST}</code>, <code>${RW_URL}</code>, ...) for things that needs to be rewritten dynamically.


We could do the static rewriting by setting placeholder (<code>${RW_SERVER_HOST}</code>, <code>${RW_URL}</code>, ...) for things that needs to be rewritten dynamically.
Wombat initialization would be inserted in html page at this step. Wombat itself will be used exactly the same way we use it now (catching url changes/requests coming from js and rewrite it to "local" url)


=== At reading ===
=== At reading ===
Line 161: Line 162:
* The fuzzy rules (how to generate fuzzy url from the data driven fuzzy_rules). https://github.com/webrecorder/wabac.js/blob/main/src/fuzzymatcher.js seems to be a good "specification"
* The fuzzy rules (how to generate fuzzy url from the data driven fuzzy_rules). https://github.com/webrecorder/wabac.js/blob/main/src/fuzzymatcher.js seems to be a good "specification"


== Notes: ==
== Notes/Questions: ==


* Revisit and redirect are different: redirect make kiwix-serve return a 302 to the target. revisit make kiwix-serve answer a 2xx with the content of the target revisit.
* Revisit and redirect are different: redirect make kiwix-serve return a 302 to the target. revisit make kiwix-serve answer a 2xx with the content of the target revisit.
* We may anyway store <code>H</code> revisit as redirect entry in the zim file.
* We may anyway store <code>H</code> revisit as redirect entry in the zim file.
*Restrict <code>H/url</code> lookup for entries with a specific mimetype (there's no standard, we can set an <code>X-HTTP-Headers</code>)
*Maybe keep a switch (using a ''private'' tag ?) to toggle content rewriting as there is no reason to run that on ZIMs that don't need it.
*I can't think of any use (but debug) to expose the fuzzy rules. Not having them in C would be another reason to allow pylibzim to access X NS. Right now only way is via ID and is sort of a hack.
*Are we keeping the ''modifier'' prefix ? You mention it at creation time but don't afterwards. I understand it's '''mostly''' Content-Type based and used to toggle rewriting. I understand what's written above as: we'll conditionally rewrite some stuff but use the Content-Type instead. Correct?
*What will our entry paths look like? Full URL? <code>/<nowiki>https://developer.mozilla.org/en-US/</nowiki></code> ? Current warc2zim stores a canonicalized version without scheme on ZIM but the content and SW uses full URLs.
*We'll need to reconstruct the URL by concatenating any query parameter sent to reader/kiwix-serve. We should be aware that this could be challenging on some websites as a website could generate both <code>/home?article_id=32&lang=fr</code> and <code>/home?lang=fr&article_id=32</code> because in a normal dynamic server context this is the same but in our static one it's not. The SW probably took care of that ; we should look into how it was implemented.
*We won't have any chrome nor iframe anymore. MainPage would be the start URL.


== Questions ==
== Questions ==
Line 189: Line 197:
* What are “prefix queries”? “prefix search”?
* What are “prefix queries”? “prefix search”?
* How does the replayer cache system works? What's its main purpose? Can it be turned off?
* How does the replayer cache system works? What's its main purpose? Can it be turned off?
*What's the difference between a ''page'' as (in pages.jsonl) and a `text/html` entry?
*What's the difference between a ''page'' as (in pages.jsonl) and a `text/html` entry? Status Code only?
* Is there a WARC testing suite with various use and corner cases ?
10

edits

Navigation menu