Difference between revisions of "Zimit"

Jump to navigation Jump to search
1,543 bytes added ,  12:31, 24 May 2023
Tags: Mobile edit Mobile web edit
 
Line 162: Line 162:
* The fuzzy rules (how to generate fuzzy url from the data driven fuzzy_rules). https://github.com/webrecorder/wabac.js/blob/main/src/fuzzymatcher.js seems to be a good "specification"
* The fuzzy rules (how to generate fuzzy url from the data driven fuzzy_rules). https://github.com/webrecorder/wabac.js/blob/main/src/fuzzymatcher.js seems to be a good "specification"


== Notes: ==
== Notes/Questions: ==


* Revisit and redirect are different: redirect make kiwix-serve return a 302 to the target. revisit make kiwix-serve answer a 2xx with the content of the target revisit.
* Revisit and redirect are different: redirect make kiwix-serve return a 302 to the target. revisit make kiwix-serve answer a 2xx with the content of the target revisit.
* We may anyway store <code>H</code> revisit as redirect entry in the zim file.
* We may anyway store <code>H</code> revisit as redirect entry in the zim file.
*Restrict <code>H/url</code> lookup for entries with a specific mimetype (there's no standard, we can set an <code>X-HTTP-Headers</code>)
*Maybe keep a switch (using a ''private'' tag ?) to toggle content rewriting as there is no reason to run that on ZIMs that don't need it.
*I can't think of any use (but debug) to expose the fuzzy rules. Not having them in C would be another reason to allow pylibzim to access X NS. Right now only way is via ID and is sort of a hack.
*Are we keeping the ''modifier'' prefix ? You mention it at creation time but don't afterwards. I understand it's '''mostly''' Content-Type based and used to toggle rewriting. I understand what's written above as: we'll conditionally rewrite some stuff but use the Content-Type instead. Correct?
*What will our entry paths look like? Full URL? <code>/<nowiki>https://developer.mozilla.org/en-US/</nowiki></code> ? Current warc2zim stores a canonicalized version without scheme on ZIM but the content and SW uses full URLs.
*We'll need to reconstruct the URL by concatenating any query parameter sent to reader/kiwix-serve. We should be aware that this could be challenging on some websites as a website could generate both <code>/home?article_id=32&lang=fr</code> and <code>/home?lang=fr&article_id=32</code> because in a normal dynamic server context this is the same but in our static one it's not. The SW probably took care of that ; we should look into how it was implemented.
*We won't have any chrome nor iframe anymore. MainPage would be the start URL.


== Questions ==
== Questions ==
10

edits

Navigation menu