Difference between revisions of "Zimit"

Jump to navigation Jump to search
855 bytes added ,  16:19, 23 May 2023
no edit summary
Line 102: Line 102:


JS rewriter: rewrite few links but mostly wrap the code in a "wombat context".
JS rewriter: rewrite few links but mostly wrap the code in a "wombat context".
HTML rewrite: rewrite html and use CSS/JS rewriter as subrewriter for <code><style></code>/<code><script></code> tags


JSONP rewriter: May rewrite the content base on the request's querystring (!!!!!)
JSONP rewriter: May rewrite the content base on the request's querystring (!!!!!)
Line 119: Line 121:


* <code><server_host></code>: depends of the production environement (host name, root prefix)
* <code><server_host></code>: depends of the production environement (host name, root prefix)
* <code><collection></code>: depends of the zim filename (we may change to base ourself on zimid ?)
* <code><collection></code>: depends of the zim filename (we may change to base ourselves on zimid ?)
* <code><requested_url></code>: In case of "revisit", pywb and wabac return the content of another record.  It rewrite the content based on "the requested url or the record url ?".  The same way, in case of fuzzymatching, request url is diferent than record url.
* <code><requested_url></code>: In case of "revisit", pywb and wabac return the content of another record.  It rewrite the content based on "the requested url or the record url ?".  The same way, in case of fuzzymatching, request url is different than record url.
* jsonp need access to the "callback" querystring value of the request.
* jsonp need access to the "callback" querystring value of the request.


Line 149: Line 151:
This workflow should be compatible with existing zim files (no <code>H</code> nor <code>W/fuzzy_rules</code>).
This workflow should be compatible with existing zim files (no <code>H</code> nor <code>W/fuzzy_rules</code>).


Searching by <code>C/url</code> first allow to avoid putting a <code>H/url</code> for the commmon case, even for warc2zim files.
Searching by <code>C/url</code> first allow to avoid putting a <code>H/url</code> for the common case, even for warc2zim files.


This allow potential fuzzy matching for other zim files (specific scrapper)
This allow potential fuzzy matching for other zim files (specific scrapper)
Line 157: Line 159:
* The possible placeholders (<code>${RW_SERVER_HOST}</code>, ...) and their value
* The possible placeholders (<code>${RW_SERVER_HOST}</code>, ...) and their value
* The header <code>H/url</code> format (just a subset of header to apply ?)
* The header <code>H/url</code> format (just a subset of header to apply ?)
* The fuzzy rules (how to generate fuzzy url from the data driven fuzzy_rules). https://github.com/webrecorder/wabac.js/blob/main/src/fuzzymatcher.js seems to be a good "specification".
* The fuzzy rules (how to generate fuzzy url from the data driven fuzzy_rules). https://github.com/webrecorder/wabac.js/blob/main/src/fuzzymatcher.js seems to be a good "specification"
 
== Notes: ==
 
* Revisit and redirect are different: redirect make kiwix-serve return a 302 to the target. revisit make kiwix-serve answer a 2xx with the content of the target revisit.
* We may anyway store <code>H</code> revisit as redirect entry in the zim file.


== Questions ==
== Questions ==
Line 165: Line 172:
* I URL rewriting really data-driven? Same question for Fuzzy-matching?
* I URL rewriting really data-driven? Same question for Fuzzy-matching?
* Can we easily use Wombat without the rest of Wabac?
* Can we easily use Wombat without the rest of Wabac?
=== Matthieu ===
* What are the information needed to rewrite html/css/js content ? At which point it is linked to the current request ?  I have identified <code>callback</code> querystring. Other ?
* Do we rewrite content using the url of the record or the requested url ?
* pwd can work with framed or frameless (https://pywb.readthedocs.io/en/latest/manual/configuring.html#framed-vs-frameless-replay). We are using a framed system with SW. Why ? Is it necessary ?
31

edits

Navigation menu