Difference between revisions of "ZIM file format"

Jump to navigation Jump to search
Enhance explanations around URLs encoding in ZIM / HTML document
(Better Major and Minor description)
(Enhance explanations around URLs encoding in ZIM / HTML document)
Line 298: Line 298:
== URLs ==
== URLs ==


=== URL Encoding ===
=== URL Encoding in the ZIM ===
The URLs in the UrlPointerlist are utf-8 and are not url encoded (https://www.ietf.org/rfc/rfc1738.txt)
The URLs in the UrlPointerlist are encoded in utf-8 and are '''not''' url encoded.


Some readers process the requests that already do the url decoding internally whereas most readers will handle the URLs directly. In this case you have to do the decoding before you pass the parameter to libzim.
For instance, if you store in the ZIM an HTML document with a href pointing to `characters%20%C3%A9ncoding.html`, you have to store the corresponding ZIM entry at `characters éncoding.html` URL.
 
Or if you want to store a ZIM entry at `index.html?param=value`, the HTML document pointing to it will have to use the `index.html%3Fparam%3Dvalue` href.
 
The reason behind it is that libzim is agnostic of which kind of content and which kind of readers will be used. Everything around URL encoding is purely linked to HTTP / HTML / Web standards.
 
When serving web content (which is usually the case), some readers process the requests and already do the url decoding internally, whereas most readers will handle the URLs directly.
 
The same applies to querystring which might be absorbed by some webservers and not passed to the libzim.
 
In any case, the reader will have to do the HTTP URL decoding before passing the parameter to libzim.


=== Local Anchors ===
=== Local Anchors ===
Many articles - especially when a table of contents is used - use local anchors to jump within an article.   
Many HTML href - especially when a table of contents is used - use local anchors to jump within a document.   


<pre>
<pre>
Line 310: Line 320:
</pre>
</pre>


The browser handles these local anchors by itself. It will determine if another article has to be loaded (local anchor inside another article than the currently shown) and will send a request only with the article URL without the local anchor - in our example "foo". After the article has been loaded the browser will then search for the local anchor tag and jump to the right location.
When a web browser is used a reader, it handles these local anchors locally client-side. This is never sent to the webserver, and even less to libzim. The browser will determine by itself if another ZIM entry has to be loaded (local anchor inside another document than the currently shown) and will send a request only with the document URL without the local anchor - in our example "foo". After the document has been loaded the browser will then search for the local anchor tag and jump to the right location.


If you use a common rendering engine or HTML widget you don't have to care for this cases, you can just use the requests as they are submitted by the engine / widget.
If you use a common rendering engine or HTML widget you don't have to care for this cases, you can just use the requests as they are submitted by the engine / widget.


Should you render the article contents by yourself you have to consider this and take care of it before you hand requests to libzim.
Should you render the article contents by yourself you have to consider this and take care of it before you hand-out requests to libzim.


== Encodings ==
== Encodings ==
9

edits

Navigation menu