Difference between revisions of "Content team"

From openZIM
Jump to navigation Jump to search
 
(7 intermediate revisions by 2 users not shown)
Line 7: Line 7:
* Book curation must remain focused on educational material, broadly construed;
* Book curation must remain focused on educational material, broadly construed;
* Books should have proper visual formatting;
* Books should have proper visual formatting;
* Books should be up-to-date;
* Books should be up-to-date like custom apps;
* The Kiwix Library should allow easy and friendly discovery of content.
* The Kiwix Library should allow easy and friendly discovery of content.


Line 41: Line 41:
** Educational content OR
** Educational content OR
** has an authorization of reproduction
** has an authorization of reproduction
* Any content we publish should
** have (almost) no user visible error
** have proper/correct metadata
** be easily discoverable in the public library


=== Content Requests ===
=== Content Requests ===
Line 49: Line 53:
** when fully implemented (user visible)
** when fully implemented (user visible)
** if refusal or impossibility of implementation
** if refusal or impossibility of implementation
* ZIM Medata should be given for new content
* Only once all prerequisites are satisfied, then start with scraping


=== Scraping ===
=== Scraping ===
* Scraping leadership means the initiative should come from the content team
* First analysis of error should be done by content team
* If error in scraper is suspected
** Issue should be updated to corresponding scraper code repository
** Scraper problem analysis does not super-seed in any manner content request
* ZIM quality should be vetted against publishing policy
* Any recipe should run successfully first in dev before been put in production
* Hardware resources should be saved


=== Library Management ===
=== Library Management ===
=== Custom Apps ===


== Processes ==
== Processes ==
Line 61: Line 77:


=== Library Management ===
=== Library Management ===
=== Custom Apps ===


== Worflows ==
== Worflows ==


## To create a new recipe for youtube files
**It’s recommended to clone an existing Youtube recipe.**
* Create the recipe name as per the naming conventions [here](https://github.com/openzim/overview/wiki/Naming-Convention).
* In the Language space, choose the language of the website you are creating the recipe for.
* From Category space, choose (other)
* From warehouse path space, choose (/.hidden/.dev) always as a first time in order to test the resulted file, if the file is tested and all is correct then you update the recipe with the proper path (videos).
* Make sure the Status is set to Enabled.
* You can choose Periodicity to be monthly or quarterly.
* In Offliner space choose: Youtube
* In platform space choose Youtube.
* Keep the rest the same with no change.
**In Youtube command flags:**
* In Playlist mode: choose (Not Set) if you are doing the recipe for a whole channel.
* If you are doing the recipe for a playlist, choose (Set).
* In Type: choose (Channel) or (Playlist) as per your required file.
* In Youtube ID: type the ID of the channel or the playlist.
* For the API Key: There is a list of keys mostly as per the channel or the playlists sizes, ask for the list to choose the appropriate API Key.
* In Zim Name: the recipe name as per the naming conventions [here](https://github.com/openzim/overview/wiki/Naming-Convention).
* In Title: type the name you want for the output file.
* Description: type a short description of your required zim file.
* Leave Optimisation Cache URL as it is (cloned from old recipe).
* Leave the rest of the fields empty or as per the cloned recipe.
* Finally, click in the bottom on (Update offliner details).
* Review all your entries once again, then go back to the top of the page and click on (Request).
* After about an hour, check the recipe if it failed or succeeded (or the next day if the source website is large).
* If successful, go to this link ([dev.library.kiwix.org](https://dev.library.kiwix.org/)) and check your created file, check the size and check if the file is working properly. If the file does not appear, wait a bit as updates are made every 15 minutes.
* If the file looks good and complete, go back to your recipe, In warehouse path space, change(/.hidden/.dev) to the proper category related to your file content (Wikipedia, Wikihow, … etc).
* Click on Update offliner details and then click on Request again.
* Finally, check the file in (https://library.kiwix.org/ ). If all is good, do not forget to go back to the initial ticket  (most likely at zim-requests) and put the link of the output file and close the ticket.


== Members ==
== Members ==

Latest revision as of 12:04, 9 February 2024

The Content team gathers people in charge of providing books in the ZIM format ("books" being understood here as web content stored as single web archives).

Purpose

Provide web-based educational content to people without internet access, and make the experience as seamless as possible. Access and discovery must be user-friendly and market ready, the content up-to-date and as portable as can technically be.

Goals

  • Book curation must remain focused on educational material, broadly construed;
  • Books should have proper visual formatting;
  • Books should be up-to-date like custom apps;
  • The Kiwix Library should allow easy and friendly discovery of content.

Responsabilities

  • Content Requests
    • Collaborate with requesters to qualify requests properly. Keep them informed.
    • Ensure we are allowed and able to fullfill requests
    • Initiate new recipes and manage first publishing if new book
    • Collaborate with scraper dev. team if necessary
    • Keep the tickets up2date
  • Scraping
    • Ensure Zimfarm works fine and contribute to its improvements with dev. team
    • Analyses failures or unexpected behaviors
    • Ensure recipes run properly, fix configuration when necessary and contribute to scraper improvements with dev. team
    • Ensure workers are online and are properly configured
    • Ensure scrapes lifecycle is correct (Reasonable pipeline size, Running scrapes progressing appropriately, not too many failures)
  • Library management
    • Ensure ZIM filenames and location (paths) are correct
    • Ensure ZIM Metadata are correct
    • Ensure ZIM are recent and kept up2date (AFAP)
    • Ensure library is coherent and user-friendly

Policies

Publishing

  • Content has to be legal in Switzerland
  • Content should not advertise fringe theory
  • Content should betterne free content
  • If not free, content should be:
    • Open content OR
    • Educational content OR
    • has an authorization of reproduction
  • Any content we publish should
    • have (almost) no user visible error
    • have proper/correct metadata
    • be easily discoverable in the public library

Content Requests

  • Allow everybody to request new, changes or deletion of content
  • In full transparency track the lifecycle of our content portfolio
  • New content should be assessed and vetted content against publishing policy (see above)
  • Content requests should be closed:
    • when fully implemented (user visible)
    • if refusal or impossibility of implementation
  • ZIM Medata should be given for new content
  • Only once all prerequisites are satisfied, then start with scraping

Scraping

  • Scraping leadership means the initiative should come from the content team
  • First analysis of error should be done by content team
  • If error in scraper is suspected
    • Issue should be updated to corresponding scraper code repository
    • Scraper problem analysis does not super-seed in any manner content request
  • ZIM quality should be vetted against publishing policy
  • Any recipe should run successfully first in dev before been put in production
  • Hardware resources should be saved

Library Management

Custom Apps

Processes

Content Requests

Scraping

Library Management

Custom Apps

Worflows

    1. To create a new recipe for youtube files
    • It’s recommended to clone an existing Youtube recipe.**
  • Create the recipe name as per the naming conventions [here](https://github.com/openzim/overview/wiki/Naming-Convention).
  • In the Language space, choose the language of the website you are creating the recipe for.
  • From Category space, choose (other)
  • From warehouse path space, choose (/.hidden/.dev) always as a first time in order to test the resulted file, if the file is tested and all is correct then you update the recipe with the proper path (videos).
  • Make sure the Status is set to Enabled.
  • You can choose Periodicity to be monthly or quarterly.
  • In Offliner space choose: Youtube
  • In platform space choose Youtube.
  • Keep the rest the same with no change.
    • In Youtube command flags:**
  • In Playlist mode: choose (Not Set) if you are doing the recipe for a whole channel.
  • If you are doing the recipe for a playlist, choose (Set).
  • In Type: choose (Channel) or (Playlist) as per your required file.
  • In Youtube ID: type the ID of the channel or the playlist.
  • For the API Key: There is a list of keys mostly as per the channel or the playlists sizes, ask for the list to choose the appropriate API Key.
  • In Zim Name: the recipe name as per the naming conventions [here](https://github.com/openzim/overview/wiki/Naming-Convention).
  • In Title: type the name you want for the output file.
  • Description: type a short description of your required zim file.
  • Leave Optimisation Cache URL as it is (cloned from old recipe).
  • Leave the rest of the fields empty or as per the cloned recipe.
  • Finally, click in the bottom on (Update offliner details).
  • Review all your entries once again, then go back to the top of the page and click on (Request).
  • After about an hour, check the recipe if it failed or succeeded (or the next day if the source website is large).
  • If successful, go to this link ([dev.library.kiwix.org](https://dev.library.kiwix.org/)) and check your created file, check the size and check if the file is working properly. If the file does not appear, wait a bit as updates are made every 15 minutes.
  • If the file looks good and complete, go back to your recipe, In warehouse path space, change(/.hidden/.dev) to the proper category related to your file content (Wikipedia, Wikihow, … etc).
  • Click on Update offliner details and then click on Request again.
  • Finally, check the file in (https://library.kiwix.org/ ). If all is good, do not forget to go back to the initial ticket (most likely at zim-requests) and put the link of the output file and close the ticket.

Members

See also