Skip to content

Commit

Permalink
doc: add DEV.md and update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yzqzss committed Dec 4, 2024
1 parent f901972 commit d545f80
Show file tree
Hide file tree
Showing 2 changed files with 89 additions and 11 deletions.
72 changes: 72 additions & 0 deletions DEV.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Snippets

## API Output format

https://www.mediawiki.org/wiki/API:Data_formats#Output

> The standard and default output format in MediaWiki is JSON. All other formats are discouraged.
>
> The output format should always be specified using format=yourformat with yourformat being one of the following:
>
> json: JSON format. (recommended)
> php: serialized PHP format. (deprecated)
> xml: XML format. (deprecated)
> txt: PHP print_r() format. (removed in 1.27)
> dbg: PHP var_export() format. (removed in 1.27)
> yaml: YAML format. (removed in 1.27)
> wddx: WDDX format. (removed in 1.26)
> dump: PHP var_dump() format. (removed in 1.26)
> none: Returns a blank response. 1.21+
In our practice, `json` is not available for some old wikis.

## Allpages

https://www.mediawiki.org/wiki/API:Allpages (>= 1.8)


## Allimages

https://www.mediawiki.org/wiki/API:Allimages (>= 1.13)

## Redirects

https://www.mediawiki.org/wiki/Manual:Redirect_table

## Logs

https://www.mediawiki.org/wiki/Manual:Logging_table

## Continuation

https://www.mediawiki.org/wiki/API:Continue (≥ 1.26)
https://www.mediawiki.org/wiki/API:Raw_query_continue (≥ 1.9)

> From MediaWiki 1.21 to 1.25, it was required to specify continue= (i.e. with an empty string as the value) in the initial request to get continuation data in the format described above. Without doing that, API results would indicate there is additional data by returning a query-continue element, explained in Raw query continue.
> Prior to 1.21, that raw continuation (`query-continue`) was the only option.
>
> If your application needs to use the raw continuation in MediaWiki 1.26 or later, you must specify rawcontinue= to request it.
# Workarounds

## truncated API response causes infinite loop

https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues/166
https://phabricator.wikimedia.org/T86611

wikiteam3 workaround: https://github.com/saveweb/wikiteam3/commit/76465d34898b80e8c0eb6d9652aa8efa403a7ce7

## MWUnknownContentModelException

> "The content model xxxxxx is not registered on this wiki;"
Some extensions use custom content models for their own purposes, but they did not register a handler to export their content.

wikiteam3 workaround: https://github.com/saveweb/wikiteam3/commit/fd5a02a649dcf3bdab7ac1268445b0550130e6ee

## Insecure SSL

https://docs.openssl.org/1.1.1/man1/ciphers/
https://docs.openssl.org/master/man1/openssl-ciphers/

workaround: https://github.com/DigitalDwagon/wikiteam3/blob/8a054882de19c6b69bc03798d3044b7b5c4c3c88/wikiteam3/utils/monkey_patch.py#L63-L84
28 changes: 17 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,16 +57,17 @@ pip install wikiteam3 --upgrade

```bash
usage: wikiteam3dumpgenerator [-h] [-v] [--cookies cookies.txt] [--delay 1.5]
[--retries 5] [--path PATH] [--resume] [--force]
[--user USER] [--pass PASSWORD]
[--http-user HTTP_USER]
[--retries 5] [--hard-retries 3] [--path PATH]
[--resume] [--force] [--user USER]
[--pass PASSWORD] [--http-user HTTP_USER]
[--http-pass HTTP_PASSWORD] [--insecure]
[--verbose] [--api_chunksize 50] [--api API]
[--index INDEX] [--index-check-threshold 0.80]
[--xml] [--curonly] [--xmlapiexport]
[--xmlrevisions] [--xmlrevisions_page]
[--namespaces 1,2,3] [--exnamespaces 1,2,3]
[--images] [--bypass-cdn-image-compression]
[--redirects] [--namespaces 1,2,3]
[--exnamespaces 1,2,3] [--images]
[--bypass-cdn-image-compression]
[--image-timestamp-interval 2019-01-02T01:36:06Z/2023-08-12T10:36:06Z]
[--ia-wbm-booster {0,1,2,3}]
[--assert-max-pages 123]
Expand All @@ -85,7 +86,11 @@ options:
--delay 1.5 adds a delay (in seconds) [NOTE: most HTTP servers
have a 5s HTTP/1.1 keep-alive timeout, you should
consider it if you wanna reuse the connection]
--retries 5 Maximum number of retries for
--retries 5 Maximum number of retries for each request before
failing.
--hard-retries 3 Maximum number of hard retries for each request before
failing. (for now, this only controls the hard retries
during images downloading)
--path PATH path to store wiki dump at
--resume resumes previous incomplete dump (requires --path)
--force download it even if Wikimedia site or a recent dump
Expand Down Expand Up @@ -125,9 +130,11 @@ Data to download:
--xmlrevisions_page [[! Development only !]] Export all revisions from an
API generator, but query page by page MediaWiki 1.27+
only. (default: --curonly)
--redirects Dump page redirects via API:Allredirects
--namespaces 1,2,3 comma-separated value of namespaces to include (all by
default)
--exnamespaces 1,2,3 comma-separated value of namespaces to exclude
--exnamespaces 1,2,3 [lack maintenance] comma-separated value of namespaces
to exclude
--images Generates an image dump
Image dump options:
Expand Down Expand Up @@ -242,9 +249,8 @@ In the above example, `--path` is only necessary if the download path (wikidump
```bash
usage: Upload wikidump to the Internet Archive. [-h] [-kf KEYS_FILE]
[-c {opensource,test_collection,wikiteam}]
[--dry-run] [-u]
[--bin-zstd BIN_ZSTD]
[-c COLLECTION] [--dry-run]
[-u] [--bin-zstd BIN_ZSTD]
[--zstd-level {17,18,19,20,21,22}]
[--rezstd]
[--rezstd-endpoint URL]
Expand All @@ -261,7 +267,7 @@ options:
Path to the IA S3 keys file. (first line: access key,
second line: secret key) [default:
~/.wikiteam3_ia_keys.txt]
-c, --collection {opensource,test_collection,wikiteam}
-c, --collection COLLECTION
--dry-run Dry run, do not upload anything.
-u, --update Update existing item. [!! not implemented yet !!]
--bin-zstd BIN_ZSTD Path to zstd binary. [default: zstd]
Expand Down

0 comments on commit d545f80

Please sign in to comment.