You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 13, 2023. It is now read-only.
Currently, analyze_chunk() removes all titles that contain : under the assumption that these are non-mainspace titles. However, article titles can contain colons, e.g. Batman v Superman: Dawn of Justice or UTC+03:00 (or, on Simple Wikipedia, UTC+08:00 or Avatar: The Last Airbender). Many of these titles are actually redirects to titles without a colon, but all redirects are already removed by this point in the function, so that's immaterial.
The text was updated successfully, but these errors were encountered:
There are a crazy number of non-articles with a colon in the title so we'll need to go back to the drawing board about how to filter these out while keeping the articles we want.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently,
analyze_chunk()
removes all titles that contain:
under the assumption that these are non-mainspace titles. However, article titles can contain colons, e.g. Batman v Superman: Dawn of Justice or UTC+03:00 (or, on Simple Wikipedia, UTC+08:00 or Avatar: The Last Airbender). Many of these titles are actually redirects to titles without a colon, but all redirects are already removed by this point in the function, so that's immaterial.The text was updated successfully, but these errors were encountered: