v3.0.0 #1299
kba
announced in
Announcements
v3.0.0
#1299
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Changed:
ocrd
) level fromINFO
toWARNING
initLogging
: do not remove any previous handlers/levels, unlessforce_reinit
disableLogging
: remove all handlers, reset all levels - instead of being selectiveweakref
with__del__
to triggershutdown
OCRD_MAX_PARALLEL_PAGES>1
: log viaQueueHandler
in subprocess,QueueListener
in mainocrd_utils.initLogging
: also add handler to root logger (as in file config),but disable message propagation to avoid duplication
ocrd_network
insrc/ocrd/decorators/__init__.py
once neededProcessor.process_page_file
: skip computingprocess_page_pcgts
if output already exists,but
OCRD_EXISTING_OUTPUT!=OVERWRITE
OCRD_MAX_PARALLEL_PAGES>1
: switch from multithreading to multiprocessing, depend onloky
instead of stdlibconcurrent.futures
OCRD_PROCESSING_PAGE_TIMEOUT>0
: actually enforce timeout within workerOCRD_MAX_MISSING_OUTPUTS>0
: abort early if too many failures already, prospectivelyProcessor.process_workspace
: split up into overridable sub-methods:process_workspace_submit_tasks
(iterate input file group and schedule page tasks)process_workspace_submit_page_task
(download input files and submit single page task)process_workspace_handle_tasks
(monitor page tasks and aggregate results)process_workspace_handle_page_task
(await single page task and handle errors)Processor
/Workspace.add_file
: alwaysforce
ifOCRD_EXISTING_OUTPUT==OVERWRITE
Processor.verify
: revert 3.0.0b1 enforcing cardinality checks (stay backwards compatible)Processor.verify
: check output fileGrps, too(must not exist unless
OCRD_EXISTING_OUTPUT=OVERWRITE|SKIP
or disjoint--page-id
range)input-files
: do not try to validate tasks here (now covered byProcessor.verify()
)run_processor
: be robust ifocrd_tool
is missingsteps
PcGtsType.PageType.id
viamake_xml_id
: replace/
with_
ocrd_utils
,ocrd_models
,ocrd_modelfactory
,ocrd_validators
andocrd_network
are not publishedas separate packages anymore, everything is contained in
ocrd
- you should adapt yourrequirements.txt
accordinglyProcessor.parameter
now a property (attribute always exists, butNone
for non-processing contexts)Processor.parameter
is now afrozendict
(contents immutable)Processor.parameter
validate when(ever) set instead of (just) the constructorProcessor.parameter
will also trigger (Processor.shutdown() and) Processor.setup()
get_processor(... instance_caching=True)
: usemin(max_instances, OCRD_MAX_PROCESSOR_CACHE)
Processor.verify
always validates fileGrp cardinalities (because we haveocrd-tool.json
defaults now)OcrdMets.add_agent
without positional argumentsocrd bashlib input-files
now uses normal Processor decorator, and gets passed actualocrd-tool.json
and tool namefrom bashlib's
ocrd__wrap
OcrdPage
as proxy ofPcGtsType
instead of alias; also containsetree
andmapping
nowpage_from_file
: removed kwargwith_tree
- useOcrdPage.etree
andOcrdPage.mapping
insteadProcessor.zip_input_files
now can throwocrd.NonUniqueInputFile
andocrd.MissingInputFile
(the latter only if
OCRD_MISSING_INPUT=ABORT
)Processor.zip_input_files
does not by default userequire_first
anymore(so the first file in any input file tuple per page can be
None
as well)Workspace.overwrite_mode
, merely delegate toOCRD_EXISTING_OUTPUT=OVERWRITE
ocrd_utils.config
Processor.process
ocrd-tool.json
Processor
constructor, add as members(i.e.
show_help
,dump_json
,dump_module_dir
,list_resources
,show_resource
,resolve_resource
)Processor
constructor(i.e.
workspace
,page_id
,input_file_grp
,output_file_grp
; now all set byrun_processor
)ocrd-tool.json
metadata toProcessor
constructorocrd.processor
: Handle loading of bundledocrd-tool.json
genericallyFixed:
ocrd --help
output was broken for multiline config options, fix help output for multi-line config option descriptions bertsky/core#25initLogging
before instantiating processors inocrd_cli_wrap_processor
, ocrd_cli_wrap_processor: always do initLogging bertsky/core#24, ocrd_cli_wrap_processor: always do initLogging #1296initLogging
: only add root handler instead of multiple redundant handlers withpropagate=false
setOverrideLogLevel
: override all currently active loggers' levelOcrdMets.get_physical_pages
: coverreturn_divs
w/ofor_fileIds
andfor_pageIds
ocrd_utils.config
gets reset whenever changing it globallyOcrdMetsServer.add_file
: pass onforce
kwargocrd.cli.workspace
: consistently pass on--mets-server-url
and--backup
ocrd.cli.validate "tasks"
: pass on--mets-server-url
ocrd.cli.bashlib "input-files"
: pass on--mets-server-url
lib.bash input-files
: pass on--mets-server-url
,--overwrite
, and parameterslib.bash
: fixerrexit
handlingocrd.cli.ocrd-tool "resolve-resource"
: forgot to actually print resultProcessor.metadata_location
:src
workaround respects namespace packages, OCR-D v3 API: fixes qurator-spk/eynollah#134Workspace.reload_mets
: handle ClientSideOcrdMets as welldisableLogging
: also re-instate root logger to Python defaults--log-filename
, and show in--help
ocrd workspace clone
: do pass on--file-grp
(for download filtering)Added:
ocrd-filter
processor to remove segments based on XPath expressions, v3 API: general XPath 2.0 mechanism, generateDS true reverse mapping, ocrd-filter bertsky/core#21pc:pixelarea
for the number of pixels of the bounding box (or sum area on node sets), v3 API: general XPath 2.0 mechanism, generateDS true reverse mapping, ocrd-filter bertsky/core#21pc:textequiv
for the first TextEquiv unicode string (or concatenated string on node sets), v3 API: general XPath 2.0 mechanism, generateDS true reverse mapping, ocrd-filter bertsky/core#21OcrdPage
: newPageType.get_ReadingOrderGroups()
to retrieve recursive RO as dictserver
: add subcommandsreload
andsave
physical_pages
--resolve-resource
, tooProcessor.process_page_file
/OcrdPageResultImage
: allowNone
besidesAlternativeImageType
OcrdConfig.reset_defaults
to reset config variables to their defaultsProcessor.max_workers
: class attribute to control per-page parallelism of this implementationProcessor.max_page_seconds
: class attribute to control per-page timeout of this implementationOCRD_MAX_PARALLEL_PAGES
for whether and how many workers should process pages in parallelOCRD_PROCESSING_PAGE_TIMEOUT
for whether and how long processors should wait for single pagesOCRD_MAX_MISSING_OUTPUTS
for maximum rate (fraction) of pages before makingOCRD_MISSING_OUTPUT=abort
Processor.metadata_filename
: expose to make local path ofocrd-tool.json
in Python distribution reusable+overridableProcessor.metadata_location
: expose to make absolute path ofocrd-tool.json
reusable+overridableProcessor.metadata_rawdict
: expose to make in-memory contents ofocrd-tool.json
reusable+overridableProcessor.metadata
: expose to make validated and default-expanded contents ofocrd-tool.json
reusable+overridableProcessor.shutdown
: to shut down processor after processing, optionalProcessor.max_instances
: class attribute to control instance caching of this implementationOCRD_DOWNLOAD_INPUT
for whether input files should be downloaded before processingOCRD_MISSING_INPUT
for how to handle missing input files (SKIP
orABORT
)OCRD_MISSING_OUTPUT
for how to handle processing failures (SKIP
orABORT
orCOPY
)the latter behaves like ocrd-dummy for the failed page(s)
OCRD_EXISTING_OUTPUT
for how to handle existing output files (SKIP
orABORT
orOVERWRITE
)--debug
as short-hand forABORT
choices aboveProcessor.logger
set up by constructor already (for re-use by processor implementors)default
-expand and validateocrd_tool.json
inProcessor
constructor, log invaliditiesdeprecation
inocrd_tool.json
by reporting warningsProcessor.process_workspace
: process a complete workspace, with default implementationProcessor.process_page_file
: process an OcrdFile, with default implementationProcessor.process_page_pcgts
: process a single OcrdPage, produce a single OcrdPage, required to implementProcessor.verify
: handle fileGrp cardinality verification, with default implementationProcessor.setup
: to set up processor before processing, optionalThis discussion was created from the release v3.0.0.
Beta Was this translation helpful? Give feedback.
All reactions