-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial or Quickstart? #11
Comments
If you can share your PDF form, or a redacted/modified version of it, I can write a yaml and add it to the examples. |
Can help to write a yaml example for below PDF?
|
@stephenbyrne-mfj ---- UPDATED --- It has now been fixed, so feel free to use it and include those docs and images into your example folders. |
For ---
extractor: "pdf.itext5"
header:
default: 200 # Ignore the "Learning and Development" portion
footer:
default: 500
maxRowDistance: 2
rootRecordType: certificate
recordTypes:
certificate:
label: "Certificate"
valueTypes:
- name
- course
- date
valueTypes:
name:
label: "Name"
course:
label: "Course Name"
date:
label: "Completion Date"
# strip out the "on " before the date
replacements:
-
pattern: "on\ *(.*)"
replacement: "$1"
initialState: "INIT"
states:
INIT:
transitions:
-
condition: certifiesThat
nextState: certifiesThat
certifiesThat:
include: false
transitions:
-
condition: any
nextState: name
name:
transitions:
-
condition: hasSuccessfullyCompleted
nextState: hasSuccessfullyCompleted
-
condition: any
nextState: name
hasSuccessfullyCompleted:
include: false
transitions:
-
condition: any
nextState: course
course:
transitions:
-
condition: date
nextState: date
-
condition: any
nextState: course
date:
transitions:
-
condition: certifiesThat
nextState: certifiesThat
conditions:
any: '1 = 1'
certifiesThat: 'text = "Certifies that"'
hasSuccessfullyCompleted: 'text = "has successfully completed"'
date: 'text =~ /on .*/ and fontSize = 14.0' Generates:
|
@krambox This is a good simple example. If you create pull request to add the PDF to I do not want to assume that we can distribute your PDF; you submitting the PR makes the permission more explicit. |
Speaking of tutorial.
Measuring up your PDFThis is very easy to do in Acrobat Reader DC, if you open additional tools. Then just select Maybe you should add the above note and picture to your Wiki, or to the readme, or wherever it can be easily found.... |
I usually run |
Writing the FSM code is definitely the hardest part. I agree there should be a tutorial (or maybe a video?) that explains how it works and walks through developing one for a simple document. |
Yeah, usually when people do FSM, they include a diagram with a "loopy" chart, known as a State Diagram. It would be great if we could find a tool to generate this for us... The links from Wikipedia:
From the old SMC tutarial: |
Does anyone know a tutorial to get started or a few more (but simpler examples) for yaml files? I'm having a hard time right now. For example, I want to extract a number from a PDF form in exactly one position.
The text was updated successfully, but these errors were encountered: