You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to create yaml for this file. Below is the yaml structure I have created. I want to get the Full Name, and refernce code. I am able to get reference code but not the Full name.
# Use the pdfbox parser, since it's the same one we used to originally etract the text to build this planning document.
extractor: "pdf.pdfbox"
# All measurements are in points. 1 point = 1/72 of an inch.
# x-coordinates are from the left edge of the page.
# y-coordinates are from the top edge of the page.
header:
# ignore anything less than this many points from the top, default and per-page
default: 690
footer:
# ignore anything less than this many points from the bottom, default and per-page
default: 7160
# Text segments are generally parsed in order, top to bottom, left to right.
# If two text segments have y-coordinates within this many points, consider them on the same line,
# and process the one further left first, even if it is 0.4pt lower on the page.
maxRowDistance: 4
# Define the output data record.
# Since the main record type we're collecting information on is our employees,
# we'll have that be the root type for our harvested information.
rootRecordType: RAF
recordTypes:
RAF:
label: "RAF" # Labels are used when nested recordTypes come into play, like this document.
valueTypes:
# Not sure what to name a valueType? Just make something up!
- URC
- Name
valueTypes:
URC:
# In the CSV, use "Employee ID" as the column header instead of "employee".
label: "Unique Reference Code"
Name:
label: "Full Name"
# Now we define the finite-state machine
# Let's name the state that our machine starts off with:
initialState: "INIT"
# When each text segment is encountered, each transition for the current state is checked.
states:
INIT:
include: false
transitions:
- condition: URC
nextState: URC
- condition: any
nextState: INIT
URC:
startRecord: true
transitions:
- condition: any
nextState: Name
Name:
include: true
transitions:
- condition: Name
nextState: Name
- conidtion: any
nextState: INIT
# Here we define the conditions:
conditions:
# An example of comparing text with regex.
# In this case, we're making sure that the text contains the characters 'ID-' followed by any amount of numbers.
URC: 'text =~ /\b[a-f0-9]{32}\b/'
Name: 'text =~ /^[A-Z][a-z]+(?: [A-Z][a-z]+)* [A-Z][a-z]+$/'
# Need a condition that is always true? "1=1" does that for you.
any: "1 = 1"
The text was updated successfully, but these errors were encountered:
I'm trying to map the image you supplied into your yml file but with some of the labels blurred I'm having a difficult time. Can you supply me with a blank copy of the form? With that, I should be able to get a better understanding of what you're trying to do and what is actually happening. I'll then try and give you some reasonable insight in to how to proceed.
I am trying to create yaml for this file. Below is the yaml structure I have created. I want to get the Full Name, and refernce code. I am able to get reference code but not the Full name.
The text was updated successfully, but these errors were encountered: