Knowledge has an incompatible new v3 file format #1253

markmc · 2024-07-23T14:38:35Z

A new v3 knowledge format has been added to InstructLab, with no backwards compatibility for v1 or v2 contributions - this till be released in InstructLab v0.18.0.

Existing knowledge contributions need to be updated, along with any documentation on creating knowledge contributions.

https://github.com/instructlab/instructlab/blob/main/scripts/test-data/e2e-qna-knowledge.yaml is an example of the new format

This is part of #160 The changes here originated from aakankshaduggal@5baf6df There are two major changes here. - When parsing a `qna.yaml` file from a taxonomy tree, adjust for the new schema for knowledge. There is no attempt to maintain compatibility with prior versions of the schema (v1, v2). - Change how we translate the taxonomy data into the dataset sent into the pipeline as input. Instead of implementing a sliding window approach of 3 sample qna pairs at a time over all chunks of the document, we now create a row per seed_example (context and associated qna pairs) for each chunk of knowledge docs. Co-authored-by: abhi1092 <[email protected]> Co-authored-by: shiv <[email protected]> Co-authored-by: Aakanksha Duggal <[email protected]> Signed-off-by: Russell Bryant <[email protected]>

bjhargrave · 2024-08-02T16:37:12Z

We have existing v1 knowledge in the main branch which needs to be fixed or removed.

markmc · 2024-08-06T17:19:02Z

xref #1260

markmc · 2024-08-06T17:19:38Z

v3 example: #1255

markmc · 2024-08-06T17:21:07Z

From @juliadenham 👍

The yaml file must include a minimum of 5 context fields, each with a minimum of 3 Q&A pairs relating to the context.

Context should be a chunk from the knowledge document(s) being submitted and should be in markdown format.

Length constraints:
500 words for context
250 words for Q&A pairs
750 total

Document_outline is a new field that replaces task_description. It should describe the information noting specifics from each context hunk.

juliadenham · 2024-08-21T20:59:07Z

We now support V3 knowledge, yay!

markmc mentioned this issue Jul 23, 2024

[Epic] Support for v3 schema of knowledge taxonomy additions instructlab/sdg#160

Closed

27 tasks

juliadenham closed this as completed Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knowledge has an incompatible new v3 file format #1253

Knowledge has an incompatible new v3 file format #1253

markmc commented Jul 23, 2024 •

edited

Loading

bjhargrave commented Aug 2, 2024

markmc commented Aug 6, 2024

markmc commented Aug 6, 2024

markmc commented Aug 6, 2024

juliadenham commented Aug 21, 2024

Knowledge has an incompatible new v3 file format #1253

Knowledge has an incompatible new v3 file format #1253

Comments

markmc commented Jul 23, 2024 • edited Loading

bjhargrave commented Aug 2, 2024

markmc commented Aug 6, 2024

markmc commented Aug 6, 2024

markmc commented Aug 6, 2024

juliadenham commented Aug 21, 2024

markmc commented Jul 23, 2024 •

edited

Loading