-
Notifications
You must be signed in to change notification settings - Fork 1
Tuple Sets
Conceptually, tuple sets flow from one execution graph node to another. In practice, the story is far more complex. Here we discuss the tuple set idea itself and how tuple sets are implemented in Drill. The first thing to realize is that Drill code does not use the term tuple set, instead the code uses a number of lower-level, implementation-focused terms. We use the tuple set concept as a way to conceptualize the details.
In Drill, a tuple set is implemented as a set of related components:
- Record Batch, conceptually, about the same as a tuple set. In practice, a Drill record batch is both a tuple set and and operator, as we'll see below.
- Vector Accessible a collection of vectors. The record batch is a type of vector accessible.
- Schema describes the set of columns within each tuple.
- Record the column values for a single tuple. However, records are never realized as such in Drill.
- Value Vector the underlying columar data representation of the values for a single column.
- Vector Container is a data structure that holds the set of value vectors for a tuple set.
- Selection Vector identifies the set of tuples to include when passing a tuple set downstream. (Or, conversely, implies the set of tuples removed by a filtering or other operation.)
The record batch in Drill can be very confusing to those first learning the code. The term seems to imply, well, a batch of records, which would seem to be the same thing as a tuple set. However, for whatever reason, as Drill evolved, a record batch became both a batch of records and a node in the operator graph. For example, the filter record batch does all of the following:
- Holds a link to the upstream node,
- Calls the upstream node to produce a new tuple set,
- Applies the specified filter operation to the incoming tuple set,
- Holds the outgoing tuple set for use by the downstream node.
The details are complex, we'll work up to them slowly. As we proceed, you'll sometimes think of a Record Batch as a processing node, sometimes as the tuple set that emerges from that node.
See the Record Batch design document. The design document starts from the memory representation of vectors; this document starts with a conceptual view. Between the two, you should gain a good basis for understanding Drill code.
Let's focus for the moment on the actual collection of records that make up a tuple set. Or, in terms of implementation, the collection of value vectors that make up vector accesible.
The org.apache.drill.exec.record.VectorAccessible
interface provides the basic structure of a tuple set:
- Schema:
org.apache.drill.exec.record.BatchSchema
- Record count
- Selection vector(s):
org.apache.drill.exec.record.selection.SelectionVector2
and...4
- Column values
In theory, this is all you need: the schema (definition of columns), the columns, the total number of records, and the list of records actually included in the result set.
Vector Accessible provides three ways to access the vectors of column values:
- Via an iterator defined by the parent interface:
Iterable<VectorWrapper<?>>
- By column id (sort of):
getValueAccessorById( )
- By column name:
getValueVectorId( )
Accessing vectors by index or name appears, however to be complex; it appears that vector accessibles do not define a simple column index due, perhaps, due to the hierarchical relationships among columns.
While VectorAccessible
is an interface, it appears that VectorContainer
serves a nearly-identical purpose, but is a concrete class that holds the implementation of a vector accessible. Just as a vector accessible is realized as the operator that produces the record batch, a vector container seems to be associated with the operator that produced it. This association is not essential; it is simply an artifact of how the code came together.
Columns are implemented as value vectors, but the relationship is indirect. In the simplest case of a fixed-width, non-nullable scalar (an INTEGER NOT NULL for example), a single value vector represents the column. But, in most cases, two, three or more vectors represent each column. Further, columns may form a hierarchy: a column may represent a structure such as a map. In this case, the column is a logical concept made up of a collection of child columns, each of which may be formed of multiple vectors.
The vector concept in Drill can refer to a single vector (array) or a collection of vectors. We will use the term simple vector to refer to a single vector. Drill uses the term hyper vector to refer to a logical vector comprised of multiple other vectors (which may themselves be simple or hyper.)
The Vector Wrapper class (org.apache.drill.exec.record.VectorWrapper
) represents a logical vector: either simple or hyper.
- If the vector is simple,
isHyper()
isfalse
and the value vector is available usinggetValueVector()
(singular). - If the vector is hyper,
isHyper()
istrue
and the set of vectors is available usinggetValueVectors()
(plural). ((Need to clarify this.)) If the vector represents a structure (map), then a child's vector (wrapper) are available usinggetChildWrapper( )
.
An excellent description of value vectors is available. Conceptually, a value vector is an array of values, perhaps with a level or two of indirection. The implementation, however is quite complex. Value vectors are generated using code templates. A different vector class exists for each "major" data type. A major data type is defined as a pair of (minor type, mode) where the minor type is one of the many SQL types that Drill supports, and the "mode" represent three kinds of cardinality:
-
Required - cardinality of (exactly) 1 value per row, described in SQL as
NOT NULL
. - Optional - cardinality of (0,1): either a value, or the value is null. This is described as "nullable" in SQL terminology.
- Repeated - cardinality of (0,*): any number of values. The field is a list of values.
Data types are defined in the Types.proto
protobuf definition in the drill-protocol
project.
Drill defines 38 data types and 3 modes, which gives a total of 114 value vector classes. Understanding this amount of code can be overwhelming. It helps to understand why Drill needs so many classes.
First, the structure of the three modes differs:
- Required is a simple vector of values (fixed-width) or with one layer of indirection (variable-width).
- Optional includes a separate bit-vector of null-value flags.
- Repeated includes an index vector that points to the first list value for each row.
Second, the data format differs for each minor type. Differences include:
- Field width
- Java type of each entry
- Implied encoding or representation (such as character encoding, date/time format, etc.)
The details are explained quite well in the design document cited above.
As it turns out, Drill supports only a subset of the existing vector classes, but it is not entirely clear which are supported. While some are obvious (all modes of INTEGER, say), and some are declared as not supported (any mode of SMALLINT), others reside in a gray area (such as time and decimal types).
All value vectors implement the org.apache.drill.exec.vector.ValueVector
interface, which has very useful Javadoc. Vectors are backed by byte buffers implemented by DrillBuf
. Vectors have a distinct life cycle: written (once), then permanently read-only. The Mutator
interface performs the write phase, the Accessor
the read phase. Each of these is type-specific, so each generated class contains its own implementation.
The IntVector
class is a good example: it implements the (required, INTEGER) major type. Values are 4 bytes wide, and represented with the Java int
type. The associated Accessor
provides the int get(int index)
operation to get the column value for a tuple (given as an index from the start of the tuple set.)
Because the value vector and accessor classes are specific to the major type, code that works with value vectors must also be specific to the major type. In theory, this means every operator must have over 100 different implementations: one per major type. Since such complexity would be impossible to develop or maintain, Drill uses a different solution: Drill generates the required code on the fly for each query as described elsewhere in these notes.
Tuple sets are collections of vectors plus a schema. The org.apache.drill.exec.record.BatchSchema
class provides the schema for each batch. Logically, a schema is a collection of columns, represented as a MaterializedField
instance. A column can be atomic or can have structure. Columns have the following basic meta-data (along with other advanced information):
- Name
- Path
- Type ("minor type", a SQL data type)
- Cardinality (mode): required, optional or repeated
- Width (for CHAR and similar types)
- Scale and Precision (for DECIMAL types)
- Allows NULL values (i.e. "nullable")
- List of child fields
As it turns out, all of the above are derived from four fundamental attributes:
- Name
- (Type, Cardinality) pair ("major type")
- Parent field (implied)
- Child fields
The major type determines the width, scale, precision and null support. The nesting hierarchy determines the field path.
The batch schema is a collection of (root level) materialized fields. The columns (appear to be) unordered. The columns for a schema can be accessed either via an interator or by index. (But not, strangely, by name.)
((How to reference columns and their values.))
As described earlier, tuple sets flow from one node (operator) to the next. In practice, the picture is a bit more complex. Alghough we said that VectorAccessible and VectorContainers hold collection of vectors, they are not direct implementations of the tuple set concept. In Drill, it is not the tuple sets that flow from operator to operator, but instead the columnar data (data buffers) that flow. Specifically, DrillBuf
instances transfer from one vector container to the next.
The means of the transfer is the org.apache.drill.exec.record.TransferPair
class. Each instance implicitly identifies the "from" vector and data buffer. Each instance identifies the destination vector, along with properties. The implementation of the transfer pair does the actual transfer. Each value vector type provides its own transfer pair implementation.