-
Notifications
You must be signed in to change notification settings - Fork 1
Performance Basics
Drill performance work consists of focusing on one particular area, discovering current performance, and seeking ways to make that particular operation or query run faster. The "school of hard knocks" has taught that the unwary novice Drill developer can spend time "discovering" that Drill runs slow if you set the wrong options. This page describes those options, when to use them, and how to turn them off when doing performance runs.
Drill has a very complex memory allocator to manage direct memory. Clearly, those who wrestled this part of Drill into perfection found the need to carefully monitor each allocation. This shows up in the effect of enabling assertions (the -ea option the Java command) when running Drill. In general, assertions are very handy to include debugging checks that Java will optimize out at run-time. Drill, unfortunately, uses the -ea option as an indication to turn on detailed memory allocation logging. This logging has the effect of adding 10x to 20x to the run time of each query.
Be sure to turn on assertions when changing Drill to catch memory issues early. (Assertions are enabled for Drill unit tests run through Maven, for example.) But, be sure assertions are off when doing performance runs. Assertions will be off if you use the out-of-the-box Drill launch scripts (those in $DRILL_HOME/bin
) but may be on if you run Drill through your favorite IDE.
Note that DRILL-5122 is in progress to address this issue.
Drill provides comprehensive logging via the Logback Classic logging system. Like most loggers, Logback provides a variety of logging levels. At the default logging levels, logging will consume much of the run time of a query.
Before doing any performance runs, change the log level to error
. If running Drill using the out-of-the-box scripts, look for $DRILL_HOME/conf/logback.xml
(or $DRILL_SITE/logback.xml
). Change these lines:
<logger name="org.apache.drill" additivity="false">
<level value="error" />
<appender-ref ref="FILE" />
</logger>
<logger name="org.apache.drill" additivity="false">
<level value="error" />
<appender-ref ref="SOCKET" />
</logger>
That is, change the levels from the default of info
and debug
to error
. Note that all log levels must be at error
. If any logger (even one you are not using, such as the SOCKET
logger) is at debug
, you will pay the debug logging cost.
If you run Drill from an IDE, beware of the multiple (or missing) Logback configuration files. ([DRILL-5114)(https://issues.apache.org/jira/browse/DRILL-5114).
Depending on your focus, it is helpful to limit Drill's parallelism in order to force Drill to run a query on a single Drillbit in a single fragment. Doing so focusses efforts on the performance of a specific set of operators, if that is what you want to optimize.
ALTER SESSION SET `planner.width.max_per_node` = 1;
This can also be done via a data source that supports just one fragment, such as the mock data source described below.
The Drill query profile contains a wealth of information. While the Drill UI is a great start, and is very good for visualizing the query, there is no substitute for looking at the query profile itself. Best to look at the JSON form. Use a JSON formatter web site or program to turn Drill's JSON into something you can read. Or, create a simple program to grab out the important information such as timings. For example:
Op: 0 Screen
Setup: 0 - 0%, 0%
Process: 3 - 0%, 0%
Wait: 4
Op: 1 Project
Setup: 1 - 100%, 0%
Process: 1 - 0%, 0%
Op: 2 SelectionVectorRemover
Setup: 0 - 0%, 0%
Process: 762 - 12%, 12%
Op: 3 Sort
Setup: 0 - 0%, 0%
Process: 5016 - 84%, 84%
Spills: 0
Memory: 118
Op: 4 Scan
Setup: 0 - 0%, 0%
Process: 143 - 2%, 2%
Total:
Setup: 1
Process: 5925
The above tell us that the sort takes 5016 ms for this query which is 84% of the run time. This tells us exactly where to focus our efforts.
The profile contains various metrics (encoded, unfortunately as numbers) that give insight into the operation of each operator. For example, the "Spills" number in the sort operator is a numbered metric (number 0 in this case.)
Often performance work requires testing, changing something, and testing again. This cycle is much easier if done from within an IDE using a test framework. Drill offers several (such as BaseTestQuery
). DRILL-5126 describes a simple, unified framework that makes it very easy to configure an embedded Drillbit with a known set of configuration and session options, run a query, and get the results in various forms, including simply timing and counting the results. For example:
FixtureBuilder builder = ClusterFixture.builder()
.configProperty(ExecConstants.SYS_STORE_PROVIDER_LOCAL_ENABLE_WRITE, true)
.configProperty(ExecConstants.REMOVER_ENABLE_GENERIC_COPIER, true)
.configProperty(ClassBuilder.SAVE_CODE_OPTION, true)
.sessionOption(ExecConstants.MAX_QUERY_MEMORY_PER_NODE_KEY, 3L * 1024 * 1024 * 1024)
.maxParallelization(1)
;
try (ClusterFixture cluster = builder.build();
ClientFixture client = cluster.clientFixture()) {
String sql = "SELECT id_i FROM `mock`.employee_10M ORDER BY id_i";
QuerySummary summary = client.queryBuilder().sql(sql).run();
System.out.println(String.format("Sorted %,d records in %d batches.", summary.recordCount(), summary.batchCount()));
System.out.println(String.format("Query Id: %s, elapsed: %d ms", summary.queryIdString(), summary.runTimeMs()));
}
}
Running an embedded Drillbit in this way provides very fast turn-around: no need to set up a Drill install directory, edit config files and so on. The IDE does the work of rebuilding (quickly) prior to each run. Of course, this works only if you are focused on specific internal bits of Drill; it is not as useful if you are interested in the distributed aspects of Drill.
Drill provides a mock data source that has historically only been available when used in a physical plan. (See TestSimpleExternalSort
.) DRILL-5152 describes an enhancement to give access to the mock data source from SQL. For example, to generate a table of 10 million rows, each with a single int column with random values:
SELECT id_i FROM `mock`.employee_10M ORDER BY id_i
Good advice is to optimize the inner loop of a program as that is where Drill spends the most time. The query profile will guide you to the operator that takes the most time. We can either find ways to eliminate the operator altogether (perhaps choose a different plan), or can optimize the operator itself. Since most of the work is done in generated code, that is where we should focus. Unfortunately, it can be hard to work with that code as it exists only as byte codes at runtime.
Using DRILL-5052, you can ask Drill to generate the code as "plain old Java" which Drill will save to disk so that your IDE can find it. You may wish to experiment with alternative forms of the generated code. The easy way to do that is:
- Make a copy of the generated code and place the copy somewhere in your source tree (for example, in the same package as the operator itself.)
- Give the copied class a unique name. For a generated
SorterGen7
say, perhaps call itSorterExp
. - Find the place where the operator generates the class. Comment out that code and replace it with your experimental version:
return new SorterExp();
- At this point, you can run only one query: the one used to generate the code.
- Rerun the query to ensure that the test code works.
- Have at the test code, trying different code variations and measuring the results to see if you can find improvements.
Once you are satisfied, put the operator back as it was. The next step would be to alter the code generator to emit the newly optimized code in place of the original. Use the code capture & debug mechanism described above to verify that the generated code is correct.
Experience has shown that the exact same Drill run on a Mac will have radically different performance characteristics over the span of a few days. The reasons are not yet clear. A particular run started at about 6 seconds. Optimizations were done and time dropped. At some point, times started climbing again; despite no code changes. Times grew to 8 then 10, finally 15 seconds. No other programs were using CPU during the runs. Because of this long-term degradation, subtle performance improvements can be masked so that it becomes hard to determine which changes actually improved performance.
One key factor is clearly whether the Mac is on batter or external power. After a long period of degradation, a particular run, which started at 6 seconds, took about 12. Removing the power charger caused the average to increase to 16 seconds on battery. Reconnecting the power brought the average down to about 15 seconds.
Some factors that have been ruled out:
- Restart of IDE had no effect.
- Rebuild of Drill using Maven (
mvn clean install -DskipTests
) made no difference. - Restart of the Mac made no difference.
There must be some state within Drill itself causing the shift.