Here are examples showing joins between tables with 1 billion, 200 million, and 1 million rows. (In this case, the tables are unpartitioned and using Parquet format.) The smaller tables contain subsets of data from the largest one, for convenience of joining on the unique HDE Tie Bow Set Pocket Men's Cummerbund Fuchsia Satin and Blend Square Tuxedo Formal ID
column. The smallest table only contains a subset of columns from the others.
[localhost:21000] > create table big stored as parquet as select * from raw_data;
+----------------------------+
| summary |
+----------------------------+
| Inserted 1000000000 row(s) |
+----------------------------+
Returned 1 row(s) in 671.56s
[localhost:21000] > desc big;
+-----------+---------+---------+
| name | type | comment |
+-----------+---------+---------+
| id | int | |
| val | int | |
| zfill | string | |
| name | string | |
| assertion | boolean | |
+-----------+---------+---------+
Returned 5 row(s) in 0.01s
[localhost:21000] > create table medium stored as parquet as select * from big limit 200 * floor(1e6);
+---------------------------+
| summary |
+---------------------------+
| Inserted 200000000 row(s) |
+---------------------------+
Returned 1 row(s) in 138.31s
[localhost:21000] > create table small stored as parquet as select id,val,name from big where assertion = true limit 1 * floor(1e6);
+-------------------------+
| summary |
+-------------------------+
| Inserted 1000000 row(s) |
+-------------------------+
Returned 1 row(s) in 6.32s
For any kind of performance experimentation, use the EXPLAIN
statement to see how any expensive query will be performed without actually running it, and enable verbose EXPLAIN
plans containing more performance-oriented detail: The most interesting plan lines are highlighted in bold, showing that without statistics for the joined tables, Impala cannot make a good estimate of the number of rows involved at each stage of processing, and is likely to stick with the BROADCAST
join mechanism that sends a complete copy of one of the tables to each node.
[localhost:21000] > set explain_level=verbose;
EXPLAIN_LEVEL set to verbose
[localhost:21000] > explain select count(*) from big join medium where big.id = medium.id;
+----------------------------------------------------------+
| Explain String |
+----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=2.10GB VCores=2 |
| |
| PLAN FRAGMENT 0 |
| PARTITION: UNPARTITIONED |
| |
| 6:AGGREGATE (merge finalize) |
| | output: SUM(COUNT(*)) |
| | cardinality: 1 |
| | per-host memory: unavailable |
| | tuple ids: 2 |
| | |
| 5:EXCHANGE |
| cardinality: 1 |
| per-host memory: unavailable |
| tuple ids: 2 |
| |
| PLAN FRAGMENT 1 |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 5 |
| UNPARTITIONED |
| |
| 3:AGGREGATE |
| | output: COUNT(*) |
| | cardinality: 1 |
| | per-host memory: 10.00MB |
| | tuple ids: 2 |
| | |
| 2:HASH JOIN |
| | join op: INNER JOIN (BROADCAST) |
| | hash predicates: |
| | big.id = medium.id |
and HDE Cummerbund Tuxedo Square Formal Blend Bow Set Pocket Satin Men's Fuchsia Tie | | cardinality: unavailable |
| | per-host memory: 2.00GB |
| | tuple ids: 0 1 |
| | |
| |----4:EXCHANGE |
| | cardinality: unavailable |
| | per-host memory: 0B |
| | tuple ids: 1 |
| | |
| 0:SCAN HDFS |
| table=join_order.big #partitions=1/1 size=23.12GB | | table stats: unavailable | | column stats: unavailable | | cardinality: unavailable |
| per-host memory: 88.00MB |
| tuple ids: 0 |
| |
| PLAN FRAGMENT 2 |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 4 |
| UNPARTITIONED |
| |
| 1:SCAN HDFS |
| table=join_order.medium #partitions=1/1 size=4.62GB | | table stats: unavailable | | column stats: unavailable | | cardinality: unavailable |Great for Party blue Fashion Formal Sky Casual Narrow Men's Occasion and ADream Solid Necktie Color Wedding qHd0q
| per-host memory: 88.00MB |
| tuple ids: 1 |
+----------------------------------------------------------+
Returned 64 row(s) in 0.04s
Gathering statistics for all the tables is straightforward, one COMPUTE STATS
statement per table:
Fuchsia and Pocket Formal Satin Set Cummerbund Tie Bow Square Men's HDE Blend Tuxedo [localhost:21000] > compute stats small;
+-----------------------------------------+
| summary |
+-----------------------------------------+
| Updated 1 partition(s) and 3 column(s). |
+-----------------------------------------+
Returned 1 row(s) in 4.26s
[localhost:21000] > compute stats medium;
+-----------------------------------------+
| summary |
+-----------------------------------------+
| Updated 1 partition(s) and 5 column(s). |
+-----------------------------------------+
Returned 1 row(s) in 42.11s
[localhost:21000] > compute stats big;
+-----------------------------------------+
| summary |
+-----------------------------------------+
| Updated 1 partition(s) and 5 column(s). |
+-----------------------------------------+
Returned 1 row(s) in 165.44s
With statistics in place, Impala can choose a more effective join order rather than following the left-to-right sequence of tables in the query, and can choose BROADCAST
or PARTITIONED
join strategies based on the overall sizes and number of rows in the table:
[localhost:21000] > explain select count(*) from medium join big where big.id = medium.id;
Query: explain select count(*) from medium join big where big.id = medium.id
+-----------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=937.23MB VCores=2 |
| |
| PLAN FRAGMENT 0 |
| PARTITION: UNPARTITIONED |
| |
| 6:AGGREGATE (merge finalize) |
| | output: SUM(COUNT(*)) |
| | cardinality: 1 |
| | per-host memory: unavailable |
| | tuple ids: 2 |
| | |
| 5:EXCHANGE |
| cardinality: 1 |
| per-host memory: unavailable |
| tuple ids: 2 |
| |
| PLAN FRAGMENT 1 |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 5 |
| UNPARTITIONED |
| |
| 3:AGGREGATE |
| | output: COUNT(*) |
| | cardinality: 1 |
| | per-host memory: 10.00MB |
| | tuple ids: 2 |
| | |
| 2:HASH JOIN |
| | join op: INNER JOIN (BROADCAST) |
| | hash predicates: |
| | big.id = medium.id |
| | cardinality: 1443004441 |
| | per-host memory: 839.23MB |
| | tuple ids: 1 0 |
| | |
| |----4:EXCHANGE |
| | cardinality: 200000000 |
| | per-host memory: 0B |
| | tuple ids: 0 |
| | |
| 1:SCAN HDFS |
| table=join_order.big #partitions=1/1 size=23.12GB |
| table stats: 1000000000 rows total |
| column stats: all |
| cardinality: 1000000000 |
| per-host memory: 88.00MB |
| tuple ids: 1 |
| |
| PLAN FRAGMENT 2 |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 4 |
| UNPARTITIONED |
| |
| 0:SCAN HDFS |
| table=join_order.medium #partitions=1/1 size=4.62GB |
| table stats: 200000000 rows total |
| column stats: all |
| cardinality: 200000000 |
| per-host memory: 88.00MB |
| tuple ids: 0 |
+-----------------------------------------------------------+
Returned 64 row(s) in 0.04s
[localhost:21000] > explain select count(*) from small join big where big.id = small.id;
Query: explain select count(*) from small join big where big.id = small.id
+-----------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=101.15MB VCores=2 |
| |
| PLAN FRAGMENT 0 |
| PARTITION: UNPARTITIONED |
| |
| 6:AGGREGATE (merge finalize) |
| | output: SUM(COUNT(*)) |
| | cardinality: 1 |
| | per-host memory: unavailable |
| | tuple ids: 2 |
| | |
| 5:EXCHANGE |
| cardinality: 1 |
| per-host memory: unavailable |
| tuple ids: 2 |
| |
| PLAN FRAGMENT 1 |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 5 |
| UNPARTITIONED |
| |
| 3:AGGREGATE |
| | output: COUNT(*) |
| | cardinality: 1 |
| | per-host memory: 10.00MB |
| | tuple ids: 2 |
| | |
| 2:HASH JOIN |
| | join op: INNER JOIN (BROADCAST) |
| | hash predicates: |
| | big.id = small.id |
| | cardinality: 1000000000 |
| | per-host memory: 3.15MB |
| | tuple ids: 1 0 |
| | |
| |----4:EXCHANGE |
| | cardinality: 1000000 |
| | per-host memory: 0B |
| | tuple ids: 0 |
| | |
| 1:SCAN HDFS |
| table=join_order.big #partitions=1/1 size=23.12GB |
| table stats: 1000000000 rows total |
| column stats: all |
| cardinality: 1000000000 |
| per-host memory: 88.00MB |
| tuple ids: 1 |
| |
| PLAN FRAGMENT 2 |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 4 |
| UNPARTITIONED |
| |
| 0:SCAN HDFS |
| table=join_order.small #partitions=1/1 size=17.93MB |
| table stats: 1000000 rows total |
| column stats: all |
| cardinality: 1000000 |
| per-host memory: 32.00MB |
| tuple ids: 0 |
+-----------------------------------------------------------+
Returned 64 row(s) in 0.03s
When queries like these are actually run, the execution times are relatively consistent regardless of the table order in the query text. Here are examples using both the unique ID
Suit Plain Bow Silky Pink Men Light Tied Cravat Adjustable Accessory Self Verlike Party Tie Necktie qYwTCHH column and the VAL
Rust Necktie Square Orange Men's Wedding Fashion Pocket Classic Tuxedo Wear Accessories qYYIUwxzP column containing duplicate values:
Satin Blend HDE Tuxedo Bow Formal Men's Set Square Cummerbund Fuchsia Tie and Pocket [localhost:21000] > select count(*) from big join small on (big.id = small.id);
Query: select count(*) from big join small on (big.id = small.id)
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
Returned 1 row(s) in 21.68s
[localhost:21000] > select count(*) from small join big on (big.id = small.id);
Query: select count(*) from small join big on (big.id = small.id)
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
Returned 1 row(s) in 20.45s
[localhost:21000] > select count(*) from big join small on (big.val = small.val);
+------------+
| count(*) |
+------------+
| 2000948962 |
+------------+
Returned 1 row(s) in 108.85s
[localhost:21000] > select count(*) from small join big on (big.val = small.val);
+------------+
| count(*) |
+------------+
| 2000948962 |
+------------+
Returned 1 row(s) in 100.76s
Note: When examining the performance of join queries and the effectiveness of the join order optimization, make sure the query involves enough data and cluster resources to see a difference depending on the query plan. For example, a single data file of just a few megabytes will reside in a single HDFS block and be processed on a single node. Likewise, if you use a single-node or two-node cluster, there might not be much difference in efficiency for the broadcast or partitioned join strategies.