Spark Enhanced Open Source Feature: Optimized SQL Query of Cross-Source Data
Scenario
Enterprises usually store massive data, such as from various databases and warehouses, for management and information collection. However, diversified data sources, hybrid dataset structures, and scattered data storage lower query efficiency.
The open source Spark only supports simple filter pushdown during querying of multi-source data. The SQL engine performance is deteriorated due of a large amount of unnecessary data transmission. The pushdown function is enhanced, so that aggregate, complex projection, and complex predicate can be pushed to data sources, reducing unnecessary data transmission and improving query performance.
Only the JDBC data source supports pushdown of query operations, such as aggregate, projection, predicate, aggregate over inner join, and aggregate over union all. All pushdown operations can be enabled based on your requirements.
Module | Before Enhancement | After Enhancement |
---|---|---|
aggregate | The pushdown of aggregate is not supported. |
|
projection | Only pushdown of simple projection is supported. Example: select a, b from table |
|
predicate | Only simple filtering with the column name on the left of the operator and values on the right is supported. Example: select * from table where a>0 or b in ("aaa", "bbb") |
|
aggregate over inner join | Related data from the two tables must be loaded to Spark. The join operation must be performed before the aggregate operation. | The following functions are supported:
The following scenarios are not supported:
|
aggregate over union all | Related data from the two tables must be loaded to Spark. union must be performed before aggregate. | Supported scenarios: Aggregation functions including sum, avg, max, min, and count are supported. Unsupported scenarios:
|
Precautions
- If external data source is Hive, query operation cannot be performed on foreign tables created by Spark.
- Only MySQL and MPPDB data sources are supported.
- Scenario
- Precautions