Data Lake Insight (DLI) is a serverless data processing and analysis service fully compatible with Apache Spark and Apache Flink ecosystems. It frees you from managing any servers.
DLI supports multiple querying methods including standard SQL, Spark SQL, and Flink SQL, with compatibility with mainstream data formats. You can use standard SQL or Spark and Flink applications to query mainstream data formats without data ETL. DLI supports SQL statements and Spark applications for heterogeneous data sources, including CloudTable, RDS, DWS, CSS, OBS, custom databases on ECSs, and offline databases.
For details about DLI functions, see Features.
Function | Description |
|---|---|
DLI is a data processing and analytics service built on the serverless architecture. | DLI is a serverless big data query and analytics service. With DLI, you only pay for the actual compute resources used, with no need to maintain or manage cloud servers.
|
DLI supports multiple compute engines. | DLI is fully compatible with ecosystems like Apache Spark and Apache Flink, and supports standard SQL, Spark SQL, and Flink SQL. It is compatible with mainstream data formats such as CSV, JSON, Parquet, and ORC.
|
DLI supports multiple connection methods. | DLI provides multiple connection methods to meet diverse user needs and scenarios. Connection methods:
|
DLI can connect to multiple data sources for cross-source data analysis. |
|
Three basic job types supported by DLI |
|
DLI supports decoupled storage and compute. | After storing data in OBS, you can connect DLI to OBS for data analysis. Under the decoupled storage and compute architecture, storage resources and compute resources can be requested and billed separately, reducing costs and improving resource utilization. You can choose single-AZ or multi-AZ storage when creating an OBS bucket for storing redundant data on the DLI console. The differences between the two storage policies are as follows:
|
DLI manages and schedules resources in a unified manner using elastic resource pools. | The backend of elastic resource pools adopts a CCE cluster architecture, supporting heterogeneous resources, so you can manage and schedule resources in a unified manner. For details, see Creating an Elastic Resource Pool and Creating Queues Within It. |
DLI includes the following core modules:
Module | Description |
|---|---|
Ecosystem tools | DLI leverages its robust serverless architecture and multimodal engine support to fulfill the diverse needs of various industries, driving their digital transformation and fostering innovation. |
Compute engine |
|
Unified resource management |
|
Unified metadata management |
|
Storage service | OBS and databases are used to store structured or unstructured data for data analysis, providing persistent data storage services. |
Data source connection |
|
Data applications | DLI can connect to mainstream BI tools in the industry to flexibly meet data presentation needs. |
A web-based service management platform is provided. You can access DLI using the management console or HTTPS-based APIs, or connect to the DLI server through the JDBC client.
You can submit SQL, Spark, or Flink jobs on the DLI management console.
If you need to integrate DLI into a third-party system for secondary development, you can call DLI APIs to use the service.
For details, see Data Lake Insight API Reference.
DataArts Studio is a one-stop data operations platform that provides intelligent data lifecycle management. It supports intelligent construction of industrial knowledge libraries and incorporates data foundations such as big data storage, computing, and analysis engines. With DataArts Studio, your company can easily construct end-to-end intelligent data systems. These systems can help eliminate data silos, unify data standards, accelerate data monetization, and promote digital transformation.
Create a data connection on the DataArts Studio management console to access DLI for data analysis.