Advanced
Тема интерфейса

Basic Concepts

HBase Table

An HBase table is conceptually a three-dimensional mapping. It maps a row key, a column primary key, and a timestamp to a cell value. All data within HBase is stored in these table cells.

Column

Column is a dimension of an HBase table. The column name is in the format of <family>:<label>, where <family> and <label> can be any combination of characters. An HBase table consists of a set of column families. Each column in the HBase table belongs to a column family.

Column Family

A column family is a collection of columns stored in the HBase schema. To create columns, you must create a column family first. A column family organizes data with the same property in HBase. Each row of data in the same column family is stored on the same server. Each column family can be one attribute, such as compressed packages, timestamps, and data block cache.

Timestamp

A timestamp is a 64-bit integer used to index different versions of the same data. A timestamp can be automatically assigned by HBase when data is written or assigned by users.

Index

CloudTable is a big data storage service that provides efficient key value (KV) random query. On this basis, CloudTable introduces self-developed distributed multidimensional term index feature. The storage format and computing are based on a bitmap. You can define which fields in HBase need to build a term index based on service requirements. Term data is automatically generated when you write data. In addition, the term index provides efficient multidimensional term query APIs based on the Lucene syntax. The APIs are applicable to scenarios such as user profile, recommendation system, AI, and spatiotemporal data analysis.

CloudTable supports a term index (a terminology used in Apache Lucene to represent tag index). You only need to create a CloudTable cluster to develop a client application on an ECS for a multidimensional term query.

Partition

Partitions divide a table's data into distinct logical segments based on defined criteria. Logically, a single table is split into multiple partitions, which simplifies data management.

Bucketing

Data is divided into different buckets based on the hash values of bucketing columns.

FE (Frontend)

Frontend nodes process user access requests, plan query parsing, and manage metadata and nodes.

BE (Backend)

Backend nodes are responsible for both storing data and executing query plans. Data is divided into shards and replicated across multiple backend nodes for redundancy and availability.

Replicas

To ensure data security and maintain high service availability during exceptional circumstances, ClickHouse offers a replica configuration that replicates data from a single server to two or more redundant servers.

Shard

In ultra-large-scale massive data processing scenarios, the storage and computing resources of a single server can become a significant bottleneck. To enhance service efficiency, the cloud database ClickHouse employs a distributed architecture where massive datasets are stored across multiple servers. Each server is responsible for storing and processing a subset of the overall data. Within this architecture, each such server is referred to as a shard.

Region and AZ

A region is a geographic area where CloudTable is located.

Availability zones (AZs) in the same region can communicate with each other over the intranet, while AZs in different regions cannot.

Data centers of the cloud are deployed around the world. You can use CloudTable in different regions. You can subscribe to CloudTable in different regions and design applications to better meet customer requirements or comply with local laws and other demands.

Each region contains many AZs where power resources and networks are physically isolated. AZs in the same region can communicate with each other over the intranet, but those in different regions cannot. Each AZ provides cost-effective and low-latency network connections to connect to other AZs in the same region and is not affected by faults in other AZs. Therefore, provisioning CloudTable in separate AZs protects your applications against local faults that occur in a specific location.