Advanced
Тема интерфейса

Using a UDF

This section describes how to develop, upload, and use UDFs.

Step 1: Creating a JAR Package

  1. Develop a UDF based on the UDF development specifications.
    Note

    UDF development rules:

    1. A UDF must implement at least one evaluate() method. The evaluate method supports reloading.
    2. UDF method invocation must be thread-safe.
    3. Do not load external large files (greater than 100 MB) to the memory when implementing a UDF. Otherwise, the memory could be used up.
    4. Use a UDF to capture and process possible exceptions. Do not send exceptions to services. Use the try-catch block to handle exceptions and record exception information if necessary.
    5. Do not define static collection classes for storing temporary data or query large objects in external data. Otherwise, the memory usage is high.
    6. Ensure that the imported packages in the class do not conflict with the packages on the server. You can run the grep -lr "Fully-qualified class name" command to check JAR package conflicts. Use fully-qualified class names to avoid such conflicts.

    Sample code:

    package org.apache.doris.udf;
    public class AddOne {
    public Integer evaluate(Integer value) {
    return value == null ? null : value + 1;
    }
    }
  2. After a UDF is developed, use the Maven tool to pack the function into a JAR package.
  3. In the IDEA Terminal window, run the mvn clean package command to pack the files.

    Figure 1 Function packaging


  4. After the build is complete, message "BUILD SUCCESS" is displayed, the target directory is generated, and the generated JAR file is stored in the target directory.

Step 2: Uploading the UDF.

  1. Log in to the CloudTable console.
  2. Click in the upper left corner to select a region.
  3. Select the target cluster and choose Cluster Name > UDF Management. The UDF Management page is displayed.
  4. Click New UDF. The page for adding a UDF is displayed.
  5. Set the parameters and click OK.
    Table 1 Parameter description

    Parameter

    Description

    JAR Package File

    Add the JAR package obtained in step 1.

    Function Name

    The function name can be customized. For example, addone1.

    NOTE:

    The cluster name must start with a letter.

    Parameters Configuration

    Configure the parameter type based on the function. The parameter type can be String, Int, Boolean, TinyInt, SmallInt, BigInt, LargeInt, Float, Double, Date, Datetime and Decimal.

    Return Value Type

    Configure the return value type based on the function. The return value type can be String, Int, Boolean, TinyInt, SmallInt, BigInt, LargeInt, Float, Double, Date, Datetime or Decimal.

    Symbol

    UDF class name.

    Aggregate Function

    Enable or disable aggregate function support.

Step 3: Using the UDF

  1. Upload the UDF and connect to the cluster. For details about how to connect to a cluster, see Using the MySQL Client to Connect to a Common Doris Cluster.
  2. View the UDF function.
    show global functions;
  3. Superimpose the function. Assign 5 to the initial value of the function.
    select addone1(5);

FAQs

  • When a UDF JAR package is uploaded, only one JAR package with the same name is retained. The number of JAR packages must be less than 20, and the package size must be less than 10 MB.
  • The account name must be defined in the authentication template. Otherwise, the account will not have permission to upload the JAR package.
  • Why Do I Need to Upload a JAR Package to an OBS Bucket?

    This prevents the JAR package from being lost on the cluster agent during cluster scale-out or upgrade. If the JAR package is lost, you can reload it from OBS and synchronize it to the agent.

    If the JAR package is lost, click Modify in the Operation column on the page for adding a UDF and upload the JAR package again.