Linkis 1.3.0 adapts to Huawei MRS+SCRIPTIS actual combat sharing

January 26, 2024 · 8 min read

contributors

Overview

The team needs to use SQL and Python syntax to analyze the data at the same time on the page. During the investigation, I found that Linkis can meet the needs. As a Huawei MRS is used, it is different from the open source software. It also carried out secondary development and adaptation. This article will share the experience, hoping to help students in need.

environment and version

JDK-1.8.0_112, Maven-3.5.2
Hadoop-3.1.1, spark-3.1.1, hive-3.1.0, zookerper-3.5.9 (Huawei MRS version)
Linkis-1.3.0
Scriptis-Web 1.1.0

dependence adjustment and packaging

First download the source code of 1.3.0 from the Linkis official website, and then adjust the dependent version

Linkis outermost adjustment pom file

<hadoop.version>3.1.1</hadoop.version>
<zookerper.version>3.5.9</zookerper.version>
<curaor.version>4.2.0</curaor.version>
<guava.version>30.0-jre</guava.version>
<json4s.version>3.7.0-M5</json4s.version>
<scala.version>2.12.15</scala.version>
<scala.binary.version>2.12</scala.binary.version>

linkis-engineplugin-hive的pom文件

<hive.version>3.1.2</hive.version>

linkis-engineplugin-spark的pom文件

<spark.version>3.1.1</spark.version>

linkis-hadoop-common的pom文件

<dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>  <!-Just replace the line and replace it with <Arttifactid> Hadoop-HDFS-Client </Artifactid>->->
        <version>${hadoop.version}</version>
</dependency>
Modify the Hadoop-HDFS to:
 <dependency>
        <cepid> org.apache.hadoop </groupid>
        <Artifactid> Hadoop-HDFS-Client </Artifactid>
        <Version> $ {Hadoop.Version} </version>
</dependency>

Linkis-Label-Common

org.apache.linkis.manager.label.conf.labelcommonconfig Modify the default version, which is convenient for subsequent self -compiling scheduling components

    Public Static Final Commonvars <string> Spark_engine_Version =
            Commonvars.apply ("wds.linkis.spark.engine.version", "3.1.1");

    Public Static Final Commonvars <string> Hive_engine_Version =
            Commonvars.apply ("wds.linkis.hive.engine.version", "3.1.2");

Linkis-computation-Governance-Common

org.apache.linkis.governance.Common.conf.governanceCommonConf Modify the default version, which is convenient for subsequent self -compiling scheduling components

  Val spark_engine_version = Commonvars ("wds.linkis.spark.engine.version", "3.1.1")

  VAL HIVE_ENGINE_VERSION = Commonvars ("wds.linkis.hive.engine.version", "3.1.2")

Compilation

After the above configuration is adjusted, you can start compiling full amount, and execute the following commands in turn

    cd linkis-x.x.x
    MVN -N Install
    MVN CLEAN Install -DSKIPTESTS

Compile Error

-If when you compile it, there is an error, try to enter a module alone to compile, see if there are errors, and adjust according to specific errors -Since the SCALA language is used in Linkis for code writing, it is recommended that you can configure the SCALA environment first to facilitate reading the source code -Aar package conflict is the most common problem, especially after upgrading Hadoop, please adjust the dependent version patiently

DatasphereStudio pom file

As we upgrade the version of Scala, the error will be reported when deploying. Conn to BML Now Exit Java.net.socketexception: Connection Reset. Here you need to modify the SCALA version and re -compile.

Delete the low version of the DSS-Gateway-SUPPPPORT JAR package,
Modify the scala version in DSS 1.1.0 to 2.12, compile it, get the new DSS-Gateway-SUPPPORT -.1.0.JAR, replace the linkis_installhome/lib/linkis-spaint-service/linkis-mg-gateway The original jar package of the Central Plains

<!-The SCALA environment is consistent->
<scala.version> 2.12.15 </scala.version>

According to the adjustment of the dependent version above, you can solve most of the problems. If you have any problems, you need to carefully adjust the corresponding log. If a complete package can be compiled, it represents the full compilation of Linkis and can be deployed.

deployment

In in order to allow the engine node to have sufficient resource execution script, we have adopted multiple server deployments, and the general deployment structure is as follows. -SLB 1 load balancing is rotary inquiry -E ECS-WEB 2 Nginx, static resource deployment, background agent forwarding -ECS-APP 2 micro-service governance, computing governance, public enhancement and other node deployment -ECS-APP 4 Engineconnmanager node deployment

linkis deployment

At the use of multiple node deployments, we did not peel the code, or put the full amount on the server, but just modified the startup script to make it only start the service required

Refer to the official website single machine deployment example: https: //linkis.apache.org/zh-docs/1.3.0/dePlayment/dePlay-qick

Linkis Deployment Points Attention Point

1. Deployment user: The startup user of the core process of Linkis. At the same time, this user will default as an administrator permissions. During the deployment process, the corresponding administrator login password, located in the linkis support specified in CONF/LINKIS-MG-Gateway.properties file file Submitted and executed users. The main process service of Linkis will be switched to the corresponding user through the SUDO -U $ {linkis-user}, and then executes the corresponding engine startup command, so the user of the engine linkis -ngine processes is the executor of the task. -The user defaults to the submission and executor of the task, if you want to change to the login user, you need to modify org.apache.linkis.entRance.Restful.entRANCERESTFAPI class json.put (taskConstant.execute_user, moduleuseuserutills.GetOperationUser (REQ)); json.put (taskConstant.submit_user, SecurityFilter.getLoginusername (REQ)); Change the above settings to the user and execute user to the Scriptis page to log in to the user
1. Sudo -U $ {linkis -user} Switch to the corresponding user. If you use the login user, this command may fail, and you need to modify the command here.
org.apache.linkis.ecm.server.operator.EngineConnYarnLogOperator.sudoCommands

private def sudoCommands(creator: String, command: String): Array[String] = {
    Array(
      "/bin/bash",
      "-c",
      "sudo su " + creator + " -c \"source ~/.bashrc 2>/dev/null; " + command + "\""
    )
  } change into
  private def sudoCommands(creator: String, command: String): Array[String] = {
    Array(
      "/bin/bash",
      "-c",
      "\"source ~/.bashrc 2>/dev/null; " + command + "\""
    )
  }

1. Mysql's driver package must be Copy to/lib/linkis-commons/public-module/and/lib/linkis-spring-cloud-services/linkis-mg-gateway/
1. The default is to use static users and passwords. Static users are deploying users. Static passwords will generate a password string in execution deployment, stored at $ {linkis_home} /conf/linkis-mg-gateway.properties
1. database script execution, linkis itself needs to use the database, but when we execute the script of the inserted data of Linkis 1.3.0, we found an error. We directly deleted the data of the error part at that time.
1. Yarn's certification. When performing the spark task, the task will be submitted to the queue. The resource information of the queue will be obtained first to determine whether there is a resource to submit. For certification, if the file authentication is enabled, the file needs to be placed in the corresponding directory of the server, and the information is updated in the linkis_cg_rm_extRNAL_Resource_Provider library table.

Install web front end

WEB side uses nginx as a static resource server, download the front -end installation package and decompress it, and place it on the directory corresponding to the Nginx server

scriptis tool installation

Scriptis is a pure front -end project. As a component integrates in the web code component of DSS, we only need to compile the DSSWEB project for separate Scriptis modules, upload the compiled static resources to Visit, note: Linkis stand -by -machine deployment defaults to use session for verification. You need to log in to the Linkis management desk first, and then log in to Scriptis to use.

Nginx deployment for example

nginx.conf

upstream linkisServer{
    server ip:port;
    server ip:port;
}
Server {
Listen 8088;# Access port
Server_name localhost;
#Charset Koi8-R;
#access_log /var/log/nginx/host.access.log main;
#Scriptis static resources
local /scriptis {
# Modify to your own front path
alias/home/nginx/scriptis-weight; # static file directory
#Root/Home/Hadoop/DSS/Web/DSS/Linkis;
index index.html index.html;
}
#The default resource path points to the static resource of the front end of the management platform
location / {
# Modify to your own front path
root/Home/Nginx/Linkis-Web/DIST; # r r r r
#Root/Home/Hadoop/DSS/Web/DSS/Linkis;
index index.html index.html;
}

local /ws {
Proxy_pass http:// linkisserver/api #back -end linkis address
proxy_http_version 1.1;
proxy_set_header upgrade $ http_upgrade;
proxy_set_header connection upgrade;
}

location /api {
proxy_pass http:// linkisserver/api; #The address of the back end linkis
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header x_real_ipP $remote_addr;
proxy_set_header remote_addr $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_connect_timeout 4s;
proxy_read_timeout 600s;
proxy_send_timeout 12s;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection upgrade;
}

#error_page  404              /404.html;
# redirect server error pages to the static page /50x.html
#
error_page   500 502 503 504  /50x.html;
location = /50x.html {
root   /usr/share/nginx/html;
}
}

How to check the question

1. There are more than 100 modules in Linkis, and the final service has 7 services, which are linkis-cg -ngineconnmanager, linkis-cg -ngineplugin, linkis-cg-entrance, linkis-cg-linkisManager, Linkis-Mg-Gateway, Linkis-Mg-Eureka, Linkis-PS-PublicService, each module has this different features. Among them, Linkis-CG-ENGINECONNMANAGER is responsible for managing the start-engine service, which will generate the corresponding engine script to pull up the engine. Services, so our team launched the Linkis-CG-ENGINECONNMANAGER alone on the server for sufficient resources to execute on the server.
1. The execution of engines like JDBC, Spark.hedu and other engines require some JAR package support. When the linkis species is called material, these jar packs will be hit in the linkis-cg-oblmphin engine when packaging , Conf and LIB directory will appear. When starting this service, two packages will be uploaded to the configuration directory, which will generate two ZIP files. We use OSS to store these material information. Download it to the Linkis-CG-ENGINECONNMANAGER service, and then configure the following two configurations of wds.linkis.enginecoon.public.dir and wds.linkis.enginecoon.root.dir, then the bag will be pulled to WDS. Linkis.engineCoon.public.dir is the directory of wds.linkis.enginecoon.root.dir. .dir.
1. If you want to check the engine log, you can see the directory under wds.linkis.enginecoon.root.dir configuration. Of course, the log information will be displayed on the log of the scriptis page. Just paste it to find it.

Linkis 1.3.2 Integration with OceanBase

March 8, 2023 · 2 min read

This article mainly introduces the integration of OceanBase database in Linkis 1.3.2 version. OceanBase database is compatible with most functions and syntax of MySQL 5.7/8.0. Therefore, the OceanBase database can be used as MySQL.

1. Preparations

1.1 Environment installation

Install and deploy the OceanBase database, see Quick experience

1.2 Environment Verification

You can use the MySQL command to verify the installation of the OceanBase database.

mysql -h${ip} -P${port} -u${username} -p${password} -D${db_name}

The connection is successful as shown in the figure below:

2. Linkis submits OceanBase database tasks

2.1 Submit tasks through the shell

shell

 sh ./bin/linkis-cli -engineType jdbc-4 -codeType jdbc -code "show tables" -submitUser hadoop -proxyUser hadoop -runtimeMap wds.linkis.jdbc.connect.url=jdbc:mysql://${ip} :${port}/${db_name} -runtimeMap wds.linkis.jdbc.driver=com.mysql.jdbc.Driver -runtimeMap wds.linkis.jdbc.username=${username} -runtimeMap wds.linkis.jdbc.password =${password} 

2.2 Submit tasks through Linkis SDK

Linkis provides SDK of Java and Scala to submit tasks to Linkis server. For details, please refer to JAVA SDK Manual. For OceanBase tasks, you only need to modify EngineConnType and CodeType parameters in Demo:

Map<String, Object> labels = new HashMap<String, Object>(); 
labels.put (LabelKeyConstant.ENGINE_TYPE_KEY, "jdbc-4"); // required engineType Label
labels.put(LabelKeyConstant.USER_CREATOR_TYPE_KEY, "hadoop-IDE");// required execute user and creator 
labels.put(LabelKeyConstant.CODE_TYPE_KEY, "jdbc"); // required codeType 

2.3 Multi-data source support

Address: Login Management Platform --> Data Source Management

Step 1: Create a new data source

Step 2: Connection test

Click Test Connect button to test

Step 3: Publish data source

Step 4: Submit the OceanBase task by specifying the data source name

Request URL: http://${gateway_url}:${port}/api/rest_j/v1/entrance/submit Request

method: POST

Request parameter:

{
    "executionContent": {
        "code": "show databases",
        "runType": "jdbc"
    },
    "params": {
        "variable": {},
        "configuration": {
            "startup": {},
            "runtime": {
                "wds.linkis.engine.runtime.datasource": "ob-test"
            }
        }
    },
    "labels": {
        "engineType": "jdbc-4"
    }
}

Response：

{
  "method": "/api/entrance/submit",
  "status": 0,
  "message": "OK",
  "data": {
    "taskID": 93,
    "execID": "exec_id018017linkis-cg-entrance000830fb1364:9104IDE_hadoop_jdbc_0"
  }
}

Engine Material Management

December 2, 2022 · 4 min read

aiceflower

Development Engineer of WeBank

background

Engine material management is the linkis engine material management system, which is mainly used to manage Linkis engine material files and store various engine files of users, including engine type, engine version and other information. The overall process is that the compressed file is uploaded to the material library (BML) through the front-end browser, and the material compressed file is decompressed and verified. If the engine does not exist locally when it needs to be executed, it needs to be searched in the material library, downloaded, installed and registered for execution.

Has the following function points:

1) Support uploading packaged engine files. The size of uploaded files is affected by nginx configuration, and the file type is zip file type. It is not supported to package zip compressed files by yourself in the windows environment.

2) Support for updating existing engine materials. After updating, add a storage version of bml engine materials in BML, and the current version can be rolled back and deleted.

3) An engine involves two engine materials, namely lib and conf, which can be managed separately.

Architecture Diagram

Architecture Description

Engine material management requires administrator privileges in the Linkis web management console, and the administrator field in the test environment needs to be set during development and debugging.
Engine material management involves adding, updating, and deleting engine material files. Material files are divided into lib and conf to store them separately. The concept of two versions is involved in the file, one is the version of the engine itself, and the other is the material version. In the update operation, if the material is modified, a new material version will be added and stored in BML, which supports the material version delete and rollback.
Use the BML Service to store the engine material files, call the BML service to store the files through RPC, and obtain the stored resource id and version and save them.

Core process

Upload the engine plug-in file of zip type, first store it in the Home directory of the engine plug-in and decompress the file, and then start the refresh program.
Compress the conf and lib directories in the decompressed engine file, upload it to the BML (material management system), obtain the corresponding BML resource id and resource version, and read the corresponding engine name and version information.
In the engine material resource table, add a new engine material record, and each upload will generate lib and conf data respectively. In addition to recording the name and type information of the engine, the most important thing is to record the information of the engine in the material management system, including the resource id and version information of the engine, which are linked to the resource table in BML.

Database Design

Engine Material Resource Information Table (linkis_cg_engine_conn_plugin_bml_resources)

Field name	Function	Remarks
id	engine material package identification id	Primary key
engine_conn_type	The location where resources are stored	such as Spark
version	engine version	such as Spark's v2.4.3
file_name	engine file name	such as lib.zip
file_size	engine file size
last_modified	The last modification time of the file
bml_resource_id	The id of the record resource in BML (material management system)	The id used to identify the engine file in BML
bml_resource_version	record resource version in BML	such as v000001
create_time	resource creation time
last_update_time	The last update time of the resource

Apache Linkis 1.3.0 PES(Public Enhancement Services) Some Service Merge

October 9, 2022 · 6 min read

aiceflower

Development Engineer of WeBank

Foreword

With the development of business and the update and iteration of community products, we found that Linkis1 There are too many .X services, and services can be merged appropriately to reduce the number of services and facilitate deployment and debugging. At present, Linkis services are mainly divided into three categories, including computing governance services (CG: entrance/ecp/ecm/linkismanager), public enhancement services (PS: publicservice/datasource/cs) and microservice governance services (MG: Gateway/Eureka) . There are too many sub-services extended by these three types of services, and services can be merged, so that all PS services can be merged, CG services can be merged, and ecm services can be separated out.

Service merge changes

The main changes of this service merge are as follows:

Support Restful service forwarding: The modification point is mainly the forwarding logic of Gateway, similar to the current publicservice service merge parameter: wds.linkis.gateway.conf.publicservice.list
Support Change the remote call of the RPC service to a local call, similar to LocalMessageSender, and now it is possible to complete the return of the local call by changing the Sender
Configuration file changes
Service start and stop script changes

To be achieved

Basic goal: merge PS services into one service
Basic goal: merge CG service into CG-Service and ECM
Advanced goal: merge CG services into one server
Final goal: remove eureka, gateway into single service

Specific changes

Gateway changes (org.apache.linkis.gateway.ujes.route.HaContextGatewayRouter)

//Override before changing 
def route(gatewayContext: GatewayContext): ServiceInstance = { 

    if (gatewayContext.getGatewayRoute.getRequestURI.contains(HaContextGatewayRouter.CONTEXT_SERVICE_STR) || 
        gatewayContext.getGatewayRoute.getRequestURI.contains(HaContextGatewayRouter.OLD_CONTEXT_SERVICE_PREFIX)){ 
      val params: util.HashMap[String, String] = gatewayContext.getGatewayRoute.getParams 
      if (!gatewayContext.getRequest.getQueryParams.isEmpty) { 
        for ((k, vArr) <- gatewayContext.getRequest.getQueryParams) {
          if (vArr.nonEmpty) {
            params.putIfAbsent(k, vArr.head)
          }
        }
      }
      if (gatewayContext.getRequest.getHeaders.containsKey(ContextHTTPConstant.CONTEXT_ID_STR)) {
        params.putIfAbsent(ContextHTTPConstant.CONTEXT_ID_STR, gatewayContext.getRequest.getHeaders.get(ContextHTTPConstant.CONTEXT_ID_STR)(0))
      }
      if (null == params || params.isEmpty) {
        dealContextCreate(gatewayContext)
      } else {
        var contextId : String = null
        for ((key, value) <- params) {
          if (key.equalsIgnoreCase(ContextHTTPConstant.CONTEXT_ID_STR)) {
            contextId = value
            }
        }
        if (StringUtils.isNotBlank(contextId)) {
          dealContextAccess(contextId.toString, gatewayContext)
        } else {
          dealContextCreate(gatewayContext)
        }
      }
    }else{
      null
    }
  }
  //after modification
  override def route(gatewayContext: GatewayContext): ServiceInstance = {

    if (
        gatewayContext.getGatewayRoute.getRequestURI.contains(
          RPCConfiguration.CONTEXT_SERVICE_REQUEST_PREFIX
        )
    ) {
      val params: util.HashMap[String, String] = gatewayContext.getGatewayRoute.getParams
      if (!gatewayContext.getRequest.getQueryParams.isEmpty) {
        for ((k, vArr) <- gatewayContext.getRequest.getQueryParams.asScala) {
          if (vArr.nonEmpty) {
            params.putIfAbsent(k, vArr.head)
          }
        }
      }
      if (gatewayContext.getRequest.getHeaders.containsKey(ContextHTTPConstant.CONTEXT_ID_STR)) {
        params.putIfAbsent(
          ContextHTTPConstant.CONTEXT_ID_STR,
          gatewayContext.getRequest.getHeaders.get(ContextHTTPConstant.CONTEXT_ID_STR)(0)
        )
      }
      if (null == params || params.isEmpty) {
        dealContextCreate(gatewayContext)
      } else {
        var contextId: String = null
        for ((key, value) <- params.asScala) {
          if (key.equalsIgnoreCase(ContextHTTPConstant.CONTEXT_ID_STR)) {
            contextId = value
          }
        }
        if (StringUtils.isNotBlank(contextId)) {
          dealContextAccess(contextId, gatewayContext)
        } else {
          dealContextCreate(gatewayContext)
        }
      }
    } else {
      null
    }
  }


  // before modification
  def dealContextCreate(gatewayContext:GatewayContext):ServiceInstance = {
    val serviceId =  findService(HaContextGatewayRouter.CONTEXT_SERVICE_STR, list => {
      val services = list.filter(_.contains(HaContextGatewayRouter.CONTEXT_SERVICE_STR))
      services.headOption
    })
    val serviceInstances = ServiceInstanceUtils.getRPCServerLoader.getServiceInstances(serviceId.orNull)
    if (serviceInstances.size > 0) {
      val index = new Random().nextInt(serviceInstances.size)
      serviceInstances(index)
    } else {
      logger.error(s"No valid instance for service : " + serviceId.orNull)
      null
    }
  }
  //after modification
  def dealContextCreate(gatewayContext: GatewayContext): ServiceInstance = {
    val serviceId = findService(
      RPCConfiguration.CONTEXT_SERVICE_NAME,
      list => {
        val services = list.filter(_.contains(RPCConfiguration.CONTEXT_SERVICE_NAME))
        services.headOption
      }
    )
    val serviceInstances =
      ServiceInstanceUtils.getRPCServerLoader.getServiceInstances(serviceId.orNull)
    if (serviceInstances.size > 0) {
      val index = new Random().nextInt(serviceInstances.size)
      serviceInstances(index)
    } else {
      logger.error(s"No valid instance for service : " + serviceId.orNull)
      null
    }
  }

  // before modification
  def dealContextAccess(contextIdStr:String, gatewayContext: GatewayContext):ServiceInstance = {
    val contextId : String = {
      var tmpId : String = null
      if (serializationHelper.accepts(contextIdStr)) {
        val contextID : ContextID = serializationHelper.deserialize(contextIdStr).asInstanceOf[ContextID]
        if (null != contextID) {
          tmpId = contextID.getContextId
        } else {
          logger.error(s"Deserializate contextID null. contextIDStr : " + contextIdStr)
        }
      } else {
        logger.error(s"ContxtIDStr cannot be deserialized. contextIDStr : " + contextIdStr)
      }
      if (null == tmpId) {
        contextIdStr
      } else {
        tmpId
      }
    }
    val instances = contextIDParser.parse(contextId)
    var serviceId:Option[String] = None
    serviceId = findService(HaContextGatewayRouter.CONTEXT_SERVICE_STR, list => {
      val services = list.filter(_.contains(HaContextGatewayRouter.CONTEXT_SERVICE_STR))
        services.headOption
      })
    val serviceInstances = ServiceInstanceUtils.getRPCServerLoader.getServiceInstances(serviceId.orNull)
    if (instances.size() > 0) {
      serviceId.map(ServiceInstance(_, instances.get(0))).orNull
    } else if (serviceInstances.size > 0) {
      serviceInstances(0)
    } else {
      logger.error(s"No valid instance for service : " + serviceId.orNull)
      null
    }
  }

}
//after modification
def dealContextAccess(contextIdStr: String, gatewayContext: GatewayContext): ServiceInstance = {
    val contextId: String = {
      var tmpId: String = null
      if (serializationHelper.accepts(contextIdStr)) {
        val contextID: ContextID =
          serializationHelper.deserialize(contextIdStr).asInstanceOf[ContextID]
        if (null != contextID) {
          tmpId = contextID.getContextId
        } else {
          logger.error(s"Deserializate contextID null. contextIDStr : " + contextIdStr)
        }
      } else {
        logger.error(s"ContxtIDStr cannot be deserialized. contextIDStr : " + contextIdStr)
      }
      if (null == tmpId) {
        contextIdStr
      } else {
        tmpId
      }
    }
    val instances = contextIDParser.parse(contextId)
    var serviceId: Option[String] = None
    serviceId = findService(
      RPCConfiguration.CONTEXT_SERVICE_NAME,
      list => {
        val services = list.filter(_.contains(RPCConfiguration.CONTEXT_SERVICE_NAME))
        services.headOption
      }
    )
    val serviceInstances =
      ServiceInstanceUtils.getRPCServerLoader.getServiceInstances(serviceId.orNull)
    if (instances.size() > 0) {
      serviceId.map(ServiceInstance(_, instances.get(0))).orNull
    } else if (serviceInstances.size > 0) {
      serviceInstances(0)
    } else {
      logger.error(s"No valid instance for service : " + serviceId.orNull)
      null
    }
  }

// before modification
object HaContextGatewayRouter{
  val CONTEXT_ID_STR:String = "contextId"
  val CONTEXT_SERVICE_STR:String = "ps-cs"
  @Deprecated
  val OLD_CONTEXT_SERVICE_PREFIX = "contextservice"
  val CONTEXT_REGEX: Regex = (normalPath(API_URL_PREFIX) + "rest_[a-zA-Z][a-zA-Z_0-9]*/(v\\d+)/contextservice/" + ".+").r
}
//after modification
object HaContextGatewayRouter {

  val CONTEXT_ID_STR: String = "contextId"

  @deprecated("please use RPCConfiguration.CONTEXT_SERVICE_REQUEST_PREFIX")
  val CONTEXT_SERVICE_REQUEST_PREFIX = RPCConfiguration.CONTEXT_SERVICE_REQUEST_PREFIX

  @deprecated("please use RPCConfiguration.CONTEXT_SERVICE_NAME")
  val CONTEXT_SERVICE_NAME: String =
    if (
        RPCConfiguration.ENABLE_PUBLIC_SERVICE.getValue && RPCConfiguration.PUBLIC_SERVICE_LIST
          .exists(_.equalsIgnoreCase(RPCConfiguration.CONTEXT_SERVICE_REQUEST_PREFIX))
    ) {
      RPCConfiguration.PUBLIC_SERVICE_APPLICATION_NAME.getValue
    } else {
      RPCConfiguration.CONTEXT_SERVICE_APPLICATION_NAME.getValue
    }

  val CONTEXT_REGEX: Regex =
    (normalPath(API_URL_PREFIX) + "rest_[a-zA-Z][a-zA-Z_0-9]*/(v\\d+)/contextservice/" + ".+").r

}

RPC Service Change（org.apache.linkis.rpc.conf.RPCConfiguration）

//before modification
val BDP_RPC_BROADCAST_THREAD_SIZE: CommonVars[Integer] = CommonVars("wds.linkis.rpc.broadcast.thread.num", new Integer(25))
//after modification
val BDP_RPC_BROADCAST_THREAD_SIZE: CommonVars[Integer] = CommonVars("wds.linkis.rpc.broadcast.thread.num", 25)

//before modification
val PUBLIC_SERVICE_LIST: Array[String] = CommonVars("wds.linkis.gateway.conf.publicservice.list", "query,jobhistory,application,configuration,filesystem,udf,variable,microservice,errorcode,bml,datasource").getValue .split(",") 
//after change 
val PUBLIC_SERVICE_LIST: Array[String] = CommonVars("wds.linkis.gateway.conf.publicservice.list", "cs,contextservice,data-source-manager,metadataquery,metadatamanager, query,jobhistory,application,configuration,filesystem,udf,variable,microservice,errorcode,bml,datasource").getValue.split(",") 

Configuration file changes

##Remove part #Delete the 

following configuration files 
linkis-dist/package/conf/linkis-ps-cs.properties 
linkis-dist/package/conf/linkis-ps-data-source-manager.properties
linkis-dist/package/conf/linkis-ps-metadataquery.properties

##modified part

#modify linkis-dist/package/conf/linkis-ps-publicservice.properties
#restful before modification
wds.linkis.server.restful.scan.packages=org.apache.linkis.jobhistory.restful,org.apache.linkis.variable.restful,org.apache.linkis.configuration.restful,org.apache.linkis.udf.api,org.apache.linkis.filesystem.restful,org.apache.linkis.filesystem.restful,org.apache.linkis.instance.label.restful,org.apache.linkis.metadata.restful.api,org.apache.linkis.cs.server.restful,org.apache.linkis.bml.restful,org.apache.linkis.errorcode.server.restful

#restful after modification
wds.linkis.server.restful.scan.packages=org.apache.linkis.cs.server.restful,org.apache.linkis.datasourcemanager.core.restful,org.apache.linkis.metadata.query.server.restful,org.apache.linkis.jobhistory.restful,org.apache.linkis.variable.restful,org.apache.linkis.configuration.restful,org.apache.linkis.udf.api,org.apache.linkis.filesystem.restful,org.apache.linkis.filesystem.restful,org.apache.linkis.instance.label.restful,org.apache.linkis.metadata.restful.api,org.apache.linkis.cs.server.restful,org.apache.linkis.bml.restful,org.apache.linkis.errorcode.server.restful

#mybatis before modification
wds.linkis.server.mybatis.mapperLocations=classpath:org/apache/linkis/jobhistory/dao/impl/*.xml,classpath:org/apache/linkis/variable/dao/impl/*.xml,classpath:org/apache/linkis/configuration/dao/impl/*.xml,classpath:org/apache/linkis/udf/dao/impl/*.xml,classpath:org/apache/linkis/instance/label/dao/impl/*.xml,classpath:org/apache/linkis/metadata/hive/dao/impl/*.xml,org/apache/linkis/metadata/dao/impl/*.xml,classpath:org/apache/linkis/bml/dao/impl/*.xml

wds.linkis.server.mybatis.typeAliasesPackage=org.apache.linkis.configuration.entity,org.apache.linkis.jobhistory.entity,org.apache.linkis.udf.entity,org.apache.linkis.variable.entity,org.apache.linkis.instance.label.entity,org.apache.linkis.manager.entity,org.apache.linkis.metadata.domain,org.apache.linkis.bml.entity

wds.linkis.server.mybatis.BasePackage=org.apache.linkis.jobhistory.dao,org.apache.linkis.variable.dao,org.apache.linkis.configuration.dao,org.apache.linkis.udf.dao,org.apache.linkis.instance.label.dao,org.apache.linkis.metadata.hive.dao,org.apache.linkis.metadata.dao,org.apache.linkis.bml.dao,org.apache.linkis.errorcode.server.dao,org.apache.linkis.publicservice.common.lock.dao

#mybatis after modification
wds.linkis.server.mybatis.mapperLocations=classpath*:org/apache/linkis/cs/persistence/dao/impl/*.xml,classpath:org/apache/linkis/datasourcemanager/core/dao/mapper/*.xml,classpath:org/apache/linkis/jobhistory/dao/impl/*.xml,classpath:org/apache/linkis/variable/dao/impl/*.xml,classpath:org/apache/linkis/configuration/dao/impl/*.xml,classpath:org/apache/linkis/udf/dao/impl/*.xml,classpath:org/apache/linkis/instance/label/dao/impl/*.xml,classpath:org/apache/linkis/metadata/hive/dao/impl/*.xml,org/apache/linkis/metadata/dao/impl/*.xml,classpath:org/apache/linkis/bml/dao/impl/*.xml

wds.linkis.server.mybatis.typeAliasesPackage=org.apache.linkis.cs.persistence.entity,org.apache.linkis.datasourcemanager.common.domain,org.apache.linkis.datasourcemanager.core.vo,org.apache.linkis.configuration.entity,org.apache.linkis.jobhistory.entity,org.apache.linkis.udf.entity,org.apache.linkis.variable.entity,org.apache.linkis.instance.label.entity,org.apache.linkis.manager.entity,org.apache.linkis.metadata.domain,org.apache.linkis.bml.entity

wds.linkis.server.mybatis.BasePackage=org.apache.linkis.cs.persistence.dao,org.apache.linkis.datasourcemanager.core.dao,org.apache.linkis.jobhistory.dao,org.apache.linkis. variable.dao,org.apache.linkis.configuration.dao,org.apache.linkis.udf.dao,org.apache.linkis.instance.label.dao,org.apache.linkis.metadata.hive.dao,org. apache.linkis.metadata.dao,org.apache.linkis.bml.dao,org.apache.linkis.errorcode.server.dao,org.apache.linkis.publicservice.common.lock.dao 

Deployment script changes (linkis-dist/package/sbin/linkis-start-all.sh)

startup script remove the following part 

#linkis-ps-cs 
SERVER_NAME="ps-cs" 
SERVER_IP=$CS_INSTALL_IP 
startApp 

if [ "$ENABLE_METADATA_QUERY" == "true" ]; then 
  #linkis-ps-data-source-manager
  SERVER_NAME="ps-data-source-manager"
  SERVER_IP=$DATASOURCE_MANAGER_INSTALL_IP
  startApp

  #linkis-ps-metadataquery
  SERVER_NAME="ps-metadataquery"
  SERVER_IP=$METADATA_QUERY_INSTALL_IP
  startApp
fi

#linkis-ps-cs
SERVER_NAME="ps-cs"
SERVER_IP=$CS_INSTALL_IP
checkServer

if [ "$ENABLE_METADATA_QUERY" == "true" ]; then
  #linkis-ps-data-source-manager
  SERVER_NAME="ps-data-source-manager"
  SERVER_IP=$DATASOURCE_MANAGER_INSTALL_IP
  checkServer

  #linkis-ps-metadataquery
  SERVER_NAME="ps-metadataquery"
  SERVER_IP=$METADATA_QUERY_INSTALL_IP
  checkServer
fi


#Service stop script remove the following part 
#linkis-ps-cs 
SERVER_NAME="ps-cs" 
SERVER_IP=$CS_INSTALL_IP 
stopApp 

if [ "$ENABLE_METADATA_QUERY" == "true" ]; then 
  #linkis-ps-data-source-manager 
  SERVER_NAME ="ps-data-source-manager" 
  SERVER_IP=$DATASOURCE_MANAGER_INSTALL_IP 
  stopApp 

  #linkis-ps-metadataquery 
  SERVER_NAME="ps-metadataquery" 
  SERVER_IP=$METADATA_QUERY_INSTALL_IP 
  stopApp 
fi 

For more details on service merge changes, see: https://github.com/apache/linkis/pull/2927/files

Deploy Apache Linkis1.1.1 and DSS1.1.0 based on CDH6.3.2

September 27, 2022 · 5 min read

kevinWdong

contributors

With the development of business and the update and iteration of community products, we found that Linkis1. X has greatly improved its performance in terms of resource management and engine management, which can better meet the requirements of the construction of data middle stations. Compared with version 0.9.3 and the platform we used before, the user experience has also been greatly improved, and the problems such as the inability to view details on the task failure page have also been improved. Therefore, we decided to upgrade Linkis and the WDS suite. The following are the specific practical operations, which we hope will give you a reference.

1.Environment

CDH6.3.2 Component versions

hadoop:3.0.0-cdh6.3.2
hive:2.1.1-cdh6.3.2
spark：2.4.8

hardware environment

128G cloud physical machine*2

2. Linkis installation and deployment

2.1 Compile code or release installation package?

This installation deployment adopts the release installation package method. In order to adapt to the company's CDH6.3.2 version, the dependency packages of hadoop and hive need to be replaced with the CDH6.3.2 version. Here, the installation package is directly replaced. The dependent packages and modules to be replaced are shown in the following list.

// Modules involved 

linkis-engineconn-plugins/spark
linkis-engineconn-plugins/hive
/linkis-commons/public-module
/linkis-computation-governance/

// List of cdh packages that need to be replaced

./lib/linkis-engineconn-plugins/spark/dist/v2.4.8/lib/hive-shims-0.23-2.1.1-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/spark/dist/v2.4.8/lib/hive-shims-scheduler-2.1.1-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/spark/dist/v2.4.8/lib/hadoop-annotations-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/spark/dist/v2.4.8/lib/hadoop-auth-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/spark/dist/v2.4.8/lib/hadoop-common-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/spark/dist/v2.4.8/lib/hadoop-hdfs-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/spark/dist/v2.4.8/lib/hadoop-hdfs-client-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-client-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-mapreduce-client-common-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-yarn-api-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-yarn-client-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-yarn-server-common-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-hdfs-client-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-mapreduce-client-shuffle-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/hive/dist/v2.1.1/lib/hadoop-yarn-common-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/flink/dist/v1.12.2/lib/hadoop-annotations-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/flink/dist/v1.12.2/lib/hadoop-auth-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/flink/dist/v1.12.2/lib/hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/flink/dist/v1.12.2/lib/hadoop-yarn-api-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/flink/dist/v1.12.2/lib/hadoop-yarn-client-3.0.0-cdh6.3.2.jar
./lib/linkis-engineconn-plugins/flink/dist/v1.12.2/lib/hadoop-yarn-common-3.0.0-cdh6.3.2.jar
./lib/linkis-commons/public-module/hadoop-annotations-3.0.0-cdh6.3.2.jar
./lib/linkis-commons/public-module/hadoop-auth-3.0.0-cdh6.3.2.jar
./lib/linkis-commons/public-module/hadoop-common-3.0.0-cdh6.3.2.jar
./lib/linkis-commons/public-module/hadoop-hdfs-client-3.0.0-cdh6.3.2.jar
./lib/linkis-computation-governance/linkis-cg-linkismanager/hadoop-annotations-3.0.0-cdh6.3.2.jar
./lib/linkis-computation-governance/linkis-cg-linkismanager/hadoop-auth-3.0.0-cdh6.3.2.jar
./lib/linkis-computation-governance/linkis-cg-linkismanager/hadoop-yarn-api-3.0.0-cdh6.3.2.jar
./lib/linkis-computation-governance/linkis-cg-linkismanager/hadoop-yarn-client-3.0.0-cdh6.3.2.jar
./lib/linkis-computation-governance/linkis-cg-linkismanager/hadoop-yarn-common-3.0.0-cdh6.3.2.jar

2.2 Problems encountered during deployment

2.2.1 Kerberos configuration

It needs to be added in the linkis.properties public configuration

Each engine conf also needs to be added

wds.linkis.keytab.enable=true
wds.linkis.keytab.file=/hadoop/bigdata/kerberos/keytab
wds.linkis.keytab.host.enabled=false
wds.linkis.keytab.host=your_host

2.2.2 Error is reported after Hadoop dependency package is replaced

java.lang.NoClassDefFoundError:org/apache/commons/configuration2/Configuration

Cause: Configuration class conflict. Add a commons-configuration2-2.1.1.jar under the linkis commons module to resolve the conflict

2.2.3 Running spark, python, etc. in script reports no plugin for XXX

Phenomenon: After modifying the version of Spark/Python in the configuration file, the startup engine reports no plugin for XXX

Reason: LabelCommonConfig.java and GovernanceCommonConf In scala, the version of the engine is written dead, the corresponding version is modified, and all jars containing these two classes (linkis computation governance common-1.1.1. jar and linkis label common-1.1.1. jar) in linkis and other components (including scheduleris) are replaced after compilation

2.2.4 Python engine execution error, initialization failed

Modify python. py and remove the imported pandas module
Configure the python loading directory and modify the python engine's linkis-enginecon.properties

pythonVersion=/usr/local/bin/python3.6

2.2.5 Failed to run the pyspark task and reported an error

Reason: PYSPARK is not set_ VERSION

resolvent:

Set two parameters in/etc/profile

export PYSPARK_ PYTHON=/usr/local/bin/python3.6
export PYSPARK_ DRIVER_PYTHON=/usr/local/bin/python3.6

2.2.6 Error occurs when executing the pyspark task

java.lang.NoSuchFieldError: HIVE STATS JDBC_ TIMEOUT

Reason: Spark 2.4.8 uses the hive1.2.1 package, but our hive has been upgraded to version 2.1.1. This parameter has been removed from hive2. Then the code in spark sql still calls the hive parameter, and then an error is reported,

Therefore, HIVE is deleted from the spark sql/hive code STATS JDBC TIMEOUT This parameter is recompiled and packaged to replace the spark hive in spark 2.4.8 2.11-2.4.8.jar

2.2.7 Proxy user exception during jdbc engine execution

Phenomenon: User A is used to execute a jdbc task 1. The engine chooses to reuse it. Then I also use User B to execute a jdbc task 2. It is found that the submitter of task 2 is A

Analysis reason:

ConnectionManager::getConnection

When creating a datasource, we judge whether to create it according to the key. The key is a jdbc url, but this granularity may be a bit large, because different users may access the same datasource, such as hive. Their urls are the same, but their account passwords are different. So when the first user creates a datasource, the username has been specified. When the second user comes in, If the data source is found to exist, it will be used directly instead of creating a new data source. Therefore, the code submitted by user B will be executed by user A.

Solution: Reduce the key granularity of the data source cache map, and change it to jdbc. url+jdbc. user.

DSS deployment The installation process refers to the official website documents for installation configuration. The following describes some issues encountered in the installation and debugging process.

3.1 The database list displayed on the left side of the DSS is incomplete

Analysis: The database information displayed in the DSS data source module is from the hive metabase. However, because of the permission control through the Sentry in CDH6, most of the hive table metadata information does not exist in the hive metastore, so the displayed data is missing.

resolvent:

The original logic is transformed into the way of using jdbc to link hive and obtain table data display from jdbc.

Simple logic description:

The properties information of jdbc is obtained through the IDE jdbc configuration information configured on the linkis console.

DBS: Get the schema through connection. getMetaData()

TBS: connection. getMetaData(). getTables() Get the tables under the corresponding db

COLUMNS: Get the columns information of the table by executing describe table

3.2 Error jdbc is reported when executing jdbc script in DSS workflow name is empty

Analysis: The default creator in the dss workflow is Schedulis. Because the related engine parameters of Schedulis are not configured in the management console, the parameters read are all empty.

Adding a category of Schedulis to the console gives an error, ”The Schedulis directory already exists. Because the creator in the scheduling system is schedulis, the Schedulis Category cannot be added. In order to better identify each system, the default creator in the dss workflow is changed to nod_exception. This parameter can add wds. linkis. flow. job. creator. v1=nod_execution in the dss flow execution server. properties.

Overview​

environment and version​

dependence adjustment and packaging​

Linkis outermost adjustment pom file​

linkis-engineplugin-hive的pom文件​

linkis-engineplugin-spark的pom文件​

linkis-hadoop-common的pom文件​

Linkis-Label-Common​

Linkis-computation-Governance-Common​

Compilation​

Compile Error​

DatasphereStudio pom file​

deployment​

linkis deployment​

Linkis Deployment Points Attention Point​

Install web front end​

scriptis tool installation​

Nginx deployment for example​

nginx.conf​

How to check the question​

1. Preparations​

1.1 Environment installation​

1.2 Environment Verification​

2. Linkis submits OceanBase database tasks​

2.1 Submit tasks through the shell​

2.2 Submit tasks through Linkis SDK​

2.3 Multi-data source support​

background​

Architecture Diagram​

Architecture Description​

Core process​

Database Design​

Foreword​

Service merge changes​

To be achieved​

Specific changes​

Gateway changes (org.apache.linkis.gateway.ujes.route.HaContextGatewayRouter)​

RPC Service Change（org.apache.linkis.rpc.conf.RPCConfiguration）​

Configuration file changes​

Deployment script changes (linkis-dist/package/sbin/linkis-start-all.sh)​