How Do I Read Uploaded Files for a Spark Jar Job?
You can use SparkFiles to read the file submitted using –-file form a local path: SparkFiles.get("Name of the uploaded file").
Note
- The file path in the Driver is different from that obtained by the Executor. The path obtained by the Driver cannot be passed to the Executor.
- You still need to call SparkFiles.get("filename") in Executor to obtain the file path.
- The SparkFiles.get() method can be called only after Spark is initialized.
The java code is as follows:
package main.javaimport org.apache.spark.SparkFilesimport org.apache.spark.sql.SparkSessionimport scala.io.Sourceobject DliTest {def main(args:Array[String]): Unit = {val spark = SparkSession.builder.appName("SparkTest").getOrCreate()// Driver: obtains the uploaded file.println(SparkFiles.get("test"))spark.sparkContext.parallelize(Array(1,2,3,4))// Executor: obtains the uploaded file..map(_ => println(SparkFiles.get("test"))).map(_ => println(Source.fromFile(SparkFiles.get("test")).mkString)).collect()}}
Parent topic: Spark Job Development