Definition
Setl is the entry point of the framework. It handles the pipeline instantiation and the management of repositories and connectors.
Instantiation
Instantiation with default configuration:
val setl = Setl.builder()
.withDefaultConfigLoader() // load the default config file according to the jvm property: app.environment
.getOrCreate()
Or you can specify a configuration file:
val setl = Setl.builder()
.withDefaultConfigLoader("conf_file_name.conf")
.getOrCreate()
Or you can use your own SparkConf object:
val setl = Setl.builder()
.setSparkConf(sparkConf)
.withDefaultConfigLoader()
.getOrCreate()
Or you can specify the config path of Setl settings in your configuration file:
val setl = Setl.builder()
.setSetlConfigPath("path_of_setl_configuration")
.withDefaultConfigLoader()
.getOrCreate()
Repository management
Setl helps you to create and manage the SparkRepository of your application.
To register spark repository:
setl
.setSparkRepository[MyClass]("configPath")
.setSparkRepository[MyClass]("anotherConfigPath", consumer = Seq[Class[MyFactory1], Class[MyFactory2])
When Setl creates a new pipeline, all the registered SparkRepositories will be passed into the delivery pool.
To get a spark repository:
val myClassRepository: SparkRepository[MyClass] = setl.getSparkRepository[MyClass]("configPath")
You can have as many SparkRepositories as possible, even if they have the same type. Each registered spark repository is identified by its config path.
:warning: the setSparkRepository
method will not update an existing repository having the same config path. To update a registered repository, use the resetSparkRepository
method.
To use a repository in a Factory
class MyFactory extends Factory[Any] with HasSparkSession {
import spark.implicits._
@Delivery
val repo = SparkRepository[MyClass] // this variable will be delivered at runtime
@Delivery(autoLoad = true)
val ds = spark.emptyDataset[MyClass] // pipeline will invoke the "findAll" method of an available
// SparkRepository[MyClass], and deliver the result to this variable.
}
Connector management
Setl handles Connectors in a similar way as it handles repositories. You can have as many connectors as possible as long as each of them has a different configuration path and delivery ID.
To register a connector:
setl
.setConnector("config_path_1") // type: Connector, deliveryID: config_path_1
.setConnector("config_path_2", "id1") // type: Connector, deliveryID: id1
.setConnector("config_path_3", classOf[CSVConnector]) // type: CSVConnector, deliveryID: config_path_3
.setConnector("config_path_4", "id2", classOf[DynamoDBConnector]) // type: DynamoDBConnector, deliveryID: id2
To use connectors in a Factory
class MyFactory extends Factory[Any] {
@Delivery(id = "config_path_1")
var connector1: Connector = _
@Delivery(id = "id1")
var connector2: Connector = _
@Delivery(id = "config_path_3")
var connector3: CSVConnector = _
@Delivery(id = "id2")
var connector4: DynamoDBConnector = _
}
Pipeline management
A new Pipeline could be instantiated by calling newPipeline()
. The newly created pipeline will by default contain all the registered repositories and connectors.
val pipeline = setl.newPipeline()