Skip to the content.

Definition

Setl is the entry point of the framework. It handles the pipeline instantiation and the management of repositories and connectors.

Instantiation

Instantiation with default configuration:

val setl = Setl.builder()
  .withDefaultConfigLoader()  // load the default config file according to the jvm property: app.environment
  .getOrCreate()

Or you can specify a configuration file:

val setl = Setl.builder()
  .withDefaultConfigLoader("conf_file_name.conf") 
  .getOrCreate()

Or you can use your own SparkConf object:

val setl = Setl.builder()
  .setSparkConf(sparkConf)
  .withDefaultConfigLoader() 
  .getOrCreate()

Or you can specify the config path of Setl settings in your configuration file:

val setl = Setl.builder()
  .setSetlConfigPath("path_of_setl_configuration")
  .withDefaultConfigLoader() 
  .getOrCreate()

Repository management

Setl helps you to create and manage the SparkRepository of your application.

To register spark repository:

setl
  .setSparkRepository[MyClass]("configPath")
  .setSparkRepository[MyClass]("anotherConfigPath", consumer = Seq[Class[MyFactory1], Class[MyFactory2])

When Setl creates a new pipeline, all the registered SparkRepositories will be passed into the delivery pool.

To get a spark repository:

val myClassRepository: SparkRepository[MyClass] = setl.getSparkRepository[MyClass]("configPath")

You can have as many SparkRepositories as possible, even if they have the same type. Each registered spark repository is identified by its config path.

:warning: the setSparkRepository method will not update an existing repository having the same config path. To update a registered repository, use the resetSparkRepository method.

To use a repository in a Factory

class MyFactory extends Factory[Any] with HasSparkSession {
  import spark.implicits._

  @Delivery
  val repo = SparkRepository[MyClass]  // this variable will be delivered at runtime

  @Delivery(autoLoad = true)
  val ds = spark.emptyDataset[MyClass]  // pipeline will invoke the "findAll" method of an available  
                                        // SparkRepository[MyClass], and deliver the result to this variable.
}

Connector management

Setl handles Connectors in a similar way as it handles repositories. You can have as many connectors as possible as long as each of them has a different configuration path and delivery ID.

To register a connector:

setl
  .setConnector("config_path_1")  // type: Connector, deliveryID: config_path_1
  .setConnector("config_path_2", "id1")  // type: Connector, deliveryID: id1
  .setConnector("config_path_3", classOf[CSVConnector])  // type: CSVConnector, deliveryID: config_path_3
  .setConnector("config_path_4", "id2", classOf[DynamoDBConnector])  // type: DynamoDBConnector, deliveryID: id2

To use connectors in a Factory

class MyFactory extends Factory[Any] {
  @Delivery(id = "config_path_1")
  var connector1: Connector = _ 

  @Delivery(id = "id1")
  var connector2: Connector = _ 

  @Delivery(id = "config_path_3")
  var connector3: CSVConnector = _ 

  @Delivery(id = "id2")
  var connector4: DynamoDBConnector = _ 
}

Pipeline management

A new Pipeline could be instantiated by calling newPipeline(). The newly created pipeline will by default contain all the registered repositories and connectors.

 val pipeline = setl.newPipeline()