In this article, we shall discuss how to use different spark configurations while creating PySpark Session, and validate the Configurations. Spark Session is the entry point to any Spark functionality.
Table of contents
1. Create Spark Session With Configuration2. Configuring Spark Session in Pyspark3. Validate Spark Session Configuration4. Conclusion
1. Create Spark Session With Configuration
Spark Session provides a unified interface for interacting with different Spark APIs and allows applications to run on a Spark cluster. Spark Session was introduced in Spark 2.0 as a replacement for the earlier Spark Context and SQL Context APIs.
To create a Spark Session in PySpark, you can use the SparkSession builder. Here is an example of how to create a Spark Session in Pyspark:
# Imports
from pyspark.sql import SparkSession
# Create a SparkSession object
spark = SparkSession.builder
.appName(“MyApp”)
.master(“local[2]”)
.config(“spark.executor.memory”, “2g”)
.getOrCreate()
In this example, we set the Spark master URL to “local[2]” to run Spark locally with two cores, and we set the Spark Session Configuration in Pyspark amount of executor memory to “2g”. You can customize these options as per your requirements.
2. Configuring Spark Session in Pyspark
To change the Spark Session configuration in PySpark, you can use the SparkConf() class to set the configuration properties and then pass this SparkConf object while creating the SparkSession object.
Here’s an example:
# Imports
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
# Create a SparkConf object
conf = SparkConf().setAppName(“MyApp”)
.setMaster(“local[2]”)
.set(“spark.executor.memory”, “2g”)
# Create a SparkSession object
spark = SparkSession.builder.config(conf=conf).getOrCreate()
Now, you can use the SparkSession object to perform various Spark operations
In this example, we are changing the Spark Session configuration in PySpark and setting three configuration properties using the set() method of SparkConf object.
The first property setAppName() sets the name of the application.
The second property setMaster() specifies the Spark cluster manager to connect to. Here, we are running in local mode with two cores.
The third property set(“spark.executor.memory”, “2g”) sets the amount of memory to be used by each executor in the Spark cluster.
Finally, we pass the SparkConf object to the config() method of the SparkSession builder and create a SparkSession object. You can change the configuration properties as per your requirement. Just make sure to set them before creating the SparkSession object.
3. Validate Spark Session Configuration
To validate the Spark Session configuration in PySpark, you can use the getOrCreate() method of the SparkSession object to get the current SparkSession and then use the SparkContext object’s getConf() method to retrieve the configuration settings.
# Imports
from pyspark.sql import SparkSession
# Create a SparkConf object
conf = SparkConf().setAppName(“MyApp”)
.setMaster(“local[2]”)
.set(“spark.executor.memory”, “2g”)
# Create a SparkSession object
spark = SparkSession.builder.config(conf=conf).getOrCreate()
# Retrieve the SparkConf object from the SparkContext
conf = spark.sparkContext.getConf()
# Print the configuration settings
print(“spark.app.name = “, conf.get(“spark.app.name”))
print(“spark.master = “, conf.get(“spark.master”))
print(“spark.executor.memory = “, conf.get(“spark.executor.memory”))
# Output
spark.app.name = MyApp
spark.master = local[2]
spark.executor.memory = 2g
In this example, we retrieve the SparkConf object from the SparkContext and print the values of three configuration properties: spark.app.name, spark.master, and spark.executor.memory. You can add or remove configuration properties to validate their values.
You can run this code after setting your Spark Session configuration properties to see the values of those properties. If the printed values match your configuration, it means that your configuration has been successfully applied to the Spark Session.
You can also set most of these values while submitting the spark application using spark-submit.
4. Conclusion
In conclusion, the Spark Session in PySpark can be configured using the config() method of the SparkSession builder. You can set various configuration properties, such as the application name, the Spark master URL, and the executor memory, to customize the behavior of your Spark application.
Related Articles
PySpark – What is SparkSession?
Spark/Pyspark Application Configuration
What is DAG in Spark or PySpark
PySpark collect_list() and collect_set() functions
In this article, we shall discuss how to use different spark configurations while creating PySpark Session, and validate the Configurations. Spark Session is the entry point to any Spark functionality. 1. Create Spark Session With Configuration Spark Session provides a unified interface for interacting with different Spark APIs and allows applications to run on a Read More PySpark