Skip to content

Commit 1311ed8

Browse files
authored
Create Delimiter.py
Add Delimiter.py for PySpark and update contributors list This pull request adds the Delimiter.py script for PySpark functionality and includes myself (divith raju) in the contributors list.
1 parent 8f46090 commit 1311ed8

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

Delimiter.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# different delimiter in a file pyspark
2+
3+
from pyspark.sql import SparkSession
4+
from pyspark.sql.functions import split, col
5+
6+
# Initialize Spark Session
7+
#spark = SparkSession.builder.appName("mixedDelimiterExample").getOrCreate()
8+
9+
# Sample data
10+
data = ["1,Alice\t30|New York"]
11+
12+
# Creating a DataFrame with a single column
13+
df = spark.createDataFrame(data, "string")
14+
15+
# Custom logic to split the mixed delimiter row
16+
split_col = split(df['value'], ',|\t|\|')
17+
18+
# Creating new columns for each split part
19+
df = df.withColumn('id', split_col.getItem(0))\
20+
.withColumn('name', split_col.getItem(1))\
21+
.withColumn('age', split_col.getItem(2))\
22+
.withColumn('city', split_col.getItem(3))
23+
24+
# Selecting and showing the result
25+
df.select('id', 'name', 'age', 'city').show()
26+
27+
# Stop Spark Session
28+
#spark.stop()

0 commit comments

Comments
 (0)