[STRUCTURED] Data with Darshil [Apache Spark With Databricks For Data Engineering ]
[STRUCTURED Apache Spark] (Quick in 10 Min)
# Reference=> Apache in 10 Min ~ DatawithDarshil
Index
- Intro
- Hadoop
- Apache Spark
1: Intro
- Big Data: Need to Be Organize for business Insights
- Old Tech : Hadoop
- New Tech : Apache Spark[Online Streaming], Databricks [Cloud Based Platform, Supports Apache Spark and other Big Data Frame Work], Delta Lake [ACID], Lakehouse [Data lake + Data Warehouse]
>> Components
- HDFS => "Storage"
- Map Reduce => "Process"
>> Limitations
- Batches => " Wait..."
- Disk => "Slow"
3: Apache Spark [Online Process]
>> Components
- Driver Process => " Manages Task" [HEART: Manages]
- Executor Process => " Actual Work" [Worker Node]
>> Features
- Memory => "RDD" [Backbone]
- 10x
- Multilanguage Supports
>> Architecture
- Driver process
- Cluster Manager (Manages both)
- Executor process
Photo:
Cluster Manager: Manages and Co-rdinates execution of task across cluster of computers
>> How does Apache Spark executes Code in parallel?
>Compulsory Step :
- create spark session
> Spark session [Multi language support]:
- python, java, scala
> Sample Code of Spark Session
# Code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getorCreate()
myrange = spark.range(1000).toDF("number")
myRange.show()
>>> Whats Dataframe in Python and Spark langauge:
Dataframe => (Rows and Colns) Eg: Ms Excel
Python : Single Computer
Spark : Multiple Computer
>>> Terms
Lazy Evolution : DAG
Transformation : filter by gender
Actions : .show()
>>> Miniproject on sparkpreprocessing
Steps
- Create Spark session
- Import Dataset
- Convert to table
- Write SQL Query on it
Addn : You can Perform Numpy,Pandas Functions on it also.....
# Code
>>> Apache Spark Optimizing Techniques
- Coalse Function : Less Shuffling
- Re-partitiong : More Shuffling
____________________________________________________________________________
# Reference Darshil Notes : Obsedian Link
Main Topics in Apache Spark in this Course:[Total 38 Topics] [Doing...]
A] Apache Spark Guide [1-8]
1:What is Apache Spark ? 2:Why do we need Spark? Big Data Problem
3:Spark Architecture
4:Spark DataFrame
5:Partitions
6:Transformations
7:Lazy Evolution
8:Actions
B] End to End Example [9]
9. End-to-End Example
C] Structured API [10 - 16]
10. Structured API overview
11. Basic Structured operations
12. Working with different types of data
13. User Defined Functions
14. Joins
15. Data Sources
16. Spark
11. Basic Structured operations
12. Working with different types of data
13. User Defined Functions
14. Joins
15. Data Sources
16. Spark
D] Lower Level API [17 - 19]
17. Resilient Distributed Dataset (RDD)
18. Advance RDDS
19. Distributed Shared Variable
18. Advance RDDS
19. Distributed Shared Variable
E] Prodcution Application (Deployment and debuugging) [20 - 24]
20. How Spark Runs on a Cluster
21. The Lifecycle of spark application
22. Spark Deployment
23. Monitoring and debugging
24. Debugging and common errors
20. How Spark Runs on a Cluster
21. The Lifecycle of spark application
22. Spark Deployment
23. Monitoring and debugging
24. Debugging and common errors
F] Databricks [25- 38]
25. Overview of Databricks and ecosystem
26. Databricks architecture
27. Databricks Lakehouse Architecture
28. Setting up databricks environment
29. Understand databricks workspace
30. Creating Cluster
31. Databricks NB and File System (DBFS)
32. Delta Lake
33. Advance Delta Lake (Time, Travel, Optimize, Vaccum)
34. Database and Tables on Databricks
35. Views in Databricks
36. Delta Table (Working with Files)
37. Medalion Architecture Complete Guide
38. In Depth Parquet Files
25. Overview of Databricks and ecosystem
26. Databricks architecture
27. Databricks Lakehouse Architecture
28. Setting up databricks environment
29. Understand databricks workspace
30. Creating Cluster
31. Databricks NB and File System (DBFS)
32. Delta Lake
33. Advance Delta Lake (Time, Travel, Optimize, Vaccum)
34. Database and Tables on Databricks
35. Views in Databricks
36. Delta Table (Working with Files)
37. Medalion Architecture Complete Guide
38. In Depth Parquet Files
_________________________________________________________________________________
Now My Language [English + Marathi + Hindi Notes]
Comments
Post a Comment