Posts

SAURABH NOTES BLOG

 POST 1] Unity Catalog Links and Quick Notes By Nishtha Jain and Pratima Jain To Do Now =>          1: Make All Links Proper Here  [  ]          2: Start Learning and Making Notes  [  ]

DATA ENGINEER BLOG

Image
1] Post: Microsoft Fabrics    Understand what is Microsoft Fabrics in Simple Way... Written Notes:::>>>>   Link2Pdf Reference :>>> Youtube Video  Link2video ________________________________________________________________________________ 2] Post : Celebal Technology Link  Celebal Tracker:  Tracker :>  Link Sample :>  Link Colab :>  Link Curiculum :>  Link

SAURABH SKILLS BLOG

 SQL SKILL : SQL

Time + Work (A2 -->) [10-7]

Most Important Topics:          1. Azure Databricks : [Rajas Playlist  | Data Master Playlist | Partner Academy Data engineer                          Course]            2. UC Migration:  [   Structured Theory  from Learnings  Cloudpandith 4 Videos + Project 1 video of cloud pandith and then naval yemul  ]           3. Leet Code : Solve Daily 2 Coding Problem [Time 5 to 6:30 ]

THE DATA ENGINEER Topics ~sauru_6527

 Topics: Databricks Naval Yemul Playlist [  ] Rajas DE [  ] Partners Academy Course for Data Engineer :  Link Apache Spark Course By Darshil Parmar [  ] Azure Cloud DataMaster: Naval yemul [  ] UC Migration (Catalog: Hive to Unity ) 1 End to End Project  [  ] Projects Spotify [  ] Uber [  ] UC Migration [  ] Delta Live Table (DLT) Framework S/w Engineer Test [  ]  Sample DLT with SQL Script Naval Yemul Playlist [ Done ] Python  Course by Darshil python for data engineer [  ] Rishabh Pandey Bootcamp [  ] Pyspark Functions [  ] Practice [  ] SQL Practice [  ] Presentation  August Study Presentation in Celebal [Your Review of First Month] [  ]

[STRUCTURED] Data with Darshil [Apache Spark With Databricks For Data Engineering ]

Image
[STRUCTURED Apache Spark] (Quick in 10 Min) # Reference=> Apache in 10 Min ~ DatawithDarshil  Index Intro Hadoop Apache Spark

[ STRUCTURED Databricks ]

Image
 >>>>> Databricks Cluster UI (~ Ref Youtube Video : ~Mr.Ktalks Tech ) # Cluster Diagram Explanation:  Cluster : Nodes and Executors Set of VM 1 Node = 1 Executor Each Executor :  has 1/more Cores 1 Core =  1 Partition Each Core : takes 1 task and used for parallism Driver : Step 1: Write Code in Driver Step 2:  Everything get's Divides into STAGES and TASKS This is Done with help of DAG DAG: divides all jobs in form of stages and task each TASK: executes on executor each EXECUTOR: it made of essential cores each CORE: define degree of parallelism that will happen when JOB RUNS.... #  Cluster: Set of Virtual Machines to do work Creating Computer Resources for processing BigData Types: 1] All Purpose Compute 2] Job Compute 1] All-Purpose Compute [Everything] Analyze data in NB Create, Terminate and Restart Cost: Expensive 2] JOB Compute [Just to run NB as Job with ADF pipeline and Databricks] Just Support Running a NB as Job No restart Cost : L...

[Unstructured to Structured ]Learn Relearn and Unlearn Technology

Image
# Terms Used by Data engineer >Tables: Only Structured > Volume : All Types  [Structured + Unstructured] > Catalog Binding :  Restricts User Permission diffferent categories: 1. production 2. development 3. testing > Scheme Provisioning : Dont need to explicitly define link/sync azure active directory when resource is leaving we dont need to explicitly remove id IF resource is leaving company is called scheme provsioning > UC:  Storage Credentials  External locations Celebal Tech Utility (UNITY LAUNCHER)  => UCX Similar to unity launcher (UR Utility) > Sync and deep clone > Migrating Notebook to Hive [Steps]: Two level  --> Three level namespace (UC only supports Three Level Namespace) If we use mount : we need to replace with external location If we use RDD replace with dataframe > Goup vs Service Principle Group: Handles by user Service Principle: Handled by Machine > CI/CD and Branching Strategies in Databricks > Q: Ho...