Posts

Quick 1] Material Master

 Issue 1:   Three Fields of Table AUSP coming from gold layer, instead get them from domain Solution=> Option 1]: Keep Column sourced from silver [No changes, Document fields in FDD ] # Description:  AUSP ATINN characteristic codes not documented in FDD/XLSX # Current State:  Notebook uses ATINN 0000000029/0000000034/0000000045 with KLART=001 but FDD only references AUSP.ATWRT generically # Suggested Fix:  Create ATINN-to-field mapping table in FDD: 0000000029→MPN,                   0000000034→Name, 0000000045→Model Option 2]: Change Domain logic  # Code: Found in Master_product # LOC: 44, 657  Issue 2: Related to Description Spelling name # Source mapping sheet:   Where DW field name is coming incorrect from main source DW  =>  SQL Table : tbl_Material_Valuation_Tableau SQL Fields : Material Decription #LOC: 77 # Code: -- Material description from MAKT.MAKTX (EN) via pr...

Quick 6] Price Realization Tool - EVOQUA

 Report : Price Realization Tool - EVOQUA >> Field mapping Validation 19 Fails >> Join Validation All Pass >> Filter Validation  All Pass >> Derived Field Validation 5 Fails >> Data Quality Checks All pass >> Action items

Quick: 5] Profitability_Prod - Profitability_Actuals_Tableau

 Report: Profitability_Actuals_Tableau >> Field Mapping Validation 4 Field Fails >> Join validation No fails >> Filter Validation no fails Derived Field validation no fails >> Data Quality Checks 2 Fails >> Action items

Quick :4] Pricing POC_Prod - Purchase info Record

 Report name:  Purchase info Record >> Field Mapping: # Fix: Find in BW  table # LOC: # Code:  '' AS `Last Price Update`, --A017.Datab         CAST(NULL AS STRING) AS `Last Close PO Delivery Date`,--EKET EINDT         CAST(NULL AS STRING) AS `Last Close PO Price` --EKPO NETPR # Comment: Three Fields=> Last Price Update  [ Z017 - DATAB ] Last Close PO Delivery Date [ EKET - EINDT ] Last Close PO Price [ EKPO - NETPR ] |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| Rough=> Planned Delivery Time (in Days) :  Fix: (Hardcoded precision may not match SAP Verify SAP EINE-APLFZ field type) Net Price: Fix: Hardcoded precision/scale Verify SAP EINE-NETPR field type Price Unit:  Fix: Hardcoded precision may not match SAP Verify SAP EINE-PEINH field type goods_invoice_receipt: Fix:  Last Price Update Fix: Last Close...

Quick: 3] Price list with release status

 Report name:  Price list with release status Issue 1:   Field Mapping validation #Fields under A006 Table=> 1 ]PL_SALES_ORG_ID  2] PL_DISTRIBTN_CHANNEL_ID 3] PL_PRICE_LIST_ID 4] PL_CnTy 5] PL_MATERIAL 6] PL_VALID_START_DATE 7] PL_VALID_END_DATE # Extra Field: 1] load_date 2] RELEASE_STATUS Issue 2:  Join validation join name: A006 Pricing Master Fix: Implement explicit JOIN to A006 table on appropriate keys join name: KONP Pricing Conditions Fix: No Fix Required join name: TCURR Currency Conversion Validation Review of Your Comment: NOT JOINED; FDD documents detailed conversion logic In your code: No TCURR table is used No exchange rate logic exists Amount is directly taken from salesorders Why this Fails code: CAST (so.Rate AS decimal ( 17 , 2 )) AS `PL_List Price Amount` so.rate_unit AS `PL_Unit_CURRENCY` This means: Amount is in original transaction currency No conversion to reporting currency If your dataset contains: Curren...

QUICK: 2]Material Valuation

Report: Material Valuation (Validation and Fixes.....) Issue 1:  Material Decription  [Naming issue: check the main source]  [Impact: LOW]   Issue 2:    [Impact: MEDIUM] Joins=> Join name: A]Product to Sales Org  [one line need to be added] "one product_id have multiple sales org" # Code: New_updated Code for  Product to Sales Org :  SALES_ORG_DISTINCT AS (         SELECT DISTINCT              product_id,             sales_org_code,  # CT_XYLEM_19_03             Item_category_group_from_material_master,             Delivering_Plant_Own_or_External        FROM {gold_catalog_name}.master_data.product_sales_organization # Comment:  New Line need to be added in the join. B] Product_Plant to Inventory   [Impact: Low] # LOC: 27 # Code: CT_XYLEM=>     ...

SAURABH NOTES BLOG

 POST 1] Unity Catalog Links and Quick Notes By Nishtha Jain and Pratima Jain To Do Now =>          1: Make All Links Proper Here  [  ]          2: Start Learning and Making Notes  [  ]

DATA ENGINEER BLOG

Image
1] Post: Microsoft Fabrics    Understand what is Microsoft Fabrics in Simple Way... Written Notes:::>>>>   Link2Pdf Reference :>>> Youtube Video  Link2video ________________________________________________________________________________ 2] Post : Celebal Technology Link  Celebal Tracker:  Tracker :>  Link Sample :>  Link Colab :>  Link Curiculum :>  Link

SAURABH SKILLS BLOG

 SQL SKILL : SQL

Time + Work (A2 -->) [10-7]

Most Important Topics:          1. Azure Databricks : [Rajas Playlist  | Data Master Playlist | Partner Academy Data engineer                          Course]            2. UC Migration:  [   Structured Theory  from Learnings  Cloudpandith 4 Videos + Project 1 video of cloud pandith and then naval yemul  ]           3. Leet Code : Solve Daily 2 Coding Problem [Time 5 to 6:30 ]

THE DATA ENGINEER Topics ~sauru_6527

 Topics: Databricks Naval Yemul Playlist [  ] Rajas DE [  ] Partners Academy Course for Data Engineer :  Link Apache Spark Course By Darshil Parmar [  ] Azure Cloud DataMaster: Naval yemul [  ] UC Migration (Catalog: Hive to Unity ) 1 End to End Project  [  ] Projects Spotify [  ] Uber [  ] UC Migration [  ] Delta Live Table (DLT) Framework S/w Engineer Test [  ]  Sample DLT with SQL Script Naval Yemul Playlist [ Done ] Python  Course by Darshil python for data engineer [  ] Rishabh Pandey Bootcamp [  ] Pyspark Functions [  ] Practice [  ] SQL Practice [  ] Presentation  August Study Presentation in Celebal [Your Review of First Month] [  ]

[STRUCTURED] Data with Darshil [Apache Spark With Databricks For Data Engineering ]

Image
[STRUCTURED Apache Spark] (Quick in 10 Min) # Reference=> Apache in 10 Min ~ DatawithDarshil  Index Intro Hadoop Apache Spark

[ STRUCTURED Databricks ]

Image
 >>>>> Databricks Cluster UI (~ Ref Youtube Video : ~Mr.Ktalks Tech ) # Cluster Diagram Explanation:  Cluster : Nodes and Executors Set of VM 1 Node = 1 Executor Each Executor :  has 1/more Cores 1 Core =  1 Partition Each Core : takes 1 task and used for parallism Driver : Step 1: Write Code in Driver Step 2:  Everything get's Divides into STAGES and TASKS This is Done with help of DAG DAG: divides all jobs in form of stages and task each TASK: executes on executor each EXECUTOR: it made of essential cores each CORE: define degree of parallelism that will happen when JOB RUNS.... #  Cluster: Set of Virtual Machines to do work Creating Computer Resources for processing BigData Types: 1] All Purpose Compute 2] Job Compute 1] All-Purpose Compute [Everything] Analyze data in NB Create, Terminate and Restart Cost: Expensive 2] JOB Compute [Just to run NB as Job with ADF pipeline and Databricks] Just Support Running a NB as Job No restart Cost : L...

[Unstructured to Structured ]Learn Relearn and Unlearn Technology

Image
# Terms Used by Data engineer >Tables: Only Structured > Volume : All Types  [Structured + Unstructured] > Catalog Binding :  Restricts User Permission diffferent categories: 1. production 2. development 3. testing > Scheme Provisioning : Dont need to explicitly define link/sync azure active directory when resource is leaving we dont need to explicitly remove id IF resource is leaving company is called scheme provsioning > UC:  Storage Credentials  External locations Celebal Tech Utility (UNITY LAUNCHER)  => UCX Similar to unity launcher (UR Utility) > Sync and deep clone > Migrating Notebook to Hive [Steps]: Two level  --> Three level namespace (UC only supports Three Level Namespace) If we use mount : we need to replace with external location If we use RDD replace with dataframe > Goup vs Service Principle Group: Handles by user Service Principle: Handled by Machine > CI/CD and Branching Strategies in Databricks > Q: Ho...