• 5+ years of coding experience with distributed systems and exposure to Big Data technology such as Hadoop MapReduce, Pig, SQOOP, YARN, Hive, HBase, PySpark
• GCP certified data engineering professional with hands-on experience in GCP DataProc with Spark, Hive as core skills
• Hands-on experience in building data pipelines for PubSub to GCS & GCS to BigQuery
• Experience in migrating large scale data and workloads from on-prem to GCP :
o Code (HiveQL or Spark) On-prem To GCP (DataProc , BigQuery)
o Data (Tearadata, Oracle, NoSQL, DataLake, Data warehouse) On-prem To GCP (GCS, BigQuery)
• Strong experience in building and maintaining data warehouses with BigQuery
• Experience in developing GCP Dataflows / Cloud Composer
• Must have strong capability in Spark , Hive tuning and performance optimization
• Must have strong capability in debugging, diagnosing, and trouble-shooting complex, Hadoop Jobs (Spark, Hive, Oozie workflows) or equivalent in DataProc
Source link
