In depth understanding and knowledge on distributed computing with spark.
Deep understanding of Spark Architecture and internals
Proven experience in data ingestion, data integration and data analytics with spark, preferably PySpark.
Expertise in ETL processes, data warehousing and data lakes.
Hands on with python for Big data and analytics.
Hands on in agile scrum model is an added advantage.
Knowledge on CI/CD and orchestration tools is desirable.
AWS S3, Redshift, Lambda knowledge is preferred
Mandatory Skillsets –
Python, Spark, Pyspark, SQL, AWS(Glue, EMR)
Source link