Good understanding of RDBM’s data concepts and unstructured data- Very strong SQL /querying skills
• Understanding of Hadoop and its components (Impala, Zookeeper…) for creating ETL pipelines and prepare plans for all ETL planning, architecture and procedures
• Hands on experience Spark/Scala programming
• Hands on experience in extraction and transformation of data using ETL tools
• Experience in handling large data sets and on the fly transformations
• Comfortable working with Python and understands different data structures
• Experience coordinating activities across a global team
• Continuous process improvement/ Quality experience is a plus
• Designing Data Schema normalization techniques to efficiently store data
Source link