Exp- 3 to 5 years
Please use this job description:
Description
Key Responsibilities:
∙ Operational management and architecture of Hadoop ecosystem in a GxP regulated environment
∙ Build out and maintain GxP-compliant clusters in data centers around the world
∙ Tuning multi-tenant Hadoop ecosystem for operational efficiency, balancing various workloads
∙ Implement security, encryption, authentication, and authorization controls to adhere to corporate security policies and maintain GxP compliance
∙ Enable High Availability and resiliency in the cluster
∙ Work with data architects on the logical data models and physical database designs optimized for performance, availability and reliability
∙ Helping to tuning and optimization of backend and frontend data operations
∙ Serve as a query tuning and optimization technical expert, providing feedback to team
∙ Scripting and automation to support development, QA and production database environments, deployments to production and management of services and infrastructure
∙ Mentors development team members
∙ Proactively helps to resolve difficult technical issues
∙ Provide technical knowledge to teams during project discovery and architecture phases
∙ Keep management informed of work activities and schedules
∙ Assess new initiatives to determine the work effort and estimate the necessary time-to-completion
∙ Document new development, procedures or test plans as needed
∙ Participate in data builds and deployment efforts. Help mature our Continuous Integration and Continuous Deployment methodologies
∙ Participate in projects through various phases
∙ Performs other related duties as assigned
∙ Partner with the business units to develop effective solutions that solve business challenges
Technical Skills and Qualifications:
∙ Computer Science Degree/Student
∙ 3+ years strong native SQL skills
∙ 3+ years strong experience in database and data warehousing/data lake concepts and techniques. Understand: relational and dimensional modeling, star/snowflake schema design, BI, Data Warehouse operating environments and related technologies, ETL, MDM, and data governance practices.
∙ 3+ years’ experience with Hadoop, Hive, Impala, HBase, and related technologies
∙ 2+ years’ experience working in Linux
∙ 1+ years strong experience with low latency (near real time) systems and working with Tb data sets, loading and processing billions of records per day
∙ 1+ years’ experience with Chef/Puppet, Bash, Linux scripting, Ansible and/or Terraform
∙ 1+ years’ experience with containerization (Mesosphere, Docker, Kubernetes)
∙ 1+ years’ experience with MPP, shared nothing database systems, and NoSQL systems
∙ 1 year experience with Spark, Scala, Python, Java, and/or R
∙ 1+ years’ experience working with Cloudera distribution is a plus
∙ Ability to work in a fast-paced, team-oriented environment
∙ Ability to complete the full lifecycle of software development and deliver in an Agile/Scrum environment, leveraging Continuous Integration/Continuous Development
∙ Strong interpersonal skills, including a positive, solution-oriented attitude
∙ Must be passionate, flexible and innovative in utilizing the tools, their experience, and any other resources, to effectively deliver to very challenging and always changing business requirements with continuous success
∙ Must be able to interface with various solution/business areas to understand the requirements and support development
∙ Healthcare and/or reference data experience is a plus
Source link