Year of Exp required 4 years
Skill set: IBM Rational LSF, Linux, Basic networking
Strong written and oral communication
Strong knowledge in Linux flavours and windows clients
Highly proficient in gauging, finding, analyze and fine tune the collective complex environments.
Highly proficient and be able to handle complex tasks in Linux (various flavors RHEL/Centos/SUSE)/Unix (Solaris) administration (involving and including Networking/network protocols/Network services/other services/NIS/Autofs/NFS/system resource usage, analysis and performance tuning)
Fair knowledge of Storage, firewall, Windows OS, packaging, Network licensing and Configuration management tools (Design sync and ClearCase)
Excellent analytical and troubleshooting skills.
Python/Perl/Shell scripting
Deliverables:
The below list of activities must be owned, worked and results shared to users, business and other teams within IT. The team must be able to handle the below tasks without any dependency.
LSF Jobs administration:
oJobs type analysis, analyze pend reasons, parallel jobs, analyze and change resource usage settings, jobs sates, reason for current state of jobs, terminate, reschedule, check host load, exclude jobs in GUI, jobs chunking, inefficient jobs, Multiuser jobs, etc.
Queue management and responding users on max capacity on specific queues
Monitoring LSF Livemon.
oBased on observation and metrics, team have to take actions like working with customer, educate them and solve issues if there are any.
oIdentify inefficiencies and work with end users to take necessary action
Monitoring LSF services availability, Host checks
oMake sure hosts/services responds and in case no, they will have to work with infra team, other teams, diagnose issues, rectify the problem and make sure service is optimally restored to good working normal condition.
Provide access to LSF queues and support user community on requests
Open/Close machine in the cluster for any HW issues to avoid loss of productivity to Engineering community
Collaborate with Infrastructure to address hardware issues on time to maintain high farm capacity
Must troubleshoot all issues reported on LSF and escalate to L3 if required
Troubleshoot LSF Profile issues with in SLA
Monitoring LSF server logs and Storage paths to avoid disruption of services
Monitor and Interact with users on inefficiencies in cluster every day
LSF roles described above apply globally
Responsibilities
Working on escalations, ticket passed from L1 team, 24/7 production stop support across issues globally, audits, off hours support and off hours conference calls
Actively participate in various calls/Tech forums/meetings (Architecture meetings, solutions, strategy, ITR, Change Request) work and provide solutions.
Work with PSA (Production Stop Analyst), apprise the situation and help in resolution of the same
Participate in projects, run projects, upgrades, patching, maintenance, provide new solutions, recommend policies and drive them to successful closure meeting timely milestones.
Work, interact and handle business units, customers, partners and as well rest of IT teams across globe and various non-IT support teams (E.g. Asset team)
Must have Integrity, be Inquisitive, be flexible and have good decision-making skills
Understand business, its priorities, customers and jobs patterns.
Develop processes to leverage Level 1 support team for support activities
Document procedures and conduct training/knowledge sharing sessions
Preparing and Provide training to customers, help in first level installation, maintain KB, project pages and various documents
Need to attend off hour conference calls as required for the projects
On- Call details support must be provided by the team 24×7 based on a roster
oThe on-call phone/connection will be provided/managed by Managed Services Partner
oSupport engineer is expected to monitor, SMS/message alerts on their mobiles and take necessary action in case of issues
oManaged Services partner employee is expected to join Production stop bridges and provide resolutions
oThe on-call roster would be published to by Managed services partner
oTI will be configuring each domain Production stop call routing to numbers specified by Managed service partner
Receive and document incident and service requests via web tickets or phone calls or emails
Monitor and route incoming emails, L1/L2/L3 support inquiries, convert them to tickets and work towards resolution
Own and execute planned maintenance activities during off-working hours and weekends
Requirements
Experience
Infrastructure automation experience and Scripting experience using Shell/PERL/Python
Demonstrable experience in an IT Service Delivery environment
Collaborate with other IT teams and business users
Identifying service improvement opportunities of key processes and procedures
Positive, energetic & helpful
Must be able to deliver in high pressure and on time
Must have Integrity, be Inquisitive, be flexible and have Good decision-making skill
Benefits
As per company norms


Source link