IND (New) Lead Support Engineer

Hyderabad, Telangana, India | Full-time | COVID-19 remote

Apply

Founded in 2002, Quantium combines the best of human and artificial intelligence to power possibilities for individuals, organisations and society. Our solutions make sense of what has happened and what will, could or should be done to re-shape industries and societies around the needs of the people they serve. 

As one of the world’s fully diversified data science and AI leaders we operate across every sector of the economy and we’re growing fast - with growth comes opportunity! We’re passionate about building out our team of smart, fun, diverse and motivated people. 

We combine a team of experts that spans data scientists, actuaries, statisticians, business analysts, strategy consultants, engineers, technologists, programmers, product developers, and futurists – all dedicated to harnessing the power of data to drive transformational outcomes for our clients. 

We actively foster a culture where our people can stretch themselves to reach their full potential. We also know that work has to work for you, and modern life is fast-paced and balance can be tricky. You want to work where you are respected and valued as an individual, not a number. Quantium embraces a flexible and supportive environment dedicated to powering possibilities for our team members, clients and partners. 

Role summary

As a Lead Engineer you will be accountable for the product support of a set Quantium Products that will include below scope of services:

Client Provisioning

  • Managing the process of client provisioning; working with platform and infrastructure teams to provision new client infrastructure and associated storage/service accounts
  • Set up various client-specific environments to support our standard product release processes
  • Building data pipeline packages, including application and Spark specific configuration and data assets
  • This includes developing client-specific data transfer and ETL processes
  • Installation of any real-time systems using the components provided by the core product team

BAU Operations

  • Performing maintenance and upgrades to underlying operating systems (or docker images)
  • Execution and monitoring of client data pipelines on their scheduled cadence (typically monthly, but can be daily if required) to ensure the delivery of data assests
  • Monitoring of any real-time services to ensure performance and availability continue to meet the agreed SLAs
  • Accurately assessing the impact of issues against severity definitions in order to establish the urgency of a response
  • Investigating issues and escalating to the appropriate staff engineering teams in cases where they can't be resolved directly
  • Escalation may be done directly by contacting staff engineering teams or logging JIRA tickets including adequate information about issues (depending on severity)
  • Responding to triaged support tickets from customers
  • In some cases pro-actively contacting customers to communicate on-going outages and provide estimates around time to resolution
  • Maintenance of internal and external status pages detailing incidents in production environments
  • Logging of detailed shift summaries and incident reports in OpsGenie (or PagerDuty, etc)

Continuous Delivery and Improvement

  • Managing upgrades to the client-specific packages to bring in newly developed core features and continue to improve the performance and reliability of the system; which involves:
  • Liaising with the core product team to understand any upgrade procedures required
  • Working with the operations analysts to sign-off the impact of these changes from both technical and analytics perspectives
  • Performing deployments via CI/CD systems to bring production, staging, and UAT systems in line with stable software releases
  • Analysing systems and adding metrics in order to improve system instrumentation and operational observability
  • Building dashboards, monitors, and test automation to ensure teams have visibility of operational state of systems in (near) real-time
  • Designing and building tools to automate repetitive support tasks

Key responsibilities

  • Provide expert support to users, by acting as a key point of contact for day-to-day issues and escalations
  • Ensure key processes and procedures are adhered to within team; providing seamless support to users that interact with product support team and improve operational effectiveness
  • Provide constructive feedback to the Production Support engineer and Senior Production support engineer within team and work to empower them to deliver consistently with high quality output
  • Consistently coach and guide team members to enhance their efficiency and assists them to timely escalate cases to other support teams as required
  • Should be able to drive and facilitate discussion with stakeholders, dev, 3rd party teams, while handling high priority cases
  • Should be able to manage stakeholders by guiding team and sometimes managing and providing timely communication with stakeholder and relevant parties, in times of Sev 1/Sev 2 Outage/Non-Outage scenarios
  • Should assist in avoiding escalations where possible and in case of an escalation, assist in managing the same, by ensuring quick issue resolution keeping user satisfaction in mind
  • Lead should be able to identify, capture, and channelize application functionality feedback around areas within application and should be able to route it to stakeholders and Development team as required
  • Should act as a people manager/counsellor for team members seeking guidance on multiple areas from career growth, leave management, shift alignments, and any logistics related requirements
  • Identify knowledge gaps within the team and work to implement training to address gaps and support development of individuals
  • Support case inflow handling, by allocating work or empowering senior members within the team and providing daily reporting around the same
  • Work towards developing content for Knowledge Base Articles and RCA’s within team & ensure team is regularly creating and utilizing these articles for reference for a quicker resolution
  • Lead manages license assignments and ensure efficient use of the available licenses for the users at all times
  • Lead is responsible for managing all the compliance and legal requirements/usage as required
  • Should be able to bring in new ideas and process metrics to bring in efficiency within team

 

Experience and education required 

 

  • E / M.E in Computer Science, Information Technology, Electronics and Communications, or equivalent with 8+ years of industry experience
  • Has played the lead role in successfully leading the support of one or more software systems 
  • Has managed and lead a set of team members and also directly manages clients and stakeholders
  • Proficiency in (at least) one scripting language, such as Ruby or Python
  • Proficiency in shell scripting and using GNU tools under Linux
  • Exposure to cloud computing technologies, especially networking and system architecture, would be beneficial
  • Exposure to Infrastructure/Configuration as Code and CI/CD technologies would also be beneficial
  • Kubernetes: staff will need exposure on the fundamentals of k8s, ideally they would be capable of administrating a cluster if required
  • Exposure to Docker, Kubernetes, Ansible, Terraform will be good to have
  • Data Engineering:will need a working understanding of tools such as Spark to debug issues and run pipelines 
  • Problem-solving skills –Excel at resolving problems encountered by users
  • Have a deep understanding of the product they handle as well as the processes behind it
  • Attention to detail is very important trait for successful support resources
  • Excellent communications skills and be able to liaise with both customers and internal stakeholders to explain issues and provide updates
  • Ability to understand and articulate technical concepts in clear terms so that documentation around incidents is concise and unambiguous
  • Be strong generalists with a demonstrated track record when it comes to solving complex problems
  • Be curious about why things aren't working and willing to take the time to think critically during the investigation or issues
  • Be capable of building tooling to automate repeatable tasks or contribute back to tool codebases when gaps are 

 

What does success look like?

  • Tickets delivered within SLAs, minimal reactivations and high user satisfaction
  • Has established as an SME on Product support & BAU processes, infrastructure insights and technical knowledge
  • Established automated processes wherever feasible to make support process highly efficient
  • Availability SLAs for services, BAU processes and systems are met by establishing proper monitoring of systems and service both from availability and performance side
  • Well defined dashboards and reports to clearly publish the support KPIs and performance metrics
  • Strong process documentation and knowledgebase created
  • Objectives and development plans in place for all team members

Employee engagement high within the team