Data Centre Operations Lead - Varite : Job Details

Data Centre Operations Lead

Varite

Job Location : Jersey City,NJ, USA

Posted on : 2024-12-10T07:33:20Z

Job Description :
Job Title: Data Centre Operations Lead Location: Rockville, MD - Onsite from day1 - Contract to hire Duration: 6 months - Contract to hire Pay Range: $55 - $60 Job Description: • Lead the data center operations team, providing guidance, training, and support to ensure high performance and operational excellence. Act as the primary point of contact for all data center-related issues and escalations. • Oversee the daily operations of data center facilities, ensuring high availability and reliability of all systems. • Manage data center infrastructure technology stack end to end - VMWare/VxRail/Citrix/Logic Monitor/Moog Soft/AD/Azure AD SSO, Azure Security Policy/PKI/Windows & Linux Servers/Vulnerability management/Beyond Trust Password Safe and AD-Bridge/Storage & Backup tools etc. • Ensure adherence to operational standards and best practices. • Drive the major incidents and potential incidents end to end with periodic updates to client stake holders for approvals/recommendations. • Lead, mentor, and manage a team of data center operation engineers. • Provide guidance and support for professional development and performance improvement. • Coordinate and manage the team's daily activities, ensuring alignment with organizational goals and priorities. • Lead the response to data center incidents, ensuring timely resolution and minimal impact on business operations. • Perform root cause analysis and implement preventive measures to avoid recurrence of issues. • Develop and maintain incident management processes and procedures. • Plan and oversee scheduled maintenance and upgrades of data center infrastructure. • Ensure that all hardware and software components are up-to-date and functioning optimally. • Coordinate with vendors and service providers for maintenance and support activities. • Monitor and analyze data center resource usage, ensuring efficient utilization and avoiding over-provisioning. • Conduct capacity planning to support future growth and demand. • Implement optimization strategies to enhance performance and reduce operational costs. • Ensure data center infrastructure adheres to security policies, standards, and best practices. • Implement and maintain security controls to protect data and systems. • Ensure compliance with regulatory requirements and industry standards (e.g., ISO 27001, HIPAA). • Develop and implement disaster recovery and business continuity plans for data center operations. • Ensure regular testing and validation of disaster recovery procedures. • Ensure data center infrastructure is resilient and can recover quickly from failures or disruptions. • Work closely with other IT teams, business units, and stakeholders to understand requirements and deliver solutions that meet their needs. • Collaborate with vendors and service providers to evaluate and integrate new technologies and services. • Communicate effectively with stakeholders, providing regular updates on data center operations and performance. • Maintain comprehensive documentation of data center infrastructure, configurations, processes, and procedures. • Generate regular reports on data center performance, incidents, and operational metrics. • Ensure documentation is up-to-date and accessible to relevant stakeholders. Here are some technical responsibilities in detail. Active Directory and Cloud Services • Administer Azure AD, manage security groups, GPO, SSO, and application configurations. • Handle public cloud directory services, Oracle IDCS, network/file shares, SCP policies, privileged user management, and service account passwords. • Conduct AD audits, schema updates, backup/restore services, and assist with JSOX, FDA, and GQS audits. • Manage ticket queues and follow up on aging tickets. • End-to-end support for Active Directory Domains (Azure AD, AD security groups, GPO, SSO, application configurations, etc. IT Environment Monitoring • 24x7 ITSM queue-based monitoring. • Triage and first-level troubleshooting based on alert severity. • Incident resolution using Standard Operating Procedures. Vendor Coordination • Coordinate with vendors for infrastructure on public/private Cloud. • Provide vendor contact details and escalation matrix. Citrix Architecture and Optimization • Maintain Citrix architecture and seek continuous optimization. • Participate in architecture design and planning with the steering committee. • Recommend system and end-user performance improvements. • Implement approved performance improvements. Citrix Environment Support • Support Citrix environment and integrate with Otsuka-specific technologies. • Order, install, update, and maintain Citrix servers and tools. • Assess, consolidate, upgrade, and manage Citrix infrastructure, including SDX appliances. • Manage NetScaler infrastructure and upgrades. IT Service Continuity and Disaster Recovery (DR) Services • Strategy and Policy Definition • Coordination and Execution • Data Management • Testing and Reporting • DR Activation and Coordination • Review and Enhancement Onsite and Remote Support • Onsite server support, IMAC services, and remote software installation. • Decommissioning, proactive evaluation, and datacenter assessment. Windows Server Management & Projects • Administer and monitor Windows servers, including health checks and problem management. • Manage local users, groups, shares, and server disk/storage. • Handle event logs, vendor coordination, and performance issues. • Install and manage IIS, apply security patches, and troubleshoot clusters. • Oversee DNS, SCOM, certificate management, migrations, and server deployments. Linux Server Administration and Projects • User Administration - Manage user accounts, environments, and home directories. • OS Package Administration - Add/remove OS packages and troubleshoot issues. • Storage Management - Create/manage file systems, logical volumes, and clean up disk space. • NIS and NFS Management - Administer NIS tables and services, install/configure NFS servers. • Network and Security - Configure/manage NTP, DNS, and implement security standards. • OS Upgrade and Patching - Upgrade/patch Linux OS, configure SSSD and AD, manage disk and security. • High Availability and Compliance - Build/configure HA environments, enforce security, and ensure regulatory compliance. • Server Builds and Management - Install/configure NIS, mail, DNS servers, and centralized syslog servers. DC Power Tools • Tool Stack -Logic Monitor, MoogSoft, Manage Engine, Beyond Trust Password Safe, Beyond Trust AD Bridge, CommVault compliance Search, Veritas Hubstor etc. - Management and Support Logic Monitor Administration • Installation and Configuration - Install and configure LogicMonitor Collectors and group servers for monitoring. • Monitoring and Reporting - Configure monitoring settings, create HLD/Templates/SOPs, and integrate with Moogsoft. • Maintenance and Troubleshooting - Backup/restore LogicMonitor Collectors, troubleshoot devices, and modify LogicModules. • Consultancy and Coordination - Provide consultancy, manage stakeholders, oversee platform support, and monitor infrastructure services. Moogsoft Administration and Issues • Integration and Event Management -Resolve Element Layer Tool integration issues and missing events/alarms at the Moogsoft layer. • Ticketing and Situation Formulation - Address ticketing problems with ITSM tools and inconsistencies in situation formulation/Cookbook. • Maintenance and Upgrades - Fix maintenance window malfunctions and perform Moogsoft module upgrades. • Configuration Management - Manage Moogsoft ReC, Ipe additions/deletions/modifications, and Cookbook enablement/disablement. • TeamRooms and API Integration - Create/modify/delete Moogsoft TeamRooms and integrate Moogsoft AI Operations with vendor APIs to automate ticketing. • Updates and Enhancements - Manage Moogsoft updates and enhancements. Storage Backup & Data Management • Define performance, data segregation, backup, restore, archival, retention, reliability, encryption, security, scheduling, and access control needs. • Recommend hierarchical storage solutions (shared/dedicated, tiered storage, platforms) and procedures to meet requirements and SLRs. • Review and approve storage and backup solutions and procedures. • Procure and manage data storage infrastructure (SAN, NAS, tape, optical). • Provide and manage backup and archival consumables for Otsuka facilities. • Maintain data set placement, manage data catalogs, and configure Nimble SAN and NAS switches. • Notify Otsuka of any data losses or risks. • Perform data and file backups/restores per procedures and SLRs. • Manage file transfers, data movement, and input processing for third-party media. • Decommission storage and backup environments per policies. • Develop and maintain backup schedules, manage backup media, and ensure data retention. • Work with third-party vendors to archive data at secure offsite locations. • Conduct media testing to ensure data recovery capability and integrity. • Test end-to-end system recovery, remediate flaws, and coordinate with vendors. • Recover files/data as required, provide recovery updates, and manage data replication to DR sites. Qualifications we seek in you! Minimum Qualifications / Skills: • Bachelor's degree in Computer Science, Information Technology, Electrical Engineering, or a related field. Advanced degrees or relevant professional training are a plus. • Minimum 10 years of experience in data center operations, with at least 5 years in a leadership or senior technical role. • Extensive experience in data center operations, with a proven track record of managing large-scale data center environments. • Strong leadership and team management skills, with the ability to motivate and develop a high-performing operations team. • In-depth knowledge of data center infrastructure, including servers, storage, networking, power, and cooling systems. • Excellent problem-solving and analytical skills, with the ability to diagnose and resolve complex technical issues. • Experience with incident and problem management, change management, and capacity planning. • Strong understanding of compliance, security, and regulatory requirements related to data center operations. • Effective communication and interpersonal skills, with the ability to interact with stakeholders at all levels. • Experience in vendor management and contract negotiations. • A proactive approach to continuous improvement and innovation in data center operations. Preferred Qualifications/ Skills: • Relevant certifications from Microsoft, VMWare Citrix and Storage vendors are highly desirable. • Experience with ITIL or other IT service management frameworks. • Familiarity with cloud computing and hybrid data center environments. • Excellent communication and collaboration skills, with the ability to effectively interact with technical and non-technical stakeholders at all levels of the organization. • Strong analytical and problem-solving skills, with the ability to identify root causes of issues and implement effective solutions in a timely manner. • Proven ability to work independently as well as part of a team, with a proactive and self-motivated attitude towards achieving project goals. Data Center - Has experience managing data center operations - racking and stacking & some data center migrations etc.. VMWare - Has experience working on VMWare ESXi and some VXRail management. Storage - Experienced with Pure storage extensions, EMC and sone NetApp - hardware installations and configuration. Backup - Experienced with Cohesity, Veeam and Data domain few years back. AD - xperience with AD and Domain Controller setup. Linux and Windows Admin experience. Ansible experience Scripting - Python/Bash/PowerShell for automation. Monitoring - Zenoss, Logic Mon, SolarWinds Azure - Some recent experience with Azure Compute, Storage, Networking, VPN and VPC and backups, ARM templates etc but had to recollect. AWS experience Terraform experience.
Apply Now!

Similar Jobs ( 0)