Role Purpose
The role is to bring in best standards towards bringing the orchestration of several ETL workflows that gets created over time, but would need to be optimised for better throughput of the system resources. Also to bring in the governance on SQL query design and kill/terminate long running queries consuming system specs thereby enabling an efficient operational process, economical to maintain and scale
Example Responsibilities
- Monitor total time in execution and performance of e2e data ingestion, transformation and curation
- Resource Utilization and optimisation (show $$ savings on cost): Optimise resource usage to ensure balanced CPU, memory and disk I/O
- Data Accuracy: Prevent Data Leakage
- Review the existing SSIS packages, stored procs., take stock of the redundancies and provide optimizing solutions.
- Expert level understanding of database management fundamentals including Query tuning, Backup maintenance, disk management and replication setup
- Improve reliability of systems, automated remediation of issues, or improve scalability
- Should have experience with tools in each of these areas:
- SQL Databases, and Administration, ETL, Data Warehousing.
- Data Modeling skills/tools (e.g. SQL Server Management Studio (SMS), Azure related Db technologies, MySQL, PostgreSQL, etc.)
- Data Integration, Analytics and Business Intelligence (BI) Skills / Tools (e.g. SQL Server Integration Services (SSIS), SQL Server Reporting Server (SSRS)
- Demonstrable RCA skills on monitoring and observability of several key measure wrt production jobs, data load delay issues, cube refresh, job orchestrations
- Ensure all Monitoring and Observability metrics are persistent through a dashboard, created as std views to help future RCAs
- Documentation: Develop comprehensive documentation for optimised ETL process config.
- Work collaboratively with cross-functional teams, internally and across the teams including product engineering, data engineering, operations, and business groups
- The candidate will be responsible for exploring open source tech stacks and Big data concept on cloud to enable more scalability at lesser costs
Experience and Qualifications:
- 5-7 years of relevant experience in deep understanding of DBA management, data governance & standards, and associated technologies (Azure, SQL Server, Oracle, Post Gre)
- Proven experience in leading, establishing and operationalizing DBA techniques wrt to production jobs, optimising, RCA of data load delay, partitioning, query tuning, backup management, i/o throughput, disk management management iin a complex, global organization.
- 4+ years of demonstrable experience in creating and leveraging data monitoring and observability metrics to support data operations and management
- 3+ years experience in handling large datasets, implementing coding standards troubleshooting and building data processing framework, scalable and reusable and economical to scale
- 3+ years of experience in SQL optimization and performance tuning, and development experience in programming languages like Python, PySpark, Scala etc.)
- 3+ work exp. across multiple ETL tools, across big data platforms (Oozie,HDFS, Spark, Hive, etc) and ANSI SQL Standard databases (Teradata, SQL Server)
- Exposure working with Jenkins, Unix shell scripting
- 2+ years in cloud data engineering experience in at least one cloud (Azure, AWS, GCP)
- 6 years of relevant experience displaying technical expertise and understanding of system interdependencies and their impacts on Test Data
- 3 years of relevant experience in Project Delivery Methodologies such as SDLC, Iterative, Agile, Scrum, Kanban
- 6 years of experience with test automation tools and frameworks
- Ability to abstract technical details and effectively communicate to audiences at different levels
- 3 years of mentoring and coaching of junior resources to ramp/scale them up to deliver on dq practices
Great to have Experience and Qualifications
- Practical knowledge of industry frameworks for DBA, such as MS SQL Server Best practices, DAMA
- Leveraging AI to build self healing capabilities
- Automated Error Handling and self heal
- Self Optimization
- Analyze historical data and predicts future trends enabling proactive adjustments to ETL process
- System resource usage prediction and requirements
- Detect anomaly in usage and access patterns, data security threats
- Continuous learning and feedback loop, continuous improvement in accuracy and effectiveness
- Ability to navigate through ambiguity to clarify objectives and execution plans
- Understanding of Fintech and online identity verification business and products
- Proven ability to lead others without direct authority in a matrixed environment