about 1 year ago
Our organization is modernizing our HR data & analytics technology services inclusive of self-serve & API-driven access to Big Data. The Sr. Data Engineer-Data Lake will fill a lead role on the Bank's Global Human Resources Technology, Data Lake Development Team. We are looking for a well-rounded data engineer who ""gets it"" and sets the example for effective collaboration in a team environment. He/She must demonstrate excellent communication skills and critical thinking to ensure all assumptions, constraints, behaviors are well thought through. The individual ensures the systems design and requirements are aligned to achieve the desired business outcomes, and that team practices and coding/quality principles are aligned to achieve the desired technology outcomes. This individual will deliver developing complex solutions in Data Lake environment, involving technical design, development, testing.
The desired individual will have demonstrated experience in standing up Big Data operational & analytics platforms, translating business requirements to scalable & sustainable technical solutions. Qualified candidates must be well-versed in data warehousing and 'Big Data' distributed processing and storage technologies as well as Data Lake design pattern. In addition, the selected candidate should have knowledge and experience with data management techniques such as Meta data management , Data Quality (DQ) management, Data Governance, Data Integration/Ingestion, Data Architecture, and Data Profiling.
- developing and scripting in Python programming as well as SQL in Linux environments.
- integrating data with Sqoop and ingest files of multi-record types with various data formats Parquet, Avro, and Json.
- Create and maintain optimal data pipeline architecture in Cloudera CDH or similar platform with application development skills in hive, Sqoop, Pyspark
- Participate in status meetings to track progress, resolve issues, articulate & mitigate risks and escalate concerns in a timely manner.
- understands and evangelizes great design, engineering, & organizational practices (unit testing, release procedures, coding design and documentation protocols) that meet/exceed the Bank's change management procedures
* sets the bar for team communications and interactions-is an excellent teammate to peers, influencing them in a positive direction
- uses versioning tools such as GIT/Bit bucket
- sets up jobs using autosys for automation
Candidate should have:
- 10+ years of overall IT experience with 5+ years on Big Data technologies such as Hadoop, Hive, Spark, Pyspark, Sentry, Hbase, Sqoop, Impala, Kafka.
- 5+ years Data Warehousing experience, including manipulation/ transformation of millions of rows with optimal methods
- 5 + years developer experience building software in Python/Scala/Java, PL/SQL
- Experience with Jenkins Pipeline, Bit bucket, Python Unit Test code development tools.
- Advanced-level analytical, debugging, and critical thinking skills.
- Knowledge of search engine technologies such as Apache Solr, or Elastic Search and able to define schema, create collections, ingest data into search engine and retrieve data using streaming APIs is plus
- Experience in Master Data Management (MDM) concept, tools, and best practices