Data Engineer

Pennsylvania, Pittsburgh, 15201

Permanent

Vacancy expired!

Note - Visa sponsorship provided.

Role and Responsibilities

Design, build and deploy data pipelines (batch and streaming) in the data lake using Hadoop technology stacks and programming languages such as PySpark, Python, Spark, Spark Streaming, Hive
Design, build and deploy error handling, data reconciliation, audit log monitoring, job scheduling using PySpark and shell scripting / CRON scripts
Data extraction from multiple data sources on to raw data layer
Decode the data from flat files, text files. Write Python codes to decode raw files and convert into business variables
Develop and implement coding best practices using Spark, Python, and PySpark
Collaborate with offshore and onshore team and effectively communicate status, issues, and risks daily
Develop data model and structure for the data lake to ensure alignment with the data domain, integration needs and efficient access to the data
Analyze existing and future data requirements, including data volumes, data growth, data types, latency requirements, data quality, the volatility of source systems, and analytic workload requirement

Required Skill Set

Must have skills - PySpark, Spark HDFS, Hadoop, Oozie, Shell Scripting, Hive, Impala
Familiarity with
MongoDB, JSON, XML and other data sources / file formats
Hands-on experience working with
Hadoop ecosystems including HIVE, HDFS, MapReduce, Spark, Oozie
Hands-on experience working with scripting/programming languages such as PySpark, Python, Scala, to write batch and streaming jobs
Hands-on experience working with Big Data Platforms such as Cloudera, Hortonworks, or MapReduce
Experience with ETL and ELT approaches of data ingestion and integration in the Hadoop ecosystems
Experience with using the Agile approach to deliver solutions
Deep understanding of Data Warehouse and Data Lake design, standards and best practices
Good to have: Experience of working in financial services domain
Outstanding written and verbal communication skills

Vacancy expired!

Report job