• Find preferred job with Jobstinger
  • ID
    #13526121
  • Job type
    Contract
  • Salary
    Depends on experience
  • Source
    ITmPowered
  • Date
    2021-05-17
  • Deadline
    2021-07-16

Vacancy expired!

Sr. Big Data Engineer - Data Wrangler - ITmPowered Consulting

Senior Data Engineer, you will be tasked with creating a data ecosystem to have the right data, to ask the right question, at the right time to make data meaningful and actionable. Deliver Data Engineering solutions including large scale data structures, data lake, data pipelines and workflows to gather, cleanse, test and curate datasets. Wrangle Data leveraging Trifacta and Arcadia Discover to Structure, Clean, Enrich, Validate, and Publish data to an operational data model / ODS for BI Dashboard and data drill down/up utilization enterprise wide. Ingest data from multiple data sources (Splunk, Qualis, CMDB/Asset Inventory, CyberArk, Armis, ForeScout, Automated Patch Management systems, Governance and Standards data, Risk Management data, Cyber, ITSM, etc.) to create quality accurate operational datasets in Enterprise Big Data (100 node Spark Hadoop) for use by several teams. Develop high performance Big Data pipelines using Kafka, SQOOP, Flume, Java, Scala, Hive, Pig, Hadoop (Cloudera) to operationalize (build in Data Quality, data validation, governance and maintenance automation) the ingestion, ETL, transforms, aggregations, joins, enrichment, validations and final ODS BI Dashboard underpinnings. Work with Data Scientists, Technology, Cyber, and Executive leadership to advance their causes to enable better decision making across the org. Deliver runbooks, data models, source to target mappings, and updated data dictionaries. Mentor and train the client teams in optimized Data Solution Engineering.

Responsibilities

  • Deliver Data Engineering solutions, contributes to analytics/solutions design and operationalization.
  • Ingests, merges, prepares, tests, documents curated datasets from various novel external and internal datasets for a variety of BI Dashboard and Analytics projects for IT Risk, Cyber Security, PKI, and Compliance.
  • Data Wrangling / Data Matching / ETL techniques to explore a variety of data sources, gain data expertise, perform summary analyses and curate datasets.
  • Ingest, aggregate, transform, cleanse, and publish quality actionable data.
  • Be part of a fast paced, high-impact team who will work with an entrepreneurial mindset using best of breed tools in our Big Data environment; Hadoop/Cloudera, Java, SQL, R, Spark, Python, Trifacta, Arcadia.
  • Develops large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
  • Develop ingestions of raw data from 10-20+ data source feeds, ingestions using Flume, Sqoop, Kafka, etc.
  • Develop data pipelines to process data using Map Reduce, Spark, Oozie, Hive, HiveQL, Pig, Java, Spark, SQL.
  • Collaborates with big data to transform data and integrate algorithms and models into automated processes.
  • Leverages Hadoop or Spark architectures, in designing & optimizing queries to build data pipelines.
  • Builds data models, data marts and ODS to support Cyber Security, Technology Risk, Data Science business intelligence, data science, and other internal customers.
  • Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
  • Writes ETL (Extract / Transform / Load) processes, data systems, and tools for analytic processing.
  • Uses strong programming skills in Java, Python, and SQL to build robust data pipelines and dynamic systems.
  • Leverage Cloudera, Parquet, partitioning, and bucketing. Develop mappers, reducers, map/reduce side joins.
  • Build Data Quality into the Ingestion, ETL, aggregation, parsing, and joining processes. Build in auto data checks back to raw data sets to validate and report data quality, anomalies, nulls, etc.

Required skills:

  • Graduate-level degree in computer science, engineering, or relevant experience in the field of Business Intelligence, Data Mining, Data Engineering, Database Engineering, Big Data (MapReduce) Programming
  • 5-7 years of data development and design experience Big Data, ODS, Data Modeling, and BI Dashboards
  • 3-5 years' Data Wrangling experience leveraging Trifacta (or similar) for data wrangling including; data structuring, ingestion, cleansing, enrichment, and validation logic on Hadoop HDFS, map reduce or Spark.
  • Expertise in data development, management software; Trifacta, Arcadia, Alteryx, Tamr, Paxdata, spotfire.
  • Coding experience with Java, SQL and any of the following languages: Spark, Scala, Python, HQL, PigQL,
  • Experience with data formats including Parquet, ORC or AVRO and moving data into and out of HDFS.
  • Experience working with data visualization tools like; Arcadia, Tableau, Protovis, Vega, D3.
  • 5+ years of working on Hadoop eco system with Hadoop distributions ( Cloudera 5.x, Hortonworks )
  • Hands on working experience on Hadoop technologies (HDFS, Hive, Impala, Sqoop, UDF, Oozie, Map reduce, Spark Framework etc.).
  • Knowledge in Hadoop architecture, HDFS commands and experience designing & optimizing queries against data in the HDFS environment.
  • Experience building Exploratory Data Analysis reports such as Histograms, Box plots, Pareto, Scatter Plot using Data Visualization tool such as Tableau, Spotfire
  • Experience building data transformation and processing solutions.
  • Ability to manipulate voluminous data with different degree of structuring across disparate sources to build and communicate actionable insights for internal or external parties
  • Possesses strong personal skills to portray information, communicate with non technical users / executives.
  • Working knowledge of SQL and Relational Databases and NoSQL datastores (Cassandra, MongoDB, Neo4J, )
  • Experience in Streaming technologies ( Kafka, Spark streaming, etc)
  • Strong skills in programming languages (Python, Shell Scripting, Java).
  • Experience in data structures, algorithms. various RDBMS, Data types and Primary/Foreign Key constrains.
  • Hadoop or Cloudera Developer Certification (CDH 5.X ) preferred.

About ITmPowered Consulting

ITmPowered's Application Information and Data Management (AIDM) Practice creates solutions that cover the entire lifecycle of information utilization, from ideation through implementation. We engage with clients in building and maturing Big Data programs from the ground up; methodology design, solution engineering, and data enablement, data quality, analytics, and BI Dashboarding of Big Data ecosystems. We offer consulting and development services to help our clients define their strategy and solution architecture. Then our teams deliver and manage high performance big data, data warehousing, business analytics and dashboarding applications that provide tangible business benefits. Outstanding solutions delivered.

Logistics:

  • Local Denver resources only. Remote until COVID abates. Then on site only. No remote.
  • No sub-contracting. No sponsorship available.

Vacancy expired!

Report job

Related Jobs

Jobstinger