-
ID
#13526121 -
Job type
Contract -
Salary
Depends on experience -
Source
ITmPowered -
Date
2021-05-17 -
Deadline
2021-07-16
Sr. Big Data Engineer - Data Wrangler - Cybersecurity (BHJOB22048717)
Colorado, Denver, 80111 Denver USAContract
Vacancy expired!
- Deliver Data Engineering solutions, contributes to analytics/solutions design and operationalization.
- Ingests, merges, prepares, tests, documents curated datasets from various novel external and internal datasets for a variety of BI Dashboard and Analytics projects for IT Risk, Cyber Security, PKI, and Compliance.
- Data Wrangling / Data Matching / ETL techniques to explore a variety of data sources, gain data expertise, perform summary analyses and curate datasets.
- Ingest, aggregate, transform, cleanse, and publish quality actionable data.
- Be part of a fast paced, high-impact team who will work with an entrepreneurial mindset using best of breed tools in our Big Data environment; Hadoop/Cloudera, Java, SQL, R, Spark, Python, Trifacta, Arcadia.
- Develops large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
- Develop ingestions of raw data from 10-20+ data source feeds, ingestions using Flume, Sqoop, Kafka, etc.
- Develop data pipelines to process data using Map Reduce, Spark, Oozie, Hive, HiveQL, Pig, Java, Spark, SQL.
- Collaborates with big data to transform data and integrate algorithms and models into automated processes.
- Leverages Hadoop or Spark architectures, in designing & optimizing queries to build data pipelines.
- Builds data models, data marts and ODS to support Cyber Security, Technology Risk, Data Science business intelligence, data science, and other internal customers.
- Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
- Writes ETL (Extract / Transform / Load) processes, data systems, and tools for analytic processing.
- Uses strong programming skills in Java, Python, and SQL to build robust data pipelines and dynamic systems.
- Leverage Cloudera, Parquet, partitioning, and bucketing. Develop mappers, reducers, map/reduce side joins.
- Build Data Quality into the Ingestion, ETL, aggregation, parsing, and joining processes. Build in auto data checks back to raw data sets to validate and report data quality, anomalies, nulls, etc.
- Graduate-level degree in computer science, engineering, or relevant experience in the field of Business Intelligence, Data Mining, Data Engineering, Database Engineering, Big Data (MapReduce) Programming
- 5-7 years of data development and design experience Big Data, ODS, Data Modeling, and BI Dashboards
- 3-5 years' Data Wrangling experience leveraging Trifacta (or similar) for data wrangling including; data structuring, ingestion, cleansing, enrichment, and validation logic on Hadoop HDFS, map reduce or Spark.
- Expertise in data development, management software; Trifacta, Arcadia, Alteryx, Tamr, Paxdata, spotfire.
- Coding experience with Java, SQL and any of the following languages: Spark, Scala, Python, HQL, PigQL,
- Experience with data formats including Parquet, ORC or AVRO and moving data into and out of HDFS.
- Experience working with data visualization tools like; Arcadia, Tableau, Protovis, Vega, D3.
- 5+ years of working on Hadoop eco system with Hadoop distributions ( Cloudera 5.x, Hortonworks )
- Hands on working experience on Hadoop technologies (HDFS, Hive, Impala, Sqoop, UDF, Oozie, Map reduce, Spark Framework etc.).
- Knowledge in Hadoop architecture, HDFS commands and experience designing & optimizing queries against data in the HDFS environment.
- Experience building Exploratory Data Analysis reports such as Histograms, Box plots, Pareto, Scatter Plot using Data Visualization tool such as Tableau, Spotfire
- Experience building data transformation and processing solutions.
- Ability to manipulate voluminous data with different degree of structuring across disparate sources to build and communicate actionable insights for internal or external parties
- Possesses strong personal skills to portray information, communicate with non technical users / executives.
- Working knowledge of SQL and Relational Databases and NoSQL datastores (Cassandra, MongoDB, Neo4J, )
- Experience in Streaming technologies ( Kafka, Spark streaming, etc)
- Strong skills in programming languages (Python, Shell Scripting, Java).
- Experience in data structures, algorithms. various RDBMS, Data types and Primary/Foreign Key constrains.
- Hadoop or Cloudera Developer Certification (CDH 5.X ) preferred.
- Local Denver resources only. Remote until COVID abates. Then on site only. No remote.
- No sub-contracting. No sponsorship available.
Vacancy expired!
Report job