Job Title : Big Data Engineer
Responsibilities : Build PySpark based applications for both batch and streaming requirements on Cloudera Distributed systems, utilizing in-depth knowledge of Hadoop and NoSQL databases. Develop and execute data pipeline testing processes and validate business rules and policies. Optimize performance of the built Spark applications in Hadoop using configurations around SparkContext, Spark-SQL, Data Frame. Optimize performance for data access requirements by choosing appropriate native Hadoop file formats (Avro, Parquet, ORC etc.) and compression codec. Design and build real-time applications using Apache Kafka & Spark Streaming. Build integrated solutions leveraging Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec. Process large amounts of structured and unstructured data, including integrating data from multiple sources. Create and maintain integration and regression testing framework on Jenkins integrated with BitBucket and / or GIT repositories. Create projects in Zephyr and Jira and maintain project folders and structures. Job Qualifications : 8 years experience in Business Intelligence in Banking domain. Bachelor's degree in Computer Science Engineering.
#J-18808-Ljbffr
Engineer Engineer • Riyadh, Saudi Arabia