With an ever increasing importance of data analytics, the need to ingest data in a fast, qualitative and more and more real-time way, increases equally. Today around 100 data engineers (majority externals) are designing, developing, maintaining and operating this logic. The technologies used are mainly Teradata (datawarehouse), Cloudera (DataLake), Informatica and home made tools (ETL), Kafka/Flink/Couchbase (real-time ingestion), PowerBI/Microstrategy (reporting tools), Gitlab/Jenkins (CI/CD), Spark/PySpark (big data processing),…
Recent initiatives around efficiency revealed room for improvement in terms of quality of code, time to market, number of mandays to implement requests.