Position Expired
This job is no longer accepting applications.
Data Wrangler
Genomics England
Are you interested in working with large scale data sets and impacting the future of genomic healthcare?
We are currently recruiting for a Data Wrangler to join us here at Genomics England!
As a Data Wrangler, you will specialise in optimising the performance and seamless movement of large data volumes using specialist tooling. You will be responsible for curating and transforming datasets, generating key statistics, and deriving new datasets tailored for diverse audiences.
This role will also include managing data workflows, developing and maintaining data pipelines, collaborating with cross-functional teams to understand data requirements, and ensuring data integrity. Additionally, you will explore new technologies and contribute to knowledge sharing across the Data Chapter.
Key responsibilities
Design and build data solutions that deliver the business needs and requirements across clinical and research domains
Extract transform and load data to support research and clinical practices
Generation and derivation of statistics and data visualisations to support data driven decision making
Codifying repeatable data processes, increasing productivity and efficiencies and supporting the standards for data usage throughout Genomics England
Ensure data quality is at the centre of data delivery, using processes including automated routines and self-healing/monitoring
Implementation and adherence to software development best practices
Developing testing routines and datasets to ensure robust and consistent product delivery
Developing healthcare data models and associated artifacts
Managing associated data storage and management solutions ensuring the optimum architecture exists to support data solutions
Working with Cloud First technologies to leverage the latest data healthcare solutions
Example tooling
Cloud: AWS or equivalent Cloud experience
Data Processing: AWS Glue, Python, SageMaker, Prefect
Data models: XML, JSON, HL7, FHIR, OMOP
Databases: AWS S3 & Athena, AWS DynamoDB, AWS RDS, AWS Aurora (Postgres)
Continuous deployment: AWS Lambda, Docker, Kubernetes
Languages: Python, SQL
Visualisation software: Tableau
Practices: DMBOK2, Continuous Integration/Continuous Deployment (CI/CD)
Qualifications
Ideally, a Master’s degree or equivalent experience working in data management, biostatistics, clinical informatics or data analysis.