Four recommendations for NHS operations to get ahead in the data science revolution

May 2, 2017 • Reading time 7 minutes

Netflix analyses millions of data points to makes the best movie recommendations. Amazon uses its gigantic data network to get goods to you quickly. Google indexes the entire internet to help you find search results in milliseconds.

What about the NHS? How much data does your GP collect to give you the best advice? How much data does your hospital collect to schedule your care efficiently? How much data does your social care provider collect to ensure your needs are met? How are data analysed when they have been collected?

In healthcare, we often rely on doctors collecting single data points when we are sick. Or hospitals planning for demand and capacity based on anecdote. So we find tumours by chance and cancel operations because a thousand bed hospital doesn’t have space on the day of surgery for another admission.

What are the key priorities for the NHS to catch up?[1]

  1. Standardised datasets needs to be easily linkable using agreed protocols
  2. The legal framework for using linked data needs to be navigable
  3. Starting new data collections should happen early
  4. Capability development should be nurtured

1) Standard data needs to be easily linkable using agreed protocols

The UK is a leading nation on the collection of hospital data. NHS organisations are digitizing, NHS numbers can be used to link data, and there are national collections (like HES). Significant amounts of data are collected, but this is just the beginning:

  • Data are mainly collected from hospitals, so they only focus on the moments when we are already sick.
  • Within hospitals there are often 10’s of different databases (all with their own purpose), but they lack communications skills.[2]
  • High value data (e.g. the opinions of doctors) are often recorded on hand written notes and paper, or on whiteboards that are regularly cleaned.

The problem is that hospitals started digitizing locally (within department rather than across a health system). This means that there are often excellent digital systems in place, but the compatibility between systems is poor. A doctor’s name for example may have several different spellings (e.g. Mr White versus WHITM). This creates a headache for analysts, but it is also a blocker to the deployment of machine learning algorithms to help predict which outpatient appointments are most likely to be missed.

The 5YFV outlines the importance of linked data, but standardising and combing datasets is hard work. And this is not being made easier by many local programmes in hospitals (or hospital departments) to standardise their own data for their own needs. Work to standardise data is underway (e.g. Social Care Data Integration and Intelligence Project (HSCDIIP)) and there is some additional funding (like with the paperless NHS initiative), but more is needed.


Trust-wide (ideally wider) consistent standards need to be developed for all new and existing data collections. These should be simple and easy to implement at a team level with minimal interruption local priorities: E.g. Consistent variable naming and identifiers, standard data formats, rules on editing rights and storage etc.

2) The legal framework for using linked data needs to be navigable

The concept of data systems that can speak to each other is not new. But, there are some legislative hurdles and barriers that are complex to navigate – especially when combined with public opinion and distrust.

The initiative is a good example of where this can go wrong. A plan to bring together health and social care data from across the country was derailed by privacy concerns that could not easily be addressed by current legislation. This has happened before and can lead to mistrust. For example, the recent clinch between the NHS and the home office where the NHS was forced to hand over data to the home office in the context of immigration policy.

Laws are struggling to keep pace with developments in data and tech. This is not surprising given that 90% of the worlds data were collected in the last five years and data governance legislation is dated – e.g. the Data Protection Act is from 1998.

There are difficult questions like: Who should access the data? What safeguards are needed? How to anonymise? How to punish abuse? How to safeguard against discrimination? What data are needed?

Importantly, not all data is created equally. Some data should never be collected, other data should tightly access controlled, other data should be accessible but anonomysed and finally some data (especially aggregated data) might be happy in the public domain.[3]


Pragmatic guidance is needed, which starts from the perspective that data can generate a lot of value for the NHS and seeks to make it safe. A three tiered approach should be considered where some data should not be collected, some data should be aggregated and anonomysed and some data should be made public. Consultation with the public and NHS leadership have to determine which data falls into which buckets.

3) Starting new data collections should happen early

Currently, most of the NHS’ data is collected when people are ill. Data from wearable technology, decoded genomes and so on, have great value in pre-empting illness. A recent paper published in the Clinical Psychological Science journal uses electronic health records to predict suicide for example.

Much has been written about the health-side of wearable technology. But what about the operational side? Our health systems need to plan, schedule, and organize in the short, medium and long run. Having more data on the healthy population will enable not only better medical care but improved operations. In the short run this can mean improved doctor rosters and better emergency, surgical or bed planning. In the medium or long run it can mean effective hiring decisions or well laid out operational focal points. An exciting iniative here is underway by verily that will collect a whole heap of data from 10 000 healthy individuals over 4 years. There is no reason the NHS could not implement similar concepts on an even larger scale!

Predictive analytics will not just be key for the health and prevention of individuals but the whole health system. We can already use the data we have, to build predictive models that allow us to better schedule, align capacity and organize our care. Adding the new data into now updated legacy systems governed by the right privacy and access legislation will allow us to build better models that are more secure and better meet the needs of doctors and administrators.


Collecting (and linking) wider sources of data will help to plan health services now and in the future. New initiatives should consider how to collect (and involve people in collecting themselves) data before people fall ill.

4) Capability development should be nurtured

Let’s assume our health systems have secure data repositories that collect the data they need and are governed by consistent privacy and access legislation. Let’s even assume patients collect data while healthy and are synced with their GP for early warning.

This all sounds great. But it poses one last hurdle. Developing capability and capacity to analyse this data. We need more statisticians, data scientists, genomicists, bioinformaticians to analyse the data and develop systems that alert doctors, align resources in hospitals and generally are able to work with the bigger and bigger data sets which contain more and more variety of data types (e.g text, figures and images).

Many data scientists currently in academia or tech would love to work on problems relating to population health rather than problems relating to sales in big companies. If the NHS puts in place sensible systems, development frameworks and platforms, people from Microsoft, Google, and Facebook would come back to the NHS to solve some of the most challenging problems faced.


As part of a coherent data science strategy there should be a focus on creating new career paths for talented data science professionals in the NHS. Create collaborations with computer and data science programs at local universities including scholarships and awards programs. Run Kaggle competitions to identify talent and solve problems.

Is the NHS late to the tech revolution?

There are many reasons that the NHS has not led the tech revolution. The economics of healthcare, difficulty to reward prevention and misalignment of incentives are just a few of these reasons. But while there are hurdles, there are things the NHS is doing to catch up. To support this we need ensure our legacy systems are coherent and can talk to each other, we need to decide on which data we want to collect, how we protect it, and we need to improve on the ways we analyse it. The NHS is pushing forward in all of these directions. Support from legislators, funding and connections with academia and tech will pave the way.

If we succeed, it will allow hospital administrators to better allocate resources, plan for demand and capacity, schedule operations and organise care. Doctors can stop wasting their time on admin problems and spend more time with patients. There is no reason that Netflix can participate in the tech revolution but the NHS can’t. Challenges need to be overcome – hardware, software, legal and organizational – but it is not tenable that we use all our resources and innovation in technology to develop better movie recommendations while public health care lags behind. The technology will come – we need to create the conditions for its success now.


[1] Also have a look at the Obama’s cancer moonshot report which inspired parts of this blogpost

[2] Have a look at this doctors explanation of using these systems in every day life

[3] An interesting podcast on that question here:


Edge Health are a specialist UK healthcare analytics consultancy that use data and insights to improve the delivery of health and care services, so that better outcomes can be delivered more efficiently.