a blog by Owen Boswarva

1 Aug

Post: 1 August 2014

Yesterday the Environment Agency added some additional open data releases to its Datashare pageThese datasets are reusable under the Open Government Licence.

River and Coastal Maintenance Programme

This download contains two spreadsheets, one describing the EA’s “Frequent” maintenance programme and the other its “Intermittent” maintenance programme. These programmes are about managing flood and coastal erosion risk.

Several comments:

1. Although the download page and spreadsheet names indicate this data relates to the 2013/14 programmes, I think it is actually forward-looking and relates to activities planned for the 2014/15 year. The Intermittent spreadsheet is titled “Intermittent Maintenance 2014/15”. I have asked EA to clarify.

2. EA has only released spreadsheet data, and not the spatial data for the Flood Risk Management System areas to which the data applies. I have obtained a copy of the spatial data (see my post from February) but the EA released that under its more restrictive "EA OpenData" terms, which are not fully compliant with the Open Definition. Ideally I would like to see the FRMS polygons released under the OGL (or the OS OpenData Licence, if they contain OS derived data).

3. EA has published the spreadsheets in the closed XLSX format. In line with the Government’s push to adopt open standards it would be better to use a format that is directly supported by free software, such as XLS or ODS. That said, XLSX is an improvement on releasing the data only in a 33MB XLSM file.

4. I note the spreadsheets omit the Flood Defence Grant in Aid amounts allocated for each watercourse/asset. I think EA should try to be transparent about those figures; both Defra and to a lesser extent EA itself have been criticised in the past for obfuscating the amounts spend on flood protection.

River Habitat Surveys - Survey Details and Summary Results

This is a large spreadsheet. The dataset is described on as follows:

River Habitat Survey (RHS) is the Environment Agency standard for collecting data on the physical character and quality of river habitats across the UK. It is a substantial dataset of significant research interest. RHS is a standard field survey of a 500m stretch of river where data is collected in a replicable manner. At 50m intervals a ‘spot-check’ is conducted to record specific details about bank and channel physical attributes, man-made modifications, land uses and vegetation structure. Since 1994 approximately 24,000 surveys have been carried out. The bulk of surveys were carried out between 1994 to 1997 and 2006 to 2008. Surveys are still carried out for specific drivers, for example assessing habitat availability and Water Framework Directive.

I haven’t explored this dataset in any detail but it looks interesting.

See the AfA286 metadata in EA’s Information for Re-Use Register for more technical details.

Water Framework Directive datasets

EA has also released three new WFD spatial datasets.

WFD - Management Catchments Cycle 2 (Draft) is described as follows:

Management Catchments are the geographical units for which action plans are drafted in implementing the Water Framework Directive (WFD). Catchments have an action plan published that relates to all waterbodies that fall within its boundaries.

WFD - Operational Catchments Cycle 2 (Draft) is described as follows:

Operational Catchments (Cycle 2) show how Water Framework Directive (WFD) work is grouped geographically for practical management purposes.

WFD - River Basin Districts Cycle 2 (Draft) is described as follows:

River Basin Districts are the geographical units showing the area of land and sea, made up of one or more neighbouring river basins together with their associated groundwaters and coastal waters for assessment and action under the Water Framework Directive (WFD).


I confess I have difficulty keeping track of the differences between all the WFD spatial data. Of course it’s better to have too many datasets than too few; only a few years ago it was difficult to find any open spatial data that provided a hydrologically significant segmentation of the country.

For purposes of practical reuse, however, its worth noting that the three datasets above are at a more general level of geography than the WFD - River Waterbody Catchments Cycle 2 (Draft) dataset, previously released by the EA as open data.

Recorded Flood Outlines (not released)

There have been some hints online that EA is preparing to release its Recorded Flood Outlines dataset as open data.

This spatial dataset, previously known as Historic Flood Outlines, contains the individual location outlines and approved attributes for records of historic flooding. See the AfA008 metadata in EA’s Information for Re-Use Register for more technical details.

Recorded Flood Outlines has substantially more potential for reuse than the more “niche” datasets mentioned above. I have worked with this dataset previously in a commercial context, and am quite looking forward to its release as open data. However it’s understandable that there might be some protracted discussions within the EA prior to making the data available to the general public.

Image produced from the WFD datasets described above. Attribution: “Contains Environment Agency information © Environment Agency and database right”.

22 Jul

Post: 22 July 2014

Updated: 24 July 2014 (see bottom of the post)

This week the Open Data User Group has published a benefits case arguing for open data release of an “authoritative” GP dataset.

ODUG calls on the Department of Health to organise an open dataset of all GP and dental practices, to include practice details, opening times, location, contact details, patient acceptance criteria, and a list of individual practitioners.

Geographic coverage is not mentioned, but as the call is to DoH I’m assuming ODUG is focused only (or at least mainly) on the data for England.

This is the first new benefits case from ODUG since last summer (list of previous benefits cases), so it’s worth taking a look at both the case itself and the related blog post by Giuseppe Sollazzo. My comments are below.

Existing Datasets

The current best sources for core bulk data on GP practices in England (codes, addresses, contacts, etc.) are:

Downloads from the HSCIC site
Downloads from the NHS Choices site
Download from the CQC site

Those datasets are all reusable under the Open Government Licence, i.e. they are open data.

Several side points before I get into the substance of the ODUG case:

1. NHS Choices staff are employed by HSCIC, so the first two datasets are effectively the responsibility of the same public authority. However there are substantial differences between the datasets as they reflect the underlying purposes for which they are maintained.

2. The ODUG criticises the NHS Choices dataset as follows:

"the branding of the NHS Choices dataset as a ‘Freedom Of Information’ dataset is troubling from an Open Data perspective, mainly for is "on demand" nature: a FOI data release, being a reactive response to a request, does not establish an ongoing process; while data release under an Open licence often comes proactively from the publishing entity, which in doing so creates a sustainable data update procedure".

I think this is rather over the top. NHS Choices hasn’t “branded” the data as a FOI dataset. It has merely made it available, along with a number of other useful data files, in the FOI section of its site. It would be nice if the NHS Choices site also had a dedicated open data landing page. However it’s perfectly sensible to draw users’ attention to existing datasets that they may want to know about before submitting a FOI request. NHS Choices says the data files are updated daily, so they are clearly not being published as a “reactive response” to FOI requests.

3. ODUG maintains that the GP practices data on the HSCIC site is not open data, and points to a page about “responsibilities in using the ODS data”. However HSCIC has recorded that dataset (EGPCUR) on as reusable under the OGL. (The ODS “responsibilities” page seems to written for NHS users. A literal reading only permits use of the data in connection with NHS-related activities, which is obviously not the actual licensing position.)

It’s also worth noting that elsewhere on its site HSCIC publishes an open dataset of practice codes, names and addresses as part of its monthly release of GP prescribing data.

Why are the HSCIC/NHS Choices datasets not “authoritative”?

There’s nothing wrong with arguing that existing datasets could be made more useful by improving the quality, or updating them more frequently, or appending data from other sources.

But we can have those arguments about most of the nation’s information infrastructure. A dataset doesn’t need to be ideal to be authoritative in practice.

The HSCIC and NHS Choices datasets are produced by the relevant official body, they are in wide use, and there are currently no better equivalents. The datasets are therefore, on the face of it, authoritative.

ODUG proposes that DoH establishes “an ongoing process to build, update and maintain on an authoritative dataset of medical practices and operating practitioners, drawing on the datasets made available by HSCIC and NHS Choices”.

I’m not sure how ODUG expects DoH to build an authoritative dataset by drawing on datasets it has dismissed as non-authoritative. ODUG’s call is to DoH, but in practice DoH would surely delegate any such new process to HSCIC. So what is ODUG proposing HSCIC should do differently?

Maintaining the new dataset on is also unlikely to add credibility, given the current state of the DGU catalogue and other functionality. HSCIC already has its own platforms and they seem serviceable for the publication of data. What in the ODUG proposal requires the involvement of

Release of open data or creation of a new data product?

The typical model of open data activism is to argue for the release of existing data assets (usually those held by public authorities) for reuse under an open licence. ODUG was originally set up to frame those arguments based on views from UK data users (within terms of reference from the Cabinet Office).

I’ve never been entirely on board with the idea of submitting “benefits cases” for release of open data, because it seems to conflict with the principle of “open by default”. In my view the onus should be reversed; public authorities should be required to demonstrate why we should not be able to reuse data that they hold. Benefits cases should only be necessary when there are significant costs involved in extracting and publishing the data.

However that model of open data release assumes we are talking about data that the public authority already holds and maintains in order to deliver its public task.

In this instance ODUG seems to be arguing for creation of a new data product, combining the existing HSCIC/NHS Choices datasets with data from other sources such as GMC’s Medical Register and patient acceptance criteria for each GP practice.

That last source in particular would probably involve quite a bit of ongoing administration and processing, as patient acceptance criteria are not held centrally or in a standard format.


Arguing for release of existing data is one thing. Arguing for the creation of new data products and new processes is something more.

I have no doubt there is room for improvement in the existing open data that HSCIC publishes on GP and dental practices. However public datasets are mainly produced to support a public task. I will be surprised if DoH takes up these ODUG recommendations without a more detailed demonstration of why the existing data and processes are inadequate to meet the requirements of the agencies and public bodies it supports.

For purposes of reuse beyond the needs of the health system itself, I think we are already quite well served by the existing open data on GP and dental practices. The ODUG benefits case is somewhat perfunctory; in the absence of more detailed analysis I am unconvinced by its attempts to talk down the value of the existing open datasets.

In my view the most interesting element of the ODUG benefits case is the idea that the Government should require the General Medical Council to release data from the Medical Register on individual practitioners. This register is an existing, useful source of public data that is not currently available for reuse under an open licence. I think a focus on that element, properly explicated, would make a more practical and worthwhile proposal.

Update (24 July 2014)

Giuseppe Sollazzo has written a new post in response to my post above, and I am grateful to him for engaging in further discussion.

Giuseppe’s new post provides a useful gloss on the benefits case and the thinking behind it. However in general the post seems to be more about what ODUG meant to say than what it did say. I cannot find much in there that changes my perspective on the benefits case itself.

There are sound arguments for releasing additional open data in this space, such as data from the Medical Register and data on patient acceptance criteria (if the administrative costs of collecting and maintaining that data can be justified).

But with respect to the core bulk data on GP practices, it seems to me that the key question is whether HSCIC’s EGPCUR dataset — the most robust of the existing sources — is currently available under the Open Government Licence.

ODUG may well be in contact with users who consider the EGPCUR dataset to be inadequate for technical or qualitative reasons. But the benefits case doesn’t get into that detail. As near as I can see the EGPCUR dataset is pretty good for general purposes — provided it is reusable as open data.

So why is this in doubt? ODUG’s benefits case states plainly that the HSCIC is not open data. I assume ODUG may also have given that advice to users who have consulted it. ODUG’s public profile is such that some potential users may take that as definitive and be discouraged from using the data. This is worrisome.

Giuseppe’s new post says ODUG is “asking for clarity”. But if ODUG has already engaged directly and positively with NHS England, why was the licensing position not simply confirmed with the data publisher at the earliest stage?

I have difficulty taking seriously the proposition that EGPCUR is not open data, given that the Department of Health listed it on as an Open Government Licence dataset more than three years ago. However this question makes quite a difference to the strength of ODUG’s case. The ongoing availability of a credible open dataset of GP practices is likely to be of wider concern to the open data community than enhancements to existing datasets.

17 Jul

Post: 17 July 2014

The Public Data Group is a collection of four data-rich government organisations (Companies House, Land Registry, Met Office and Ordnance Survey) that report to the UK’s Department for Business, Innovation & Skills (BIS).

The PDG organisations are trading funds, encouraged by Government to generate commercial revenue from the data assets that they control. However all four organisations make available at least some data under an open licence.

On Tuesday the Public Data Group issued a statement containing some commitments to future releases of open data. The statement is a bit short on detail, so this post is an attempt to add some context to the planned releases.


Companies House

The headline announcement in the PDG statement is the decision that Companies House will “make all of its digital data available free of charge”, from the second quarter of 2015 (April - June).

This follows previous initiatives to open up Companies House data:

The current Companies House Price List is online. My interpretation of Tuesday’s announcement is that Companies House will, at minimum, remove the £1 charge for access to individual Company Records via WebCHeck. “Electronic images” will also be free, which I think means PDF copies of the records.

The important unanswered question is whether this release will also include any new bulk downloads of data. According to the Open Definition, a dataset is only properly open data if it is available in bulk.

Bulk release is necessary for any kind of serious analysis of companies data. If the Government is serious about leveraging the free availability of Companies House data to “boost the UK economy”, then bulk release is essential.

Land Registry

According to the PDG statement Land Registry “will release their Price Paid Data for commercially owned properties for free by March 2015.”

There are no further details provided. It will be interesting to see what data is contained in that release. Land Registry currently makes available its Price Paid Data for residential transactions back to 1995, as open data. However it does not publish any statistics on commercial transactions. My past understanding was that Land Registry did not maintain a separate dataset for commercial sales (of either land or properties).

There are several open questions: How complete or extensive is Land Registry’s data on prices paid for commercially owned properties? Does Land Registry intend to release data on historical as well as new transactions (bearing in mind that the initial release of residential Price Paid Data was only new transactions)? And is there likely to be resistance from commercial property owners to the open publication of sale prices?

The PDG statement also says that in 2014/15 Land Registry intends to “make the whole Index Map polygon layer covering England and Wales available at a cost recovery price.”

This is not an open data release, of course. The news may be welcomed by Land Registry licensees, but it remains to be seen how much of a saving they will realise. “Cost recovery price” should not be confused with “marginal price”. Index Map polygons are based on Ordnance Survey spatial data, so (as we have seen with the INSPIRE Index Polygons) it will be OS pricing that determines the actual cost of reuse.

Met Office

There is not much open data on the horizon from Met Office. However the PDG statement says Met Office is creating something called the “National Archive for the Nations Memory of the Weather”, and that “a selection of this will be available as Open Data.”

This could be significant or not, depending on what Met Office decides to release. Data required to maintain the nation’s “memory of the weather” could range from detailed weather observations, to historical documents, to anecdotal information about notable weather events.

Ordnance Survey - enhancements to OS OpenData

Ordnance Survey’s open data programme is more extensive than those of the other trading funds, as it was launched under the previous Government. Tuesday’s PDG statement notes several future developments.

The existing OS Street View product (raster base-mapping) will be enhanced with new features added such as car parks, major paths, major cycle routes and hill.

OS will release an “enhanced Gazetteer”. This is presumably the “Gazetteer of Great Britain" that OS demonstrated at GeoBusiness 2014 in May. I have seen some sample data for this product (via the OS Insight developer programme); it looks like a useful addition to the OS OpenData suite.

Ordnance Survey - Public Rights of Way

OS will be “working with” Defra to “provide consultancy, technology and to enhance public access (through a portal) to Rights of Way data”. I guess this is good news, though lack of a portal is not the main blockage to release of Public Rights of Way data.

I’ve written about PRoW data before. The main problem is that there has been no organised effort from central government to encourage local councils to release the data. OS is part of that problem (and therefore has to be part of the solution) because most councils use OS data to maintain their Rights of Way maps and need OS’s permission to release those maps as open data. However DCLG and the Local Government Association should be pushing this harder as well. The ideal would be publication of an open national PRoW dataset, collated from the many local sources.

We already have a pretty good portal for the 80 or so local PRoW datasets already available as open data: Barry Cornelius’s Rowmaps site.

Ordnance Survey - Derived River Network

Of the various open data commitments mentioned in the PDG statement, this is the one I am personally most excited about: Ordnance Survey plans to release a new “Derived River Network” open data product.

My understanding is that this dataset will be derived from the new Water Layer in MasterMap. (The other relevant dataset in this space, the Detailed River Network that OS developed with the Environment Agency, is being deprecated.)

The River Network is a dataset that I have been campaigning for since 2012, via the ODUG process and other channels such as the Defra Transparency Panel.

Since then more spatial data about our rivers has become available, most recently the Cycle 2 draft of the EA’s Water Framework Directive (WFD) River Waterbodies dataset (last month’s post). However I am gratified to see that there is now sufficient support for open release of a good general-purpose vector map of the river network.

The Derived River Network dataset will complement the Environment Agency’s recent open data release of live flood warning and river level feeds as well as plans to release the NaFRA national flood risk dataset.

Photo credit: Open_Data_stickers.jpg by Jonathan Gray (CC0 1.0). It’s an iconic image and I was too lazy to find something more imaginative to illustrate this post.

15 Jul

Post: 15 July 2014

Yesterday the Government published the outcome of its recent public consultation on privatisation of Land Registry.

The Government received 304 formal responses to the consultation. 264 responded directly to the objective questions.

Of those, 91% of respondents answered No to the following question:

Q1. Do you agree that by creating a more delivery-focused organisation at arm’s length from Government, Land Registry will be able to carry out its operations more efficiently and effectively for its customers?

This is the key question indicating support or opposition to the privatisation plan. 5% of respondents agreed and 4% were not sure.

According to media reports, Land Registry privatisation plans have now been abandoned by ministers. However the Government has told Parliament only that “at this time, no decision has been taken”. The local MP for Durham, where Land Registry employs 400 people, has said she would be seeking reassurances the plan had been scrapped — and not just delayed.

So who supports privatisation of Land Registry?


We know that the bulk of consultation responses opposed privatisation of Land Registry, by an overwhelming majority. But who does support privatisation?

The Government has published 105 consultation responses; excluding “those which were explicitly marked as from individuals or those we believe are a personal response.”

Those respondents (organisations, businesses, representative bodies, trade unions, etc.) are listed in this spreadsheet, with their responses to Q1.

With a couple of exceptions, SMEs oppose privatisation of Land Registry. Legal representatives, local councils, trade unions and professional organisations also consistently said No.

However the large businesses that responded to the consultation almost all supported privatisation. The following were unequivocally in favour:

Decision Insight Information Group (DIIG)
Landmark Information Group
Silver Lake Europe LLP

Capita is the UK’s largest business process outsourcing and professional services company. IBM is a well-known global technology company.

DIIG is the UK and Ireland’s leading property searches group. Landmark is a reseller of property related environmental risk information and digital mapping. (DIIG and Landmark are both owned by DMGT, which also owns the Daily Mail.)

Equiniti is the UK’s leading share registration business, and owned by the private equity group Advent International. Silver Lake Europe is the European branch of the US-based Silver Lake private equity firm. (Silver Lake’s consultation response was marked “PRIVATE & CONFIDENTIAL”. The Government released it anyway; possibly an oversight.)

Teranet is a Canadian company that operates electronic property search and registration services, equivalent to Land Registry, in the provinces of Ontario and Manitoba.

Photo credit: HM Land Registry Office, Croydon by Heortlea, CC BY-SA 3.0

11 Jul

Post: 11 July 2014

There are various ways to judge the extent to which local authorities have engaged with the open data agenda. One of these is to look at whether a council is delivering the bare minimum required by Government.

In England that bare minimum is the publication of spending data, i.e. payments to suppliers, for reuse under an open licence.

I would certainly not argue that spending data is the most important dataset held by local authorities. However it was an early priority for the current Government. Spending data was the top bullet point for local government in the Prime Minister’s "transparency" letter of May 2010. It was also central to the Code of Recommended Practice issued by DCLG in 2011. (If all goes to plan an enhanced version of that code will shortly be mandatory for English councils.)

At this point, several years into the current Government’s programme, it is unsurprising to find that nearly all councils in England — 349 out of 353 — do publish and maintain an archive of their spending data online. (Spending data is also collected and analysed on a number of third party sites.)

This provides us with a useful baseline for identifying those councils most resistant to the open data agenda. Based on this one metric, these are the four “least transparent” councils:


4. Enfield London Borough Council

Enfield does publish expenditures of £500 and above, in CSV format. However it only publishes the most recent month’s data.

3. Southwark London Borough Council

Southwark publishes spending over £250. However like Enfield it only publishes one month of data. The data is only available as a PDF file, and contains far less detail than the norm — just the vendor name and an amount.

2. Rother District Council

Rother does not publish its spending data online at all. Following is the explanation on the Council’s website:

Supplier expenditure data exceeding £500 has been removed from the Rother District Council web site on 18th November 2011.

For the foreseeable future spend data will not be published on this web site.

The removal of this data is to protect the Council and its Suppliers against fraudulent activities.

Requests for this information can be made via the normal Freedom of Information, (FOI) channels, with each request being given due attention under the rules and guidelines of this legislation.

(National Archives has captured some of the 2010 data files.)

1. Wigan Council

Wigan also does not publish its spending data online at all. This is the message on the Council’s website:

Information relating to our spend over £500 is available on request. Please email providing details of dates or any specific information you require.

Wigan edges out Rother as the “least transparent” council in England because, in addition to not publishing its spending data online, it also recently refused a Freedom of Information request for the missing data.

Wigan seems to have been motivated by the same vague concerns about fraud put forward by Rother. However the Information Commissioner was unconvinced and ordered the Council to release the data to the FOI requester:

"The Commissioner considers it extremely unlikely that the DCLG would issue recommendations which would expose local authorities throughout the country to potential fraud."

Well done, Wigan Council.

Is the above unfair to the councils?

There are lots of ways we can measure the open data performance of a public authority, and I’ve simplified for the sake of brevity.

Open data and transparency are related but not the same thing. As far as transparency is concerned, a robust commitment to Freedom of Information is more important than open data. Councils can also demonstrate support for open data by publishing useful datasets that have nothing much to do with the transparency agenda.

However none of the above four councils seem to be doing anything special in the open data space to offset the poor impression created by their refusal to publish spending data properly. (Enfield and Rother have no dataset records in the catalogue. Wigan has one record; ironically, for the spending data that it no longer publishes. Southwark has a slew of records for geographic datasets, but none of them are open data.)

Photo credit: The Civil Centre in Wigan by Dave Green, CC BY-SA 2.0

Page 1 of 16