'Big Data' come to the global oil industry

OpenOil's Johnny West argues it can lead to more oversight and scrutiny.

What we realised when we put together the first guide to public domain data on the oil industry is that Big Data are coming to oil and gas. We only just scratched the surface in the first edition.

Right now, you can see who drilled to what depth in the North Sea last week. The Canadian province of Nova Scotia posted geological data it had gathered at a cost of $15 million on the Net to encourage investors, who closed eventually with a $900 million commitment. The Brazilian company Petrobras is talking of using open source algorithms to increase efficiency of interpretation. And it's only just beginning.

End of Easy Oil, rise of Digital Oil

At the level of production and exploration, data acquisition has increased exponentially and interpretative algorithms by an order of dimension again, functions of both Moore's Law of processor power and the end of Easy Oil. BP hired ships to process terabytes of data in place when surveying an area of Libyan waters the area of Ireland. Oil companies are talking in terms of the Digital Oil Field and satellite imagery and surveillance aircraft use heat imaging and gravimetric techniques to extrapulate information below ground at costs which are dropping by the year. Meters measure flows through pipelines to a fraction of a percent across multipe grades of crude oil and transmit their data to a gathering center in real-time.

Downstream, EITI reports and initiatives such as the Open Government Initiative are beginning to yield data about money flows. Dodd Frank, a law passed by the US Congress in 2010, is about to increase the flow of financial reporting by most oil and gas majors exponentially, simply to comply with requirements to list on US financial markets.

And this is before we get to the growing ability of semantic techniques to mine large public data repositories to establish network graphs of acquaintance or business interest, and the gradual, machine-assisted dissolution of the language barrier to distributing such information in dozens of languages around the world. And the rising assertiveness of publics and transnational civil society, empowered to use such technologies, to be informed and consulted on decisions and operations that play a large part in determining their economic future.

Leading to openness or overload?

But there are two competing visions as to what the arrival of Big Data means, and in this extractive industries are possibly no different to other sectors.

The first, which we subscribe to at OpenOil is that these trends predict something that still seems unimaginable to many today – an open oil industry whose profit centers, rents and  operational workings are as visible as other business sectors, within a decade.

At the limit, it is possible to imagine that Dodd-Franks and parallel European legislation might provide enough information to allow reverse engineering of oil company business models. Think about it. Every payment of over $100,000 to any government in the world should be declared in public filings to the Security and Exchange Commission, disaggregated most probably down to the level of a legal concession area. If we know what the tax rates are, and what production is, and what sales are, then it should be possible to use the tax payments data to make more refined guesses about how much profit companies are making country by country. And that, after all, is one of the most startlingly unknowable features of the oil industry today. You can sell a barrel of oil from Libya and one from Canada for $90, say, but if the one from Libya cost $6 to produce and the one from Canada cost $60, the Libyan barrel of oil yields nearly three times as much profit as the Canadian one. Same company, same staff, same product, 300% difference in the profit. And what percentage of Company X's profits come from Libya in any given year, or from Canada, or from any of the 40 other countries it might be operating in is pretty much pure conjecture.

This of course is one of the reasons companies gave in their objections to wider scope for Dodd Franks regulations, that such disclosures would effectively give away the shop. But it should be noted that if it did happen, it would not be the end of the oil business by any measure. It would be more the transformation of oil into just another industry, with the kind of scrutiny that manufacturing or retailing, for example, regularly come under.

On the other hand, there is the possibility that this explosion of data, hard fought for and now coming, could simply bring information overload to the field of transparency in spades. Malcolm Gladwell gave the best explanation of this so far when he pointed out in his latest book “What the Dog Saw” that the Enron affair, actually, was many things but only just a cover-up. Jonathan Weil and the other journalists who broke the Enron scandal did it using documents Enron had filed to the SEC years before – that nobody had bothered to read. Then there was the bunch of graduate students at Cornell University who had made Enron a class project in 1998, years before the scandal broke, and concluded that shareholders should sell because the company was probably manipulating the stock price. According to this scenario, then, an explosion of available information does little to nothing to increase real oversight and scrutiny.

Making sense of big data: standards and semantics

We hold to the optimistic scenario is, ultimately, for two reasons.

First, data are becoming more standardised in the transparency world. We see this both within and without EITI.

And second, the technology is emerging to deal with Big Data, in the form of semantics, data mining, and machine assisted translation. Here in Berlin this year has seen the birth, for example, of the Wiki Data Project. Consider the fact that the information in Wikipedia is essentially dumb. If you want to know the answer to the question Which are the ten most populous cities in the world with a woman as mayor? The information is certainly in there, but you can't just ask the question and get an answer, because no human or machine has parsed and analysed the statements in the data. You would have to find a list of the biggest cities and go through them one by one until you accumulated ten women mayors. Wikidata will provide the kind of database structure that Wikipedia lacks simultaneously across 270 languages, while still remaining open to public editing.

It's just one initiative, but an example of what is possible and coming in the public space.

It may be hard to see right now because we are in transition. The publishing model in and around the extractive industries is still dominated by a 20th century business model of scarcity and access. Pay $1,000 a year and we'll provide unique access, unparalleled analysis, say scores of newsletters, and for a time, based on the personal networks of the founder, that may sometimes be true in some regions. It's almost always oversold, though, extended so far and so thin that most of those customers should want their money back. In the Big Data world the possibility to add value lies not in the data but in the metadata - what the data actually mean. Can you find the ten woman mayors from three million Wikipedia articles? Can you reverse engineer a reasonable estimate what Company X must have grossed and what its Internal Rate of Return in Nigeria was last year? Can you generate network graphs to follow the trail of a chain of subsidiaries across six stages of ownership and four jurisdictions, back to the Cayman Islands? In the haystack of public domain data, can you establish rules which allow software programs to pick out the needles? 

The challenge for the transparency community here is to invest and skill up in techniques that are now common in the world of business intelligence - semantic Web, entity extraction, sentiment analysis. We need to recognise the unavoidably geeky nature of a lot of this work, and to marry that well to the political dynamic of genuine engagement with civil society and media in the EITI countries and elsewhere. False egalitarianism serves nobody. We should not be afraid - sometimes - to invest in complexity and implement "high end" projects if we can see how to bring the results back to the constituencies that matter.

If we don't do this, the transparency movement as a whole will remain under-equipped and overwhelmed by the flood of Big Data that is now coming to this sector.

OpenOil's guide to Data in the Oil Industry is available online at http://data.openoil.net and in print form on request. Johnny West is founder of the OpenOil, a Berlin-based consultancy which works on energy sector governance.