Data management realities – the role of the chief data officer

7 December 2016

As volumes of data grow increasingly larger, and transparency via regulation is at the forefront of those in the industry as well as its consumers, Virginie O’Shea, research director at Aite Group, looks at the role of the chief data officer and all they may need to be aware of.

The post-crisis era of increased transparency has resulted in many firms stepping in to assess the quality and the management of the core reference data sets on which they are basing their trading, risk management and operational decisions. Reporting requirements to regulators and clients have also escalated in this environment, which, in turn, has increased firms’ internal data aggregation, storage and management requirements. This has given rise to the establishment of more data governance programmes within firms across the industry, and the creation of C-suite executive positions dedicated to championing data management and data governance at the enterprise level.

Whether they are called chief data officer (CDO) or any other variant, these individuals face a tough task ahead – how do you effectively manage data across operational and geographic silos, and introduce a culture of data responsibility within the business, while meeting ongoing tactical and strategic targets, and cost-saving goals?

Aite Group research indicates that as of the end of October 2015, 24% of the top 100 global banks, ranked by value of assets under management (AUM), had a CDO in place. Some of these banks had even hired more than one CDO to focus on their various divisions and data sets.

Data in the future

The capital markets are faced with a hectic regulatory outlook over the next three years and many of these regulations have direct implications for the way data is stored, managed and communicated. The Markets in Financial Instruments Regulation (MiFIR), due to be implemented in January 2018 in Europe, for example, extends the number of fields for transaction reports to 64, up from 24 under its predecessor, increases the scope of reporting from equities to nearly all other instruments and sets minimum data storage requirements of five years for this information. Global over-the-counter (OTC) derivatives regulations have also introduced new reporting requirements and related market infrastructures in the form of trade repositories.

The capital markets are faced with a hectic regulatory outlook over the next three years; many of these regulations have direct implications for the way data is stored, managed and communicated.

Not only do these regulations increase the volume of reported data, they also directly scrutinise the quality of the data and require firms to adopt new standards and classifications, and provide a data audit trail. To this end, the Basel Committee on Banking Supervision (BCBS) 239 compels banks to adopt new data management principles that are focused on supporting risk management reporting and governance. Tier-1 banks had to adopt the data aggregation principles by January 2016, which necessarily involved a focus on improving data ownership by the business and the introduction of consistent data quality frameworks across geographies and the enterprise.

A watchful eye

Regulators are keen to see an audit trail for data underlying many financial institutions’ operational decisions such as risk analytics or the pricing of trades, which means that data must be tagged with metadata to prove lineage and provenance of individual items. Some firms are working on centralisation programmes in order to store and scrutinise this data in a comprehensive manner as well as to bring down data costs in the long term. Given that many incoming regulations require the storage of this data for a minimum period of five years, the scale of the data problem is becoming much greater than before.

A useful industry-wide proxy to use for scale is the US equity and options markets’ consolidated audit trail (CAT) initiative and the projected data capacity requirements of the new system after a five-year implementation period. The CAT proposal estimates that five years’ worth of US equity and options quote, order and execution data will total around 21 petabytes (PB) of data, which is equivalent to the amount of data Google was processing every day in 2008.

If the US market represents around half of the global equities and options market, then a global CAT could potentially require anywhere between 42 and 50PB of data in the same time period. While this is a significant or potentially big-data challenge in terms of volume, it must be noted that this is structured data that is provided by market infrastructures and vendors; hence, complexity and variety of data is less of a problem than for unstructured data formats.

Beyond the standard

The technology environments that data management teams are currently dealing with are dominated by standard relational databases, though there is a fairly high level of data-related tooling, including those focused on integration, business intelligence and data quality. Historically, in-house teams have built data management technology because of a desire to craft a platform that meets the bespoke needs of the business, or so the theory goes. The reality of such projects is that they are often subject to ‘scope creep’ and delays due to reliance on internal IT staff members who must split their time over multiple projects. A common complaint by business heads is that the support of new asset classes takes too long because the data management team must understand and model the required data components, and the IT department has to build the required functionality. Firms that have a rigid, canonical data model underlying their internal data architecture must gather end-user requirements into a definition document on which the new asset class schema design must be based.

Though the primary focus of the data management function is on dealing with structured data residing in relational databases or warehouses, there has been an uptick in requirements to managed unstructured data items such as source documentation stored in PDF formats, especially in the realm of legal entity data and know-your-client (KYC) contexts. The adoption of electronic trading has spread across different regions and asset classes, which has meant the diversity of data sources and sheer volume of data have also increased substantially over the past decade.

The technology environments that data management teams are currently dealing with are dominated by standard relational databases.

Most firms are dealing with a fragmented technology environment, and this means that data management teams face a significant challenge to aggregate internal data and respond to internal business, client and regulatory demands. As part of an industry − wide push to reduce operational risk, these firms are also attempting to improve the manner in which they measure the maturity of their key internal data sets.

Focus areas for improving data sets

Data-quality metrics such as accuracy, completeness, timeliness, integrity, consistency and appropriateness of individual data items.
How satisfied the business end users and clients are with regard to data transparency and availability overall.
The amount of manual reconciliation or data cleansing that must be done to get data into the right state for consumption, including the cost and number of employees dedicated to these tasks.
The ability of the business to respond to data requirements (internal and external) in a timely manner and not be hampered by scalability issues.
The elimination of legacy applications and systems via the consolidation of data environments.

Virginie O’Shea is a research director with Aite Group, heading up the Institutional Securities and Investments practice, and covering data management, collateral management, legal entity onboarding and post-trade technology.