Naveed Asem joined DFIN in 2016 as the head of data and analytics. He now leads a growing team of data architects and analysts who are focused on the business value of data through the use of modern data management technologies, analytics and machine learning. The team is not only responsible for the implementation of DFIN’s enterprise data warehouse and data lakes, but they manage everything from data ingestion and curation, to the delivery of reporting, and filing of artifacts with regulatory bodies in the U.S. and EU.
In this interview, we asked Asem to share some best practices when it comes to data management and answer a few questions we get from clients to help demystify the world of data.
Let’s begin with a trending topic. Many of today’s financial firms aim to standardize data processes to ensure they efficiently disclose integrity-based data. What are some of the tangible steps they can take on this front?
That’s a great question because data integrity often requires evolving goals and purposes across the data lifecycle. To establish fundamentally good data integrity practices, firms can begin by maintaining healthy audit balance and control over data throughout the lifecycle. They can also maintain high data quality by establishing a reasonable data governance process. I say “reasonable” because many organizations use a complex process where the costs involved are higher than the value it provides. And finally, firms can ensure that data flowing through their organization is of high value for its given purpose. For instance, when the purpose is analytics, firms want to ensure that the data can actually offer useful insights.
And from the regulators’ side of things, what are industry leaders doing to receive and analyze data in ways that maximize what structured data can offer?
Well, precisely how regulators use data is the “secret sauce” in their industry and therefore not always disclosed. But given how the potential for collected data is fairly substantial, leading regulators are using it to understand market trends by looking at key indicators, to detect fraudulent and illegal financial practices, and to accurately present the “pristine” view of financial facts. This view helps investors more clearly understand the risk and reward of their investments.
The four V’s — volume, variety, velocity and veracity — are often cited as the key dimensions for Big Data. But what is the most important Big Data dimension when it comes to digital disclosure?
In governance, risk and compliance, “validated” is actually the V we care about most. That’s because regulators assume and expect that firms will share regulatory information that is valid, trustworthy, and follows the expected structure.
Artificial Intelligence is another buzzword right now. What connections does AI have to disclosure?
We are already seeing how AI can help address disclosure challenges such as the needs for transparency, for insights and for actions on those insights. Because it is typically based on a large set of structured and unstructured information, disclosure can be very difficult for the average investor to comprehend and leverage for making decisions. AI can bridge that gap by uncovering hidden meanings and relationships. The result can be critical insights into, say, the illegal financial practices of companies or the risk associated with investments — insights at a level that few human beings can’t reach on their own in a timely manner.
Speaking of depth, is there a simple way to define the meaning and importance of “data lake”?
Data lake is a simple concept often described using big words. At a very simple level, data lake helps firms manage data that may initially seem ambiguous or have no clear purpose. Because we generally agree that data will be useful down the road, a data lake architecture integrates collection, storage, and consumption — both structured and unstructured — to help derive more meaningful value from data.
How are you able to leverage Big Data potential to help bring greater insights to DFIN clients?
At DFIN, we collect data from disparate sources and process it to derive insights using cutting-edge tools. For instance, we use natural language processing and natural language generation to both extract structure where required and to build narrative text from structured facts. In this way we not only improve efficiency and speed to market but also unlock key capabilities that our competitors can’t match. Now with SEC EDGAR data, we can provide clients with analytics and comparisons against their peers, sectors, and defined comparison groups. In short, we can slice and dice data in even more ways that clients may want.