Evolution of data in Web 2

Draws parallel between data evolution in Web 2 and lessons we can learn from it

When we look at how the data was evolving in Web 2, we can follow the progress of dotcoms launching in late 90s, first keeping small amounts of data in files processed by PERL using CGI, then relational databases like Oracle and Microsoft SQL Server supporting business domains, then exponential explosion of data leading to distributed NoSQL databases and processing systems, then map-reduce systems such as Hadoop, and then data moving to the cloud and emergence of modern data cloud systems, such as Snowflake and BigQuery which can crunch terabytes of data, to extract valuable knowledge from it.

The way how Web 2 data technologies evolved, show typical reactive approach to the faced challenges. The data systems followed first demands of web applications, and with growth of data, started dumping all the application data and logs into cheap cloud block storage, hoping to analyze it and extract knowledge from it later.

These large burial grounds of data on AWS S3 became known as data warehouses or data lakes and in many organization has been rechristened as “data swamps”, as lack of planning and shortage of data engineers prevented many orgs to realize potential to be a truly data-driven business with all the data required for decision making at the fingertips of the managers.

Data fabric architecture tried to address this, by introducing integration of data in wherever system and form it is found through a common interface accessible by decision makers, it was still suffering from excessive centralization, shortages of data engineers required to process custom requests & subpar data quality.

Data mesh, the newest data engineering paradigm, breaks away from centralized data warehouse or data lake idea and embrace oriented domain driven micro-services architecture, where the team who has built and maintain this micro-service treats the data they produce as a data product. And since they also possess domain knowledge, they can guarantee high quality meaningful data as the output. We will explore the concept of data mesh in the later chapters, since it’s the most relevant to decentralized p2p architecture of Web 3.

The current state of data evolution in Web 3 looks like this:

We are in a situation now where all the Web 3 metadata analytics is done by either indexers, such as Graph Protocol, which is similar to a data warehouse / data lake concept that ingests anything it can find in L1 chains and IPFS, leaving figuring out the meaning and deal with quality of the data to the consumer, or data marketplaces, such as Ocean Protocol, which allow publishing domain-specific highly curated data products, but mention nothing about cross-domain integration. So projecting into the future that cross-domain integration will become important in the future, judging by Web 2 evolution, can we suggest something for our vision of Web 3 - driven future, where everything in both worlds, physical and virtual, be tokenized as NFTs?

Last updated