The Evolution of Big Data Joe: 2025 Edition
What I Got Right, What I Misjudged, and What I Learned After 8 More Years in the Trenches
Back in January 2017, I wrote a post called “The Evolution of Big Data Joe.” At that time, I had just moved from consulting into Western Digital to build their Big Data & Analytics ecosystem from scratch. So much has changed since then — for me personally, for the industry, and for how we think about data, cloud, AI, HPC, and engineering enablement.
Today, after leading large global data, cloud, and IT organizations across Western Digital, Intel, Solidigm, and Marvell and working at the intersection of Big Data + HPC + Cloud + AI + EDA … I’ve had a front-row seat to the evolution of everything I believed in 2017.
Here’s what I got right… what I got wrong… and what I see differently now.
1. “Hadoop is not Big Data.”
2025 Update: Still true… and even more true than I realized.
In 2017, I said Hadoop wasn’t Big Data.
In 2025, we can add: Hadoop is no longer Big Anything.
The entire ecosystem moved:
Hadoop → Spark
HDFS → Object Storage (S3, GCS, ADLS, on-prem S3)
YARN → Kubernetes
MapReduce → Spark / Flink / Ray
Hive → Snowflake / BigQuery / Databricks / Iceberg
I was right that the “ecosystem matters more than Hadoop.”
But I underestimated how fast the ecosystem would leave Hadoop behind.
2. “MPP is needed.”
2025 Update: Yes… but now MPP is invisible.
Back then, I argued that MPP engines like Redshift, Teradata, and Netezza were essential.
Today, MPP is everywhere — you just don’t think about it anymore:
BigQuery abstracts everything.
Snowflake turned MPP into “elastic compute clusters.”
Databricks SQL closed most of the gaps.
Presto/Trino matured dramatically.
Iceberg/Delta/Hudi gave us real open table formats.
What I got wrong:
I thought Hadoop-native SQL engines would never get there.
But Trino + Iceberg has proven that assumptions age fast.
What I got right:
Relational analytics never died. It simply moved to the cloud and became elastic.
3. “NoSQL should be considered.”
2025 Update: Correct… and now it’s mainstream and boring (in a good way).
MongoDB, Cassandra, Redis — all still here.
But the ecosystem expanded:
DynamoDB
Bigtable
ScyllaDB
AlloyDB for Postgres
And vector DBs like Pinecone, Milvus, Chroma
What I underestimated:
The fusion of NoSQL + SQL.
Today you see hybrid engines everywhere … Postgres with JSON, AlloyDB with vectors, Redis with search.
NoSQL is no longer about odd workloads … it’s just part of normal architecture.
4. “Train data scientists internally… context matters.”
2025 Update: 100% correct… but the definition changed.
In 2017, I said “Data Scientist is not a general title” … and that’s still true.
But today:
AI Engineers
ML Engineers
Analytics Engineers
Prompt/Model Engineers
…have blurred into a new type of technical athlete.
Back then I said companies should grow “Data Hackers.”
Today, the modern version is:
“Full-Stack Data Engineers who understand the business.”
And in 2025, AI copilots dramatically accelerate their learning curve.
I was right about context.
I was wrong that ramp-up would always be slow … AI now collapses the ramp time.
5. “Big Data needs a dedicated IT team.”
2025 Update: Still true… but today that team looks completely different.
In 2017, I imagined a cross-functional “DevOps for Big Data” team.
Today, that team is more like:
Platform Engineering
SRE
Cloud Engineering
Data Platform Engineering
ML Platform Engineering
FinOps
HPC/EDA Infrastructure
Full-stack Observability
And in semiconductor + engineering-focused companies (Intel, Solidigm, Marvell):
IT has become a strategic enabler.
Not service desk.
Not ticket takers.
Not “keep the lights on.”
The modern IT teams I lead today are a fusion of:
HPC + EDA + Cloud + Storage + Observability + FinOps + Automation + Engineering Experience.
That didn’t exist when I wrote the post in 2017.
6. “Hadoop can exist in the cloud successfully.”
2025 Update: Hadoop didn’t just move to the cloud… it disappeared into it.
What I said then:
“Today, I wouldn’t deploy Hadoop any other way.”
What I’d say now:
“Today, I wouldn’t deploy Hadoop at all.”
“Use Spark + object storage and keep moving.”
Cloud didn’t just make Hadoop viable… it made it obsolete.
And ironically… everything we ran on a Hadoop cluster is now serverless or containerized.
7. What I never saw coming in 2017
Here’s what 2017 Joe couldn’t have predicted:
HPC and EDA workflows moving to the cloud at scale
Massive GPU footprints for AI/LLMs
Hybrid EDA flows spanning on-prem NetApp + AWS Nitro fleets
FinOps as a core discipline in engineering
Data pipelines becoming real-time and ML-driven
AI copilots in every engineering workflow
S3/GCS becoming the center of gravity for everything
Iceberg/Delta/Hudi becoming the new file system
AI/ML shaping how storage, compute, and networks are designed
Omni-path/Infiniband decisions becoming cloud decisions
2017 was Big Data.
2025 is Cloud + AI + HPC + Data + Engineering Experience as one ecosystem.
8. What hasn’t changed
What’s still true:
The landscape is always shifting
Architecture is never done
Talent beats tools
Context beats credentials
Platforms matter more than products
Engineering and IT win together
Data is still only valuable when it impacts the business
Most of all:
I still love this space… maybe more than ever.
So… what have you changed your mind about?
I’ve shared mine… now I’d love to hear yours.
Thanks for reading.
— Big Data Joe


