Credit by Bill Schmarzo
Welcome to today’s technology architecture challenge!
For many companies, their technology architecture is becoming more of a hindrance than an enabler of value creation, it resembles an archeological dig, with layers of ancient technologies layered on top of even more ancient technologies. The result: added weight, obstructed vision, and lack of flexibility, agility and mobility.
Modern digital companies like Google, Facebook, Twitter, Apple, Netflix, Amazon, and AirBnB have taken a technology architecture approach that increasingly treats the technology infrastructure as “disposable” using open source technologies.
And the reason for this open approach, in my humble opinion, is two-fold:
Firstly, building upon open source technologies provides the flexibility, agility and mobility for companies to move to the next best technology without the constraints and architectural lock-in of traditional technology. Modern digital companies are basing their technology infrastructure on open source technologies that not only prevents vendor architectural lock-in but also allows them to advance the technology capabilities at their pace and at the pace of the business.
Secondly and more importantly, these digital companies understand that the technology isn’t the source of business value and differentiation. They understand that the source of business value and differentiation is: the data that these organizations are masterfully amassing via every customer engagement and every usage of the product or service, and the customer, product and operational insights (Intellectual Property in the form of customer, product and operational propensities, tendencies, associations, relationships and patterns) that leads to new Intellectual Property (IP) monetization and commercialization opportunities.
Understanding the Modern Digital Business Architecture
1. Started with AWS Redshift but transitioned to Apache Hive when it started to run into scalability issues.
2. Migrated to Presto to provide a more powerful query engine that supports data exploration analytics across multiple data sources.
3. Uses Apache Spark to support massive ETL batch processing and to train its machine learning models.
4. Uses Druid, a column-oriented in-memory OLAP data store that excels at performing drill-downs and roll-ups over a large set of high dimensional data.
5. Uses Jupyter (data science team), a popular notebook-style interface for working with data and machine learning algorithms, and the PySpark library.
6. Uses Apache Airflow which creates repeatable data engineering and data science workflows that can be executed atop the workflow orchestration tool Kubernetes.
7. Uses a mixture of Apache Kafka, Apache Flink, and Spark to build streaming services.
Hive, Presto, Spark, Druid, Jupyter, Airflow, Kafka, Flink, Kubernetes…crazy-assed names for open source technologies that one is not going to find from the monolithic technology vendors of yesteryear (see Figure 1).
*Figure 1: Source: Hortonworks*
Lyft and other modern digital companies understand that their technology architecture should serve two basic purposes:
Facilitate the capture, refinement, curation, sharing, management, governance and analysis of the company’s invaluable data assets.
Build a technology architecture that doesn’t get in the way of point #1.
The Economics of the Modern Business
Data is the economic asset of lasting and differentiated value. Data is the source of customer, product and operational insights that the modern company uses to differentiate their products and services while driving towards operational excellence. And Data Science is the heart of the data value creation process.
These data and analytics-centric companies are tightly integrating the data science and business teams in order to define the parameters of analytics success; to identify, capture and operationalize the sources of customer, product and operational value creation (see Figure 2).