返回 Skill 列表
extension
分类: 其它无需 API Key

Databricks Company

Databricks 是 Apache Spark 创始人打造的数据与 AI 统一平台,提供 Lakehouse 架构、Delta Lake 及 AI 模型训练能力。

person作者: hanxueyuanhubclawhub

Overview

Databricks — the unified data and AI platform founded by the creators of Apache Spark, valued at $43B as a private company.

When to Load This Skill

  • User asks about Databricks history, Apache Spark, or data/AI platforms
  • Need analysis of Databricks vs. Snowflake competition or Lakehouse architecture
  • Questions about MosaicML acquisition, DBRX model, or the AI data infrastructure market

Historical Timeline

  • 2013: Ali Ghodsi, Matei Zaharia (Spark creator), Ion Stoica found Databricks in Berkeley
  • 2013: Open-sources Apache Spark — becomes the dominant big data processing engine
  • 2019: Introduces Delta Lake — open-source storage layer bringing ACID transactions to data lakes
  • 2021: Revenue passes $500M; valued at $38B
  • 2023: Acquires MosaicML ($1.3B) — enters generative AI model training
  • 2023: Introduces Lakehouse architecture — unifies data warehouse and data lake
  • 2024: Launches DBRX (open-source LLM); valued at $43B; revenue ~$2B+

Business Model

Platform-as-a-Service: consumption-based pricing on Databricks Runtime (compute), storage (Delta Lake tables), and AI/ML services. Unity Catalog provides governance. Expanding from data engineering into BI, AI/ML, and governance.

Competitive Moat

  • Apache Spark originators: deep technical authority and community influence
  • Delta Lake ecosystem: open-source standard that competitors must support
  • Lakehouse architecture: unifies data engineering, analytics, and AI — one platform instead of multiple tools
  • MosaicML acquisition: vertical integration from data infrastructure to model training
  • Open-source strategy: Spark, Delta Lake, MLflow create developer lock-in and community advocacy

Key Data

Valuation: $43B (private, 2024) | Revenue: ~$2B+ (2024) | Customers: 10,000+ | Spark users: 1M+ developers | Employees: ~7,000+

Interesting Facts

  • Apache Spark was originally a class project at UC Berkeley's AMPLab — the paper was rejected from two conferences before it became the most popular big data framework
  • Databricks is named after the fictional 'databrick' unit the founders jokingly used to measure Spark cluster processing capacity