Projects

Seoul Restaurant Survival

Of 528,000+ Seoul restaurants, only 35% reach 10 years. Where you open matters more than what you cook.

Timeline2024
CompanyIndependent
RoleAnalyst
StackPython · pandas · lifelines · Matplotlib

Overview

Median Seoul restaurant lifespan is 6.1 years. Jongno-gu reaches 10. Yangcheon-gu hits 4.7.

I pulled 528,000+ restaurant license records from Seoul's open-data portal and treated business lifespan as a survival problem. Kaplan-Meier curves, log-rank tests, and Cox proportional hazards regression isolated which covariates actually predict survival — and the answer wasn't cuisine.

Technologies

language
Python
analysis
pandas lifelines NumPy
visualization
Matplotlib seaborn
data source
Seoul Open Data

The Problem

"Don't open a Korean place in Gangnam, the market's saturated" — except is it?

Restaurant survival advice in Korea is mostly folklore: cuisine choice, neighborhood buzz, foot traffic. None of it is grounded in a long-tail dataset of actual openings and closures. Without survival analysis (which handles right-censoring properly), you can't separate "still open" from "lived a long life."

💡
How might we
treat 528K Seoul restaurants as a survival cohort — and rank what actually predicts longevity, with statistical rigor instead of intuition?

Product Vision

A district-and-cuisine survival explorer for anyone deciding where to open.

The end state is a public tool: pick a district + cuisine combination, see the Kaplan-Meier curve, the median life, and how the hazard compares to alternatives.

Core architecture

Censoring-aware survival analysis on 528K licenses

Records are right-censored at the dataset cutoff (active restaurants). The pipeline fits Kaplan-Meier per stratum (district × cuisine), runs log-rank tests for between-group differences, and uses a Cox PH model to quantify hazard ratios while controlling for confounders.

528K+
license records
25
districts stratified
6.1 yr
median lifespan

My Contribution

End-to-end: data acquisition through statistical write-up.

Built the entire pipeline solo — from pulling raw open-data CSVs to fitting models to producing the figures. The interesting work wasn't running the model; it was cleaning license records that had inconsistent district codes, then designing comparisons that actually answered the "where should I open" question.

What I worked on

  • Cleaning 528K license records with inconsistent district codes and date formats
  • Stratified Kaplan-Meier fits across district × cuisine combinations
  • Pairwise log-rank tests with Bonferroni correction across 25 districts
  • Cox proportional hazards regression controlling for cuisine, size, and opening year
  • Matplotlib visualizations of survival curves with confidence bands

Key Achievements

[ Headline outcome — the one-line "this is what it delivered." ]

[ Slightly longer narrative on the most meaningful results. ]

metric one
metric two
metric three

Lessons Learned

[ The single biggest takeaway from this project. ]

[ What worked, what didn't, what you'd do differently next time. ]

Where It's Going

From a static notebook into an interactive survival explorer.

Next: a Streamlit front-end where users pick district + cuisine and get the curve. Past that: extend the methodology to other Korean metros and to non-restaurant small-business categories — the survival framework generalizes.

Interactive
Streamlit explorer
Multi-city
extend beyond Seoul
Generalize
other SMB categories
← Prev: Quanta Next: Military Career Analysis →