Renal Cell Carcinoma Region Detection and Subtyping Dataset

Complete Region Annotation

Areas that lie inside Red Lines are cancerous regions and vice versa

Minimal Point Based Annotation

The Red Point represents the marked position in the cancerous region

The Green Point stands for the non-cancerous region

This dataset is derived from the TCGA database and contains three TCGA projects (i.e., KIRC, KIRP, KICH) and totally has 667 WSIs which are scanned at 40x magnification and selected by experienced pathologists.

  • Our RCC region detection dataset includes 3 subtypes: ccRCC, pRCC, and chRCC which includes 123, 88, and 46 WSIs respectively.

  • In each WSI, we have two annotation methods: Minimal Point-Based (Min-Point) annotation and Complete Region annotation.


  • The dataset provided here is for research purposes only. Commercial uses are not allowed.

  • If you intend to publish a research work that uses any of these datasets, you must cite our publication.

Minimal Point-Based Annotation Rules

  1. Equally mark points on both cancerous and non-cancerous regions. We set this number to 5 for RCC datasets.

  2. Evenly distribute the points within the whole image.

  3. Do not mark points on the blank, edge, badly stained, damaged (man-made), and other abnormal areas.


Statistics

Annotation Time

Min-Point annotation can reduce the annotation time to roughly one-twentieth when compared to the complete annotation. Noisy annotation is a annotation method between Min-Point and Complete Region.

Details

A large RCC classification dataset with two types of annotation, the test set is composed by two parts, i.e., patch-level test set for cancer region detection, WSI-level test set for subtyping

Papers

Renal Cell Carcinoma Detection and Subtyping with Minimal Point-Based Annotation in Whole-Slide Images

Zeyu Gao, Pargorn Puttapirat, Jiangbo Shi, and Chen Li

Applications

Cancer Region Detection Results (Heat Maps)

Subtyping Results - Blue(ccRCC), Red(chRCC), Green(pRCC)

Data Format

This dataset is composed of two parts and each sample is stored in an individual folder (named as the TCGA id of each svs file):

  1. l_label: The complete region annotation is saved as a png file, the region marked by green is the cancer region, and marked by black is the background or normal tissue region.

  2. p_label: The Min-Point annotation is saved as two folders, i.e., cancer and normal, each contains five png files indicate the point marked regions.

Note that, Only annotation data is available here, the original WSIs (.svs file) need to be downloaded from the TCGA portal. The labeled and unlabeled image patches need to be cropped from the original WSIs based on the annotation data.