Region-based Annotated Child Pornography Dataset (RCPD)

This dataset is a private database that belongs to the Brazilian Federal Police. The paper “A Benchmark Methodology for Child Pornography Detection” describes the structure of the dataset. The aim of the dataset is to assess and compare the performance of child pornography detection methods. Its files are not available to the public by any means, but researchers can submit their child pornography detection methods to be evaluated against the RCPD dataset following the instructions below.

Evaluation Instructions

To submit your method, you should send an email to macedo.jjmn@dpf.gov.br with a zip file containing your method and a script for a docker container, defining libraries and dependencies. Alternatively, you can email a link for these files. You should also implement the code to test your method, with the following requirements:

  • assume that the images are located in a folder named “data”, under your method’s directory
  • output a csv file named results.txt, containing the name of each tested image and your prediction (‘True’ or ‘False’) for child pornography:

filename001.jpg,True
filename002.jpg,False
filename003.jpg,False
    

The following script shows an example of this code in Python:


import os
import random
from os.path import abspath, dirname, isfile, join

# predicts whether an image is related to child pornography
# returns True or False
def predict(file_path):
    #call your method here
    return random.choice([True, False])

if __name__== "__main__":
    script_dir = dirname(abspath(__file__))
    dset = join(script_dir, "data")
    results = join(script_dir, "results.txt")
    output = open(results, "w")
    fnames = [f for f in os.listdir(dset) if isfile(join(dset, f))]
    for f in fnames:
        res = predict(join(dset, f))
        output.write(f + "," + str(res) + "\n")
    output.close()
    

Scoreboard

Results of methods and tools evaluated against the RCPD dataset:

Method/Tool Accuracy (%) Precision (%) Recall (%) F1-Score (%)
Macedo et al. 79.84 68.64 64.61 66.56
LED 76.47 75.34 57.21 66.30
NuDetective 57.43 78.74 41.24 54.13

Please, cite this dataset as:

pdf-icon
Macedo, Joao, Filipe Costa, and Jefersson A. dos Santos. “A Benchmark Methodology for Child Pornography Detection.” 2018 31st IEEE SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 2018.

 


@inproceedings{
  macedo2018rcpd,
  title={A Benchmark Methodology for Child Pornography Detection},
  author={J. Macedo, F. Costa and J. A. dos Santos},
  booktitle={Graphics, Patterns and Images (SIBGRAPI), 2018 31th SIBGRAPI Conference on},
  year={2018},
  organization={IEEE}
}
    
Categories: DatasetsDownloads

Related Posts

Datasets

Fashion Dataset

Fashion Dataset This dataset is a composition of fashion images and associated tags and comments crawled from two fashion-related social networks, namely pose.com and chictopia.com. The first part of this dataset (related to the pose.com Read more…

Downloads

Deep Semantic Segmentation of Mammographic Images

MIAS and INbreast are mammographic datasets for the detection and diagnosis of breast cancer. With the dawn of digital mammograms, one important preprocessing step for the tasks of detection and diagnosis is the removal of the Read more…

Datasets

Brazilian Cerrado-Savanna Scenes Dataset

The dataset is composed of 1,311 multi-spectral scenes extracted from images acquired by the RapidEye satellite sensors over the Serra do Cipó region, a mountainous and highly biodiverse and heterogenous landscape in southern-central Brazil mainly constituted of Cerrado-Savanna Vegetation. From the 5 bands (blue, green, red, red edge and near infrared) that the images acquired by the RapidEye satellite sensors have, we have selected three (near-infrared, green, and red bands), which are the most useful and representative ones for discriminating vegetation areas. It is a very challenging dataset given its high intraclass variance, caused by different spatial configurations and densities of the same vegetation type, as well as its high interclass similarity, given similar appearance of different types of vegetation species.

Top