PornProbe: LDA-SVM for Porn Detection

PornProbe is a pornography detection system that utilizes a large-scale training dataset of 420,615 samples, including 65,827 positive samples, to effectively identify pornographic content in videos and images. The system employs a hierarchical LDA-SVM model that combines unsupervised clustering and supervised learning to achieve high precision and recall while ensuring efficiency in training and testing. This approach allows for the handling of vast amounts of data, making it suitable for online detection of pornographic materials.

Uploaded by

avaiyautsav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views2 pages

PornProbe: LDA-SVM for Porn Detection

Uploaded by

avaiyautsav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PornProbe: an LDA-SVM based Pornography Detection

System
Sheng Tang 1,2, Jintao Li1,Yongdong Zhang1,Cheng Xie1,Ming Li1, Yizhi Liu1,Xiufeng Hua1
1
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100190
{ts, jtli, zhyd, xiecheng, mli, liuyizhi, huaxiufeng}@[Link]

Yan-Tao Zheng2, Jinhui Tang2, Tat-Seng Chua2

2
School of Computing, National University of Singapore, Singapore, 117417
{yantaozheng, tangjh, chuats}@[Link]

ABSTRACT videos is relatively small. Second, the system must be highly

efficient, in order to scale up to handle huge amount of videos and
We present PornProbe, a pornography detection system that images on the Internet. In this research, we tackle the above goals
detects pornographic contents in videos. To build such a detection from the aspects of database and machine learning. Since the
system, we leverage a large scale training data set with 65,827 pornographic images vary considerably, it is necessary to
positive training image samples out of a total of 420,615 training establish a large-scale image database to include all possible
samples, and a novel detection scheme based on hierarchical variations (such as significant variations in races, lighting
LDA-SVM. The system combines the unsupervised clustering in conditions, textures, viewing angles, and complicated body
Latent Dirichlet Allocation (LDA) and supervised learning in structures, etc.) of such images as much as possible. Here, we set
Support Vector Machine, so as to achieve both high precision and up a large-scale database with more than 105 images. To the best
recall while ensuring efficiency in both training and testing. This of our knowledge, this is the largest pornographic image dataset,
demonstration shows how the system detects the pornographic as most existing systems usually use less than 103 pornographic
scenes in restricted artistic (RA) movies. images for training. However, training support vector machines
on such a large data sets is very time-consuming and it is often a
Categories and Subject Descriptors bottleneck. Therefore, it is important to develop fast algorithms
H.3.3 [Information Storage and Retrieval]: Information Search for training SVM to learn the pornography detection rules both
and Retrieval; efficiently and effectively from large-scale training databases.
H.2.4 [Database Management]: Systems—multimedia databases
2. Large-scale training data set
General Terms: Algorithms, Experimentation, Performance
Table 1. Statistics of our training data set
Keywords: Latent Dirichlet Allocation, SVM, Pornography Images Video Keyframes Total
Detection Positive 21,699 44,128 65,827
Negative 51,680 303,108 354,788
1. Introduction Table 1 presents the statistics of our large-scale training image
With the advancement of Internet, the proliferation of images database (including key frames extracted from videos) for
and videos has inevitably increased the chance of individuals pornography detection. Altogether, we collected 420,615 training
encountering adult-oriented contents such as pornographic images image samples from a wide variety of sources. For videos, we
and videos. The pornographic materials contain subjects that collected 1,108 pornographic videos from offline VCD sources;
induce sexual excitements. As the pornographic materials raise 20,000 short pornographic video clips from online media streams
many social, moral and religious issues, automatic detection of by the skin-based detection method [1] from Dec 2007 to Dec
pornographic contents in images and videos is highly demanded 2008; and about 65,000 non-pornographic videos from YouTube,
to facilitate possible regulation and censorship. Tudou, YouKu and other websites. For images, we utilized the
non-pornographic images from Corel database while downloaded
In this demonstration, we present PornProbe, a pornography pornographic ones from Pinkworld.
detection system that can detect pornographic video shots and
images. The design of PornProbe aims to accomplish two goals. 3. LDA-SVM Model
First, we want to achieve high precision and recall in detection.
This is because as compared to the total amount of images and Our proposed solution is motivated by the insight from
videos on the Internet, the number of pornographic images and psychophysical studies that humans can perform coarse
categorization of visual objects quite easily and quickly, followed
This work was supported by National Nature Science Foundation of by successively finer but slower discrimination [2]. Specifically,
China (60873165), National Basic Research Program of China (973 as shown in Figure 1, we propose a hierarchical LDA-SVM
Program, 2007CB311105) and Co-building Program of Beijing model which can scale up to large data sets through a combination
Municipal Education. of unsupervised clustering and supervised learning. First, we use
the generative Latent Dirichlet Allocation (LDA) [3] to model the
Copyright is held by the author/owner(s). relationship between images to mine the hidden structure of
MM’09, October 19–24, 2009, Beijing, China. images. We then perform coarse categorization by clustering
ACM 978-1-60558-608-3/09/10. large-scale training data set into small topic sets according to the

1003
Feature Extraction

Unsupervised
LDA clustering LDA-SVM
Training
Topic1 Topic2 Topick
Data
……
LDA
Model
SVM1 SVM2 …… SVMK
Training
Testing SVM Predict & Porn
Adaptive Bayesian Fusion degree
Test data
LDA Inference

Feature Extraction

Fig.1 The framework of PornProbe Fig.2 ROC Curves Fig.3 The interface of PornProbe
principle of maximum membership probability determined by the and 5415 non-pornographic samples. The Figure clearly indicates
topic-simplex representation vector inferred by the LDA model. that the LDA-SVM is much more effective than other two
Second, we perform successive finer discrimination by training methods. To test the effectiveness of our adaptive Bayesian fusion
each clustered topic set to generate multiple smaller SVM models. method, we compare it against the average fusion method, where
These topic-based SVM models are generally more effective its ROC curve is also shown in Fig.2 (the cyan one). The training
since their optimal separating hyperplanes may be much simpler time, testing time and the numbers of samples and SVs of the
to discriminate the data and have better generalization single SVM method and LDA-SVM are shown in Table 2, which
performance in the small homogenous topic sets rather than in the demonstrates the high training and testing efficiency of the
large overall data set. Finally, we propose an adaptive Bayesian proposed method.
approach to fuse membership probability with the probability Table 2. Training time and testing time of the SVM methods
predicted by the corresponding SVM models over all the topic Training Number Training Testing time
clusters. For a given test sample, we adaptively select only the samples of SVs time for 320×240
most probable clusters’ SVM models for prediction. Furthermore, SVM 120, 000 24,112 72 hours 667 ms
as the number of support vectors (SVs) is greatly reduced by
1842 per
training on the smaller topic set, testing efficiency can be LDA-SVM 420,615
topic
6 hours 49 ms
considerably improved. This makes it practical for online
detection in spite of large training data set. 4. Demonstration
The separate training of multiple SVM models on each topic set In this demonstration, we focus on the effectiveness and
gives rise to the parallelism of SVMs. The distributed training can efficiency of PornProbe, in pornography detection. Figure 3
drastically lower the computational complexity from O(n2) to shows the interface of PornProbe. The interface composes of 3
O(n2/k2). The overall computational complexity for training all the panes: a folder browsing pane (left-top) used for selecting the
k SVM models is O(n2/k), where n is the total number of training video to be detected; a play-back pane (left-bottom) used for
samples and k is the topic number. The optimal topic number k is playing back the detected shots or the whole video; and a result
not necessary since the first clustering is coarse categorization. browsing pane (right) used for displaying the detected
We only need to partition the training data set broadly. Therefore, pornographic key frames. After the input video is specified, the
k can be roughly determined by the ratio of the total number of pornography detection process will be started to examine the key
training samples n to the desired average size of topic set. frames of input video. Once the pornographic key frame is
detected, it will be blurred and displayed in the result pane. The
In our hierarchical framework for combining unsupervised pornographic content in a key frame is measured by its porn
clustering and supervised learning, we use the global color degree, which is shown at the bottom of each key frame. Our
histogram for coarse categorization at the top layer due to its demonstration shows that our system achieves both high precision
relatively lower dimension and faster extraction [4]; and the prior and recall while ensuring efficiency in testing.
fusion (concatenation) of the color moment and edge histogram
for finer discrimination at the bottom layer to further remove false 5. References
detection caused by many existing skin-based methods. Although [1] H. Zheng; “Maximum entropy modeling for skin detection: with
there is no special emphasis on detecting skin, skin detection is an application to Internet filtering”; Ph.D. Thesis, Univeristé des
actually included in the latent semantic analysis of the color Sciences et Technologies de Lille, France, 2004.
histogram and training of SVM models with color moment. Since [2] S. Thorpe, D. Fize, and C. Marlot; “Speed of processing in the
we train multiple models on the large-scale training data set that human visual system”; Nature, 381:520-522, June 1996.
includes nearly all possible variations, it is more stable and robust
[3] Blei,D.M., Ng, A.Y., and Jordan, M.J; “Latent Dirichlet
as compared to skin-based methods and single SVM methods.
allocation”. Journal of Machine Learning Research, 3, 2003,
For comparison, we evaluate the following 3 systems: (a) skin- 993-1022.
based method [1]; (b) single SVM (where we randomly selected [4] S. Tang, J.-T. Li, M. Li, C. Xie, Y.-Z. Liu, K. Tao, S.-X. Xu;
120,000 training samples from our training data set instead of the “TRECVID 2008 High-Level Feature Extraction By MCG-ICT-
whole set due to the impractical amount of training computation); CAS”; Proc. TRECVID Workshop, Gaithesburg, USA, Nov
and (c) our proposed LDA-SVM (k=40) method. Figure 2 shows 2008.
the ROC Curves of testing the 3 methods on 1695 pornographic

1004

Detecting Child Pornography with CNNs
No ratings yet
Detecting Child Pornography with CNNs
7 pages
Adult Video Detection with Machine Learning
No ratings yet
Adult Video Detection with Machine Learning
9 pages
YOLO-CNN for Adult Content Detection
No ratings yet
YOLO-CNN for Adult Content Detection
26 pages
Real-Time Porn Video Blocking with GPUs
No ratings yet
Real-Time Porn Video Blocking with GPUs
9 pages
Automatic Detection of Adult Websites
No ratings yet
Automatic Detection of Adult Websites
21 pages
Video Nudity Detection Using BoVF
No ratings yet
Video Nudity Detection Using BoVF
8 pages
A Benchmark Methodology For Child Pornography Detection.
No ratings yet
A Benchmark Methodology For Child Pornography Detection.
8 pages
Nudity Detection Algorithm Overview
No ratings yet
Nudity Detection Algorithm Overview
7 pages
Adult Content Filtering Methodology
No ratings yet
Adult Content Filtering Methodology
4 pages
Pornographic Image Recognition Method
No ratings yet
Pornographic Image Recognition Method
7 pages
Distributed Machine Learning Innovations
No ratings yet
Distributed Machine Learning Innovations
10 pages
Self Supervised
No ratings yet
Self Supervised
23 pages
Survey on Self-Supervised Learning Trends
No ratings yet
Survey on Self-Supervised Learning Trends
20 pages
Machine Learning for Malware Detection
No ratings yet
Machine Learning for Malware Detection
50 pages
PromptSAM+: Visual Malware Detection
No ratings yet
PromptSAM+: Visual Malware Detection
13 pages
Survey on Self-Supervised Learning
No ratings yet
Survey on Self-Supervised Learning
20 pages
Optimal Data Distributions in ML
No ratings yet
Optimal Data Distributions in ML
125 pages
Statistical Learning Theory and Support Vector Machines
No ratings yet
Statistical Learning Theory and Support Vector Machines
5 pages
ML Report Finall
No ratings yet
ML Report Finall
14 pages
Machine Learning for Fake Job Detection
No ratings yet
Machine Learning for Fake Job Detection
6 pages
17029-Article Text-20523-1-2-20210518
No ratings yet
17029-Article Text-20523-1-2-20210518
8 pages
Heart Disease Prediction with ML Techniques
No ratings yet
Heart Disease Prediction with ML Techniques
46 pages
ML in Medicine
No ratings yet
ML in Medicine
5 pages
COMP 5310: Principles of Data Science: Heart Disease UCI
No ratings yet
COMP 5310: Principles of Data Science: Heart Disease UCI
9 pages
Machine Learning for PDF Malware Detection
No ratings yet
Machine Learning for PDF Malware Detection
9 pages
LSTM Model for Fake News Detection
No ratings yet
LSTM Model for Fake News Detection
9 pages
Machine Learning Prediction in Cardiovascular Diseases: A Meta Analysis
No ratings yet
Machine Learning Prediction in Cardiovascular Diseases: A Meta Analysis
11 pages
Machine Learning for Heart Disease Prediction
No ratings yet
Machine Learning for Heart Disease Prediction
8 pages
CNNs for Phishing Detection in URLs
No ratings yet
CNNs for Phishing Detection in URLs
6 pages
Unlabelled Data-driven Tumor Classification
No ratings yet
Unlabelled Data-driven Tumor Classification
14 pages
Aslikha Amalia - Referensi Visual Recognition
No ratings yet
Aslikha Amalia - Referensi Visual Recognition
6 pages
Matlab Vs OpenCV A Comparative Study of
No ratings yet
Matlab Vs OpenCV A Comparative Study of
6 pages
Bartlett 2021 Invasive-Measurements
No ratings yet
Bartlett 2021 Invasive-Measurements
8 pages
CNN vs VGG-16: Detecting Pornographic Content
No ratings yet
CNN vs VGG-16: Detecting Pornographic Content
16 pages
Fake News Detection with Spark SVM
No ratings yet
Fake News Detection with Spark SVM
12 pages
Sanjana Research Paper
No ratings yet
Sanjana Research Paper
12 pages
Machine Learning for Fake News Detection
No ratings yet
Machine Learning for Fake News Detection
11 pages
Large-Scale Learning For Media Understanding: Editorial Open Access
No ratings yet
Large-Scale Learning For Media Understanding: Editorial Open Access
3 pages
Automatic Fake Profile Detection System
100% (2)
Automatic Fake Profile Detection System
66 pages
1 s2.0 S1877050923001072 Main
No ratings yet
1 s2.0 S1877050923001072 Main
7 pages
Machine Learning for Malware Detection
No ratings yet
Machine Learning for Malware Detection
7 pages
Deep Learning for Real-Time Object Detection
No ratings yet
Deep Learning for Real-Time Object Detection
13 pages
Fake Profile
No ratings yet
Fake Profile
15 pages
1794 3918 2 PB
No ratings yet
1794 3918 2 PB
8 pages
Summer Internship
No ratings yet
Summer Internship
34 pages
Automatic Selection in Machine Learning
No ratings yet
Automatic Selection in Machine Learning
16 pages
PCOS Detection Using Machine Learning
No ratings yet
PCOS Detection Using Machine Learning
11 pages
Detecting Phishing with LSTM Algorithm
No ratings yet
Detecting Phishing with LSTM Algorithm
10 pages
Articulo
No ratings yet
Articulo
3 pages
SVM vs Random Forest for Negative Content Detection
No ratings yet
SVM vs Random Forest for Negative Content Detection
9 pages
Model Selection
No ratings yet
Model Selection
8 pages
Machine Learning Algorithm Validation With A Limited Sample Size
No ratings yet
Machine Learning Algorithm Validation With A Limited Sample Size
20 pages
Early Heart Disease Diagnosis with ML
No ratings yet
Early Heart Disease Diagnosis with ML
8 pages
Linear Techniques for Gender Recognition
No ratings yet
Linear Techniques for Gender Recognition
7 pages
Similarity 0413104128
No ratings yet
Similarity 0413104128
9 pages
Programmable IIR Filters for SDR on FPGA
No ratings yet
Programmable IIR Filters for SDR on FPGA
4 pages
Ethical Hacking: A Beginner's Guide
100% (5)
Ethical Hacking: A Beginner's Guide
130 pages
Summer 2024 Esri User Conference Insights
No ratings yet
Summer 2024 Esri User Conference Insights
72 pages
PIC16F87XA I2C Mode Overview
No ratings yet
PIC16F87XA I2C Mode Overview
15 pages
AI Tools and Resources Overview
No ratings yet
AI Tools and Resources Overview
15 pages
Vision-Feature Extraction Topics: Pattern Recognition For Vision Fall 2004
No ratings yet
Vision-Feature Extraction Topics: Pattern Recognition For Vision Fall 2004
25 pages
Requirements Engineering Process Overview
No ratings yet
Requirements Engineering Process Overview
57 pages
Deep Live Cam Setup Instructions
No ratings yet
Deep Live Cam Setup Instructions
3 pages
Power BI Architecture Design Document
No ratings yet
Power BI Architecture Design Document
8 pages
Enterprise COBOL for z/OS 6.4 Reference Guide
No ratings yet
Enterprise COBOL for z/OS 6.4 Reference Guide
906 pages
Active Directory Configuration Guide
100% (1)
Active Directory Configuration Guide
6 pages
Toyota Canada Enhances Yard Management
No ratings yet
Toyota Canada Enhances Yard Management
2 pages
Kaspersky Endpoint Security Product Overview
No ratings yet
Kaspersky Endpoint Security Product Overview
1 page
Mc4203 Cloud Computing
No ratings yet
Mc4203 Cloud Computing
190 pages
iSCSI Express Configuration For ESXi Using VSC
No ratings yet
iSCSI Express Configuration For ESXi Using VSC
21 pages
Audit Trail Compliance Guide for Companies
No ratings yet
Audit Trail Compliance Guide for Companies
49 pages
Computer Science Class XII Syllabus
No ratings yet
Computer Science Class XII Syllabus
2 pages
C Program for Voter Eligibility Check
No ratings yet
C Program for Voter Eligibility Check
21 pages
Computer Basics: Hardware, Software, and Operations
No ratings yet
Computer Basics: Hardware, Software, and Operations
8 pages
CS1101 Assignment 06 Programming Tasks
No ratings yet
CS1101 Assignment 06 Programming Tasks
2 pages
Think Academy App Login Guide
No ratings yet
Think Academy App Login Guide
17 pages
Understanding Biometric Authentication
No ratings yet
Understanding Biometric Authentication
14 pages
Understanding SEO Silo Structures
No ratings yet
Understanding SEO Silo Structures
14 pages
1Z0 1085 - Final Version PDF - PDF - Cloud Computing - Computer Architecture
No ratings yet
1Z0 1085 - Final Version PDF - PDF - Cloud Computing - Computer Architecture
23 pages
Ultimate AI Resource Guide: 1,000+ Links
No ratings yet
Ultimate AI Resource Guide: 1,000+ Links
4 pages
Data Warehouse Fundamentals in Business
No ratings yet
Data Warehouse Fundamentals in Business
38 pages
Sudheer Khan: Telecom Technician Profile
No ratings yet
Sudheer Khan: Telecom Technician Profile
2 pages
Advanced Penetration Testing Course
No ratings yet
Advanced Penetration Testing Course
14 pages
C Programming Tokens and Data Types Guide
No ratings yet
C Programming Tokens and Data Types Guide
37 pages
Best EduTech Company Physical Programs
No ratings yet
Best EduTech Company Physical Programs
17 pages

PornProbe: LDA-SVM for Porn Detection

Uploaded by

PornProbe: LDA-SVM for Porn Detection

Uploaded by

PornProbe: an LDA-SVM based Pornography Detection

Yan-Tao Zheng2, Jinhui Tang2, Tat-Seng Chua2

ABSTRACT videos is relatively small. Second, the system must be highly

You might also like