Improving Data Collection on Article Clustering by Using Distributed Focused Crawler

Data Science: Journal of Computing and Applied Informatics

View Publication Info
 
 
Field Value
 
Title Improving Data Collection on Article Clustering by Using Distributed Focused Crawler
 
Creator Gunawan, Dani
Amalia,
Najwan, Atras
 
Description Collecting or harvesting data from the Internet is often done by using web crawler. General web crawler is developed to be more focus on certain topic. The type of this web crawler called focused crawler. To improve the datacollection performance, creating focused crawler is not enough as the focused crawler makes efficient usage of network bandwidth and storage capacity. This research proposes a distributed focused crawler in order to improve the web crawler performance which also efficient in network bandwidth and storage capacity. This distributed focused crawler implements crawling scheduling, site ordering to determine URL queue, and focused crawler by using Naïve Bayes. This research also tests the web crawling performance by conducting multithreaded, then observe the CPU and memory utilization. The conclusion is the web crawling performance will be decrease when too many threads are used. As the consequences, the CPU and memory utilization will be very high, meanwhile performance of the distributed focused crawler will be low.
 
Publisher Talenta Publisher
 
Date 2017-07-18
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article
 
Format application/pdf
 
Identifier https://talenta.usu.ac.id/index.php/JoCAI/article/view/82
10.32734/jocai.v1.i1-82
 
Source Data Science: Journal of Computing and Applied Informatics; Vol 1 No 1 (2017): Data Science: Journal of Computing and Applied Informatics (JoCAI); 1-12
2580-829X
2580-6769
 
Language eng
 
Relation https://talenta.usu.ac.id/index.php/JoCAI/article/view/82/45
 
Rights Copyright (c) 2017 Journal of Computing and Applied Informatics
 

Contact Us

The PKP Index is an initiative of the Public Knowledge Project.

For PKP Publishing Services please use the PKP|PS contact form.

For support with PKP software we encourage users to consult our wiki for documentation and search our support forums.

For any other correspondence feel free to contact us using the PKP contact form.

Find Us

Twitter

Copyright © 2015-2018 Simon Fraser University Library