A&S Graduate Studies
This calendar presented by
A&S Graduate Studies
[PAST EVENT] Shengye Wan, Computer Science M.S. Thesis Defense
April 6, 2016
12pm - 2pm
Location
Wren Building, Grammar School Classroom111 Jamestown Rd
Williamsburg, VA 23185Map this location
Abstract:
Web crawlers have been developed for several malicious purposes like downloading server data without permission from website administrator. Armored stealthy crawlers are evolving against new anti-crawler mechanisms in the arms race between the crawler developers and crawler defenders.
In this paper, we develop a new anti-crawler mechanism called PathMarker to detect and constrain crawlers that crawl content of servers stealthy and persistently. The basic idea is to add a marker to each web page URL and then encrypt the URL and marker. By using the URL path and user information contained in the marker as the novel features of machine learning, we could accurately detect stealthy crawlers at the earliest stage. Besides effectively detecting crawlers, PathMarker can also dramatically suppress the efficiency of crawlers before they are detected by misleading the crawlers visiting same page's URL with different markers. We deploy our approach on aforum website to collect normal users' data. The evaluation results show that PathMarker can quickly capture all 12 open-source and in-house crawlers, plus two external crawlers (i.e., Googlebots and Yahoo Slurp).
Bio:
Shengye Wan is a M.S. student in the Computer Science department. He is supervised by Dr. Kun Sun and his research interest is the security of systems and network. His thesis work focuses on detecting malicious crawlers and protecting valuable content for the websites. Shengye Wan received his B.E. from Huazhong University of Science and Technology in 2014.
Web crawlers have been developed for several malicious purposes like downloading server data without permission from website administrator. Armored stealthy crawlers are evolving against new anti-crawler mechanisms in the arms race between the crawler developers and crawler defenders.
In this paper, we develop a new anti-crawler mechanism called PathMarker to detect and constrain crawlers that crawl content of servers stealthy and persistently. The basic idea is to add a marker to each web page URL and then encrypt the URL and marker. By using the URL path and user information contained in the marker as the novel features of machine learning, we could accurately detect stealthy crawlers at the earliest stage. Besides effectively detecting crawlers, PathMarker can also dramatically suppress the efficiency of crawlers before they are detected by misleading the crawlers visiting same page's URL with different markers. We deploy our approach on aforum website to collect normal users' data. The evaluation results show that PathMarker can quickly capture all 12 open-source and in-house crawlers, plus two external crawlers (i.e., Googlebots and Yahoo Slurp).
Bio:
Shengye Wan is a M.S. student in the Computer Science department. He is supervised by Dr. Kun Sun and his research interest is the security of systems and network. His thesis work focuses on detecting malicious crawlers and protecting valuable content for the websites. Shengye Wan received his B.E. from Huazhong University of Science and Technology in 2014.