Automated web scraping tutorial using jsoup, JPA, EclipseLink and ADF Essentials 12.1.3

Q: How does a programmer surf the web? A: With a scraper.

There are several sites I visit periodically to track a specific set of data relevant to my interest. This rinse and repeat process quickly becomes boring and prone to error. Now I started thinking as a programmer…if only a piece of software can do this for me, I can spend the free time on drinking coffee and surfing other sites! Apparently this is pretty much what a web crawler does and more specifically “scraping” is the term used to describe targeted searches. Below is a two part video on a couple of scraping examples I’ve built and a basic tutorial on how to build your own.

Part 1: Introduction

Part 2: Implementation

To run the main class in a Linux env:
java -cp “jsoup-cronjob-pubmed.jar:lib/*” com.adfhomebrew.jsoup.cronjob.pubmed.PubMedSearchClient

Jars needed:
TopLink/EclipseLink: JDEV_INSTALL\oracle_common\modules\oracle.toplink_12.1.3\eclipselink.jar

JPA: JDEV_INSTALL\oracle_common\modules\javax.persistence_2.0.jar

mysql jdbc driver: http://dev.mysql.com/downloads/connector/j/5.0.html

jsoup: http://jsoup.org/download

The live examples and links to tutorial code can be found at http://www.adfhomebrew.com

About these ads

About wesfang

www.linkedin.com/in/wesfang/ https://twitter.com/wesleyfang
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Automated web scraping tutorial using jsoup, JPA, EclipseLink and ADF Essentials 12.1.3

  1. Web Scraping says:

    Nice tutorial…i am scraper and doing scraping using scrapy and beautifulsoup framework…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s