Get source of a website

Discussion of Common Lisp
Post Reply
Zaph

Get source of a website

Post by Zaph » Sat Apr 18, 2009 3:28 am

Hi Forum,
I'm quite new to common lisp, so please excuse my trivial question.
I am looking for a possibility to get the html-source of a website.
The idea is that a user can enter the name of a city, my app queries wikipedia and parses the coordinates of the city out of the responded html code. Together with the current date the app calculates the sun rising in this place. I admit this sounds kind of stupid, but it is important for my project (some kind of a (growth) analysis app for plants).
Well.. i hope you can give me a hint how i can get the source code of a website with a known url right out of my app.

PS: I'm working with sbcl on os x

dmitry_vk
Posts: 96
Joined: Sat Jun 28, 2008 8:01 am
Location: Russia, Kazan
Contact:

Re: Get source of a website

Post by dmitry_vk » Mon Apr 20, 2009 1:10 pm

You could use http://www.weitz.de/drakma/. Example:

Code: Select all

(defvar *wiki-main-page* (drakma:http-request "http://en.wikipedia.org/"))
(subseq *wiki-main-page 0 1000) =>
"<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\" dir=\"ltr\">
	<head>
		<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
		<meta http-equiv=\"Content-Style-Type\" content=\"text/css\" />
		<meta name=\"generator\" content=\"MediaWiki 1.15alpha\" />
		<meta name=\"keywords\" content=\"Main Page,1653,1862,1884,1968,1994 Pacific hurricane season,1999,2009,420 (cannabis culture),Accipiter,Africa\" />
		<link rel=\"apple-touch-icon\" href=\"http://en.wikipedia.org/apple-touch-icon.png\" />
		<link rel=\"shortcut icon\" href=\"/favicon.ico\" />
		<link rel=\"search\" type=\"application/opensearchdescription+xml\" href=\"/w/opensearch_desc.php\" title=\"Wikipedia (en)\" />
		<link rel=\"copyright\" href=\"http://www.gnu.org/copyleft/fdl.html\" />
		<link rel=\"alternate\" type=\"application/rss+xml\" title=\"Wikipedia RSS Feed\" href=\"/w/index.php?title=Special:RecentChange"
To generate URLs you can use the http://puri.b9.com/ library.

dmitry_vk
Posts: 96
Joined: Sat Jun 28, 2008 8:01 am
Location: Russia, Kazan
Contact:

Re: Get source of a website

Post by dmitry_vk » Mon Apr 20, 2009 1:18 pm

There is a simpler way to get the data out of wikipedia:
1) There are dumps of database available for download at http://download.wikimedia.org/. You can import them into database and query the database. As a benefit, it will be simpler to parse, because text is in wiki markup, not in html markup
2) There is a dbpedia project that extracts such structured information from wikipedia.

Inaimathi
Posts: 4
Joined: Thu Sep 30, 2010 12:14 pm

Re: Get source of a website

Post by Inaimathi » Thu Sep 30, 2010 12:19 pm

Does anyone know if there's a mirror of the puri library download?

b9.com is currently down, and all download links for this library point to it.

Inaimathi
Posts: 4
Joined: Thu Sep 30, 2010 12:14 pm

Re: Get source of a website

Post by Inaimathi » Fri Oct 01, 2010 6:58 am

Found one by the time the previous post was approved.

http://ftp.de.debian.org/debian/pool/ma ... rig.tar.gz

Post Reply