Hi Forum,
I'm quite new to common lisp, so please excuse my trivial question.
I am looking for a possibility to get the html-source of a website.
The idea is that a user can enter the name of a city, my app queries wikipedia and parses the coordinates of the city out of the responded html code. Together with the current date the app calculates the sun rising in this place. I admit this sounds kind of stupid, but it is important for my project (some kind of a (growth) analysis app for plants).
Well.. i hope you can give me a hint how i can get the source code of a website with a known url right out of my app.
PS: I'm working with sbcl on os x
Get source of a website
Re: Get source of a website
You could use http://www.weitz.de/drakma/. Example:
To generate URLs you can use the http://puri.b9.com/ library.
Code: Select all
(defvar *wiki-main-page* (drakma:http-request "http://en.wikipedia.org/"))
(subseq *wiki-main-page 0 1000) =>
"<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\" dir=\"ltr\">
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
<meta http-equiv=\"Content-Style-Type\" content=\"text/css\" />
<meta name=\"generator\" content=\"MediaWiki 1.15alpha\" />
<meta name=\"keywords\" content=\"Main Page,1653,1862,1884,1968,1994 Pacific hurricane season,1999,2009,420 (cannabis culture),Accipiter,Africa\" />
<link rel=\"apple-touch-icon\" href=\"http://en.wikipedia.org/apple-touch-icon.png\" />
<link rel=\"shortcut icon\" href=\"/favicon.ico\" />
<link rel=\"search\" type=\"application/opensearchdescription+xml\" href=\"/w/opensearch_desc.php\" title=\"Wikipedia (en)\" />
<link rel=\"copyright\" href=\"http://www.gnu.org/copyleft/fdl.html\" />
<link rel=\"alternate\" type=\"application/rss+xml\" title=\"Wikipedia RSS Feed\" href=\"/w/index.php?title=Special:RecentChange"
Re: Get source of a website
There is a simpler way to get the data out of wikipedia:
1) There are dumps of database available for download at http://download.wikimedia.org/. You can import them into database and query the database. As a benefit, it will be simpler to parse, because text is in wiki markup, not in html markup
2) There is a dbpedia project that extracts such structured information from wikipedia.
1) There are dumps of database available for download at http://download.wikimedia.org/. You can import them into database and query the database. As a benefit, it will be simpler to parse, because text is in wiki markup, not in html markup
2) There is a dbpedia project that extracts such structured information from wikipedia.
Re: Get source of a website
Does anyone know if there's a mirror of the puri library download?
b9.com is currently down, and all download links for this library point to it.
b9.com is currently down, and all download links for this library point to it.
Re: Get source of a website
Found one by the time the previous post was approved.
http://ftp.de.debian.org/debian/pool/ma ... rig.tar.gz
http://ftp.de.debian.org/debian/pool/ma ... rig.tar.gz