edit-task
Home
Up
Delete
Task Name:
Task Description:
] how to read some data off another web page and ...
TaskGroup ID:
Start Date:
Start Time:
Duration:
Priority:
Status:
To Do
Completed
In Process
Add Photo:
Owner ID:
Content:
use HTML
Edit Content
<h1 style="text-align: center;">prj-scraper</h1> <h2>[previously]</h2> <ol> <li>[2013-mm-dd] C# read html into dom</li> <li><strong>[2013-10-06] > x] prj-screen scraping</strong></li> <ol> <li>] write tutorial with different methods of screen scraping</li> <li>] example - read values(for qualifying results) from mrn page into string</li> <li>x] research </li> </ol> <li><strong><strong>[2013-10-07]</strong>> i] prj-scraper - 001-research</strong></li> <ol> <li style="text-align: left;">] api for racing data VS screen scraping</li> </ol></ol> <h2>[currently]</h2> <ol> <li>] </li> </ol> <h2>[next]</h2> <ol> <li>] </li> <li><strong>] CD tech-dev-www READ ?? xslt ?? &&</strong> </li> <ol> <li>] built into browser, ] use to transform xml into html, </li> <li>] use .this VS using "jq" or "svr side - framework - methods" to do transformation, b/c it is native to browser, </li> <ol> <li>] implementation of std across browser platfroms</li> </ol> <li>] ? use to deserialize objects or ???</li> </ol> <li>] src = CD bk language javascript</li> </ol> <div> <div>] html screen scraping</div> <div>ex = mrn = "id=ctl00_cphMain_phExtra_ctl00_seasonStatsGridView"</div> <div>- use WebClient(), dload string </div> <div>- use regex to parse string </div> <div>OR</div> <div>- use library like HTMLAgilityPack (reads string into dom elements)</div> <div>- has methods to query, </div> <div>- uses XSLT, to </div> <div>OR </div> <div>- directly use XML document class, LINQ to XML</div> <div>- browsers have XSLT transform </div> <div>REVIEW</div> <div>- example prev "weather pages" </div> </div> <h2>[reference]</h2> <div><ol> <li><a href="http://snipd.net/parsing-xhtml-into-a-dom-tree-in-c" target="_blank">Parsing (X)HTML into a DOM tree in C#</a> </li> <li><a href="http://mark-dot-net.blogspot.ca/2012/09/screen-scraping-in-c-using-linqpad-and.html" target="_blank">Sound Code: Screen-Scraping in C# using LINQPad and HTML Agility Pack</a> - </li> <li><a href="http://viziblr.com/news/2010/10/9/scraping-the-nhl-2010-2011-schedule-with-c-linq-and-the-html.html" target="_blank">viziblr - News - Scraping the NHL 2010-2011 Schedule with C#, LINQ, and the HTML Agility Pack</a></li> <li><a href="http://madskristensen.net/post/Screen-scraping-in-C" target="_blank">Screen scraping in C# | .NET Slave(mads kristensen)</a> </li> <li><a href="http://hoonzis.blogspot.ca/2013/05/screen-scraping-in-c-using-webclient.html" target="_blank">The Wall: Screen scraping in C# using WebClient</a></li> <li><a href="http://htmlagilitypack.codeplex.com/releases/view/90925#ReviewsAnchor" target="_blank">Html Agility Pack - Download: HAP 1.4.6</a> - a c# library for parsing html files</li> <li>2015-05-08] <a href="https://blog.hartleybrody.com/web-scraping/" target="_blank">https://blog.hartleybrody.com/web-scraping/</a> comments: <a href="https://news.ycombinator.com/item?id=4893864" target="_blank">https://news.ycombinator.com/item?id=4893864</a> </li> <li>] <a href="http://www.reddit.com/r/startups/comments/35y2jm/how_legal_is_scraping_to_jumpstart_a_twosided/" target="_blank">legalities of scraping sites</a></li> </ol></div>