Archive for the ‘Uncategorized’ Category

Web parsing

Sunday, July 27th, 2008

Five web sources need to be parsed and data entries (say search results) need to be extracted. What is the best approach?

One could use regular expressions to work on the data. However, I am more familiar with XPath selectors (similar to CSS selectors) due to my experience with jQuery hence I’ll be talking about an approach without using regular expressions.

Some information on how the two (XPath selectors/CSS selectors) are interrelated is mentioned in this post by John Resig (creator of jQuery.)

Here are the steps required to extract the data from the web sources:

  • Start by accessing the website itself, so you’ll connect to the page via some HTTP library present in the language (all good languages have them anyway.)
  • Once you’ve got the raw HTML as a string you need to ‘massage’ it into XML. Depending on the language there are different approaches, I have found that BeautifulSoup is good for Python and that JTidy might be good for Java.
  • The above libraries will transform your HTML string into a well-formed XML tree structure. Upon analysis of this webpage you will manually identify where result entries repeat and exist. For example, you may find that your XML tree has a snippet like the following:
<tr>
  <td colspan="3">
    <a href="..." class="medium-text" target="_self">
      Experiments on Design Pattern Discovery
    </a>
    <div class="authors">
      Jing Dong, Yajing Zhao
    </div>
  </td>
</tr>
  • In the above example we would create an XPath selector as follows:
//tr/td[contains(@colspan,'3')]
  • Which would return a list of the contents of the elements that matched the selector:
<a href="..." class="medium-text" target="_self">
  Experiments on Design Pattern Discovery
</a>
<div class="authors">
  Jing Dong, Yajing Zhao
</div>
  • Once you have that list you can start pulling the little details out of the result entry. To do this you may write custom string parsing functions, perhaps you will use some to pull the authors out of the result entry and separate them from the title of the result entry.
  • Alternatively, another approach would be to apply Natural Language Processing to the entries. NLP attempts to pick up the different kinds of words and text existing within a larger set of text. However, NLP is beyond the scope of this discussion. For Python I believe the NLTK is appropriate.

iPhone

Thursday, July 24th, 2008

I’ve signed up to a 24-month contract with Optus for an iPhone on a 61 dollar plan. I rationalised it accordingly:

  • Previously I’ve been on prepaid with Optus and recharging about once a month, sometimes more, at a rate of $30 a recharge.
  • Do be neat, let’s say I was recharging at a rate of 36 dollars a month. Then my new additional fees are 25 dollars a month over 24 months.
  • This totals $600 of what appears to be unnecessary fees. However, one should bear in mind that my existing phone has been on its last dying breaths lately (it’s unable to send SMS’s now.) so I needed a new phone.
  • Considering I needed a new phone, we can assume I’d have bought one of approx. $200. This leaves $400 of unnecessary fees in my decision.
  • However! My old phone was also my music player,  so I would need a new iPod as well. Assuming I were to buy a basic music playing iPod (i.e not an iPod Touch) then the closest would be the $250 8GB iPod Nano. Let’s extrapolate and pretend there was a 16GB Nano at $300 (being conservative here.)
  • That leaves $100 of unnecessary fees in my decision, I can justify this by the fact that I’ve spent $100 on: maintaining the convenience of having my phone and iPod as a single unit, gaining the ability to check emails and RSS feeds on the train, owning something really swish.

So there you have it, my rationalised iPhone buying frenzy. This whole experience will also be a lesson for me I believe, as I’ve never signed up to an actual contract before and I look forward to the benefits and problems I expect to endure.

Wii Fencing

Tuesday, July 22nd, 2008

A blog I read called GoNintendo reported on an issue with Wii Sports: Resort’s fencing game.

Effectively, the 1:1 sword fighting fails due to players not actually caring about the gameplay and just shaking the wiimote around instead. Similar to what I’ve experienced when playing Street Fighter with a complete novice, the player jumps around unpredictably and I sometimes have more difficulty with them than an average player.

How do you solve this issue? I imagine the main difference between real life fencing and this new 1:1 Wii fencing is the fact that real fencing tires people out very quickly if they try to flail their foil around constantly.

Perhaps Nintendo need to add some kind of stamina component to the game? One that regenerates but decreases very quickly if the player makes ridiculous swinging motions.

Or does that go against the principle behind the Wii Sports games? Should any loss of stamina only be derived from our natural, human stamina? After all, I’ve heard of people becoming exhausted by Wii boxing.

Mirror’s Edge

Sunday, July 20th, 2008

So EA are responsible for some awesome looking new game?

A first person game that’s not primarily a shooter? (amazing!) With gymnastic abilities to boot.

The gameplay trailer reminds me of an old and cooky anime known as Aeon Flux.