Tuesday, July 22, 2008
Future Movie Releases
With use of the internet movie database I was able to dig into the future releases of Hollywood. With some hacking skills (Google) I was able to grab all upcoming releases in the ‘script only’ stage, to post-production status. With possibly a little over 2000 movies coming out between October 2008 and 2013, we have a lot coming our way. But when you see the future releases list you can’t help but feel like puking. The sheer lack of creativity out in the Hollywood hills is mind boggling. The general idea of a ‘new’ movie is something like this: Just take a popular (comic) book character, hire a hot female cast member, add something with terrorist threatening the world (which in most movies means New York) with a some sort of global warming bomb and get the script ready for a possible sequel. People will be swarming to download on bittorrent buy a ticket at the box office!
Below are Hollywood’s more popular ideas listed by category:
Games are hot
After a long period of books becoming a movie (Harry Potter), currently comic books characters become movies (Spiderman, The Hulk) our future holds… drum-roll… games that become movies! ‘Tombraider’ set it all off with taking in $300M worldwide. The movie was a big hit, although not really because of its script quality, but rather due to Angelina’s - hard-to-ignore - front assets. Hollywood is now attempting to achieve the same thing with numerous computer games listed below:
| Prince of Persia: The Sands of Time | 16 June 2009 |
| Halo | 2009 |
| Max Payne | 15 October 2008 |
| Metal Gear Solid | 2009 |
| Mortal Kombat | 2010 |
| Warcraft | 2009 |
| Gears of War | 2009 |
| The Sims: The Movie | 2009 |
Sequels, Triquels and more
Hollywood loves sequels. Sequels provide some kind of guarantee of people coming to the theaters, thus bringing in the money. For example take American Pie. Everybody loved it, but while part 1 brought in ‘only’ $102M, the horrible sequel was somehow cashing 40% more (over $145M). It gets worse with the ‘Scary Movie’ series that reached a global total of $800M with 5 movies. Some noteworthy sequels:
| Transformers: Revenge of the Fallen – There just can’t be enough Megan Fox movies. | 24 June 2009 |
| The Incredible Hulk – So it’s like the Hulk, but than even more incredible? | 6 June 2008 |
| Sin City 2 – Doubt it will be nice since Jessica has become a mom. | 2010 |
| The Brazilian Job – Like the Italian job/heist but now in Brazil. How did they come up with it? | 2009 |
Every time I hope the moviemakers will just give up after yet another bad sequel, but somehow you guys can’t seem to get enough of paying money for a movie that really isn’t worth sitting through if you were the one getting paid to see it! Anyways it leads us to triquels, fourquels…
| Harry Potter and the Half-Blood Prince – Lord Voldemort! (hehe, I said it out loud) | 19 November 2008 |
| High School Musical 3: Senior Year – I’ll wait it out till Vanessa’s college freshmen years. | 22 October 2008 |
| Quantum of Solace – With a $230M budget is must be good. Right? | 31 October 2008 |
| Spider-Man 4 – I hope they get back together. | May 2011 |
| Harry Potter and the Deathly Hallows: Part I – So like… another one? | 19 November 2010 |
| Jurassic Park IV – They should’ve stopped four movies earlier. | 2009 |
| Superman: Man of Steel – Our yearly Superman movie release… | June 2009 |
| Toy Story 3 – Wasn’t part 2 released in your local supermarket VHS section? | 18 June 2010 |
| Shrek Goes Fourth – They better keep Puss in boots! | May 2010 |
Destined for disaster
Sometimes you just need a tiny bit of information to predict if a movie will be hot or not. Somehow these movies just don’t sound like box office successes.
| Fast and Furious – Come on Vin, couldn’t you make the title more original? | 4 June 2009 |
| Watchmen – Based on a comic book.... next. | 5 March 2009 |
| Jennifer's Body – edit: wrongly listed here since it has Megan Fox | 2009 |
| Justice League: Mortal – Official summary “Superman, Batman, Wonder Woman, Aquaman, the Flash and other superheroes unite to to fight against evil forces.” Enough said. | 2011 |
| The First Avenger: Captain America – We’ve all had enough with superheroes by the time this gets released. We – the people - want movies based on supermodels or playmates. | 2011 |
Retro = cool
Didn’t you love the A-team? Of course you did, everybody did! So Hollywood decided to make a movie of it (sound familiar?). Too bad all characters are overweight now, so they have to bring in other actors (won’t be the same without the real BA “I ain't gittin' on no plane!" Baracus will it?). Anyways go over the list below and decide what will go on your 2009-calendar.
| Dragonball – I’ve been practicing “Kamehameha” since. | 27 March 2009 |
| G.I. Joe: Rise of Cobra – Why the F is Brendan Fraser in this movie? | 6 August 2009 |
| X-Men Origins: Wolverine – Soon to be released in a $3 dvd-movie collection near you. | 29 April 2009 |
| The A-Team – Let’s hope that with Bruce Willis the movie will work out. | 11 June 2009 |
Money, money, money!
So what would you do, if you had 100+ million dollars? Apparently the producers of the movies below got that question and answered by saying “We will make a movie!”.
| Lincoln | 2010 |
$100M |
| Hawaii Five-0 | 2010 |
$100M |
| Land of the Lost | 15 July 2009 |
$100M |
| Dragonball | 27 March 2009 |
$100M |
| Halo | 2009 |
$100M |
| The Curious Case of Benjamin Button | $100M | |
| Battle Angel | June 2009 |
$200M |
| Terminator Salvation | 22 May 2009 |
$200M |
| Quantum of Solace | 31 October 2008 |
$230M |
| Harry Potter and the Deathly Hallows: Part I | $250M |
Wednesday, July 16, 2008
Domains in Asia
Last week I visited the beautiful country Thailand, from Koh Pi Pi (island in the south) via Bangkok to Chang Mai and Pai (up north in the jungle) back to Bangkok: a huge city that spans as far as the eye can see from the 84th floor of the highest building.
Thai are overall very friendly and enthusiastic people, which provides a nice break from European so-called ‘hospitality’ (be careful for the elephants though!).
Next to a great experience of a first time in Asia, I’ve had to post something on what’ve experienced about domains in this non-english country (their language goes something like this: ราชอาณาจักรไทย). Although only a limited group of Thai are able to read, write and talk in English all website references go to english / ASCII characters .com (or .th)-domains. Not only companies servicing foreigners in tourism and transportation but also (local) governmental sites refer to .com-domains on billboards while the description is fully in the Thai language.
For me this emphasizes that although the number of domain top-level names will be increased significant soon and IDNs are targeted at a big portion of the web from Asia, a “ascii-domain.com” is strong and widely used even on the other side of the world.
PS: I wish I had taken some pictures as examples, but l had to pretend I had no work on my mind to my girlfriend
.
PS2: Get a blue-ray player and buy the “Planet Earth” documentary from the BBC. 550 minutes of jaw-dropping entertainment.
Friday, December 21, 2007
No civilization will last forever
While watching BBC’s Earth: The Power of the Planet I was pleasantly suprised I wasn’t confronted with the typical nature show patronizing speech on how our actions have caused global warming (or cooling? Lets say climate change) and how very bad we are in using (fossil) energy. Instead of all this, the TV series presenter ended with a very insightful speech which I feel captures the whole ‘we should all save the planet’ debat into what it really is all about:
our planet is really tough
and there is nothing to suggest that it is going to change anytime soonin the long run, earth can cope with anything we can throw at it
we could clear all the jungles, but a jungle can regrow over a few thousand years
we could burn all earths’ fossil fuels, flooding the atmosphere with carbon dioxide
but even then, it will take the planet only a million years or so for the atmosphere to recover
even the animals we are wiping out will eventually be replaced by others equally rich in diversity
as a relentless work of evolution continues
it’s only a question of time
the earth will be just fine.it’s not to say rapid changes we force on earth don’t matter
that is because humans operate on a different time scale
we have evolved to life in a world as it is now
so in changing this world, we are altering the environment that has allowed the human race to thrive
we could be creating conditions that threaten that long term survival of our civilizationso all this stuff about saving planet earth, well that is not the problem:
planet earth doesn’t need saving, earth is a great survivor
it’s not the planet we should be worrying about, it’s us.
(transcripted from: Dr Iain Stewart )
Wednesday, October 24, 2007
How is the domain market different from any other bubble?
An overview of the current domain name market.
We’ve had the Dutch Tulip Bubble (1673), South Sea Bubble (1720), Industrial Bubble (1929), Internet Company Bubble (2000), and the recent Sub prime Mortgage/US Housing Bubble where the asset in the form of a tulip bulb, property or stock price has risen to extremes in a short period while imploding in an even shorter period of time.
Exploding (stock) prices - thus short term profits - attract people of all classes wanting to ‘invest’ and join in the profits. This creates a price of the asset based on what someone else would pay for it, without having a realistic view on the real tangible asset (the company) providing the equity holder profits on dividends and/or long term investments growth. The huge investment demand led to 159 dotcom IPO’s in the first quarter of 2000: all companies were not making any profit, but their stock sold as if it was given away free.
Only a small bump in the road could cause for the market bubble to implode. No new buyers were found for the current price, and people sold, sold and sold, just because everybody sold. And the realisation of the bubble came in too late.
Strengthed by derivates where margin calls enhanced its initial price dropping effect. This ‘side-effect resulted in multi-billions dollar write-offs on their imploded assets by the worlds biggest banks, hedge funds and insurance companies.
Since a few months a market has proven to be booming and very liquid with huge profits: this is the domain name market. Although domains have a value of around $8/year (bulk only $5). Their after-market value shows profits of over 20,000% on e.g. English dictionary words with the .com extension.
Here is a short list of recent domains sold over 1 million dollars on the domain after-market:
Computer.com ($2,200,000)
Creditcards.com ($2,750,000)
CreditCheck.com ($3,000,000)
SEO.com ($5,000,000)
Beer.com ($7,000,000)
Porn.com ($9,500,000)
Seniors.com ($1,800,000)
[ See recent sales on sedo.com ]
A domain auction in June ‘07 sold 16 domains of over $100,000 and this October 12 domains sold for over a $100,000. For example CarSales.com sold for $400,000 where obvious a intermediar company could sell cars. More questionable in future value is CrosswordPuzzles.com that was auctioned for $210,000.
Remember this price is payed for the domain name only. Without any content, website, marketing, or company. Just a domain name.
How come have these prices exploded? It seems that the market has saturated, where all ‘good’ names are already owned by someone. And it is a important issue, because people expected domain names to have limited value (not exceeding the price of a new domain) because you just bought a domain that was still available. But prices are exploding, build on a fundamental economic basis: something is only worth something if its availability is limited. Gold and diamonds have value, because it’s limited available.
Domains names were unlimited, millions of (real) combinations are possible, but it seems that now every dotcom domain name is owned making the the non-after market limited in availability.
In mid-2007 there were 138 million top level domain names active according to VeriSign. The domain market has a large percent of individuals holding a (one-domain to large) portfolio of names, where each name represents only an $8 investment. In the second quarter of 2007, 14.5 million new registrations were made.
The market is now flooded with buyers: companies starting a web company, individuals who want a website/weblog with a catchy name, large portfolio holders, and gold-seekers buying domains hoping to sell them for more.
As a multi-million Widget producing company, owning widget.com and widgets.com is some prime real-estate/asset. A good real life example is vodka.com ($3,000,000) sold to a Russian vodka company, and beer.com ($7,000,000) sold to a Belgium brewer. With search engine ranking being highly volatile this is a sound long-term investment where you don’t need to be dependent on any company (like Google).
But we shouldn’t underestimate the value of the domain itself. Its value is in the steady stream of type in traffic. The domain could be a prime online real-estate where people looking for ‘widget’, go online to ‘widget.com’ as URL (the “.com” extension has been promoted so much, it’s the first thing on peoples mind).
Arriving on the widget.com page, domain owners set up a landing pages showing usually relevant (to ‘widget’!) advertisements only. With current cost-per-click prices in niche markets over $2, a few clicks to an advertiser on the landing page could earn back the yearly costs of the domain, and not very uncommon by pass this break even point. And now the most interesting part is, what if you have $10/month revenue domain times 2,000 domain names? This would result in $20.000 revenue/month with over a 99% profit margin because the domain owner doesn’t have to make any costs beyond the domain price to make a turnover.
And it’s a win-win-win situation. The visitor eventually lands on the highest-paying relevant advertiser page, the advertiser has a relevant to their product potential customer and the domain owner has made a few cents without any promotion (costs).
This all making domains a very attractive asset due to its stream of income for an unlimited period, and it’s possible after-market value.
But you are too late*, as I said before; the good names are already taken. Pre 1999 almost all dictionary words were gone, and in 2001 after the internet bubble many quality domain names became publicly available again but grabbed again for 8$ within seconds. Between 2001 and 2005 all valuable widgetwidget.com were sold (the value of the domain is not in its shortness, but in what it describes) leaving only overpriced domains relative to the initial price.
So we see a huge market growth. But the main question is. Is the domain market a bubble waiting to implode?
The resemblance of all previous bubbles is that the market is flooded with buyers wanting to make short term profit, the general media is joining the craze, lousy domains (assets) are sold in the after-market for 1000% profit, and joining in now won’t make you rich.
More buyers attracting more money providing a self fulfilling prophecy on the price, until ... but the inferred question: Are domain names a long term business opportunity?
There are many uncertainties*, to many to discuss here, but can we assume that the basis of all of this is on if people keep typing in these domains? And, will there be growth in these numbers of visitors and CPC? Is it a long term growth business providing a landing page with ads only?
* Update: when I say “you are too late”, I don’t refer to new business ideas or unique/niche domain profiles… ofcourse you can make money if you are entrepreneur
I just want to warn you that you might enter a ‘bubble’ due to little knowlegde of the market and blindness of the success stories.
Thursday, September 20, 2007
Plug-and-Play YouTube videos
YouTube released a new data API a few weeks ago based on Google’s gData. Reading through the docs to implement it into my YouTube search, I found the option for JSON (in script). With the JSON function you can grab youtube’s videos without having to use a local proxy for AJAX request. The local proxy is needed due to javascript security settings blocking request to other domains, but usually slows the process alot. Also using server side scripting where the data is grabbed, HTMLized and included is usually to difficult for implementing if you just want to add a few vids somewhere.
I wrote a plug-and-play script to include YouTube videos in your site instantly without having to use local proxies, flash, or any other work around. All you need is to include the javascript and where ever you want to add a few videos - to for example a blog post - you add this code:
<div id="youtubeDiv">
<script>
insertVideos(’youtubeDiv’,’search’,’madonna’,’15’,1);
</script>
</div>
How it works
Instructions and examples are on this page. It’s a very early release so options like sorting are not functioning yet.
Saturday, June 16, 2007
Please no more Digg Traffic!!!
List of most digged domains in 2007
Interesting news last week where populair blogs lifehacker and gizmodo announce that they prefer people from not digging every article anymore. An surprising post ofcourse resulting in mass diggs, because if the digg community loves something, that would be talking about themself…
Their reasons to stop digging their website include the following:
- Prefer people to click their banners, instead of digging.
- Don’t want to keep adding servers every week.
- They want to be viewed as profesional journalist, instead of link bait writers.
- And many other reasons we common people will never understand...
But lets see how many frontpage stories these and other sites actually get. With the Digg API I grabbed the data from 01 Feb 2007 to 15 June 2007, and counted what domains got the most stories made populair (the notorious frontpage). A quick sort of the array created a list that show 3,033 unique domains getting 13,091 frontpage listings in the last five and a half months. On average there are 90 frontpage stories from 20 sources (domains) per day.
[...] treehugger.com (74) metacafe.com (82) flickr.com (84) lifehacker.com (87) thinkprogress.org (95) reuters.com (104) destructoid.com (105) msn.com (105) google.com (107) rawstory.com (110) crooksandliars.com (111) washingtonpost.com (112) consumerist.com (116) break.com (119) go.com (126) wired.com (152) nytimes.com (179) blogspot.com (188) - various blogs cnn.com (188) gizmodo.com (194) yahoo.com (206) engadget.com (302) arstechnica.com (395) co.uk (469) - various sites youtube.com (1249)
Although Gizmodo.com got 194 frontpage stories, and Lifehacker.com 87, YouTube should have been the one complaining with over 1,249 stories (well videos in this case)! Surprisingly only 3,000 unique domains have ever been on the digg homepage: I thought the internet was bigger?!
So if this list teaches us one thing, that would be that expanding digg’s horizon beyond the current 3,000 websites might not be such a really bad idea because the ‘top’ sites don’t appreciate it anyway,!
PS: See this page for the complete list of frontpage domains.
Thursday, April 26, 2007
WordPress Theme Generator
Because I “do stuff-with-internet” I am frequently asked if I can create a blog for friends. Eventually I always end up with the same default theme (Kubrick). Just because I don’t have time to make a unique theme everytime.
Somehow nobody ever created a theme generator where you click & try a few colors, a nice layout, tabs, titles, logo, and end up with something nice!
So here we are many hours later: WordPress Theme Generator.
It lets you design a complete theme (with widget support), save to a .zip file, extract and upload. Easy & Fast. No need for CSS or PHP knowledge. Let me know your comments, so I can improve (if needed).
Few Simple Examples:
So, if you want to create a WordPress theme fast & easy: Try it out.
Wednesday, March 28, 2007
Awareness Project
Today the website millionsoulsaware.org has been launched. Millionsoulsaware.org is a not for profit project that has the mission to raise awareness by featuring an article on an important topic that needs attention. Millionsoulsaware.org doesn’t ask for donations, but asks you to spread the word. The millionsoulsaware.org goal is to get one million souls aware on the current subject: Refugee camps worldwide.
The upcoming weeks I will be promoting this in the web community and hope to reach the goal of a million souls aware within several weeks. I’ve added ‘ads’ of the project to my adsense alternate ad code, and ask websites that are reading this to look at it and try it out: help raise awareness on a important topic and no more blank ads on your site. If you have a blog you might wanna try writing about something actual important
If you have time to read this, make some time to also read the article. Awareness is the starting point for a better world. Thank you.
UPDATE 21 april 2007: Millionsoulsaware.org is going at a good pace! The counter is over 11,000 souls aware after three weeks, but we already reached a 1 million goal because the banner (above) is beeing downloaded almost 1 million times a day (!). With help of my lyrics site friends. Click through rate is low, but we’ll get there…
Monday, February 12, 2007
R.I.P. - A tribute to web 1.0
’ .gif files optimized for windows 95 on a 36k modem. These sites had no
AJAX techniques, profiles, blogs, let alone an option to comment. The internet was a place to look around, instead of interaction.
![]() In ’95 Hotmail was introduced: the first place to get a free email address, disconnected from an ISP.
Hotmail was properly the first contact for many new web users with the powers of the internet: communicating by email. Four years later after it's introduction 30 million people worldwide were exchanging @hotmail
email addresses. At some point maybe thought to be the only way to ‘email’ by n00bs.
Hotmail was bought by Microsoft in 1998 for just 400 million dollars, a bargain for pre-internet bubble standards.
Now in 2007 the end of Hotmail is near – although the @hotmail.com won’t go anywhere- since it’s transformation to “Live” mail to become an integrated part of the
Microsoft’s “Live” family.
|
|
![]() Geocities was the most popular place where you could create your own free homepage on the web.
In 1997 Geocities was the fifth most popular website, with over 500,000 homepages created. Yahoo bought Geocities two years later for $3.57 billion dollars. And started to actively
commercialize the homepages with various advertising types that resulted in their death sentence. With ‘real’ web hosting becoming affordable for anybody, the need for free homepages in this
form vanished. Geocities accounts are now only used for outdated information, and to upload/download illegal mp3 files from...
|
|
![]() Search engine Altavista was the Google of the last millennium. The first real effort to index the World Wide Web.
It was popular because it was one of the few search engines that actually came up with good search results.
But Altavista had a hard time fighting spam listings in their results.
While spam grew logarithmic in Altavista, some company named Google found a way to prioritize web pages more intelligently, and thus keep spam out better.
When people tried Google and compared it was Altavista, it became an easy switch. Since then their market share in the industry dropped to almost nothing, with only visitors from old bookmarks. Altavista never (tried to) recover. Yahoo! is now the proud owner of this piece of history.
|
|
![]() ICQ – for the younger people a abbreviation of “I seek you” – created in 1996 was an easy to use instant messenger program where you
could add friends to your list, and see if they were online. Doens't sound new at all, but back then it was revolutionary for the masses and it became the ‘application’ everybody had installed.
ICQ was acquired by AOL in June 1998 for a whopping $287 million plus contingent payments of up to $120 million over three years based on growth performance
levels.
What went wrong? Eventually the program got too many additional features that made the application heavy and unorganized. While competition of AOL IM, Yahoo IM, and MSN Messenger increased,
and friends on your ICQ-list left the application. Eventually resulting in a mass abandoning of the network.
|
|
![]() Netscape, now only famous for the oldschool “optimized for Netscape” on outdated webpages, has dropped from a browser share of over 50% in
’98 to less than 1% now.
What went wrong? Netscape was ‘victim’ of Microsoft’s notorious ways of dealing with competitors. But in the end most blame lies with Netscape self, due to lack of innovation and inability to tie customers to their product. The netscape browser was good in the beginning but got slowwww, buggy, and had an (even more) ugly layout compared to Internet Explorer.
Struggeling to survive Netscape became in 2006 a non-innovative boring web portal, waiting to completely dissapear into the history books.
|
|
![]() Bringing a online standard in streaming audio since ’95. The first audio from the web was transmitted in the Realplayer format. This was in a time of
.wav files and slow 36k modems: not a good combination. Real had created the solution with their applications, and (live) internet broadcasts was born.
But what went wrong? The Realplayer audio format – and player - became obsolete due to - locally savable - small sized mp3 files, and Windows Media Player – distributed standard on all pc’s.
Yes, the death story resembles Netscape Vs. Internet Explorer. Also the program became too commercial with annoying ‘buy pro version’ pop-ups every 10 minutes.
|
|
![]() The web hasn’t always been on ‘open’ place. In the previous millennium there was only one company available where you could buy a .com, .net or .org
domain.
For the small (
It took until the beginning of 2000 until they lost the monopoly position and domain prices dropped over 95%.
Since then innovation halted and Network Solutions became one of the thousands anonymous domain registrars.
|
|
Thursday, December 07, 2006
8 questions about the web you always wanted answers to
The most popular 10,000 websites analyzed - 8 Questions & Answers
Is porn dominating the web?
Is China taking over the web?
Hola, 你好, Konnichi-wa – excuse me, what language?
Are all websites made in Silicon Valley?
I was already link building my Geocities.com account!
Is it true that Yahoo and MSN are more used than Google?
Has the web evolved to web 2.0?
Why do I always see ‘ads by Goooooogle’?
Tuesday, October 17, 2006
Why Can't I Change?
some thoughts on change
It’s in our human psychology to keep the status quo: we prefer going the route we were going all along. The opposite of the status quo is change. Humans are very bad in initiating change. Change means that you have to put effort, it’s unpredictable, creates risk, and worst of all: means that we were wrong before.
Changes are often wanted to improve a current situation. At a certain point you have to decide to change while the option to continue is still open. Visualize this as a crossroad where you can continue, but also change, and turn. Wanting to change rarely succeeds.
Most changes made in our lives are forced changes:
- Forced into change: at some point you are at a “T” intersection, forced to make a decision because continuing is no option. The change is often postponed as long as possible.
- Gradually change: at some point you are at a “Y” intersection, where continuing straight on is no option, but a decisions (thus change) has to be made.
Our day-to-day decisions are made unconsciously through use of heuristics. It’s too complicated for our brains to think everything over. Change has to be initiated by our conscious mind because our unconsciousness will prefer status quo and heuristics. These separated parts of our brain don’t work well together. And the old heuristics conflict with the change wanted by our consciousness.
E.g. smoking
Although all signs and information are indicating that people should stop smoking because it makes them sick, of all the millions of people that smoke, around 70% of them want to stop. To quit smoking is one of those changes you have to decide upon, put effort into, and you would have to do on your own. But only few (6% actually succeeds) are able to succeed to change (stop) without having to have a doctor telling them that it is quit or die (“T” intersection).
Question
Think about it: What have you ever consciously changed in your life?
Wednesday, October 04, 2006
Create your own Tag Cloud - Easy!
For a website - that wanted to be very web 2.0 - I had to create a tag cloud like this on del.liou.us or at flickr. People think they are cool and usefull, so who am I to disagree?! Why re-invent the wheel evertytime, when we have the internet as an unlimited source for code stealing examples.
So as part III of my coding-give-aways* (I,II) I give you:
The Tag Cloud Creator
1) make a $variable with all the words you want in your tag cloud.
2) grab this php example file that is only 30 lines in size (you can use it any way you want - (you like that ey!))
3) include it somewhere on your site, upload it to your server and - if you are not the dumbest nerd - you should get something like this:

Digg.com - as search cloud or try it on other sites.
5) now you have created your own tag cloud to use for a searchengine, photoarchive, or whatever you want.
So have fun, and tell your social community friends
Saturday, September 16, 2006
Easy Fuzzy Logic with MySql – The end of “no results found”
As a web programmer I ran into the problem when running a complicated (user) search on MySql that the results are too strict, and thus giving the well known error “no results found”. While good (although not perfect) results exist!
The problem
When a traditional search query is initiated, sql queries are being generated in the terms of:User search: where tv_manufacturer=”sony” and tv_description =”%widescreen%” and tv_price < 1000;
A user is asking for a Sony television AND that is widescreen AND less then 1000 dollar. This will show very accurate results. But limits the opportunities when (a best matching) TV is $1050. The users would be okay with paying $50 more in real life. But our query won’t allow it. We want to have that (almost perfect match) results shown!
This query can be rewritten by replacing the AND with OR in the query, but by using OR we get inaccurate results because results will show any TV below 1000 dollar OR any Sony OR any widescreen - useless.
The good news is that we can solve this without having to ask a user the factual and nerdy: WIDESCREEN AND (SONY OR 1000 DOLLAR) – way to difficult.
The answer is in what is named ‘fuzzy logic’. Fuzzy logic is more natural and (semi-) intelligent by mathematical logarithms:
User search: a preferably Sony TV with widescreen support for more or less a 1000 dollars, I prefer less. Please.
A few specialists software company’s offer fuzzy logic software, but this is highly tailored to the specific needs of the system.
But mysql has a solution, with a few hacks will result in accurate results.
The solution:
The solution is to be found in the “MATCH AGAINST” function of mysql. It is a text matching system where you can add your preferences, and the query gives points to indicate the score in matching.Very few people use this, maybe because they are disappointed that it is only matching text. But in this post I will show you how to also integrate a (in the real world less strict) demand like: less then $1000.
We do this by encoding the numbers to a word. In this case the TV price of our tv in the database will be encoded to unique words like “pricemaxthousand”, etc.
All the features of the TV are being stored in a new (text only) column named encodedsqlrow.
So we get this: encodedsqlrow = “sony widescreen pricethousandtotwothousand diagonalthirtyinch”.
With the match against function we can also search “IN BOOLEAN MODE”. This will add ‘preferences’ to every search demand (word) in our query.
The preferences you can give to a demand (word) are in the order of:
+ = Obligated
> = Important
~ = More or less important
- = Without
And last but not least, we can retrieve a score with every results. So the most accurate results can be listed at the top.
With all this together we (a user) can create a search query that will results in more natural human-like picked results.
Creating our query:
if($demandpricemax) < 1000)
$encodedsearch = “>sony +widescreen ~pricemaxthousand”;
Getting the score:
Select tv_manufacturer, MATCH (encodedsqlrow) AGAINST (’$encodedsearch’ IN BOOLEAN MODE) as score
Setting the match search:
WHERE MATCH (encodedsqlrow) AGAINST (’$encodedsearch’ IN BOOLEAN MODE) ORDER BY score DESC
Example Page – intergrated:
For a dutch website I made this function so it matches all studies (1800) against the many demands of a to-be-student. Like he could say: I am searching for a study obligated in Amsterdam with more or less important in the economic field with important average workload important mostly female on a more or less important university.
Many demands, and this will result in accurate results that include studies in Amsterdam although it has mostly male students.
Have any questions or want to bash this text: email adres is on the right hand side of your screen.
Note: the database column (encodedsqlrow) must have an FULLTEXT index (via phpMyAdmin the blue “T” the at ‘actions’. This will make it searchable for the MATCH AGAINST function. Else it won’t work.
Sources:
http://en.wikipedia.org/wiki/Fuzzy_logic
http://www.seattlerobotics.org/encoder/mar98/fuz/flindex.html
http://www.wcc.nl/
http://www.kiesjestudie.nl/l-studietest.html
http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Thursday, August 31, 2006
Use Cache to Speed up webserver
Last month I was getting high traffic on one of my sites and it was killing the server. Waiting times over 10 seconds (or timeouts) due to too many queries on mysql.
So, I had to find a solution to stop quering mysql on every ‘hit’ on my site. This post will fill you in on how to use mysql query caching: easy, and fast with text files in a folder as cache.
Let’s start off with the example script (in PHP): show me!
This is what it does:
Step 1) We have a SQL query. Before we just ask mysql to get us that information, we check if a file exists (named after that sql query) in our “cache” folder.
Step 2) If is does exist, we check how old the file is.
- Over X days? Load the data from mysql, and update the file.
- Under X days? Open the file, and print it so we don’t need to load the server with a new query.
Step 3) Oohh!! Wow, that’s already it! So all you have to do to implement this: put your queries in a function. And create a directory “cache” that is writable (777). Load the mysql results and write them to a file. For a webserver opening a file (the cache) is faster then having to connect to mysql, and calculating the query.
Look at the example above, and I think you are all done.
note: this doesn’t cache the whole page, only a small portion that needs data from the database.
Update: This site is now served from an 64 bit Dual Core Opteron 265 - 2 GB server.
[ home ]









