Dev.Opera - Follow the standards, break the rulesDev.Opera - Follow the standards, break the rules

Login

Lost password?

Forums » Article Discussions

Discuss the articles posted on Dev.Opera.

Note: You need to login to post in the forums. if you don't have an account you first need to sign up.

By pavel.studeny anchor Thursday, 10. April 2008, 20:50:06

Indexing and searching in Opera - visited pages search

In this article, Pavel Studený lifts the lid off Opera Quick Find History Search, an exciting new Opera feature that allows you to search the full text of previously-visited pages.

( Read the article )

By FataL anchor Tuesday, 15. April 2008, 18:01:08

avatarI have a question regarding indexing algoritm...
Why Opera 9.5 doesn't include parts of URL in index?
For example, I very often read Desktop Team blog. It has word "desktopteam" as part of URL. When I type desktopteam inside address field I get no results. It's strange to have all page content indexed, but not URL...

Any comments on this issue?

By pavel.studeny anchor Wednesday, 23. April 2008, 15:40:40

avatarOnly domains are indexed from a URL. It might change.
Furthermore, some types of page, such as https, are not indexed to protect your privacy.

By kamalesh anchor Thursday, 24. April 2008, 00:25:46

avatarGreat article on another cool Opera innovation, Pavel. :smile:

Can you explain whether the results from Opera History Search should be identical as the sort order from the address bar? It appears that OHS returns by strictly relevance, while searching from the address bar sorts by relevance AND most recent. Am I seeing that correctly?

(This is in Build 4784, not sure if you'll further tweak this in subsequent builds.)

By FataL anchor Thursday, 24. April 2008, 18:41:47

avatar

Originally posted by pavel.studeny:

Only domains are indexed from a URL. It might change.

Would be great! :up: :wait:

Originally posted by pavel.studeny:

Furthermore, some types of page, such as https, are not indexed to protect your privacy.

That's fair, but I would make it depends on Cache HTTPS option (if I decided to cache HTTPS pages, why not allow to search them).

By olli O anchor Friday, 25. April 2008, 14:54:30

avatar"Furthermore, some types of page, such as https, are not indexed to protect your privacy."

IMO that is a stupid bug

By pavel.studeny anchor Tuesday, 27. May 2008, 15:35:40

avatarYes, the results can be sorted in a few different ways. It is sorted by relevance in opera:historysearch, while the address bar is tuned for the best performance, for case you typed too quickly.

By HaJotKEXBanned User anchor Tuesday, 27. May 2008, 15:57:50

avatar

Originally posted by olli:

IMO that is a stupid bug

Completely agreed... :D

At least worth an option in Preferences Editor (opera:config)!

By Profnovice anchor Tuesday, 3. June 2008, 11:00:09

avatarI am, one question concerning indexation!!!
Whether it is possible to index all content of pages, without the name of url in content of pages?

By eestlane anchor Tuesday, 15. July 2008, 12:49:21

avatarA good article. However, I didn't read out how the rank of any word is calculated before it goes into the database?

I'm planning to develop something similar with php and mysql.

By gnpk anchor Tuesday, 23. September 2008, 19:05:51

avatarCan you explain a little bit on how the rank for a particular word is created. There might be words that might be skipped in the ranking (for example, a word like the in English). But then, that is very much language dependent. As mentioned in the post, there might be languages which might not have such grammatical elements (btw, I didn't know about the lack of spaces in Japanese and Arabic - thanks for that). So, how do you pick which words to be part of the inverted index.
Also, if you can explain how you pick words in non-English (specifically Asian languages) it will be very interesting to know.

Gangadhar

By jeremyhudson anchor Friday, 21. November 2008, 15:45:09

avatarA really useful new feature in the Opera browser, although it is still limited in the area of URL indexation as it was already pointed out above, such restrictions call for at least a more customizable preferences. Still, it's good to see that browser developers don't rest on their laurels.

By kamalesh anchor Friday, 21. November 2008, 16:38:18

avatarWhat limitation, Jeremy? Opera can find the web page title, all the contents of that page, and the entire URL path.

(And it's not new, since it's been available since April now; though, I guess it's new if you use Firefox's half-"awesome" bar or Chrome's "One Box" addr bar...)

Moderators: pepelsbey | dstorey | mcx | operadev | chrismills | shwetankdixit | brucelawson | iheni | andreasbovens | zibin | mollydotcom