...it remains to be seen whether [AOL/Time Warner] will emerge as an online titan or succumb to the sclerotic forces that inevitably tend to bedevil huge companies.
Last year, in "The Future of Web Search" (ONLINE, May/June 1999, p. 54-61), I took on the somewhat quixotic task of painting a portrait of the state of Web search in the year 2004. All of the industry leaders I spoke with were amused by the audacity of the idea--after all, on Internet time, most people can't predict what will happen five months from now, let alone five years hence.
Nonetheless, some long-term trends emerged that are definitely shaping the constantly shifting landscape of search. But inevitably, given the startling acceleration in the growth of the Web and players seeking to capture the minds (and dollars) of its users, new ideas, technologies, and projects have sprung up that hold significant promise--and implications--for the future of Web search.
CONVERGENCE: THE EVERYWHERE WEB
Convergence also arrived in not-so obvious forms.
The most ballyhooed and visible example of convergence in the past year was the proposed merger between AOL and Time Warner. Widely touted as the killer marriage of a giant content company with the dominant provider of Internet access, it remains to be seen whether the new
company will emerge as an online titan or succumb to
the sclerotic forces that inevitably tend to bedevil
huge companies.
Also widely heralded were new-generation mini-browsers, appearing on everything from palm organizers to cellular phones to (calling Dick Tracy) wrist watches. These devices will soon become pervasive thanks to advances in micromachining--carving out and building up microscopic structures on silicon wafers--that will allow multiple components to be constructed on a single chip at very reasonable cost. Search is a killer app for mini-browsers, though it will predominantly be search for real-world and real-time information--maps and directions, stock quotes, news and sports scores, and so on.
Convergence also arrived in not-so- obvious forms. Scores of Internet-enabled devices were announced at the CES International show in Las Vegas in January. Internet refrigerators, dishwashers, and microwave ovens will soon be commonplace.
It's easy to dismiss these new
developments as irrelevant for
serious searchers. Look at the
companies involved in the new
generation of Web gizmos, though, and you'll see the major players behind today's Internet infrastructure: Cisco, Sun Microsystems, Microsoft, Intel. You'll also see serious involvement from makers of consumer- entertainment products.
For these new products to be
successful, massive amounts of R&D money will be spent, with particular emphasis on making them easy to use. This should drive a huge wave of
innovation, particularly in the area of connectivity and user-interface design. These products will also spur (finally) the creation of an effective micropayment system.
The implications of micropayments for both traditional information providers and Web search engines/ portals are enormous. Documents will shatter into packets; packets into straightforward "answers" that are true and succinct, not links back to full documents. Micropayment capabilities will once and for all overcome the pervasive but ridiculous notion that "information wants to be free," allowing content owners to slice and dice information in myriad ways, and searchers to become information
consumers who won't think twice about paying for what they're getting because their needs will finally
be satisfied.
While a certain part of the Web
population will continue to want
full-text search and retrieval, a much larger portion will use search simply to solve problems. When your
Web-enabled VCR can automatically answer hard questions like "How do I get the clock to stop blinking 12:00" by querying the U.S. Naval Observatory Master Clock and automatically reprogramming itself, even "user friendly" services like Ask Jeeves may find themselves with countless
underutilized answers in search of
a questioner.
INFORMATION UNDERLOAD:
THE SEARCH ENGINES AWAKEN
Distributed search engines would have much in common with the massive scalable databases used by scientists who are increasingly working with huge datasets.
July 1999: A study published in Nature by NEC Research Institute scientists Steve Lawrence and Lee Giles estimated that the publicly indexable Web had over 800 million pages, but that no search engine indexed more than 16% of the total. Like a glass of cold water splashed in their collective faces, this seemed to awaken the major search services and spur them to action. Suddenly, the competition to have the biggest index of the Web became intense. Within months of publication of the NEC study, search-engine claims of doubling or tripling index sizes became commonplace.
Then, in early 2000, Inktomi, together with the NEC Research Institute, announced the results of a study that the Web had grown to more than 1 billion documents. At this juncture, relative newcomer FAST appeared to be the leader, with 300 million pages in its index. Both AltaVista and Excite claimed numbers in the 250-million-page range. All claimed to have sampled many more pages than they indexed, presumably filtering out duplicate pages or spam.
As impressive as these catch-up numbers are, the engines are still
lagging behind the growth rate of the Web. IDG estimates that the Web will grow to over 13 billion pages over the next three years. It's beginning to appear that centralized approaches to creating Web indexes may not scale with the Web's explosive growth. Catching up will likely require
adopting some sort of distributed search approach, similar to the approach patented by Infoseek.
Distributed search engines would have much in common with the
massive scalable databases used by scientists who are increasingly working with huge datasets. For example, CERN (the birthplace of the Web) is building the world's largest particle accelerator, the Large Hadron Collider (LHC). The LHC will generate at least one pedabyte of data per year. To put that in perspective, consider that the entire Library of Congress stores less than a thousandth of that amount (one terabyte).
To cope with this truly staggering amount of information, the LHC will decentralize storage, maintaining databases housed on computers around the world. Any researcher with Web access can query these servers and display results in seconds--from the full, unfiltered dataset. Adapting this technique to search indexes has obvious advantages, and as a technological solution, distributed search appears to have great potential. Whether the business models of the search engines can adapt to embrace the approach is another question altogether.
POWER TO THE PEOPLE
A surprising development during 1999 was the surging popularity of human-compiled directories of the Web, most notably, the Open Directory Project (ODP). Though it is little more than a year-and-a-half old, the ODP now provides directory data to more than 100 search engines, including AltaVista, AOL Search, Dogpile, HotBot, Lycos, Metacrawler, and Netscape Search. This gives the ODP a reach of total users comparable to that of traffic- leader Yahoo!
However, if automatically spidered search engines can't keep up with the explosive growth of the Web, human-compiled directories don't have a prayer of being comprehensive. This doesn't mean they will be any less popular. What we'll likely see over the next few years is the emergence of hybrid human/machine-compiled directories. These will take two primary forms: machine-compiled directories that will be "edited" by humans for quality control, and systems that apply intelligence to searching an existing human-compiled directory. The Inktomi directory engine is an example of the former, and Oingo is an example of the latter. Oingo currently uses ODP data as its source, though its technology is readily adaptable to any source of directory data.
Rather than doing simple keyword-matching searches, Oingo searches are conducted within what Oingo calls the realm of "semantic space," bringing up categories and documents that are close in meaning to the concepts the searcher is interested in. The key advantage to this approach is that results can be retrieved that would have been missed in a traditional plain-text search. Simply because a certain word does not appear on a page does not preclude its relevance, if the document is conceptually related to the query.
Oingo also provides a sophisticated filtering mechanism that allows
successively greater degrees of control over search results by specifying
the exact meaning of query words,
eliminating irrelevant alternate
definitions.
SEARCH GETS PERSONAL
The holy grail for search services has always been a system that adapts itself to user needs by observing, learning, and reconfiguring itself to deliver only totally relevant results while screening out the dross. Most systems that attempt personalization apply Artificial Intelligence methods to some degree. AI has made great strides in the past few years, but we're still a long way from truly intelligent agents that essentially become your all-knowing e-librarian.
But there are less ambitious approaches to capturing information about what people find relevant and fine-tuning relevance algorithms to reflect this information. Google's PageRank system does this in part by analyzing page "importance," a form of citation analysis that amounts to a virtual peer-review process for Web pages. Direct Hit takes a different approach to measuring the relative popularity of a page. The system observes which pages are selected from search results, and how long visitors spend reading the pages. Direct Hit has compiled data on more than one billion of these "relevancy records" and continuously updates its user-relevancy rankings based on newly gathered data.
Both of these systems rely on
analyzing the behavior of the
aggregate population of Web users. New systems are emerging that bring the focus to the level of the individual. Backflip is an interesting and highly promising example of such a system.
Backflip essentially creates a Yahoo-like directory from your
bookmarks or Internet favorites. Because it is constructed from pages and sites that you have already vetted by bookmarking them, the directory is almost totally relevant to your needs. And it transcends a simple list of
bookmarks, because Backflip captures the full text of all bookmarked pages, and then allows you to run sophisticated keyword queries on just that limited set of pages.
Soon, Backflip will introduce the capability to do keyword searches on every page you've ever visited, whether you've bookmarked it or not. No longer will you need to use a search engine to find a previously visited page--and you won't want to, since the visited search space of even
a power user will be orders of
magnitude smaller than even a
smallish general-purpose directory, assuring dramatically relevant results for nearly any query.
Don't expect the search engines to ignore this threat. They could easily offer a similar system by taking advantage of information they already provide to advertising networks in the form of click-stream data. With
appropriate permissions, you could allow a search engine to analyze your personal Web-surfing data (perhaps under license from ad servers like Double Click or Flycast) and use it as a massive filter on the engine's full index. The privacy implications of this approach are enormous, of course, but so is the potential for dramatically improved, customized search results.
BROWSER-FREE SEARCHING
Searching the Web typically means calling up a search engine or directory in your browser window, entering keywords, and viewing the results. We're now seeing a new class of tools emerge that operate independently from the main browser window. Alexa was one of the first of these utilities, providing links to related sites and other information.
AltaVista offers Discovery, a thin standalone bar that offers AltaVista search and several other useful
features. It includes the Hyperbolic Tree by Inxight, that displays Web- page relationships, allowing you to visually navigate sites. Like Backflip, Discovery can search your previously viewed Web pages, and it can also summarize Web pages, highlight keywords, and perform several other useful tasks. Excite's Assistant offers different features, but is also browser-independent.
Two relatively new programs point to further browser-free innovations that we'll likely see more of in the future. These are applications designed to provide ready reference information at your fingertips. GuruNet is a deceptively simple utility that allows you to highlight a word on a Web page and get dictionary and encyclopedia definitions for the word in GuruNet's pop-up window, as well as a translation of the word into the language of your choice. Depending on the word, you may also be offered science- or technology-related definitions.
GuruNet also provides you with
relevant RealNames links, and the results of an Oingo search for the word, providing you with direct access to relevant Web pages without the steps involved in calling up a search engine or directory and entering a query.
Flyswat is what you might call an "association engine." It scans the full text of every word on a Web page, and automatically hyperlinks words for which it has additional information. Clicking the Flyswat link brings up a window with links to focused, targeted information related to the word.
For example, if a page mentions a publicly traded company, Flyswat automatically turns its name into a link. Clicking the link brings up a menu with literally dozens of sources of information about the company, including links to analysis, news, various investment-related sources, the company's home page, and so on. For names of people, Flyswat creates links to biographical information. Cities, countries, or other places get links to maps, country facts, weather, and tourist information. In essence, Flyswat automatically constructs targeted search results using all of the significant keywords on a page.
WHOLE DOCUMENT QUERIES
Lacking the thousands of real-life heuristics we humans have learned from the day of our birth, search engines are still essentially idiot savants...
Most current search systems are limited to keyword queries. Some systems also allow natural-language queries, but typically these are limited to a single phrase or question.
New systems under development allow searching based on similarity of content rather than similarity of vocabulary. These systems allow users to submit paragraphs or even entire pages of text as a query, then attempt to actually understand what the query is about. They work by evaluating words in context and assigning documents numerical values that can be used for sophisticated relevance calculations.
Using huge chunks of text rather than specific keywords may seem like a sure recipe for muddying the clarity of a query, but the approach seems to significantly improve recall and precision. Ejemoni is an example of such a new service that should have been demonstrated in beta format sometime earlier this year. The program was tested at Vanderbilt University in a controlled academic environment on a relatively small database, and achieved impressive scores in both recall (85%+) and precision (86%+). Though it remains to be seen whether the system can scale to the full size of the Web, Ejemoni has attracted some high-power talent with impressive credentials to its executive team and board of directors. Two other promising systems that go beyond traditional keyword searching are Albert and Simpli. (Editor's Note: Dialog's WebTop also allows full-text, cut-and-paste searching. See Mick O'Leary's description in O'LEARY ONLINE on page 91.)
BEYOND HAL
A common archetype for the
intelligent computer is HAL, the
murderous silicon star of 2001: A Space Odyssey. HAL's flaw, and that of all current search tools, is the lack of a fundamental component of human intelligence: common sense. Lacking the thousands of real-life heuristics we humans have learned from the day of our birth, search engines are still essentially idiot savants, able to perform startling feats of recall without really "knowing" what they are doing.
One of the most interesting attempts at integrating common sense into a computer program is the CYC Knowledge Server. The work of noted AI researcher Doug Lenat, CYC (pronounced "psych") attempts to introduce common sense into the search equation by tapping into a large knowledgebase of facts, rules of thumb, and other tools for reasoning about objects and events of everyday life. The knowledgebase is divided into hundreds of "microtheories" that share common assumptions, but which can also appear to contain contradictory facts when applied to different domains of knowledge or action. Humans, using common sense, can resolve these apparent contradictions based on context. For example, it's not acceptable to whoop like a lunatic when you approve of a business decision at a meeting, but it's equally unacceptable not to go bonkers when your favorite football team scores a touchdown during a big game. The goal with CYC is to dramatically increase tolerance for ambiguity or uncertainty while simultaneously providing a "reasoning engine" that approximates human common sense.
CYC works well on semi-structured information, such as the records in the Internet Movie Database. Begun as a well-funded research project in 1984, one hopes the technology underlying CYC, and systems like it, becomes more widespread.
THE FUTURE ISN'T WHAT
IT USED TO BE
These are just a few of the countless developments and innovations that are shaping and transforming the landscape of Web search as we move into the new millennium. For space reasons, we didn't touch on other areas where notable progress is being made, such as intelligent agents, visualization and interface design, metadata standards, and the development of micromachines and bio-silicon chips that will soon allow people to "jack in" to the Web, directly inhabiting the world of cyberspace as described in novels like William Gibson's Neuromancer and Neil Stephenson's Snow Crash.
This isn't science fiction anymore. Carver Mead and his group at Cal Tech are working with neuromorphic analog VLSI chips--silicon models that mimic, or capture in some way, the functioning of biological neural systems. It will only be a matter of time until these systems begin to interact with one another, learning, storing, and of course "remembering" on demand (i.e., becoming searchable).
Ray Kurzweil, writing in Scientific American, says that "by 2019, a $1,000 computer will at least match the processing power of the human brain. By 2029, the software for intelligence will have been largely mastered, and the average personal computer will be equivalent to 1,000 brains."
We can only hope that computers with this much intelligence will be well-equipped with common sense and a healthy sense of ethics. We appear to be on the right track. Google!
co-founder Sergey Brin, speaking at the Search Engine Strategies 1999 conference, offered this optimistic
prediction: "In the future, search engines should be as useful as HAL in the movie 2001: A Space Odyssey--but hopefully they won't kill people."
Search Engines & Directories
FAST
http://www.alltheweb.com
Oingo
http://www.oingo.com
Direct Hit
http://www.directhit.com
Open Directory Project
http://www.dmoz.org
Sites Using Open Directory Data
http://dmoz.org/Computers/Internet/WWW/
Searching_the_Web/Directories/Open_Directory_Project/
Sites_ Using_ODP_Data/
HAL's Legacy, by Douglas Lenat
http://www.cyc.com/halslegacy.html
The Coming Merging of Mind and Machine,
by Ray Kurzweil
http://www.sciam.com/1999/0999bionic/0999kurzweil.html
Carver Mead's Physics of Computation Group
http://www.pcmp.caltech.edu/
Chris Sherman (websearch.guide@about.com or csherman@searchwise.net) is the Web Search Guide for About.com, http://websearch.about.com. He holds an MA from Stanford University in Interactive Educational Technology and has worked in the Internet/Multimedia industry for two decades, currently as President of Searchwise.net, a Web consulting and training firm.
THE DEPARTURE. Our travellers were not obliged to bargain for their conveyance, as they went ashore in the boat belonging to the hotel where they intended to stay. The runner of the hotel took charge of their baggage and placed it in the boat; and when all was ready, they shook hands with the captain and purser of the steamer, and wished them prosperous voyages in future. Several other passengers went ashore at the same time. Among them was Captain Spofford, who was anxious to compare the Yokohama of to-day with the one he had visited twenty years before. "Tell me," said the Doctor, without moving a muscle in his face, "was she satisfied with her tour of my premises?" The Doctor stabbed a finger wildly in the direction of the coal cellar. "If you had seen what I have seen to-night, you would understand. You would be feeling exactly as I am now." Meanwhile Balmayne had crept in downstairs. He crossed over and helped himself liberally to brandy. He took a second glass, and a third. But there came none of the glow of courage to his heart. There was nothing in the kitchen, but there were some boxes in the storeroom beyond--a tin or two of sardines and some biscuits. Also in a wine cellar Leona found a flask or two of Chianti. "A glass of beer, madame." Outside Cherath a motor-car stood between some partially removed trees. Two officers and three soldiers stood around a map which they had laid on the ground, and with them was a young girl, scarcely twenty years old. She was weeping, and pointed out something on the map, obviously compelled to give information. One of the officers stopped me, was clearly quite satisfied with my papers, but told me that I was not allowed to go on without a permit from the military command. Then I pulled out of my pocket, as if of great importance, the scrap of paper which the commanding officer at the bridge near Lixhe had given me. The other had scarcely seen the German letters and German stamp when he nodded his head approvingly, and quickly I put the thing back, so that he might not notice that I was allowed only to go to Visé. The critical tendency just alluded to suggests one more reason why philosophy, from having been a method of discovery, should at last become a mere method of description and arrangement. The materials accumulated by nearly three centuries of observation and reasoning were so enormous that they began to stifle the imaginative faculty. If there was any opening for originality it lay in the task of carrying order into this chaos by reducing it to a few general heads, by mapping out the whole field of knowledge, and subjecting each particular branch to the new-found processes of definition325 and classification. And along with the incapacity for framing new theories there arose a desire to diminish the number of those already existing, to frame, if possible, a system which should select and combine whatever was good in any or all of them. On a square, shaded by an awning, with porticoes all round, coolies in white dresses sat on the ground making up little bunches of flowers, the blossoms without stems tied close to a pliant cane for garlands—jasmine, roses, chrysanthemums, and sweet basil—for in India, as in Byzantium of old, basil is the flower of kings and gods. The basil's fresh scent overpowered the smell of sandal-wood and incense which had gradually soaked into me in the presence of the idols, and cleared the atmosphere delightfully. A woman rolled up in pale-tinted muslins under the warm halo of light falling through the[Pg 80] awning, was helping one of the florists. She supported on her arm a long garland of jasmine alternating with balls of roses. Almost motionless, she alone, in the midst of the idols, at all reminded me of a goddess. A tall wide gate beyond the bridge opens into the ferocious fortress of Hyderabad. Shorty entered the court with an air of extreme depression in face and manner, instead of the usual confident self-assertion which seemed to flow from every look and motion. He stood with eyes fixed upon the ground. "Wot's fretting you, boy?" he asked. The price was now agreed upon, and the purse that accompanied the pursuivant's dress was more than sufficient to satisfy the exorbitant demand of the foreman. The day was favourable for the pageant, and the houses seemed to vie with each other in the variety of their silken colours and tinselled ornaments, glowing and glittering in the morning sun. At Cornhill, indeed, the meretricious adornments of art were superseded for a brief space by the simple beauty of nature, and the eye felt a momentary relief in resting on the green grass, and the few shaded trees that covered the open ground. But this green spot was succeeded by a dense mass of dwellings covered with hangings of a richness suitable to the reputed wealth of the city merchants; here the scene was animated in the extreme,—the motions of the crowd became unsteady and irregular, as they were actuated at once by eagerness to hurry on, and a desire to linger among the rainbow diversity of hues around them, and the glowing beauty which, arrayed with costly elegance, and smiling with anticipated enjoyment, graced every open window. HoME大香蕉群交之肛交视频在线
ENTER NUMBET 0016ko7.com.cn www.fushipifa.net.cn www.jwshwr.com.cn lotia.com.cn mbservice.com.cn www.kdamen.com.cn www.valassis.com.cn sogkx.com.cn www.qzchain.com.cn mtfwjw.com.cn