|
|
PageRankPageRank is a family of algorithms for assigning numerical weightings to hyperlinked documents (or World Wide Web pages) indexed by a search engine. Its properties are much discussed by search engine optimization (SEO) experts. The PageRank system is used by the popular search engine Google (search engine) to help determine a page's relevance or importance. It was developed by Google's founders Lawrence E. Page and Sergey Brin while at Stanford University in 1998. As [http://www.google.com/technology/ Google puts it]: :PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. Google interprets a link from page A to page B as a vote, by page A, for page B. But Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important." In other words, a page rank results from a "ballot" among all the other pages on the World Wide Web about how important a page is. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursion and depends on the number and PageRank metric of all pages that link to it ("incoming links"). A page that is linked by many pages with high rank receives a high rank itself. If there are no links to a web page there is no support of this specific page. The Google Toolbar PageRank goes from 0 to 10. It seems to be a logarithmic scale. The exact details of this scale are unknown. The name PageRank is a trademark of Google. Whether or not the pun on the name Larry Page and the word "page" was intentional or accidental remains an open question. The PageRank process has been patented (). An alternative to the Page rank algorithm is the HITS algorithm proposed by Jon Kleinberg. ==Page rank algorithm== ===Simplified=== Suppose a small universe of four web pages: A, B, C and D. If all those pages link to A, then the PR (PageRank) of page A would be the sum of the PR of pages B, C and D. : But then suppose page B also has a link to page C, and page D has links to all three pages. One cannot vote twice, and for that reason it is considered that page B has given half a vote to each. In the same logic, only one third of D's vote is counted for A's PageRank. : In other words, divide the PR by the total number of links that come from the page. : Finally, all of this is reduced by a certain percentage by multiplying it by a factor q. For reasons explained below, no page can have a PageRank of 0. As such, Google performs a mathematical operation and gives everyone a minimum of . It means that if you reduced 15% everyone you give them back 0.15. : So one page's PageRank is calculated by the PageRank of other pages. Google is always recalculating the PageRanks. If you give all pages a PageRank of any number (except 0) and constantly recalculate everything, all PageRanks will change and tend to stabilize at some point. It is at this point where the PageRank is used by the search engine. ===Complex=== The formula uses a model of a ''random surfer'' who gets bored after several clicks and switches to a random page. The PageRank value of a page reflects the frequency of hits on that page by the random surfer. It can be understood as a Markov process in which the states are pages, and the transitions are all equally probable and are the links between pages. If a page has no links to another pages, it becomes a sink and therefore makes this whole thing unusable, because the sink pages will trap the random visitors forever. However, the solution is quite simple. If the random surfer arrives to a sink page, it picks another URL at random and continues surfing again. To be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability of usually q=0.15, estimated from the frequency that an average surfer uses his or her browser's bookmark feature. So, the equation is as follows: : where are the pages under consideration, is the set of pages that link to , is the number of links coming from page , and ''N'' is the total number of pages. The PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix. This makes PageRank a particularly elegant metric: the eigenvector is : where R is the solution of the equation : where the adjacency function is 0 if page does not link to , and normalised such that, for each ''j'' : i.e. the elements of each column sum up to 1. The values of the PageRank eigenvector are fast to approximate (only a few iterations are needed) and in practice it gives good results. As a result of Markov process, it can be shown that the PageRank of a page is the probability of being at that page after lots of clicks. This happens to equal where is the expected value of the number of clicks (or random jumps) required to get from the page back to itself. The main disadvantage is that it favors older pages, because a new page, even a very good one, will not have many links unless it is part of an existing site (a site being a densely connected set of pages). That's why PageRank should be combined with textual analysis or other ranking methods. PageRank seems to favor Wikipedia pages, often putting them high or at the top of searches for several encyclopedic topics. A common theory is that this is because Wikipedia is very interconnected, with each article having many internal links from other articles, which in turn have links from many other sites on the Web pointing to them. Compared to Wikipedia, and similar high quality content-rich sites, the rest of the World Wide Web is relatively loosely connected. However, Google is known to actively penalize link farms and other schemes to artificially inflate PageRank. How Google tells the difference between highly inter-linked web sites and link farms is one of Google's trade secrets. ==See also== *List of websites with a high PageRank ==External links== * [http://www.google.com/technology/ Our Search: Google Technology] by Google * [http://www-db.stanford.edu/~backrub/google.html The Anatomy of a Large-Scale Hypertextual Web Search Engine] by Sergey Brin & Lawrence Page - 1998 * [http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=1999-66&format=pdf&compression= The PageRank Citation Ranking: Bringing Order to the Web] Lawrence Page, Sergey Brin, Rajeev Motwani, & Terry Winograd - 1999 (PDF) * [http://www.voelspriet2.nl/PageRank.pdf Page Rank Uncovered] by Chris Ridings & Mike Shishigin (PDF) * [http://www.cs.washington.edu/homes/pedrod/papers/nips01b.pdf The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank] by Matthew Richardson and Pedro Domingos - 2002 (PDF) * [http://www.google.com/technology/pigeonrank.html PigeonRank]: a Google-sized violation of animal welfare by exposing helpless birds to possibly harmful webpages (humour). * [http://www.mipagerank.com/ Mi PageRank]: PageRank Calculator online (Spanish) Google PageRanktalk:Wikipedia Announcements says that wikipedia has a PageRank of 0.7. How does one discover this PageRank? :If you have a Google toolbar add-on to your browser the PageRanks are indicated within it. :Since the Google toolbar is Windows/IE only, another trick is to search the Google Directory. ---- ''A common theory for why this is is because the Wikipedia is very interconnected, with each article having many internal links from other articles, which in turn have links from many other sites on the Web pointing to them. Compared to Wikipedia, and similar high quality content-rich sites, the rest of the World Wide Web is relatively loosely connected. '' This cannot be correct. Simply being more tightly woven with links cannot increase the total flux of links, which determines the PageRank of pages within a site. More plausible explanation would be: #Wikipedia has a high ''ratio'' of internal site links to external links #Wikipedia as a whole has a very high PageRank #Some of Wikipedia's highly ranked competitors (eg xrefer) have broken Google indexing of their entries (actually, xrefer is no longer free as in beer at all) #One of google's other algorithms (a lot more goes into their rankings than just PageRank) is favouring wikipedia -- maybe they simply decided "let's boost wikipedia" :) --User:Pde 18:51 17 Jun 2003 (UTC) I think it probably is correct - depending on how Google seeds it's pagerank. In the pagerank paper, they mention two possible seeds: a uniform allocation to all top-level webpages, and a uniform allocation to all pages. The former makes sense, because it bootstraps off the domain name system, making attacks on pagerank costly. (To increase your pagerank, you'd have to buy lots of domain names). The later means link farms (which wikipedia closely resembles ;) get a high pagerank. In any case, most links are internal, so not much "energy" leaks out. So in either seeding system, wikipedia would do quite well. For a more detailed discussion, see the work by Monica Bianchini and friends. User:Clausen 01:33 18 Jun 2003 (UTC) ---- The link to "PowerPoint HTML presentation by Larry Page" is dead User:Alon 15:22, May 5, 2004 (UTC) ---- I've deleted the list of sites that have at some point had a high PageRank. It's subjective, highly non-comprehensive, will never be up to date, and we've got external links to sites that do the same, only better. --User:ALargeElk User:ALargeElk | User talk:ALargeElk 10:15, 28 Jul 2004 (UTC) == Add a link to my paper: The Cost of Attack of PageRank? == Hi all, Does anyone want to add a link to my work on PageRank: members.optusnet.com.au/clausen/reputation I think adding it myself would be too much self-promotion. Contributions: survey of all the different PageRank formulas around (lots of mistakes!), proof of convergence, analysis of cost of attack. So, if you think the interested reader should know about it, add it yourself... --User:Clausen 21:50, 20 Sep 2004 (UTC) == How many? == If the "PageRank" as it's showed in the google toolbad is 5/10, is there a way to fiture out how many of the currenctly 8,058,044,651 pages that link to that page?--User:Jerryseinfeld 22:32, 3 Jan 2005 (UTC) No. It could have one link from a page with PageRank of ~6, or it could have 100000 links from pages of PageRank 1. It also depends on how many outgoing links it has. (Outgoing looks reduce PageRank) Read my thesis! http://members.optusnet.com.au/clausen/reputation --User:Clausen 12:25, 29 May 2005 (UTC) == expected number of clicks == Removed ''This happens to equal where is the expected value of the number of clicks (or random jumps) required to get from the page back to itself''. This were true only if the algorithm would be seeded with the unit vector of the page. --User:Tgr 10:51, 29 May 2005 (UTC) Actually, you can interpret non-uniform seed vectors as a probability distribution over the web pages the random surfer "jumps" to when it "gets bored". In this interpretation, the "random surfer" interpretation of still holds. --User:Clausen 12:24, 29 May 2005 (UTC) Yeah, I misunderstood the point the article made. Reverted to original. --User:Tgr 22:51, 29 May 2005 (UTC) Pagerank#REDIRECT PageRank See other meanings of words starting from letter: PPA | PB | PC | PD | PE | PF | PG | PH | PI | PJ | PK | PL | PM | PN | PO | PR | PS | PT | PU | PW | PX | PY | PZ |Words begining with PageRank: PageRank PageRank Pagerank |
These materials are based on Wikipedia and licensed under the GNU FDL
YouTube.com videos better site than Turbo Tax 2007 |
|
|