July 13, 2009 by Josh Young
Posted in Uncategorized | Leave a Comment »
July 12, 2009 by Josh Young
Posted in Uncategorized | Leave a Comment »
July 11, 2009 by Josh Young
Posted in Uncategorized | Leave a Comment »
July 8, 2009 by Josh Young
-
Cassandra is a hybrid non-relational database in the same class as Google's BigTable. It is more featureful than a key/value store like Dynomite, but supports fewer query types than a document store like MongoDB. This does not mean that SQL as a general-purpose runtime and reporting tool is going away. However, at web-scale, it is more flexible to separate the concerns. Runtime object lookups can be handled by a low-latency, strict, self-managed system like Cassandra. Asynchronous analytics and reporting can be handled by a high-latency, flexible, un-managed system like Hadoop.
-
Bricolage has become one of the most dominant themes of the new online world. We offer the following collection of some our favorite places to discover marvelous things online. All are curated by the careful eyes and hands of one or a few editors.
Posted in Uncategorized | Leave a Comment »
July 7, 2009 by Josh Young
-
We present a method for automatic generation of in-text explanatory hyperlinks for use in web publishing, using English Wikipedia as the training set, which allows us to capture the current cultural knowledge.
-
I developed a fairly extensive preprocessor of the standard Wikipedia XML dump into my own extended XML format, which eliminates some information and adds other useful information.
Posted in Uncategorized | Leave a Comment »
July 3, 2009 by Josh Young
-
This book outlines the human side of the information seeking process, and focuses on the aspects of this process that can best be supported by the user interface. See especially the chatpers on Information Visualization for Search Interfaces (10) and Information Visualization for Text Analysis (11).
-
The TimesTags service can help you build a tag set, standardize names of people and organizations, or identify subjects that are currently making news. The TimesTags service matches your query to the controlled vocabularies that fuel NYTimes.com metadata. You supply a string of characters, and the service returns a ranked list of suggested terms.
-
The advent of file sharing has weakened copyright. Today, more than 60% of internet traffic consists of consumers sharing music, movies, books, and games. Yet file sharing has not undermined the incentives of authors to produce new works. The cannibalization of sales that is due to file sharing is more modest than many observers assume. File sharing increases the demand for complements to protected works. And monetary incentives simply play a reduced role in motivating authors to remain creative.
Posted in Uncategorized | Leave a Comment »
July 1, 2009 by Josh Young
-
The business oriented social networking site LinkedIn announced this morning that it was giving DeMatteo Monness exclusive access to a, "proprietary set of search tools and promotional services," that will, "enhance the overall depth and breadth of the DM Consultant Network." A source tells SAI that the deal is a, "big blow to GLG," or Gerson Lehrman Group, the established top banana, with an "expert council" membership of nearly 200,000. GLG, privately held, has had valuations approaching $1 billion. It also has relationships with Credit Suisse and the other major banks. Last year the company made $284 million in revenues, according to Financial News. DM's revenue is thought to be in the $10-50 million range.
-
The twitter Ecosystem at the 140 character conference. This is the video that complements "The PREZI presentation i’m giving on June 16th at the 140 Character conference in New York."
-
The PREZI presentation i’m giving on June 16th at the 140 Character conference in New York.
-
This blog post is a summary of the forthcoming white paper from OneRiot, “The Inner Workings of a Realtime Search Engine.”
-
Marti Hearst’s new book, Search User Interfaces, is out, as Daniel Tunkelang reported earlier. The Social Search section discusses collaborative filtering, recommendation systems, and collaborative search, describing several systems along the full range of depth of mediation. Marti’s book joins a parade of other recent publications related to information retrieval, including…
Posted in Uncategorized | Leave a Comment »
June 30, 2009 by Josh Young
-
A highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. Readers will learn how to write Python programs to work with large collections of unstructured text.
-
If you've read this far, then you probably intend to use dbacl to automatically classify text documents, and possibly execute certain actions depending on the outcome. The bad news is that dbacl isn't designed for this. The good news is that there is a companion program, bayesol, which is. To use it, you just need to learn some Bayesian Decision Theory.
Posted in Uncategorized | Leave a Comment »
June 21, 2009 by Josh Young
-
One of the things he meant was that the question of whether we can mean is a trap, and also an addiction. A sense of fraudulence, of falling short of authenticity, is endemic to contemporary man, just as it is endemic to contemporary art. But it is not the only thing that is endemic to him or his art. It can exist alongside generosity, freedom and truth.
-
It's going to take a while, but I'm going to prove to you that the nexus where television and fiction converse and consort is self-conscious irony. Television regards irony the way the educated lonely regard television. Television both fears irony's capacity to expose, and needs it.
Posted in Uncategorized | Leave a Comment »
June 20, 2009 by Josh Young
-
Twitter disabled the automatic URL shortening if there is any slowness or other problem accessing the shortener. As far as bit.ly goes, they do have an API [http://bit.ly/apidocs] for getting all of the short versions of a long URL, so you might want to give that a shot. We've talked many times about the shortened/lengthened URL issue in search and hopefully we'll come up with a scalable solution at some point.
Posted in Uncategorized | Leave a Comment »