Friday, May 23, 2014

Related Work: Find It If You Can

Find It If You Can: A Game for Modeling Different Types of Web Search Success Using Interaction Data
Mikhail Ageev, Qi Guo, Dmitry Lagun, Eugene Agichtein
SIGIR 2011

A lot of recent search research makes use of large-scale query log analysis. This poses a challenge for researchers who are not affiliated with a major search engine, because there are very few public logs available research community. Find It If You Can, by Ageev et al., provides a nice example of how information retrieval researchers might gather their own search logs. And in doing so, it also addresses a persistent challenge with using logs: the fact that while we know what users are doing, we don't actually know what they are thinking.

Tuesday, May 6, 2014

The Dangers of Sharing Log Data

A lot of my research relies on analyzing behavioral log data, including query logs (example: personal navigation), web browser logs (example: web revisitation patterns), social media logs (example: #TwitterSearch), IM logs (example: impact of availability state), and GPS traces (example: trajectory-aware search). Behavioral logs provide a picture of human behavior as seen through the lens of the system that captures and records user activity.

However, behavioral logs can also provide a picture of a specific individual, and as such raise privacy concerns. Would you be willing to share your query history with me? I search for fairly mundane things, but even so, there’s no way I’d share an unfiltered version of my queries with you. As a result, despite good intentions several companies that have tried to make behavioral logs available to the research community have ended up in hot water. The two best known examples are AOL and Netflix.

Monday, May 5, 2014

Public Behavioral Logs

Large-scale behavioral log analysis allows us to do many things, including build better search engines, predict health epidemics, and support communities through crises. But they can also be hard to come by. This post covers some of the different types of publicly available behavioral logs available for study.