your data
A few thoughts reflecting on Sen. Wyden’s not quite proposal. As noted on HN there’s some question of exactly what your data is. Is it information you created (or otherwise control) or is it information about you? Is it an email you composed by typing on a keyboard or is it a log entry created by an autonomous system of whose existence you are unaware? The thornier issues of what the government can or cannot do are best deferred until this basic question is answered.
A complete your data test would likely involve several factors, much like the fair use test does, and be decided on a case by case basis. For starters, though, we can begin by asking one question. To what extent can you describe the data? The owner of some data is likely to be the party that can describe the data (and importantly, its format) most accurately and completely. This is the tried and true Lost and Found test. “Hey, I lost my iPod.” “Can you describe it?” If the hotel concierge has a green iPod, but I tell them I lost a black iPod, it’s probably not mine.
Certainly, for information about you, you’ll be able to describe some of it. If I know I called 555-9876 at 1:23 for 45 minutes, I can describe that call record. But I probably can’t tell what cell tower ID I was connected to, or various other facts included in that record. You can make a reasonable guess that I log web visits in the standard web log format. But you don’t know which filenames I use, or which URLs are logged specially, or when or if I aggregate log entries by IP or by URL.
Another question to ask: what would be the consequences of revealing this data to its purported owner? For example, you might claim ownership over all web access logs tied to your IP. But to reveal to any connecting IP a history of its previous visits would be a gross violation of privacy. Imagine if Wikipedia had a myhistory URL that showed you the last 20 articles viewed by whatever IP you’re connecting from. Suddenly public wifi browsing would be a lot more exciting. Or in other words, if your data cannot be easily separated from others’ data, it may not be your data.
As a currently relevant side debate, if you take a picture/video of a police officer, is that your picture or the police officer’s picture? Whatever definition or delineation of your data vs data about you we come up with, it should give the right answer to variants of the preceding question, although the question is likely moot since there is a third answer possible, public data.
More to the point, though, we should consider scenarios in which we might find ourselves in possession of others’ data and what restrictions we would or would not like to have. If I notice some suspicious attempted hacking attempts, will the language of this new law permit me to share my logs with Peter to cross correlate, or will that be violating my hacker friend’s expectation of privacy?