Monday, May 12, 2008

Random Thoughts on Data Portability Complications [Technical Note]

This is kind-of off-topic for my blog, so if you're interested in business & spirituality, just skip this technical note.

I've linked to some of the major Data Portability Discussions & Articles at FriendFeed. See this Data Portability discussion started by Robert Scoble on FriendFeed

Ellen Petry Leanse's "MicroPost: Data Portability…what’s in it for me?" (@chep2m) asked me why I was worried about Data Portability? Could I provide details?

My quotes from Robert Scoble's discussion:

Open standards have some tough questions which are only semi-technical. (1) How do we handle people wanting different data in LinkedIn vs. Facebook? e.g. work e-mail in LinkedIn, personal e-mail in Facebook (2) Who actually owns a tag on a picture in Facebook? Who owns the wall post - the owner or the recipient, especially of a chain-post. (3) Who owns "scraped" data (ala ZoomInfo), and who has the rights to correct the data? I had to e-mail Spock to change some old data CONTROLLED by a defunct e-mail.

Data portability on social networks opens up OS issues (1) replication - e.g. loops, time-stamps with different clocks (2) failure-recovery & sabotage-recovery modes and decisions (3) control & authentication (4) telling apart "updates" versus "deliberate changes" - the desire to have different information in 2 places - versioning & forking (5) security, when the weakest link fails

I used to work in a Operating Systems laboratory. One of our topics was synchronization.

Let's say you have 3 accounts: MySpace, LinkedIn, Facebook...

(1) Some people keep different info deliberately.
e.g. one lady has full bio on LinkedIn, but very light personal stuff (no work details) on MySpace for dating
some of my friends have different e-malls
(a) LinkedIn is a networking address, and not their best one
(b) Facebook for bloggers often has no phone numbers, but for most people it's the only place they give out cell numbers

I've been watching the behaviors of ~500 friends on social networks for the past 10 months, and they vary a LOT.

Some use LinkedIn like a big Plaxo, others like a very trusted group of friends.
Most use Facebook as personal and give home address & cell phone, but some use it for business networking - no e-mail, no phone, just website

(2) Most people consider phone numbers more private than e-mails (DM, Twitter, etc...)
EXCEPT doctors. For liability reasons, they don't want patients sending them info via e-mail.

(3) But lets say you want all your MySpace, LinkedIn, Facebook info to be the same.
a) Your e-mail is "first@first.com" on all
b) MySpace is updated to "second@second.com"
c) MySpace goes down before transmitting info to others.
d) LinkedIn is updated to "third@third.com"
e) MySpace comes back up.
f) What e-mail does Facebook import? Esp. if time clocks are messed up?

What happens if Tribe.net crashes, and is restored from an older backup, which has an older e-mail?
Will the "older e-mail" be perceived as an update?

There are so many scenarios! After 1-2 years of research, I gave up the "automatic conflict resolver" idea, and decided that "humans must be involved" (or that someone much smarter than me was needed to create a decision tree or Bayesian Network to automatically resolve conflicts". Best I think we can do is a 95% automatic resolver, which asks for human confirmation in tough cases (or ALL cases via Captcha)>

(4) "Single point of failure" - If someone hacks MySpace.com, the rest of my sites are relatively safe. What about the future? I was a hacker in the past. Everything is hackable. We hired a guy who broke into a lot of major corporations as "Head of Security" for one of our companies. Also, many friends have had servers physically stolen. The rash of stolen credit card data stories lately does NOT make me feel safer.

Some personal details on variant info.

(4) I have 6-10+ e-mail accounts. I give Facebook & LinkedIn almost all e-mail addresses, since I use them like "white pages",
for high school, college, work, and fun friends to find me.

(5) I'm a nomad. What do I use for location? People with 2-10+ homes/countries?

(6) What happens when "scraped data" starts fighting with "real data". Spock.com scraped a page for me, which I was UNABLE to edit, since the e-mail address on that page had been defunct for 6 years. Took me 2 weeks to correct & recorrect the information (since a later bug caused it to resurface).

No comments: