Information and Content Management

Saturday, 27 July 2013

The Semantic Web

http://www.w3.org/2001/sw/

There's talks of the prospect of another Web standard; it's the 3rd version and is dubbed 'The Semantic Web'. The name refers to the method of information gathering which is used to personalize content towards a user; advertisements, search results, and so forth. To really understand the semantic web - what it is and its implications - one must first understand its history, or rather, the versions of the web preceding it. It should also be noted that there is no clearly defined point at which the web changes versions; the Web is constantly evolving, we're just ascribing characteristics to an era.

The 'Web 1.0' is known as the 'Static Web'; it's centred around a top-down approach wherein web-pages are put up by the maintainer of the website, and there is no external input. A pro of this system is that content is put up by a single person or team, and its authenticity can be affiliated with that website. The cons mostly pertain to the lack of input; there are no external comments, and so no reviews of the information's authenticity by third-parties; links and other content must be shared via email, or some other kind of peer-to-peer communication.

The 'Web 2.0' could be seen as a more 'Social Web'; it deals with people's propensity towards social interactions; it provides a means to review others' content, for collaborative efforts, and general discussions. It focuses on a more bottom-up approach where people can comment on, or moreover, interact with web pages. This lets the static nature of 'Web 1.0' evolve into something more dynamic. The major pro is that users can voice their opinions; this has many implications, foremost is that the quality of something can be assessed by a third-party. It would also prove useful in sharing content and ideas. The major con of user-submitted content is that it's not necessarily credible, and anyone with an opinion can voice it, regardless of its integrity.

The term 'Web 2.0' was coined in 1999, and popularized in 2004, but there are examples of Web 2.0 concepts being used before the term was coined; Amazon, for example, has allowed user reviews of its products since its launch in 1995. This goes to assert the fluidity of the evolution of the Web.

Finally, we get to the main point - the 'Web 3.0'. It's a prospective version, and there's some debate over what it'll be, but the most common implementation is referred to as the 'Semantic Web'. Its gist is evident by the keyword 'semantic' which pertains to the meaning of things. So a search engine in the Web 3.0 era would gather and store information and relate results to a search profile; it would be able to extrapolate on a query.

This might mean that a Web 3.0 era search engine would take into account a personality profile on its users; when searching for 'Egypt', a journalist might find news on protests, whereas a backpacker might find travel deals. This would mean that results would relate more to the user, and they would likely find what they're looking for more quickly. This does help streamline searching, but the wool would be pulled over one's eyes; one traps themselves in a 'bubble'. Another implication is having a mismatched profile when using another's computer.

As before, there are websites which already implement Web 3.0 concepts before the era; Google, for example, takes search profiles and location data into account with its search results.

There are also implications on privacy; verbose data profiles would be stored on every individual, psychologically profiling and catagorising them. Eric Schmidt, CEO of Google is quoted as saying "We Know Where You Are. We Know Where You've Been. We Can More Or Less Know What You're Thinking About.". I disagree with tracking, particularly of this calibre; I find it unnecessary - Google claims their tracking significantly helps their search results, but anonymous alternatives such as DuckDuckGo often provide similar capabilities.

There's also philosophical implications to consider - if sensors pervade into every faucet of one's life, would the data gathered be a true representation of that person? Can a person really be defined as a collection of data? What are the limits of personality profiling? Can enough data be gathered on a person such that their personality and preferences can be deducted, essentially making that person completely predictable? There's even a question pertaining to qualia as to where the line of sentience is drawn: if a machine has the capability of collecting sensory input and utilizing it autonomously, isn't that not the (or a) definition of sentience?

Friday, 26 July 2013

Facebook's Privacy and Security

https://www.facebookbrand.com/

The following is an assessment of some of Facebook's policies pertaining to privacy and security; it is by no means comprehensive, and any legalities should be taken from their source.
Terms of Service
Data Use Policy
Facebook Principles

Privacy

Facebook outlines in a list of their principles that they believe in a free flow of information (points 1 & 3), to the extent granted by the owner of said information (point 2).

A section of their Data Use Policy outlines the kind of data Facebook gathers on its users; it mentions things like the information required for signing up to the site and information posted on the site. Such information might extend from the user's birthday and gender, to status updates and photos. It goes on to say that it also gathers information which others have posted about the user, such as when they've been tagged in a photo, and pages which the user has 'liked'.

The policy also mentions how it shares the information they've gathered. Information which is set as being publicly accessible is just that - accessible by anyone. Said information is also associated with the user, and can show up on a search of them. Other information is without a guise of being able to be made private - your name and profile picture, for example. The policy also mentions the information's role in advertisements; towards the suggestions of people you might know; service improvements and other internal operations, and so forth. It goes on to say that the information is not shared without either receiving the user's permission; giving the user notice; or removing identifying information.

They also outline in their Terms of Service (point 2, sub-point 1) that when you post a piece of intellectual property, you maintain ownership of said content, but grant them a temporary license. This temporary license "..ends when you delete your IP content or your account unless your content has been shared with others, and they have not deleted it.".

They go on to say that (point 2, sub-point 2) when you delete IP content, it's akin to emptying the recycling bin of your computer; it's not specified what this means, but it probably means that the reference to the file is deleted from the lookup list on the hard drive, but wouldn't be deleted from the hard drive itself until the space the file was using is overwritten, and so it renders the file inaccessible, but not necessarily unrecoverable. [1][2]

Security

In their Data Use Policy, Facebook mentions some of the things it does and doesn't want its users, developers and advertisers to do:

Point 3 raises points such as not uploading viruses or malicious code; not spamming or advertising without Facebook's permission; not displaying mature content without age-based restrictions, and so forth.

Point 4 mentions account-based activities such as sharing one's password, and the people who aren't allowed to create an account.

Point 5 mentions other people's rights - not gathering their data under a false guise; not breaching copyright, etc.
Point 9 mentions developers' responsibilities - limits to the access and use of information, particularly in relation to the user's consent thereof; misrepresentation of affiliations with Facebook; there's also a clause for Facebook to be able to audit an application to ensure its safety.
Point 15 mentions termination of a user's account if they violate the policy.

There are many other points which I haven't raised, but perhaps the most important is point 16, sub-point 3, which states that, while they try to keep the site bug-free, its use is without warranty, and thus at the user's peril.

The policy doesn't seem to mention what they'll do in the event of a security breach, but perhaps the gist could be inferred by their preventative measures:
There is a Facebook page set up pertaining to security https://www.facebook.com/security.
There is also a page offering a bounty starting at $500 per bug found https://www.facebook.com/whitehat.
Facebook even released a message after a bug was found via its white hat program which could have exposed the contact details of over 6 million of its users.

Some bugs have also been found throughout the course of Facebook's history, mainly involving editing the design of the page; one such example is this guy who edited the HTML DOM via JavaScript to force his own CSS, and states that "Using JavaScript, a designer is then able to modify the HTML DOM to add (or delete) page content at will.". Another example involved a self-propogating worm which changed the appearence of the profile of anyone who viewed an infected profile to a layout similar to that of MySpace. He then goes on to say that "Upon further penetration testing into Facebook, we've found at least three different XSS vulnerabilities, but none as major as the original bug. The vulnerabilities could be used to steal accounts with just the click of a button..". Typically, these bugs are patched within a day of their discovery. These are examples of XSS (cross-site scripting) bugs which are quite hazardous; an excerpt from this article explains why:

"...cross-site scripting vulnerabilities are fairly common. More serious is the design flaw that allows the vulnerability to be widely used. Once a vulnerability has been found on the Facebook site, there are no limits on what the attacker can do. Hidden form IDs can be harvested for any form. (Notably, one of these forms will submit a charge to a user's credit card.)".

This article details the creation of such a bug.

Summary

The terms outlined in the Data Use Policy and Terms of Service aren't unreasonable; Facebook has even set up a page for discussions on policy changes after an outcry over a change to the ToS which meant they could "Do anything they want with your content. Forever.". However, they ocassionally cross the line to complete invasion of privacy, and perhaps their most depraved act was their purported involvement with the NSA surveillance operation, PRISM, which Mark Zuckerberg (co-founder of Facebook) denied. The comment did, however, come under fire for its likeness to the comment released by Larry Page (co-founder of Google). One of the criticisms is the careful wording; each maintain that they've not given "direct access" to their servers, but some employees within the company have come out (under the guise of anonymity, as their divulgence of such is illegal) indicating the discussion of plans to place information on intermediary file servers. One article explains it as such:

"In at least two cases, at Google and Facebook, one of the plans discussed was to build separate, secure portals, like a digital version of the secure physical rooms that have long existed for classified information, in some instances on company servers. Through these online rooms, the government would request data, companies would deposit it and the government would retrieve it, people briefed on the discussions said."

Other such articles: [1][2][3]

So given Facebook's history of surreptitious surveillance, one's trust might begin to diminish.

https://www.facebookbrand.com/

Thursday, 25 July 2013

A Review of RSS

https://www.tinyurl.com/rss-use

RSS is often dubbed Really Simple Syndication. It's a standard based on XML, and provides a means of subscribing to feeds with an aggregator. This week, I was tasked with using one such aggregator to subscribe to several RSS feeds and evaluate my experience.

RSS is good in that it lets you keep up to date with the latest news, blogs, social media and so forth. It also saves time by retrieving said feeds automatically, and displays them in a single place, without having to go to each individual site to which you're subscribed.

The concept is good in theory, but I've also come across some drawbacks; after subscribing to a few feeds, there's an encumberance of updates, and it becomes tedious to sift through all of them. This in itself wouldn't be particularly bad, but I've found most of the posts of the feeds are themselves tedious. It could be that the feeds to which I'm subscribed are sub-par, but reading some of the other blogs, there seems to be a consensus. Unfortunately, I've found myself ignoring the feed and tending towards the main sites.

In short, RSS is a very good concept, which hasn't been executed as well as it perhaps could have been.

Information and Content Management - 3623ICT