Data scientists and security experts have been saying “Think before you click” and “What goes on the internet stays forever” for a long time now. But have you ever wondered why and how the internet stores data?
The Internet Archive
Various technological developments, such as cloud computing, web archives, and analytics platforms enable people to store online data, making it accessible for researchers who want to study the history and development of a digital society. But web archiving started with the Internet Archive first, which provides free, permanent access to a vast collection of digital materials
Because most web content are digital, they’re only available in the same format and their lifetime is short. In the late 1990s, national libraries, including the Internet Archive, realized the importance of preserving digital information, so they started archiving web content. In 2003, UNESCO deemed digital content as cultural heritage and raised the need for appropriate action in preserving this heritage.
In 1996, Brewster Kahle founded the Internet Archive, a nonprofit organization based in California. Come 1999, the Archive expanded its collections to include scanned books, videos, and music.
In 2001, the Wayback Machine was born. The Archive developed this search interface to grant users access to historical versions of archived web pages of specified uniform resource locators or URLs. Today, the Internet Archive houses more than 40 million gigabytes of data.
The Archive developed its own technologies for collecting and storing data. In a week, it has 7,000 web crawler bots scouring the internet and making copies of millions of web pages. The bots take snapshots of the web pages at varying times and frequencies per day to preserve the website at different moments in time.
Its latest feat was to preserve the public Google+ posts before the platform started deleting data and officially shut down.
Data Storage and Collection in the Age of Social Media
More and more technologies are becoming available for businesses and third-party users, enabling them to collect and store pertinent information from the internet, including social media. This proves to be advantageous in several ways.
For one, lawyers and law enforcement bodies can use web archive browser plug-ins to capture and store online evidence that can be leveraged to support cases. This technology is especially useful now that the court may accept social media content as legal documents.
Although there are a lot of ways this innovation could benefit people, there are also potential consequences. In the age of social media, where people freely share personal details, thoughts, and opinions, this kind of technology poses a data security risk for millions of users.
Late last year, Facebook suffered from a massive data breach, affecting almost 50 million user accounts. The attackers were able to take over the accounts, forcing Facebook to log out 90 million users. This includes the 50 million hacked profiles plus another 40 million that were potentially at risk.
This threat to data security is a reminder that while technology can be beneficial, it can also have major pitfalls. Security experts continue to remind internet and social media users to take extra precautions and watch what they share on the web.