According to current estimates, the Deep Web, or UnderNet, contains somewhere between 750,000 petabytes and 2 exabytes of information (1 exabyte = 1,0000 terabytes (TB)). The numbers defy comprehension by simple human minds.
To put 2 exobytes into perspective, the world's largest library, the Library of Congress, which contains virtually everything written since the history of writing, houses approximately 8TB of data. As an aside, the Wire in 2013 estimated that the NSA collects 29TB of data from its various sources every day.
The Deep Web is the new frontier of information science, but massive technical challenges are still to be understood and resolved in order to mine this wealth of information.
Some sites have login passwords that restrict the ability of web crawlers to gain access and to index information. There are data incompatibilities, and form data that requires specific inputs in order to gain specific responses. There are internal pages with no external links, unpublished and unlisted posts, and information in JPEG or MP4 format, the contents of which cannot be analysed by indexing mechanisms.
These and hundreds of other technical difficulties have left the true meat of the Internet untapped.
The Deep Web contains shockingly valuable information. Can you imagine how cancer research would blossom if every researcher had instant access to every research paper done by every single university and research lab in the world?
These papers generally take years or decades to float up to the Surface Web where than can be indexed by Google.
Or imagine a skyscraper architect having instant access to up-to-the minute materials research. Politics, greed, jealousy and fear generally keep these papers buried so deep in the web that competitors, or just the general public who truly need help, will not get access to them until the papers are no longer topical.
Every corporation worth its salt is throwing money at Deep Web research, not least Google. The company that unlocks the mysteries of the Deep Web will obtain power of an enormous magnitude.
In my profession, unlocking locked doors is my stock in trade. I have been playing with the Deep Web for as long as the word has existed (coined in 2002 if I remember correctly). As a security researcher, I must understand how people break into things before I can create a means of preventing such a break in. Would you, for example, buy a lock from a company who professed to have no knowledge of how locks are picked? It would seem foolish to me.
The Dark Web unveiled
The most astonishing subset of the Deep Web is a collection of dark alleys called the Dark Web. The Dark Web is generally thought of as a collection of criminal elements intent on subverting the law, stealing our money, and possibly kidnapping our daughters. The Silk Road was one of the most famous elements of the Dark Web that was taken down by the FBI a few short years ago.
The Dark Web, unlike the rest of the Deep Web, which is unknown due simply to our technical incompetence, is defined as that subset of the Deep Web which is purposely obscured to avoid detection by those who might become its enemies.
In the Dark Web we of course find mind-numbing pornography, advertisements for hit men, drugs of every kind, fake Cartier watches that even Cartier cannot distinguish, human traffickers of every kind, money launderers - and even lawyers.
However, we also find an equal quantity of human rights activists who, if their identities were known, would certainly be executed by their home country.
We find scientific or religious theories that are unpopular and would invite repercussions if the authors were known. We find whistle-blowers who pass documents of delicate sensitivity but powerful impact.
We also find the 29 terabytes of information collected by the NSA every day, and data from every other covert agency, in the Dark Web. It also contains the cream of the crop of conspiracy theorists.
I am not in any way a conspiracy theorist and have no time for such nonsense. At my age, life is certainly too short. I have spent much of my free time cruising the Dark Web and avoiding the subset of conspiracy folks. But sometimes, a mind such as mine, if it is late enough at night and I have had a glass of wine too many, gets trapped, like a fly in a spiderweb, in these convoluted horrors.
This happened two days ago when my Dark Web search engine - which, in spite of its limited capabilities, was proudly developed by myself - popped up a still frame from an ancient Simpsons Cartoon (right).
It was accompanied by the caption: Not for the faint of heart.
On first glance the illusion of the New York Twin Towers consistent with the number "11" was clear, as was the large number 9.
New York was clearly written at the top, and the girl's three fingers were separated on the right hand; the bottom two fingers were clasped together, and splayed slightly apart, the index finger. To a conspiracy theorist they might be construed as representing the second Millennium: year one.
As a life-long sceptic, I bounced up to the surface web for a moment to validate the frame. I expected it to be completely bogus, or an episode of the Simpsons, clearly done in bad taste, from long after 9/11. However, much to the detriment of my productivity and focus for the evening, the frame was legitimate and first appeared in the episode :"The City of New York Versus Homer Simpson" broadcast for the first time on 21 September 1997 almost four years before the 9/11 tragedy.
Learning things of no value
I was, horrifically, caught in the spider's web. Returning to the UnderNet, I began reading a 220-page analysis of the World Political/Financial Conditions of 1997 and its relationship to Fox Inc, Matt Groening, Al Jean, John Fink, Matt Selman and dozens of other "Ivy League Conspirators".
I found myself learning things of absolutely no value to me, and if true, about which I could do nothing.
Yet some dark corner of my being refused to let me tear my eyes from the screen. When the condensed plot was read aloud on the screen, I felt my heart sink. Homer Simpson goes to a bar in New York to find a "designated driver". It seems that Homer is responsible for 91% of all traffic accidents in New York and cannot drive (the percent sign is written to look like a "1", so yet another 911).
Homer's car is parked in the middle of the World Trade Center square. When he arrives he has to go to the bathroom, chooses the South Tower first, finds the restroom out of order then goes to the North Tower (the South Tower, I believe was the first to be struck, the North was the second).
After analysing the entire episode, my mind is numb. You can all verify this for yourselves by simply perusing the surface web.
In any case, sanity somehow entered my brain in the guise of my wife who reminded me that I had only an hour left for my deadline, and that I appeared to have not even begun the work.
I slowly extracted myself, tripping only once over a profoundly deep medical analysis proving that Michelle Obama was a man, and briefly glancing at three alternate theories for the exact date and time for the end of the world. And - briefly - scanning a few foolproof techniques for marrying wealthy women.
Deeper down the rabbit hole
I wrote the Deep Web Browser for my own benefit (and because I bore easily). It is not in any form that can be sold or could be useful to another. I'm a firm believer in brute force when it comes to software and resort to subtlety and elegance only when I'm tired or being slothful. This is why I do not share it with anyone other than my best, and most tolerant friends.
However, one day someone will produce a usable Deep Web browser, and possibly even a Dark Web browser.
I hope when this happens, society does not immediately shut down due to people's inabilities to ever tear themselves away from their computer screens, due to the lure of infinitely imagined pornography, fantasies of cheap and discrete hitmen who can revenge all of your past slights, high quality cocaine for $5 a gram, escorts from Bangkok who will come home with you, show you the greatest time of your life, and afterwards, clean your house.
Not to mention the near-infinite number of pages designed to suck your brain out of your ears.
My next instalment is a distillation of interviews I have had so far with some of the half million highly publicised members of Adult FriendFinder whose records were hacked from Friend Finder's Inc. and displayed prominently, including sexual preferences, fantasies, etc. on a well-known Darknet site. It has proved to be both tragic and entertaining so far, and it dovetails nicely into what our real problem is – privacy.