Have you ever thought about the capabilities of the NSA and their ability to spy on US citizens? Many people do not have the experience to comprehend the potential ability of the world’s greatest spy agency. I am not an employee or contractor for the NSA and as such I do not have any direct knowledge of their capabilities. However, I have been in IT for well over 3 decades in varying capacities. That experience gives me the ability to provide some history of IT capabilities and to provide some scope for the size of the NSA.
By Dave H, a reader of SHTFBlog
For the IT savvy forgive me. For the rest of you let’s start with some background information. What is the difference between data and information? An easy way to think of this is your phone book. The collection of names, numbers and addresses is data. In of itself it has no value. However, when you look up a number you retrieve information. Information is created when data is used to answer a question.
A query is a question or an interrogation of data that provides information. Metadata is simply data about data. When you call someone your conversation is the data. The metadata is the data that this number called that number on this date for this duration. Another concept is Moore’s Law (Gordon Moore pictured right). Moore’s law states that computing power will double every 2 years. Moore’s law can be roughly applied to data storage as well. Keep in mind the power of 2. So if you start with 1 in 2 years it will double to 2 in 2 more years it grows to 4 and in 2 more years it grows to 8. Enough definitions let’s talk about some history. Back in the early 1980’s I worked on an inverted list architecture data base called Model 204. When I worked on this system we were told that it was developed by the CIA. It used a bit map index structure to provide extremely fast answers to queries. However, it had problems. It was very slow to store data and if it was presented with a query that was not indexed, it would attempt to resolve the request by walking through the database one record at a time.
For example if you presented a query that said “Tell me all the oil wells drilled in Oklahoma?” M204 would instantly provide that response. That would probably be too much data. So let’s narrow the scope. Tell me all the wells in Oklahoma in these 5 counties that were operated by Anadarko. It would also give that answer instantly because all of these fields in the query were indexed when the data was stored. If we added another field to the query that was not indexed the database would walk through every record to resolve the query. So, tell me all the wells drilled in Oklahoma in these five counties operated by Anadarko and used a 3” diameter pipe would take the system hours to resolve. So, 1970’s technology allowed for instant responses for queries that were predefined with the proper index structures. Another point of reference for the 1980’s is in data storage. We kept track of 3 million oil wells with 45GB of storage. To put this in context you can put 128GB of storage in an iPhone 6.
Two Kids & Enterprise Search
Let’s fast forward to the middle of the 1990’s. Google comes on the scene. It can crawl through the Internet and provide an instant response to ad hoc queries. Let’s think about what happens with a Google search. It crawls through the Internet and provides millions of pages and information about those pages instantly. Those algorithms are so complex they are listed as one of the top ten most guarded corporate secrets (Video).
So how did a 21 and a 22 year old student at Stanford come up with something so complex that to this day they remain a secret? They didn’t. These 2 guys would need to be so smart that they would make Einstein look like a drooling moron. This was the work of teams of brilliant people that took decades to complete and probably cost millions if not billions of dollars to develop. To complete this work they would have needed intimate knowledge of the Internet. They would also need to develop new index structures never seen before.
They would need to develop find new ways to store and retrieve the data. They would need to develop this in a way to scale up to millions if not billions of transactions without losing performance. But you say it could be possible that these geniuses did develop Google. You may even think that given enough monkeys with enough typewriters and enough time they too could write the World Book Encyclopedia. Well these 2 monkeys would have needed to develop new groundbreaking work in 3 years. There simply weren’t enough monkeys or enough time to pull it off. They probably developed the front end. One last point, if they were so smart where is their other ground breaking work? Chrome? Google Glass? No, they are extensions of the previous technology. So, the doubters will say that they had momentary flashes of genius. Whatever!
So, who did develop the algorithms for Google? Who would have the need to provide instant responses to ad hoc queries? Who would have the money to put the mathematicians and scientists together for an extended period of time to create this body of work? It was the CIA, DARPA or the NSA. The story on the street is it was the NSA.
Why would the NSA give these two guys their trade secrets? The answer is pretty simple. The NSA was still bound by the 4th amendment and they could not collect information on US citizens. Private corporations are not bound by the 4th amendment. They can collect any information they want and sell it to whomever they want. Just like the credit agencies collect information on you and sell it. Google can collect information on you and sell it to the NSA. This is how they got around the 4th amendment. The Patriot Act ended the restrictions and your 1st and 4th amendment rights (reference).
Back to 1990’s technology we can see that we can do instant ad hoc queries. However, it cannot correlate the data. If we do a Google search about the number of oil wells in Oklahoma we will get a page that provides that information. We can even do a query about the number of oil wells in Oklahoma operated by Anadarko. When we start looking at specific counties and the response starts to fall apart. The other problem with Google from 1995 is it cannot handle natural language queries. So what about today? We can see the natural language barrier has been broken. IBM’s Watson was at the forefront and now that technology is available from Apple and Google on telephones. There is other technology that is becoming available called data analytics and predictive analytics. Here is a very good explanation of data analytics (click here).
In short data analytics is the process of developing conclusions from raw data. This means that we can now ask natural language queries against large quantities of raw data and correlate conclusions. Data analytics does not predict future behavior. That is where predictive analytics comes into play. Predictive analytics uses various statistical methods as well as data mining to determine the probability of someone taking a future action. For example: Let’s take something easy, say ammunition purchases. Someone could look at your past purchasing history to make a determination if you would purchase ammunition at a certain price and how much ammunition you would purchase. So, if you saw Lake City 5.56 ammo at $1000/1000 you probably would not purchase any ammunition. However, if Lake City .556 ammo was $200/1000 you would probably purchase several thousand rounds. Predictive analytics would give the statistical probabilities of your personal purchases.
So, where does this go? Let’s say I am the Feds and I want to know the anti-government groups in Idaho, I also want to know who is in each group and how likely they are to take action against the government. Data and predictive analytics provide this answer. How do they do it? The query could be done in a natural language like the question posed above. I would start by finding people in Idaho who visit anti-government websites. I would then look at the metadata for their phone calls and emails to determine links between the individuals. This would give me the groups. I would then look at their purchasing habits to determine their resolve as a rebel. Do they buy survivalist gear? Do they write for these anti-government sites? How much long term food storage have they purchased? How many firearm purchases did they make? Did they buy large quantities of ammunition? Did they purchase survivalist books? Did they check them out from the library? So, you take the metadata from website visits along with the metadata from phone and email records to determine groups then you use the information from purchasing history to determine the seriousness of the groups.
But what about the likelihood that any of these groups would take action against the government? Once I have determined the active anti-government groups in Idaho, I would use predictive analytics to determine the likelihood they will attack. This means I would use the actual contents of their emails and phone calls to determine their intent. The amazing part is this can be done automatically for all 330,000,000 Americans.
Then there is the Patriot Act. (Reference)
Both of these programs opened the door to unprecedented surveillance on US citizens. They are a direct assault on our personal freedoms and liberties but it actually started with the Echelon Project. (Reference)
So, how much does the NSA have? Their new data center in Bluffdale, Utah is a 1,000,000sqft facility. They say that 100,000sqft is currently data center and the remaining space is administrative space. Let’s translate, that’s another 900,000sqft for expansion. The data center currently consumes 65 megawatts of power. Also, keep in mind that this is only the newest of the NSA’s data centers. They obviously had other facilities before they built this one. (Reference) This is right off the NSA website.
Data Storage Capacity
In February 2012, Utah Governor Gary R. Herbert revealed that the Utah Data Center would be the “first facility in the world expected to gather and house a yottabyte”. Since then, conflicting media reports have also estimated our storage capacity in terms of zettabytes and exabytes. While the actual capacity is classified for NATIONAL SECURITY REASONS, we can say this: The Utah Data Center was built with future expansion in mind and the ultimate capacity will definitely be “alottabytes”!
The steady rise in available computer power and the development of novel computer platforms will enable us to easily turn the huge volume of incoming data into an asset to be exploited, for the good of the nation.
So, What Is A Yottabyte?
1 YB = 1000(8)bytes = 10(24)bytes = 1000000000000000000000000bytes = 1000zettabytes = 1trillionterabytes
Now keep in mind Moore’s law. That capacity will double roughly every 2 years. What kind of communication line would you need to fill a yottabyte? Let’s look at it like it was a swimming pool. Could you fill a 1,000,000sqft swimming pool with a garden hose? How about a fire hose? You would need something much larger to feed this capacity. When you hear that the NSA has direct links into carrier gear you should probably believe it. Going back to Moore’s law 1YB will turn into 2YB and then 4YB and 8YB and on and on and on. I was in a class a while back and one of the people was from Yahoo. They slipped and said that we would be in shock at the amount of data they were required to send to the NSA. (Reference)
Should we be afraid? You bet. Look at the quote from the NSA page.
“The steady rise in available computer power and the development of novel computer platforms will enable us to easily turn the huge volume of incoming data into an asset to be exploited, for the good of the nation.”
Who benefits from this surveillance? Obviously, both the Republicans and the Democrats are okay with the NSA spying or they would not have funded the facility. What happens if our government becomes malevolent? What happens if a Hitler comes to power in the US? Why can’t we get our 1st and 4th amendments back? What does it say about the capabilities of the NSA, if natural language, data analytics and predictive analytics have been released to the common public? I hope this article gives you enough information to open your eyes and ask questions about our friends at the NSA.