This blog post will show how much data you can interfer from an arbitrary email sign up. You will be surprised how much data and how accurate a good bayesian guessing approach can work.
My IP adresss
So lets see what we got when you sign up to a newsletter. First we have your ip address. In this example I will use my current address 188.8.131.52.
So there are services like MaxMind which tell you for a certain ip where the person is located with a 15km accuracy. So here is what I get:
|IP Address||Country Code||Location||Postal Code||Coordinates||ISP|
In real I am currently in the Dortustr. 57 in Potsdam which is 18,5 km away. So MaxMinds guess is pretty accurate.
My HTTP request
The following table shows this data:
POST /semRecSys-rest/NewsletterService/6ecb5bf5-3580-445c-a3d2-bc64493e19b7 HTTP/1.1 Host: recsys.incentergy.de Connection: keep-alive Content-Length: 46 Cache-Control: max-age=0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Origin: http://www.incentergy.de User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36 Content-Type: application/x-www-form-urlencoded Referer: http://www.incentergy.de/ Accept-Encoding: gzip,deflate,sdch Accept-Language: en,de;q=0.8 Cookie: evertimeCookie=d6d46657-31d9-4713-8e81-09695591f8e5; AWSELB=B7DB2B2316675BAFCE6F4ECBED33D738EADC2F06DB1CED99806F8AB0AD854BD143BE1BD276D82B2570B0C1BC51175259A6C92A8F7DF56A3C17A112737899AA98ED97141989; __utma=124758746.791261515.1383593642.1389706837.1390039665.40; __utmb=1247587184.108.40.2060039665; __utmc=124758746; __utmz=124758746.1389028775.37.6.utmcsr=incentergy|utmccn=newsletter_2624FE5B-F07F-4EC5-8095-2DEE366ACA5C|utmcmd=email mbox=manuel.blechschmidt%40gmail.com&submit=go
You can see that my browser automatically transmits a lof of information about me so the most importants parts are I am using a Mac with Chrome. Further google analytics did his job and already added some more information about me.
My email address itself
As you can see my email address contains my first name and my last name. It is easily guessable that my first name is Manuel and my last name is Blechschmidt. So with this data we can do more. Lets start:
That is the easiest one. Manuel is a male given name. Easy 🙂
Like in fashion there are trends in naming. So parents are sometimes more likely to give their children one name compared to another. A very famous example is Kevin. There is even a psychological trait called Kevinism.
So lets have a look for my name Manuel how old I might be.
So if I would be american I would be likely to be around 5 years or around 18. Stop, we already figured out that I am german from my ip address. So lets get the data from germany:
Hmm, so Manuel was quite common between 1970 – 1992. In real I am born in 1986 and I am 27.
Ok, I got an age and I got a location. Can I now say something about creditworthiness. Yes, I can. It is called geo scoring. Further salary is correlated with age.
Because I don’t have accurate scoring data for zip codes I will just use the average rental prices for apartments (Mietspiegel) which is published by most of the german communities according to § 558c Abs. 4 BGB. Further Immobilienscout24 collects and visualizes this data.
As you can see Teltow is not that expenses so my credit worthiness based on this information is not that good. Further this information is also a little bit unreliable.
Wait, we figured out something in the HTTP request. I am using a Mac. Can we use this? Yes, we can.
Ok, so next thing would be marital status. Again we can use age and location. The older I become the more likely I am to be married further if I am living in a city I am more likely to be single.
Currently I have no idea how to guess how many children a person has based on the given data. Any ideas are appreciated.
Data from other networks
Ad networks like Double Click or iAd are already offering to transmit specific data for a user and this data can be later used for bidding on advertisement. It would be possible to use this data too.
Context of the website
Every website has its target group. This can also be taken into account for guessing information about you.
If you are eager to learn more and how to use all this information about internet users to make them happier and increase your revenue, subscribe to our newsletter in the upper right corner or contact us.