Strange Activity on my Arduino Website

Hi

I would appreciate any thoughts as regards this matter ...

For some days I have been receiving a series of one-off identical html requests at random times during the day to my Arduino website at http://www.2wg.co.nz

What is identical about these requests is that they feature the same get URL (access/download the file at /public/overview.pdf) and involve no other web page browsing. Some system (or systems) are invoking a single URL multiple times a days.

These html requests contain identical User-Agent, Accept, Accept-Language, Accept-Encoding, Host, Referer and Connection html request fields. When I say identical it is noted that there are sometimes very minor differences in the User-Agent field indicating possibly multiple very similarly configured machines making the html requests.

What is not identical about these html requests is that they seem to be coming from random IP addresses all around the world.

What I am observing seems to be symptomatic of a distributed denial of service attack (but the volumes of the requests are insignificant) and/or IP Address spoofing.

It happens that until this afternoon the /public/overview.pdf file was the largest publically accessible file/document on my website.

Here is an example of one of the html requests as written to my system's logs:

6th Jun 22:35:41 ** HTML REQUEST **
- Browser IP: 183.249.42.227
- Socket #: 1
- Dest Port: 36459
- GET /PUBLIC/OVERVIEW.PDF/ HTTP/1.1
- Host: www.2wg.co.nz
- USER-AGENT: MOZILLA/5.0 (MACINTOSH; INTEL MAC OS X 10.7; RV:34.0) GECKO/20100101 FIREFOX/34.0
- Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap,
- */*;q=0.1
- Accept-Language: en
- Accept-Encoding: gzip, deflate
- Referer: http://www.2wg.co.nz/
- Connection: keep-alive

For yesterday and part of today here is the list of the IP addresses for a series of these (typically) identical html requests. You will see that I have looked up the location of the various IP addresses.

6th Jun 05:28:11 IP: 23.94.63.56     New York  , United States , Static
6th Jun 07:01:58 IP: 83.143.242.28               United Kingdom, Static
6th Jun 06:37:56 IP: 198.12.91.229   New York  , United States , Static
6th Jun 15:21:51 IP: 31.220.44.161   Bayern    , Germany       , Static
6th Jun 19:15:00 IP: 80.91.175.86    Kyyiv     , Ukraine       , Static
6th Jun 22:07:31 IP: 180.180.119.88  Bangkok   , Thailand      , Static
6th Jun 22:35:41 IP: 183.249.42.227              China         , Static   
7th Jun 01:25:12 IP: 78.129.131.98   London    , United Kingdom, Static
7th Jun 02:17:32 IP: 213.107.68.149              United Kingdom, Dynamic
7th Jun 02:18:42 IP: 213.107.68.149
7th Jun 02:19:45 IP: 213.107.68.149
7th Jun 02:25:16 IP: 172.245.125.161 New York  , United States , Dynamic
7th Jun 05:52:47 IP: 45.61.34.145    New Mexico, United States , Static
7th Jun 07:29:02 IP: 104.143.16.53   Colorado  , United States , Static 
7th Jun 07:30:22 IP: 94.126.144.73             , Portugal      , Static
7th Jun 07:31:00 IP: 104.140.83.57   Nevada    , United States , Static

My observance of these incoming html requests prompted me to research IP Address spoofing and make some changes to four local application URLs (commands) that are only supposed to work on my local area network. The system now uses continuously random and time restricted command strings that are only published on my system's local web page - hopefully I have covered off external spoofing of my local LAN IP addresses to invoke these critical URL commands.

Anyway, any thoughts as to what is behind this activity, and any precautions that I should consider, would be appreciated.

Cheers

Catweazle NZ

When googling the IP numbers one by one, most of them are suspicious. They are reported in databases of forum spam, and one in projecthoneypot.org which was to capture forum spammers or they are in some kind of blacklist.
When I search for normal IP numbers, the results from Google are totally different.
I think it is a botnet of automated forum spammers.

That file (overview.pdf) could be in a database. In such cases I rename the file, so it can't be found anymore.

Peter_n:
When googling the IP numbers one by one, most of them are suspicious. They are reported in databases of forum spam, and one in projecthoneypot.org which was to capture forum spammers or they are in some kind of blacklist.
When I search for normal IP numbers, the results from Google are totally different.
I think it is a botnet of automated forum spammers.

That file (overview.pdf) could be in a database. In such cases I rename the file, so it can't be found anymore.

Hi

Thanks for the information. I have swapped the 440k file for a 10k file that can be downloaded much quicker with minimal impact on my website. I will observe this activity for a few days and may yet delete the file all together. I agree the filename has likely got into a database somewhere.

I have seen other instances of ip addresses accessing odd files on my system. I may need to add another level of control functionality to my system to identify and ignore this stuff.

Catweazle NZ

Web crawlers and search engines are searching the internet all the time. Some try to find hidden files. That is probably the odd files you see. One of my websites has 200 visitors a day, and that is only half the trafic, the rest is those crawlers and search engines.

Peter_n:
Web crawlers and search engines are searching the internet all the time. Some try to find hidden files. That is probably the odd files you see. One of my websites has 200 visitors a day, and that is only half the trafic, the rest is those crawlers and search engines.

I am not worried about web crawlers - as shown here my system identifies most of them - Google has visited my website 79 times today:

My system also does a pretty good job of weeding out invalid URL hacks - mostly php attacks.

What I am interested in is why I am getting these several near identical valid URL html requests from all around the world. You have given my a helpful suggestion but I am interested in further comments or confirmation from others.

Catweazle NZ

After many months of putting up with this activity I worked out that the files being accessed (which today are more recent files, not the same as before) have been indexed by Bing, despite instructions in my robots.txt file for all web crawlers not to access/index the files in question.

If I type the file names into Bing they come up as the first search result - but I do not know what search term people all around the world are using to find these files in Bing. It is also the case that I never get a reference to Bing when people access these files.

I have changed my robots.txt file to include specific Bing instructions not to index the files - time will tell if it works.

Catweazle NZ

IS it possible someone using Tor browser could be accessing your site? I just accessed your site using Tor and it worked fine. Each access would be from a different location around the world!

Paul

Paul_KD7HB:
IS it possible someone using Tor browser could be accessing your site? I just accessed your site using Tor and it worked fine. Each access would be from a different location around the world!

Paul

Hi

It is possible - my application does not do anything special for individual browsers and I do not attempt to monitor and record statistics about individual browser access.

The html file requests that I am receiving relate to files that should not be indexed by any web crawler and should only be accessed as a result of an inquisitive end-user browsing my website through several web page levels. But I receive random direct html requests to these files because (apparently) Bing seems to be indexing the files and users are finding them in Bing searches. I can understand how Bing might possibly be doing the indexing (in terms of a certain set of browsing steps that lead to it ignoring the settings in my website's robots.txt file.)

In general any access to these files from random IP addresses around the world - between say 10 and 20 per day - are single hits to my website - they are not followed by any other browsing activity on the website for the original IP address and as far as I can tell, for any other substitute IP address within a short period of time.

Overall this is not a problem because the traffic volumes are low. But I am keen to understand the cause and stop it if possible to maximise my system's response capability for other html requests. Initially I suspected this was a low volume denial of service attack - but the volumes are so low and have been going on for so long that I doubt that.

Anyway Bing seems not to have indexed recent files - so I am hopefull that it is now correctly following the instructions within the updated robots.txt file.

I am just waiting to see how this develops.

Cheers

Catweazle NZ