Before web based API’s became the prominent way of sharing data between services we had web scraping. Web scraping is a technique in data extraction where you pull information from websites.
The technologies we will be using to accomplish this are:
- Node
- ExpressJS:The Node framework that everyone uses and loves.
- Request: Helps us make HTTP calls
- Cheerio: Implementation of core jQuery specifically for the server (helps us traverse the DOM and extract data)
Setup
Our setup will be pretty simple. If you’re already familiar with NodeJS, go ahead and setup your project and include Express, Request and Cheerio as your dependencies.
Install request and cheerio
1 | $ npm install request --save |
or npm install
1 | { |
Usage
Example 1
1 | var https = require('https'); |
Example 2
1 | var express = require('express'); |
Example 3
1 | var request = require('request'); |
抓包工具
我们使用Linux服务器,有些时候需要抓取其中的数据包进行分析攻击的类型以及特征,这样就可以根据特征在防火墙上面进行拦截防护了,在Linux的命令行里边我们需要借助tcpdump软件进行抓包.
Install tcpdump
1 | yum install -y tcpdump |
More info: