Algerian News Webscraper Command-Line Tool
DZNews-CLI is a command-line tool for scraping Algerian news articles from multiple sources and categories. Fetch, save, and organize news data for research, monitoring, or personal use all from your terminal.
- Features
- Installation
- Usage Guide
- Supported Sources and Categories
- Network Resilience, Anti-Block & Request Timing
- Troubleshooting & Error Handling
- Screenshots and Demos
- Testing
- Changelog
- Contributing
- License
- Scrape articles from several Algerian news websites
- Filter by date and category
- Export data as JSON or CSV
- Command-line interface with options for output directory and format
- Choose custom output directory in the standard CLI mode
- Interactive mode for guided scraping
- Extensible for additional sources and categories
- Network-resilient scraping with retry logic and anti-block techniques (randomized headers, user-agents, and delays)
Recommended for most users.
Requires Node.js v16+.
npm install -g dznewsNow you can run dznews as a global command from any directory.
If you prefer not to install globally, you can use:
npx dznews <command>Example:
npx dznews scrape ennahar 2025-10-09 sports --export csvIf you want the latest source or to contribute:
git clone https://github.com/HarrouAnderrahman/DZNews.git
cd DZNews
npm install
npm linkNow you can use dznews globally.
Show all available news sources:
dznews sourcesShow all categories for a given source:
dznews cat <source>
dznews cat ennaharScrape articles by source, date, and category:
dznews scrape <source> <date> <category> [options]Examples:
dznews scrape ennahar 2025-10-09 sports --export json
dznews scrape elbilad 2025-10-09 politics --export csv<source>: News site to scrape (see Supported Sources and Categories)<date>: Date to collect articles until (format:YYYY-MM-DD)<category>: News category (see categories for your source)--exportor-e: Output format (jsonorcsv)- Output is saved in the
scrapedData/directory by default.
- Default is JSON:
Output file:scrapedData/EnnaharData_2025-10-09_sports.json - Use
--export csvor-efor CSV:
Output file:scrapedData/EnnaharData_2025-10-09_sports.csv
You can choose the directory where the scraped data will be saved only in the normal CLI mode (not in interactive mode).
Add the option --directory <path> or -d <path> to the scrape command:
dznews scrape <source> <date> <category> --export json --directory /path/to/save/
dznews scrape elbilad 2025-10-09 politics --export csv -d /home/user/Documents/- The specified directory must exist and be a valid directory.
- If the directory exists, your data will be saved inside it in a
scrapedData/subfolder. - If the path points to something that is not a directory, DZNews will throw an error and not save the data.
- Note: The
--directory/-doption is not available in interactive mode.
For guided, beginner-friendly scraping:
dznews interactiveYou'll be prompted for source, category, date, and format.
All files are saved in the scrapedData/ directory inside the chosen output directory
(or inside the package folder by default).
Check the output for filenames like:
EnnaharData_2025-10-09_sports.jsonElbiladData_2025-10-09_politics.csv
To list all supported sources:
dznews sourcesDZNews currently supports the following Algerian news sites:
All sites use the same keyword mapping for categories in DZNews.
Note: Not every site has every category; some categories are only available on certain sites due to differences in their structure and coverage.
| Category Keyword | Description |
|---|---|
| national, algeria | National news |
| sports | Sports |
| international | International news |
| politics | Politics |
| security | Security |
| economy | Economy/Business |
| islam | Islamic/Religion |
| society | Society |
| hightech | Technology/Science |
| culture | Culture/Arts |
| ... | More categories may be supported in future releases |
To see which categories are available for a specific source, run:
dznews cat <source>For example:
dznews cat elkhabarDZNews includes a dedicated Axios configuration in:
scrapers/utils/axiosConfig.jsThis module is responsible for:
- Retrying failed network requests a configurable number of times.
- Randomizing browser-like headers (user-agent, platform, etc.) on each attempt.
- Controlling timing between requests using a
sleep(min, max)helper so that scraping does not look like instant, machine-speed traffic.
These techniques were added because some news sites can flag or temporarily block traffic that:
- Uses default HTTP client headers instead of real browser headers.
- Sends many requests in a very short time window.
- Keeps exactly the same headers and timing pattern for every request.
By rotating user-agents and adjusting the sleep(min, max) values, DZNews behaves more like a real user browsing the site, while still being scriptable.
Inside axiosConfig.js you’ll find:
function sleep(min = 150, max = 300) {
const time = Math.floor(Math.random() * (max - min + 1) + min);
return new Promise((res) => {
setTimeout(res, time);
});
}This function is used in two situations:
-
Before every request (short delay):
- Called as
await sleep()→ uses the defaultmin = 150,max = 300ms. - This is the “per-request” human-like delay.
- Called as
-
Between retries (longer delay on network errors):
- Called as
await sleep(delayBetweenRetries, delayBetweenRetries + 2000). - This uses
delayBetweenRetriesas the base and adds a bit of jitter. - Still, the shape of your traffic is mostly defined by the short min/max you set on the default
sleep().
- Called as
In practice, the most important knobs for how “fast” and “bot-like” DZNews feels are the min and max arguments of sleep.
You can:
- Change the defaults in the function signature:
function sleep(min = 500, max = 1500) { ... }
- Or permanently call it with explicit values where it’s used:
await sleep(800, 2000); // instead of await sleep();
Below are practical ranges for the short pre-request delay (sleep(min, max) before each axios call). All values are in milliseconds:
-
Balanced / Recommended (default-ish behavior)
min:300–800max:1200–2500- Feels like a reasonably fast human clicking around.
- Good balance between speed and not looking too robotic.
-
Safer / More Polite (less likely to trigger blocks)
min:1000–2000(1–2 seconds)max:3000–6000(3–6 seconds)- Slower crawl, but much gentler on the target sites.
- Recommended if you plan to scrape a lot of pages or run DZNews frequently.
-
More Aggressive (faster, higher risk)
min:100–200max:300–800- More like the current default (
150–300). - Pages are fetched quickly; risk of being flagged is higher, especially from VPN/RDP / shared IPs.
-
Worst / Strongly Not Recommended
min:0–50max:100–200- Requests look almost instant and very uniform.
- Much more likely to be detected as a scraper or rate-limited.
- Avoid this unless you know exactly what you are doing and accept the risk.
Rule of thumb:
- If you start getting blocked or throttled → increase both
minandmax. - If everything is fine and you want it a bit faster → lower them slightly, but keep at least a few hundred ms of delay.
The exported function in axiosConfig.js looks like this:
module.exports = async function axiosGetRetry(
url,
maxAttempts = 20,
delayBetweenRetries = 5000
) {
// ...
}maxAttempts(default:20):- Maximum number of retry attempts on network errors for a single URL.
delayBetweenRetries(default:5000ms = 5 seconds):- Base delay between retry attempts (in milliseconds).
- Combined with
sleep(delayBetweenRetries, delayBetweenRetries + 2000)to add jitter.
These values matter when there are network problems or temporary blocks, but for the overall scraping “speed” and “stealthiness”, the short sleep(min, max) before each request is usually more important.
Suggested ranges:
-
Balanced / Recommended
maxAttempts:10–20delayBetweenRetries:4000–7000ms
-
Safer / More Polite
maxAttempts:5–10delayBetweenRetries:8000–15000ms
-
Aggressive (use with caution)
maxAttempts:20–30delayBetweenRetries:1000–3000ms
Avoid combining very small sleep(min, max) values with huge maxAttempts and tiny delayBetweenRetries, as that pattern is the most likely to cause blocks.
- Unknown source or category:
Rundznews sourcesordznews cat <source>to see valid inputs. - Date format errors:
Dates must beYYYY-MM-DDand before today.
Example:2025-10-09 - No output file created:
Check for errors in the console. Make surescrapedData/exists and you have write permissions. - Directory errors:
If using--directory/-d, make sure the specified path exists and is a directory. Otherwise, DZNews will throw an error. - Network errors:
Ensure you have internet access. Some sites may block scraping; try again later. - Permission errors (Linux/Mac):
Usesudo npm install -g dznewsif you see permissions denied. - Network errors & retries:
DZNews automatically retries failed network requests (e.g., temporary network outages) up to a configurable number of times, with delays between attempts.
You can tunesleep(min, max),maxAttempts, anddelayBetweenRetriesinscrapers/utils/axiosConfig.jsas described in Network Resilience, Anti-Block & Request Timing.
- Run with
--helpfor usage info:dznews --help
- Open an issue in the GitHub repo if you encounter problems.
Command Demo:
Interactive CLI:
CSV Output Example:
To run the included smoke test (scrapes a random category from each source):
npm testor
node tests/smokeTest.js- Improved HTTP request realism and anti-block behavior in
scrapers/utils/axiosConfig.js:- Added multiple realistic browser user-agent + platform combinations (Windows, macOS, Linux) and randomize them on each request attempt.
- Set consistent browser-like headers (
User-Agent,Accept,Accept-Language,Referer, etc.) to avoid obvious bot signatures. - Introduced a
sleep(min, max)helper used before each request and between retries to space out calls and reduce the chance of being flagged for very fast, machine-like traffic.
- Documented how to tune scraping speed and safety:
- Explained how
sleep(min, max)controls per-request delays and why these values are the most important for “human-like” timing. - Added recommended ranges for
minandmax(safer, balanced, and aggressive settings), and noted combinations that are risky and not recommended. - Clarified how to customize
maxAttemptsanddelayBetweenRetriesfor retry behavior on network errors, and how these interact with thesleepfunction.
- Explained how
- Fixed these bugs :
- Dzair-Tube wasn't scraping anymore because they changed the date element
- Forgot to check for a variable if it does even exist in the scraper manager which made it not usable
- Forgot to add the choosing directory option inside all the scrapers (it was only available in ennahar scraper)
- Added support for custom output directory in normal CLI mode using
--directory <path>or-d <path> - Minimal validation: Only saves data if the given directory exists and is a directory; throws an error otherwise
- The output directory option is not available in interactive mode
- Documentation updated to reflect new output directory option
- Initial stable release
- Added a simple smoke test
- Modular architecture for easy extension
- Improved error handling
- Added network error handling and retry logic in
axiosConfig.js- If a network error occurs during scraping, DZNews will automatically retry requests up to 5 times (default).
- Maximum attempts and retry delay can be customized in code (
maxAttempts,delay).
Contributions are welcome.
- Fork the repository
- Create a feature branch
- Commit your changes
- Open a pull request
Report bugs or request features via issues.
This project is licensed under the MIT License. See LICENSE for details.
Developed by Harrou Abderrahman


