Real User Case: “I’m Scraping 300+ Product Prices With Selenium”
A Reddit user who was scraping more than 300 product prices from the same website using Selenium. They had already added wait time between actions, but still wanted to know what else they could do to avoid getting caught.
That is the exact problem many scrapers face. Adding a delay helps, but it does not solve everything. A scraper can still get detected if:
The scraper retries too aggressively after errors.
The site sees the same pattern every day.
In practice, avoiding detection is not about a single trick. It’s about building a layered scraping strategy that makes the traffic look more natural and less predictable.
Part 1. How Websites Detect Web Scrapers
Before fixing detection, you need to understand what websites look at.
1. IP reputation:Websites check whether traffic comes from a clean residential IP, a datacenter, a VPN, a public proxy, or an overused address.
2. Request frequency: Too many requests from the same IP or session can trigger rate limits, CAPTCHAs, or temporary blocks.
3. HTTP headers: Missing, mismatched, or unnatural headers can make a request look non-human.
4. TLS fingerprinting: Even before page content loads, servers can inspect connection-level fingerprints that differ between normal browsers and automation tools.
5. Browser fingerprinting: Sites can evaluate screen size, fonts, plugins, canvas behavior, WebGL, timezone, language, and automation flags.
6. Cookie and session behavior: If cookies, IPs, user agents, and regions do not stay consistent, the session may look suspicious.
7. Behavioral analysis: Real users do not click, scroll, browse, and retry in perfectly timed loops. Repetitive behavior is easy to flag.
8. CAPTCHA triggers: CAPTCHAs often appear when several risk signals stack together: bad IP reputation, high request volume, automation fingerprints, or inconsistent sessions.
Part 2. 12 Ways to Avoid Detection While Scraping the Web
1. Respect robots.txt and crawl rules.
Start by checking whether the site provides crawl guidance. Some pages may be disallowed, some may have rate expectations, and some data may be available through APIs, feeds, or sitemaps.
This helps you avoid unnecessary friction and reduces the chance of hammering pages the site clearly does not want crawled.
2. Build a crawl budget before scraping.
Do not scrape everything just because you can.
A crawl budget defines:
Which pages matter most
How often the data changes
How many pages to scrape per hour
Which pages can be checked less often
When to pause or retry
For example, a product price scraper does not need to request every product page every minute if prices update once a day. A smarter schedule reduces detection and saves proxy spend.
3. Reduce request frequency.
High request speed is one of the easiest patterns to detect.
Use:
Lower concurrency
Random delays
Longer pauses between page groups
Separate schedules by page type
Slower crawling during high-risk periods
If you see 429 Too Many Requests, do not retry faster. Slow down.
4. Randomize timing naturally.
A fixed delay like exactly 3 seconds between every request can look scripted. Real browsing has variation.
Better timing patterns include:
Randomized delay windows
Longer pauses after several pages
Slower speed on heavy pages
Backoff after errors
Different schedules for different categories
The goal is not to fake human behavior perfectly. The goal is to avoid robotic repetition.
5. Use the right proxy type.
Proxy choice should match the target.
Scraping Scenario
Best Proxy Type
Why
Product price scraping
Residential proxies
Real-user-like IPs and location flexibility
SERP tracking
Residential proxies
Regional accuracy and cleaner trust signals
Long sessions
Static ISP proxies
Stable IP continuity
Low-risk static pages
Datacenter proxies
Fast and cost-effective
Mobile-first sites
Mobile proxies
Closer to real mobile traffic
Region-specific pages
Residential proxies
Country/city targeting
Account dashboards
ISP proxies
Stable sessions and fewer IP changes
For most users, residential proxies should be the default. ISP proxies are better when session stability matters.
IP rotation is useful, but bad rotation can create new problems.
✅Good rotation:
Keeps the same IP during one session
Uses one region per workflow
Rotates between product groups or page batches
Lowers request volume per IP
Uses residential proxies for stricter targets
❌Bad rotation:
Changes IP every request during a logged-in session
Switches countries randomly
Sends the same cookie from many IPs
Retries blocked requests instantly from a new IP
Rotation should make scraping look distributed, not chaotic.
7. Keep headers realistic and consistent.
Headers help websites understand what kind of client is making the request.
Important headers include:
User-Agent
Accept
Accept-Language
Accept-Encoding
Referer
Connection
Sec-Fetch headers
The mistake is not only using “wrong” headers. It is using inconsistent headers. If your user agent says Chrome on Windows but your other browser signals look like something else, the request stands out.
8. Manage cookies and sessions carefully.
Cookies are part of identity. Treat them with the same care as IPs.
Good session management:
Keep cookies tied to the same IP when possible
Avoid resetting cookies on every request
Do not reuse one cookie jar across unrelated regions
Keep user agent, timezone, language, and IP location aligned
Use sticky sessions for flows that require continuity
If a session starts with a US residential IP, do not suddenly continue it from a different country.
9. Avoid obvious browser automation fingerprints.
Selenium and Playwright are useful, but default automation setups can be detectable.
Use browser automation only when you need it:
JavaScript-rendered pages
Infinite scroll
Screenshots
Dynamic product data
Login-like flows
UI interaction testing
If the data is available in static HTML or a public endpoint, browser automation may be unnecessary and slower. The less browser automation you need, the fewer browser-level signals you expose.
10. Handle CAPTCHA, 403, and 429 responses correctly.
Blocks become worse when scrapers respond badly.
A good scraper should:
Pause after repeated 403 errors
Slow down after 429 errors
Stop retry loops after CAPTCHA
Log which proxy triggered the failure
Separate temporary errors from hard blocks
Avoid immediate retries on the same page
A CAPTCHA is not just an obstacle. It is a signal that your current setup is too noisy.
11. Monitor block signals with real metrics.
You need data from your own scraper.
Track:
Success rate
403 rate
429 rate
CAPTCHA rate
Timeout rate
Retry rate
Average latency
Proxy failure rate
Region-level success rate
Target page type failure rate
This gives you original performance data. Instead of guessing whether proxies are working, you can see which proxy type, region, and request speed performs best.
12. Use target-specific scraping strategies.
Different websites need different strategies.
For e-commerce sites:
Slow down product page checks
Avoid refreshing cart or checkout pages aggressively
Use residential proxies for regional prices
Monitor stock pages in batches
For search results:
Use geo-targeted residential proxies
Keep language and region consistent
Watch CAPTCHA rates closely
For travel sites:
Use region-specific IPs
Track price changes less aggressively
Keep sessions stable
For social platforms:
Avoid unstable IP switching
Use ISP or mobile proxies for session consistency
Separate account environments carefully
A scraper that works on one website may fail on another. Treat each target as its own system.
Part 3. Why Nstproxy Is a Strong Choice for Web Scraping
Nstproxy is a strong proxy solution for web scraping because it matches the real pain points behind this keyword: blocks, CAPTCHAs, IP reputation, geo-targeting, rotation, long sessions, and scaling.
Scraping detection is not solved by one proxy type. A price scraper, SERP tracker, travel data monitor, and long-session crawler all need different IP behavior. Nstproxy stands out because it offers multiple proxy products in one platform, allowing users to choose the right setup for each scraping stage.
Supports eCommerce, SERP, social media, and market research scraping
Easy to scale from small tasks to enterprise-level projects
Recommended starting setup
For most public web scraping projects:
Use Nstproxy Residential Proxies for rotating public data collection.
Use Nstproxy Static ISP Proxies for stable long-running sessions.
Use Nstproxy Datacenter Proxies for low-risk, high-speed crawling.
Use Nstproxy Mobile Proxies for mobile-specific targets.
This gives you flexibility instead of forcing every scraping workflow through the same IP pool.
Scraping Stability Testing Table
Use this table to test whether your setup is improving.
Metric
Healthy Range
Warning Sign
What to Adjust
Success rate
90%+ on stable targets
Falling below baseline
Reduce speed or improve proxies
403 rate
Low and stable
Sudden spike
Check IP quality and headers
429 rate
Rare
Frequent rate limits
Lower concurrency
CAPTCHA rate
Low
Increasing over time
Review IP reputation and browser signals
Timeout rate
Low
Region-specific failures
Test proxy location
Retry count
Controlled
Repeating same URLs
Add backoff
Latency
Stable
Slow proxy pool
Switch region or proxy type
Block by page type
Isolated
Same page type fails
Change target-specific strategy
This is where original data matters. Your own logs are more valuable than generic advice.
Part 4. Final Recommendation
The best way to avoid detection while scraping the web is to reduce suspicious patterns at every layer: request rate, IP reputation, headers, browser behavior, session continuity, and error handling.
If you are scraping 300+ product prices like the Reddit user in the SERP, do not stop at adding wait time. Build a full scraping stability system:
Set a crawl budget.
Slow down request frequency.
Use clean residential or ISP proxies.
Keep sessions consistent.
Monitor block signals.
Adjust based on real performance data.
For most scraping projects, Nstproxy is a strong choice because it offers the proxy flexibility needed for different targets. Start with Nstproxy Residential Proxies for public data scraping and geo-targeted collection. Use Nstproxy Static ISP Proxies for long sessions. Use Datacenter Proxies for low-risk high-speed crawling, and Mobile Proxies for mobile-first pages.
Part 5. FAQs
1. How do websites detect web scraping?
Websites detect scraping through IP reputation, request speed, HTTP headers, TLS fingerprints, browser fingerprints, cookies, CAPTCHA triggers, and behavior patterns.
2. How can I avoid detection while scraping the web?
Use slower request pacing, realistic headers, clean proxies, consistent sessions, smart retries, browser automation only when needed, and monitoring for 403, 429, CAPTCHA, and latency changes.
3. What is the best proxy type for scraping?
Residential proxies are best for most public web scraping because they look closer to normal user traffic. Static ISP proxies are better for long sessions, and datacenter proxies are better for low-risk high-speed crawling.
4. Should I rotate proxies every request?
Not always. Per-request rotation can work for simple public pages, but sticky sessions are better when cookies, region, or session continuity matter.
5. Is Selenium safe for scraping?
Selenium is useful for JavaScript-heavy pages, but it can expose automation signals. Use it only when browser rendering is necessary.
6. Can Nstproxy help reduce scraping blocks?
Yes. Nstproxy helps reduce IP-based friction by offering residential proxies, static ISP proxies, datacenter proxies, mobile proxies, geo-targeting, rotation, and HTTP/SOCKS5 support.