Name: Amazon Product Review Analysis Dataset
Creator: RateBud
License: https://creativecommons.org/licenses/by/4.0/
Keywords: amazon review checker, fake review detection, review analysis, trust scores, product authenticity

How the Analysis Works

RateBud pulls live review and rating data directly from Amazon using a combination of official Amazon APIs (the Product Advertising API) and vetted third-party data providers. We don't scrape — we query structured data sources for review text, star ratings, reviewer profiles, purchase verification status, review timestamps, and product metadata.

Once we have the reviews, our AI models run several layers of analysis:

•Content quality and sentiment analysis — We evaluate the substance of each review. Is the reviewer mentioning specific product details, or is it vague, generic praise? Sentiment analysis flags reviews that feel unnaturally positive or templated.
•AI-generated text detection — Our models look for linguistic patterns common in LLM-generated reviews: overly polished phrasing, characteristic punctuation patterns (em-dashes, the "It's not X, it's Y" construction), and a lack of the messy specificity that real human writing tends to have. LLMs are reasonably good at distinguishing between human-written and machine-generated text.
•Reviewer behavior profiling — This is one of our most important signals. We examine the reviewer's history: Are they reviewing a bunch of random, unrelated products and giving everything five stars? Are they copy-pasting similar language across different products? Real people have messy, inconsistent review histories. Fake reviewers don't.
•Timing pattern analysis — We examine when reviews were posted relative to each other. Natural reviews accumulate gradually. Suspicious bursts — 50 reviews appearing in a single day — are flagged, though we also cross-reference social media trends to account for viral moments that can legitimately spike review volume.
•Statistical distribution analysis — Authentic products have natural rating distributions. When 90% of reviews are five stars with almost nothing in between, that's a pattern worth investigating.

Scoring

Each analysis factor produces a sub-score out of 100. These are weighted and combined into a single Trust Score (0–100), which maps to a letter grade:

A80–100 — Highly trustworthy

B60–79 — Generally reliable

C40–59 — Mixed signals

D20–39 — Significant concerns

F0–19 — High risk of manipulation

The individual factor scores are displayed on every product analysis page so you can see exactly which signals contributed to the overall grade. We don't hide behind a single number.

Known Limitations

No detection system is perfect. Here's where RateBud struggles, and we think being upfront about this is more useful than pretending otherwise.

Paid-but-honest reviews are nearly invisible. If a reviewer was paid to try a product but genuinely used it and wrote their honest opinion, our models will likely miss it. We can detect when someone explicitly discloses compensation, but if they don't? The review reads like a real one because, in most ways, it is one. This is a fundamental limitation of any automated detection approach.

Small review counts limit accuracy. Products with fewer than 10-15 reviews simply don't give us enough data for strong statistical conclusions. The patterns we look for — timing anomalies, rating distributions, language clusters — need a baseline to compare against. New products with thin review histories are inherently harder to score, and we flag this on those pages.

Amazon limits data access. Amazon doesn't allow bulk scraping of their entire review corpus. The APIs and data sources we use return a representative sample, not every single review. This is a constraint that every review analysis tool faces, including the ones that don't mention it. Our analysis is based on the data we can access — typically a significant portion of reviews, but rarely 100% of them.

The accuracy number. We cite 95% accuracy based on manual validation we've done across thousands of reviews: we compare our automated scores against hand-verified review assessments. The precision is strong, but accuracy varies by category and product type. Electronics tend to produce cleaner signal than, say, supplements where the manipulation tactics are more sophisticated.

How We Keep Improving

The algorithm is never "done." I personally review RateBud analysis pages on a regular basis, then go to the actual Amazon product listing and manually read through the reviews to see how well our scoring holds up in practice. When something looks off, I dig into the data and adjust.

I've also purchased products specifically to test the analysis — buying items our model flagged as suspicious and items it rated highly, then evaluating whether the real-world experience lined up with what the reviews described. It's the only way to truly validate whether the system is working.

We're constantly refining the model weights, adding new detection signals, and updating our AI text detection as language models evolve. The fake review industry adapts, so we have to adapt faster.

Spark

Founder of RateBud

I work in tech and built RateBud because I was personally struggling to trust reviews on Amazon. With Fakespot shutting down and a wave of low-quality AI tools popping up to fill the gap, I wanted to build something principled — a tool that uses AI thoughtfully, not as a shortcut, and focuses on providing genuine value to people trying to shop smarter.

Our Methodology

How the Analysis Works

Scoring

Known Limitations

How We Keep Improving