WordPress Technical SEO Case Study: Cleaning Up After Domain Migrations and Legacy Content
Imagewize.com has been online for over a decade, but it has had a winding path. From March 2018 to mid-2019 the domain redirected to imwz.io — a short-lived technical development blog where we published Laravel, Vue.js, and Magento tutorials — before switching back to imagewize.com in late 2019. My focus then moved to a startup, smart48.com, which ran for five to six years before winding down in 2024–2025. Imagewize stayed online throughout, but with only light maintenance and no focused SEO work. Meanwhile content had accumulated — the imwz.io tutorials, pages absorbed from a sister WooCommerce services site (wooaid.com), and whatever we had added along the way — while the service offering gradually narrowed to a clear focus: WordPress and WooCommerce for SMEs.
By late 2025, the result was a fast, clean, healthy WordPress site with 310 published posts — many of them inherited from earlier eras. Laravel tutorials. Vue.js guides. Magento how-tos. Joomla articles. None of it aligned with what the business actually sells today. Google was dutifully crawling hundreds of pages that had nothing to do with WordPress services.
This is the situation a lot of established SME sites end up in. Domain migrations, absorbed sister sites, pivoted service offerings — the content accumulates, the focus drifts, and Google’s understanding of the site drifts with it. By April 2026, after five months of focused technical SEO cleanup, Google Search Console officially confirmed it was collecting search impressions for the site and tracking 1,200+ impressions per 28-day window, with the number climbing each week. First organic referrals from content pages began in February.
No link building campaign. No new domain. No algorithm lottery win. The gains came from fixing the WordPress technical SEO foundation — cleaning up what Google found when it crawled the site and helping it understand what the business actually offers now.
This WordPress technical SEO case study documents exactly what we did, in the order we did it, with the before and after numbers. If your site has a history — redirects from old domains, content from pivoted service offerings, tag archives full of legacy topics — this is the playbook.
The Inherited Situation: What a Decade of Growth Left Behind
Before touching anything, we ran a full technical audit. Key findings from November 2025:
- 310 published blog posts — a solid content base, but heavily skewed toward legacy developer content (Laravel, Vue.js, Magento tutorials) inherited from the imwz.io era, when the business offered broader development services
- 61 orphan pages — pages with zero internal links pointing to them; present in the sitemap but effectively invisible to crawlers
- Ahrefs health score: 93 — the surface-level checks (broken links, redirect loops, hreflang) were clean; the structural problems (orphan content, topical dilution, thin archive pages) were the real issue
- 11 redirect chains in the sitemap — legacy URLs from the WooCommerce shop that had been deprecated when the business moved away from selling plugins
- 3 GSC-reported 404s at baseline — low, but about to grow as Google caught up with the shop deprecation
- Organic visibility: minimal — GSC had not yet begun collecting search impressions reliably
The site was not broken. Performance was good, technical fundamentals were in place. But Google was being asked to crawl a decade’s worth of content covering topics the business no longer offered, with no internal link structure connecting the valuable WordPress pages to each other, and a sitemap still pointing at URLs that redirected.
The audit told us: Google is crawling, but it does not understand what this site is about in 2026.
Fix 1: Crawl Budget — Refocusing Google on the Current Service Offering
The most significant single change was also the most natural one for a consolidated site: telling Google which content still matters. This is content pruning in the technical SEO sense — not deleting legacy posts, but removing them from Google’s crawl and index targets.
The site had 135 posts covering Laravel, Vue.js, Magento, and Joomla — topics we had written about years earlier when the imwz.io blog covered broader development work. These posts were honest, technically solid, and well-written. They were just no longer aligned with the current service: WordPress and WooCommerce for SMEs.
Google was spending crawl budget visiting all 135 of these posts. None of them targeted relevant keywords for the current business. None of them linked to our service pages. And because they had never attracted backlinks or engagement within the WordPress niche, Google was repeatedly crawling them, finding nothing new, and leaving — while our actual service content waited for attention.
What we did: using WP-CLI and The SEO Framework’s _genesis_noindex meta field, we bulk-noindexed these posts by category keyword.
# Example: noindex all Laravel posts
wp post list --post_type=post --post_status=publish \
--fields=ID --search="laravel" --format=ids --path=web/wp \
| xargs -n1 -I{} wp post meta update {} _genesis_noindex 1 --path=web/wp
- Laravel: 90 posts noindexed
- Vue.js: 31 posts noindexed
- Magento: 8 posts noindexed
- Joomla: 6 posts noindexed
- Total: 135 posts removed from Google’s crawl target
Why noindex instead of delete? Deleting 135 posts would generate 135 new 404s overnight, and any backlinks or referral traffic pointing to those URLs would die. Noindex keeps the content live and preserves link equity — Google just stops surfacing the pages in search and, over time, stops crawling them as aggressively.
One important edge case: Sage, Trellis, and Bedrock posts that internally reference Laravel are still indexed. They are WordPress-relevant content — the Laravel mention is incidental. We reviewed these manually and re-indexed the false positives after the bulk operation.
Result after ~3 weeks: GSC “Crawled not indexed” count started dropping as Google began re-evaluating what to crawl.
Fix 2: Noindex Tag and Category Archives
After handling the off-topic posts, we turned to a structural issue: tag and category archive pages.
Every WordPress post with a tag or category gets an automatically generated archive page listing all posts with that label. On imagewize.com, this meant:
- 21 tag archive pages (e.g.
/tag/trellis/,/tag/wp-cli/,/tag/laravel/) - 15 category archive pages (e.g.
/category/wordpress/,/category/security/)
These pages contain no original content — just a list of post titles and excerpts Google could already find directly on those posts. Google knows this. We saw it in GSC: almost all tag and category archives sat in “Crawled not indexed” — Google visited them, decided they were not worth adding to the index, and left.
The fix: noindex all of them via The SEO Framework, which lets you set per-term noindex in bulk. Nav category links remain intact for users — this only affects crawling, not site navigation.
Category archives noindexed (15): /category/wordpress/, /category/woocommerce/,
/category/security/, /category/devops/, /category/ecommerce/, and 10 more.
Tag archives noindexed (21): /tag/trellis/, /tag/wp-cli/, /tag/wordfence/,
/tag/laravel/, /tag/database/, and 16 more.
Before: Google crawling 36 thin archive pages with no indexing value.
After: those pages move to “Excluded by noindex” in GSC, freeing crawl budget for service pages and blog posts.
Fix 3: Block Junk URLs at the Robots Level
While reviewing Google Search Console’s “Crawled not indexed” queue, we noticed a pattern: Google had crawled dozens of low-value internal URLs that it should not have been touching at all.
/wp/wp-login.php— the WordPress login page, crawled repeatedly/?s=searchterm— search result pages with no indexable content/?wc-ajax=...— WooCommerce AJAX endpoints that survived the shop deprecation/wp/wp-includes/js/...— JavaScript file paths that had somehow been followed
The SEO Framework handles this with a virtual robots.txt file, but we needed static control. We created a hard robots.txt file in Bedrock’s site/web/ directory (which overrides the SEO Framework virtual file) with explicit disallow rules:
User-agent: *
Disallow: /wp/wp-login.php
Disallow: /wp/wp-includes/js/
Disallow: /?wc-ajax=
Disallow: /?s=
Deployed and verified live at https://imagewize.com/robots.txt within the same day.
Fix 4: 404 Cleanup and Redirect Strategy
Fixing 404 errors on a WordPress site is not a single action — it is a triage exercise. Some URLs need 301 redirects, some need 410 Gone, and some just need to be deleted without ceremony. The 410 vs 404 choice matters more than most teams realise, and getting it wrong leaves GSC cluttered for months.
At baseline in November 2025, GSC reported 3 404 errors. By February 2026, that had grown to 29. Deprecating the WooCommerce shop was the trigger: product and checkout URLs disappeared without redirects, and Google started flagging them.
We addressed these in three rounds across January–April 2026.
Round 1: 301 Redirects for Old Category Slugs (January)
Added 301 redirects for duplicate path variants caused by old category slugs left in URLs from earlier permalink structures:
location = /wordpress-stuff/bedrock-modern-wordpress-stack/ {
return 301 /bedrock-modern-wordpress-stack/;
}
location = /seo/turbo-boost-your-website/ {
return 301 /turbo-boost-your-website/;
}
Round 2: 410 Gone for Permanently Dead Endpoints (April)
After reviewing the full GSC drilldown, we returned 410 Gone for URLs that are never coming back:
| URL pattern | Why 410 |
|---|---|
/checkout/order-received/ | Dead WooCommerce confirmation URL from the deprecated shop |
/index.php?rest_route=/slimstat/v1/hit | Slimstat REST endpoint; plugin removed |
/app/plugins/* | Bedrock internal paths Google had somehow indexed |
/app/themes/* | Same — internal Bedrock paths, not public content |
410 is more accurate than 301 or 404 for these: a 404 says “not found right now,” while a 410 says “gone permanently, stop asking.” Google drops 410 URLs from the index faster than 404s.
The Tag Cleanup (February)
Separately, we found 85 garbage tags that had been created accidentally — PHP code fragments pasted into the tag field during old content edits. Each one generated an empty archive page that Google had crawled and was now flagging as thin content. Deleted via WP-CLI in a single pass:
wp term list post_tag --fields=term_id,name,count \
--format=csv --path=web/wp | grep ",0$" \
| cut -d, -f1 | xargs -I{} wp term delete post_tag {} --path=web/wp
Round 3: Service Page Restructure (April 21)
Deleting old service pages to simplify the site structure caused a second 404 spike — from 37 to 51 affected URLs in two weeks. This is expected: when you intentionally remove pages, Google keeps trying to recrawl them until you signal the new status.
Six new 301 redirects covered the deleted service pages:
| Old URL | New URL |
|---|---|
/shopify/ | /services/ (service no longer offered) |
/services/social-media-marketing/ | /services/ |
/services/web-development/ | /services/wordpress-development/ |
/services/on-call-wordpress-developer/backups/ | parent service page |
/services/maintenance/wordpress-security/ | canonical security page |
/product-category/maintenance/ | /services/on-call-wordpress-developer/ |
Four additional 410s covered permanently removed content: the Laravel sub-page (no longer an offered service), a deprecated WooCommerce product tag, an off-topic SaaS post, and five orphaned ?page_id= URLs from deleted pages.
The Result
3 active 404s at baseline → spike to 51 during cleanup → back toward baseline as Google processes the new signals over 4–6 weeks. Access logs from April 22 show 10 × 410 responses in a single 24-hour window, confirming Google is actively processing the gone signals.
Fix 5: Orphan Pages — Internal Linking Audit
The Ahrefs audit in January 2026 surfaced 61 orphan pages: URLs that existed in the sitemap but had zero internal links pointing to them from anywhere on the site.
This matters because Google discovers pages primarily through links. A page with no internal links is effectively invisible to Googlebot unless it is in the sitemap — and even then, crawl budget deprioritizes it. Five of these orphaned pages were active service pages we were relying on for business leads.
The fix required different approaches depending on the type of orphan.
Navigation fixes (quick wins):
- Added
/tutorials/to the main navigation — it was a high-value hub page with no nav entry - Added
/services/web-development/to the footer navigation
Hub page content fixes: we updated the /tutorials/ hub page with structured sections linking out to orphaned technical posts:
- WooCommerce tutorials section (5 posts linked)
- WordPress section (6 posts linked)
- Trellis section (2 posts linked)
- WP-CLI section (2 posts linked)
- Security section (4 posts linked via
/wordpress-security/as intermediary) - Server & Hosting section (3 posts linked)
Result: 34 of 61 orphaned pages resolved over 3–4 weeks. The remaining 27 require content-level edits to include contextual links from relevant existing posts — ongoing work.
Fix 6: Server-Level Security — Reducing Bot Noise
This one often gets overlooked in SEO discussions, but it matters for crawl efficiency.
Across January–April 2026, our access logs showed significant malicious bot traffic — not just nuisance bots, but aggressive scanners making hundreds of requests per day probing for vulnerable endpoints (wp-login.php, .env, phpinfo, webshell paths).
This has direct SEO implications:
- Bandwidth consumption competes with legitimate crawler traffic
- 404 spikes from scanning create false positives in GSC data
- Server response time degrades under heavy scanning load, affecting Googlebot’s crawl efficiency
We handle this at the Nginx level using deny lists maintained in Trellis:
# deny-ips.conf.j2 — excerpt
deny 185.177.72.0/24; # FBW Networks — coordinated subnet scanner
deny 20.107.195.165; # Azure — PHP webshell probing
deny 45.148.10.247; # TECHOFF SRV NL — 1,197 requests in 24h, AbuseIPDB 100/100
Blocking 45.148.10.247 alone reduced daily request volume by ~27% the day it was added — that IP was generating over 1,000 requests per day from a single scanner.
The process: weekly review of access logs, cross-referenced against AbuseIPDB scores. Any IP scoring 90+ with recognizable malicious patterns gets blocked. All changes deploy via trellis provision --tags wordpress-setup production.
As of April 22, 2026, the monitoring logs confirm the blocking is working: XML-RPC attempts return 444 (Nginx drop) across all 16 IPs that tried; no 5xx errors in the 24-hour window. Three additional IPs were queued for blocking after the April 22 review — 93.123.109.180 (backup/credential scanner, 178 × 404), 45.148.10.246 (sibling of the already-blocked .247 in the TECHOFF SRV subnet), and 45.92.1.39 (Alfa PHP webshell scanner).
One thing the April 22 logs surfaced that was not on our radar in November: AI crawlers. Anthropic’s ClaudeBot, Amazon Amazonbot, Meta’s meta-externalagent, and several others collectively made 176 requests and consumed 5 MB of bandwidth in a single 24-hour window — 4.3% of all requests, 11.5% of bandwidth. Meta AI was notably inefficient: 26 requests but 2.79 MB, the highest bandwidth of any AI crawler. This is not a blocking priority yet, but it is worth tracking. As AI-powered search tools index content for their own knowledge bases, monitoring their crawl patterns is becoming part of the access log review.
Fix 7: Schema Markup and Service Page Quality
The last piece was improving what Google found when it actually visited our most important pages.
GSC was showing a growing set of URLs in “Crawled not indexed” that were not thin because of content type — they were genuinely thin because the pages had not been written with SEO in mind. The /services/ hub page, for example, had 483 words and no structured data. Google visited it repeatedly and declined to index it.
Service page rewrite: we rewrote /services/ from scratch using Nynaeve’s custom Gutenberg blocks, and took the opportunity to simplify the commercial model at the same time. The old page offered fixed WooCommerce product packages — a holdover from when the business sold productised bundles. The new page reflects how we actually work today: services delivered at an hourly starter rate or on custom quotes, with a transparent “from” price on each service so prospects know roughly what to budget.
The page now contains:
- Service overview cards for each core service, each with a “from” price (WordPress development, speed optimisation, Trellis hosting setup)
- A pricing section that leads with the hourly starter rate for smaller or scoped work, and explains that larger projects get a custom written quote before any work begins
- “Why choose Imagewize” with supporting evidence
- FAQ accordion (5 questions)
Then added JSON-LD schema directly in the page content:
{
"@context": "https://schema.org",
"@type": "Service",
"name": "WordPress Web Design & Development",
"provider": {
"@type": "Organization",
"name": "Imagewize"
},
"areaServed": ["United States", "Europe"]
}
Plus FAQPage schema for the accordion section. The services hub became the first page on the site to have both Service and FAQPage structured data.
Results: What the Numbers Show
Here is where things stand as of April 2026, measured against the November 2025 baseline:
| Metric | Baseline (Nov 2025) | April 2026 | Change |
|---|---|---|---|
| GSC impression collection | Not started | Confirmed Apr 12 | Milestone reached |
| GSC impressions (28d) | Minimal | 1,200+ and climbing | New measurable baseline |
| First organic referrals | None recorded | Feb 24 onward | Content pages ranking |
| Organic landing pages (24h) | 0 | 19 distinct pages | Measured Apr 22 access logs |
| New content indexing speed | — | 4 days | Skool post published Apr 18, ranking Apr 22 |
| GSC “Crawled not indexed” | ~310 (est.) | 273 | −47 in 3 weeks |
| Active 404s (GSC) | 3 | 51 → being resolved | Spiked during page deletions; redirects and 410s added Apr 21 |
| Orphan pages | 61 | 27 remaining | 34 fixed |
| Off-topic legacy posts indexed | 135 | 0 | Removed from index |
| Tag/category archives indexed | ~36 | 0 | All noindexed |
The milestone that matters most: on April 12, 2026, Google Search Console officially confirmed: “On April 12, 2026, we started collecting Google Search impressions for your website.”
That confirmation — impressions being collected reliably — is the formal sign that Google now understands what the site is about and considers it a legitimate organic search participant for its target topics. The early February referrals were already encouraging; the April confirmation is the point where measurement becomes continuous and the growth work begins.
What We Learned
A few WordPress technical SEO takeaways that apply to any established site with history — domain migrations, absorbed sister sites, pivoted service offerings.
1. Crawl budget is finite and your legacy content is competing with your current content for it. On a site with 310 posts, 135 of them being from an older service offering is not a content quality problem — it is a crawl signal problem. Content pruning (via noindex, not deletion) is the cleanest fix. Every time Googlebot visits a Laravel tutorial on a site that now offers WordPress services, it is using budget and reinforcing the wrong topical signal.
2. Category and tag archives are almost always thin. Unless you are investing in unique descriptive content for each archive (custom descriptions, featured posts, editorial curation), these pages have no indexing value. Noindex them by default and come back to individual ones only if you have a compelling reason.
3. Internal linking is how Google understands your site’s structure. Sixty-one orphan pages is not a niche problem. It happens on every site that grows organically without a deliberate linking strategy. The fix is not complicated — it is just time-consuming. Hub pages (like /tutorials/) are the most efficient intervention.
4. 410 vs 404 is a decision, not a default. A 404 says “I can’t find this right now.” A 410 says “this is gone permanently, stop asking.” For deprecated shop URLs or dead plugin endpoints, 410 clears them from GSC faster than leaving them as 404s. Reach for 410 whenever you are certain the URL will never return.
5. Blocking malicious bots at Nginx has direct SEO value. It is not just a security measure. Reducing 27% of daily request volume from a single scanner IP has meaningful effects on server performance and log clarity.
What’s Next
The technical foundation is now solid. The work shifts toward:
- Improving service page content — five key pages are still in “Crawled not indexed” due to thin content; each needs a full rewrite with schema
- Completing orphan page linking — 27 remaining pages need contextual links from existing content
- Tracking the 4-month traffic plan — baseline captured at ~15–40 daily organic users, target is 3× by August 2026
- Adding CTAs to top technical posts — the Sage 10→11 migration post has a CTA live; 4 more high-traffic technical posts still need one
The numbers are moving in the right direction. Five months of focused WordPress technical SEO work took a site carrying a decade of legacy content and got it to the point where Google understands what it is about today — and is actively collecting impressions for the right topics. That is a repeatable process any established WordPress site can follow.
Imagewize builds and maintains WordPress sites for SMEs in the US and Europe. If your site has accumulated years of content that no longer reflects your current services — or you are seeing orphan pages, thin archives, or crawl budget problems flagged in Search Console — get in touch.