Building a News Aggregator API as a Take-Home Assignment

I recently completed a take-home assignment for a Backend Web Developer position, build a news aggregator API in Laravel. Here's how I approached the architecture, dealt with dead APIs, and kept things simple without sacrificing good engineering.

A few days ago I got a take-home assignment: build a news aggregator backend in Laravel. Fetch articles from multiple sources, store them, expose a RESTful API with filtering. Straightforward enough on paper. In practice, half the suggested sources API were dead or unsuitable. Here's what I built, why I made the decisions I did, and what I'd change with more time.

The Brief

The task asked for a Laravel API that:

Pulls articles from multiple news sources on demand
Stores them in a database with authors and tags
Exposes endpoints to list, filter, and read articles
Follows clean code principles

The suggested sources were NewsAPI, OpenNews, NewsCred, The Guardian, NYTimes, BBC News, and NewsAPI.org. Half of these either don't exist anymore, have no public API, or are themselves aggregators (which feels redundant when you're building one). The BBC public API has been discontinued entirely.

So I picked four real, working sources and built around those.

Source Selection

Source	Approach	Notes
The Guardian	Official REST API	Free tier, great content API
New York Times	Official REST API	Free tier, metadata only (no full body)
ESPN	Unofficial API	Undocumented but stable for soccer leagues
BBC	Public RSS feeds	No API, but RSS is officially maintained

For BBC I fetched 9 category feeds (Top Stories, World, Business, Technology, Science & Environment, Health, Entertainment & Arts, Politics, UK). RSS gives me everything I need except post content or summary, title, published date, link, categories, without requiring auth. It's not glamorous but it works, and it's better than pretending the BBC API still exists.

The ESPN integration uses an undocumented internal API. I iterate over 6 leagues (Premier League, La Liga, Bundesliga, Serie A, MLS, UEFA Champions League) and pull soccer news per league. It could break if ESPN changes their internals, but for a take-home it's fine, and it was a fun find.

Architecture

I wanted a clean separation between "how do we talk to the API" and "what do we do with the data". Three layers handle this:

Services

Each source has a dedicated service class (GuardianNewsService, NYTimesNewsService, etc.) that extends a shared NewsService base. The service is responsible for one thing: making HTTP requests and returning typed DTOs.

NewsService provides the shared insert() method that persists an article, authors, tags, pivots, in a single database transaction. All source services inherit this.

app/Services/NewsService.php

public function insert(NewsItemDTO $article): Document
{
    return DB::transaction(function () use ($article) {
        $document = Document::query()->create([...]);
 
        $authors = $article->authors->map(fn (AuthorDTO $author) =>
            Author::query()->firstOrCreate(['slug' => $author->slug], ['name' => $author->name])
        )->map(fn (Author $author) => $author->id);
        $document->authors()->sync($authors);
 
        $tags = $article->tags->mapWithKeys(function ($data) {
            $tag = Tag::query()->firstOrCreate(['slug' => $data->slug], ['title' => $data->title]);
            return [$tag->id => ['role' => $data->role]];
        });
        $document->tags()->sync($tags);
 
        return $document;
    });
}

DTOs

Each source maps its raw API response into a typed DTO. GuardianNewsItemDTO, NYTimesNewsItemDTO, etc. all extend the base NewsItemDTO. This means insert() doesn't know or care which source the article came from.

app/DTOs/NewsItemDTO.php

class NewsItemDTO
{
    public function __construct(
        public DocumentSource $sourceType,
        public string $sourceId,
        public string $title,
        public string $content,
        public Carbon $publishedAt,
        public ?string $image,
        public ?Collection $authors,
        public ?Collection $tags,
    ) {}
}

Tags carry a role enum value, KEYWORD or CATEGORY, stored on the pivot table. Guardian's API returns both keywords and sections, so this distinction matters for filtering later.

Sources

Sources implement a single ISource interface with one method: fetch(?int $maxItems). The source handles pagination, deduplication, and per-article error isolation.

interface ISource
{
    public function fetch(?int $maxItems = null): void;
}

Each source queries the database for existing IDs before inserting, so re-running the fetch command is safe. Errors on individual articles are logged and skipped, one bad article doesn't abort the whole batch.

Pagination strategies differ by source:

Guardian: cursor-based, using the latest stored source_id as the starting point
NYTimes: page-based, hard-limited to 1000 by the API
ESPN: page-based per league, distributes maxItems evenly across leagues
BBC: no pagination needed, RSS always returns the current items

Contracts

Every service is bound to an interface in the IoC container. IGuardianNewsService, INYTimesNewsService, etc. This keeps sources decoupled from service implementations and makes mocking in tests straightforward.

The CLI Command

php artisan app:fetch --source=guardian --max-items=200

The DocumentSource enum drives source resolution:

enum DocumentSource: string
{
    case GUARDIAN = 'guardian';
    case NYTIMES = 'nytimes';
    case ESPN = 'espn';
    case BBC = 'bbc';
 
    public function getHandler(): ISource
    {
        return app(match ($this) {
            DocumentSource::GUARDIAN => Guardian::class,
            DocumentSource::NYTIMES => NYTimes::class,
            DocumentSource::ESPN => ESPN::class,
            DocumentSource::BBC => BBC::class,
        });
    }
}

Adding a new source means: new service, new source class, new enum case. Nothing else changes.

The API

The document list endpoint is powered by spatie/laravel-query-builder, which gives the client control over what data comes back and how it's filtered, all from query parameters, with no extra controller code.

Filtering: custom filter classes handle each concern:

filter[title]: partial title match
filter[tag-slug]: articles tagged with a given slug
filter[author-slug]: articles by a given author
filter[source-type]: by source (guardian, nytimes, espn, bbc)
filter[published-from] / filter[published-to]: date range

Field selection: the client can request only the columns it needs:

GET /api/documents?fields[documents]=title,slug,published_at

By default, the content column is excluded from list responses (articles can be large). It's only returned on the single-document endpoint or when the client explicitly requests it via fields.

Includes: relationships are opt-in, not always loaded:

GET /api/documents?include=authors,tags

Sorting: prefix with - for descending:

GET /api/documents?sort=-published_at

The result is a flexible, frontend-friendly API where the client fetches exactly what it needs, nothing more.

OpenAPI docs are auto-generated by dedoc/scramble, no manual spec maintenance needed.

Performance: Laravel Octane

The app runs on Laravel Octane. The difference from a standard PHP setup is worth explaining.

With a traditional PHP-FPM setup, every HTTP request boots Laravel from scratch, loads the framework, registers service providers, resolves bindings, then handles the request and discards everything. That bootstrap cost is paid on every single request.

Octane starts Laravel once and keeps it alive as a long-running process. Requests are handled by the already-booted application, so the framework overhead is paid once at startup, not per request. The result is significantly lower CPU and memory usage under load, and much higher throughput, the same server handles more concurrent requests without needing more resources.

For a news API where the application state is read-heavy and mostly stateless between requests, this is a straightforward win.

Database Schema

documents      : source_type, source_id, title, content, slug, image, published_at
authors        : name, slug
tags           : title, slug
document_author: pivot (cascades on delete)
document_tag   : pivot with role enum (KEYWORD / CATEGORY)

Authors and tags are deduplicated by slug using firstOrCreate. The same author appearing across multiple articles shares one row.

Testing

Tests are split by layer:

Service tests: MockHandler for Guzzle, no real HTTP. Tests that the service maps API responses to correct DTOs.
Source tests: mock the service, test pagination logic, deduplication, error handling.
Controller tests: full HTTP feature tests with RefreshDatabase on in-memory SQLite.

One thing I hit during testing: several bugs surfaced that only showed up because of the test coverage. The authors() relation was pointing at the wrong pivot table name, two filter classes were calling whereHas with singular relation names, and a form request was reading the wrong route parameter. All caught and fixed. This is exactly why you write tests.

What I'd Do Differently

Scheduler: Right now you run the fetch command manually. In production you'd schedule it with php artisan schedule:work or a cron. Trivial to add, left it out to keep the scope clean.

Queue, Fetching hundreds of articles per source is synchronous. For production, each source fetch should be a queued job.

Full article body for NYTimes, The NYT API doesn't return article body text, only abstracts. A real implementation would scrape or use a different endpoint. I stored what the API gives.

ESPN stability, The unofficial API could change without notice. Guardian and NYTimes have stable, documented APIs. Worth replacing ESPN with a proper source if this went to production.

Closing

The assignment took 2 days. The interesting part wasn't the Laravel plumbing, that's routine, it was the source selection problem. Half the suggested APIs don't exist anymore, and BBC RSS is a legitimate engineering decision, not a cop-out. Sometimes the right answer to "integrate with X API" is "X API is gone, here's what I did instead and why."

The code is on GitHub: itsmattius/briefing

Table of Contents

Building a News Aggregator API as a Take-Home Assignment