TextSorter

How to Remove Duplicate Lines from Text, Excel, and Google Sheets

· 8 min read

Duplicate lines are one of those problems that shouldn’t be a big deal but somehow always are. You merge two customer lists and suddenly you’ve got 400 people listed twice. You export data from a CRM and half the entries are repeated. You paste a keyword list from three different sources and it’s a mess.

The good news: removing them takes about 3 seconds. The bad news: most people spend way too long trying to do it manually or fighting with spreadsheet formulas.

The Fastest Way: An Online Tool

If you just want duplicates gone and don’t need a lecture about it, here’s the move:

  1. Open the Remove Duplicates tool on TextSorter
  2. Paste your text (one item per line)
  3. Done. Duplicates are gone.

The tool keeps the first occurrence of each line and removes all the copies. It preserves your original order. It runs in your browser, so your data never goes anywhere. And yes, it handles lists with tens of thousands of lines without choking.

But if you want to understand the different ways to do this across different tools, keep reading. There are some tricks that’ll save you serious time.

Why Duplicates Show Up in the First Place

Before we get into the how, it’s worth understanding the why. Duplicates aren’t random. They almost always come from one of these situations:

Merging lists from multiple sources. You’ve got a mailing list from your website, another from a trade show, and a third from a purchased database. Combine them and surprise, some people are on all three lists.

Copy-paste accidents. You pasted the same block of text twice without noticing. Happens all the time when you’re working fast.

Database exports with joins. If you export data from a database and your query involves a join, you can get row multiplication. One customer with three orders becomes three rows of that customer.

Scraping and crawling. Web scraping tools can revisit pages or encounter the same content through different URLs. Your extracted data ends up with duplicates.

Log files. Server logs, application logs, and error logs often contain repeated entries, especially if a system retries failed operations.

Survey and form responses. People sometimes submit forms twice (double-clicking the submit button is practically a sport). Some form systems catch this. Many don’t.

The scale of the problem is actually bigger than most people think. Data quality research suggests that somewhere between 10% and 30% of data in enterprise databases contains duplicates. For unstructured text data and manually compiled lists, it’s often higher.

Removing Duplicates in Plain Text (Online)

This is the simplest scenario. You have a list of items, one per line, and you want the duplicates gone.

TextSorter’s Remove Duplicates tool handles this with one click. Here’s what makes it better than just doing a find-and-replace:

It preserves order. The first occurrence of each line stays exactly where it was. You’re not re-sorting or shuffling anything.

It handles case sensitivity. Toggle between treating “apple” and “Apple” as the same (case-insensitive) or different (case-sensitive). For most real world cleanup, you want case-insensitive.

It shows you what happened. You can see how many duplicates were removed, so you know the scale of the problem.

It handles whitespace. Leading and trailing spaces that make two visually identical lines technically “different” get handled properly.

Here’s a practical example. You start with this:

banana
Apple
cherry
apple
BANANA
Cherry
banana

Case-insensitive deduplication gives you:

banana
Apple
cherry

Three unique items. Four duplicates removed. Two seconds of your time.

Removing Duplicates in Excel

Excel has built-in duplicate removal, but there are actually three different methods, each useful in different situations.

Method 1: The Remove Duplicates Button

This is the most common approach:

  1. Select the range of cells containing your data
  2. Go to the Data tab on the ribbon
  3. Click Remove Duplicates
  4. A dialog box appears. Check which columns should be compared. If you check all columns, rows must match in every column to count as duplicates. If you check just one column (like email), rows are considered duplicates if that one column matches, even if other columns differ.
  5. Click OK. Excel tells you how many duplicates were removed.

The catch: this modifies your data in place. If you make a mistake, you need to undo immediately. There’s no “preview” mode. So always work on a copy of your data if you’re not sure.

Method 2: The UNIQUE Function (Excel 365 / Microsoft 365)

If you have a modern version of Excel, the UNIQUE function is beautiful:

=UNIQUE(A2:A100)

This returns a dynamic array of unique values in a new location. Your original data stays untouched. The function updates automatically if you change the source data.

You can even get fancy with it:

=UNIQUE(A2:C100, FALSE, FALSE)

The second argument (FALSE) means check uniqueness by rows (not columns). The third argument (FALSE) means return items that appear at least once (not items that appear exactly once).

Method 3: Conditional Formatting to Highlight Duplicates

Sometimes you don’t want to remove duplicates, you just want to see them. Select your range, go to Home > Conditional Formatting > Highlight Cells Rules > Duplicate Values. Excel highlights all duplicates in a color of your choice. Then you can review them manually before deciding what to delete.

This is really useful when duplicates might be partially legitimate. Like two customers with the same name but different email addresses. The highlight lets you investigate before deleting.

Removing Duplicates in Google Sheets

Google Sheets has caught up to Excel on this front. Here are the options:

Method 1: Remove Duplicates Feature

Select your range, then Data > Data cleanup > Remove duplicates. You get a dialog similar to Excel’s, where you choose which columns to check. It tells you how many duplicates it found and removed.

Method 2: The UNIQUE Function

Google Sheets has had the UNIQUE function longer than Excel:

=UNIQUE(A2:A100)

Same behavior: returns unique values in a new column without touching the original data. It’s dynamic, meaning the output updates when the source changes.

Method 3: Apps Script for Complex Deduplication

For power users, Google Apps Script can handle more complex scenarios:

function removeDuplicates() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var seen = {};
  var unique = [];
  
  for (var i = 0; i < data.length; i++) {
    var key = data[i].join('|');
    if (!seen[key]) {
      seen[key] = true;
      unique.push(data[i]);
    }
  }
  
  sheet.clearContents();
  sheet.getRange(1, 1, unique.length, unique[0].length).setValues(unique);
}

This checks entire rows for duplicates and is useful when you need custom logic (like case-insensitive matching or fuzzy matching).

Command Line Methods (For Developers)

If you’re comfortable with the terminal, these are blazing fast for large files.

Linux/macOS

sort file.txt | uniq > unique_file.txt

Classic Unix pipeline. sort puts identical lines next to each other, then uniq removes adjacent duplicates. But note: this changes the order of your file (because of the sort).

To remove duplicates while preserving original order:

awk '!seen[$0]++' file.txt > unique_file.txt

This awk one-liner keeps a hash of lines it’s seen. First occurrence passes through, subsequent occurrences get skipped. Fast and memory-efficient for files with millions of lines.

Windows PowerShell

Get-Content file.txt | Select-Object -Unique | Set-Content unique_file.txt

Or the sort-based approach:

Get-Content file.txt | Sort-Object -Unique | Set-Content unique_file.txt

The first one preserves order. The second one sorts and deduplicates.

When Deduplication Gets Tricky

Simple exact-match deduplication works great for most cases. But sometimes the real world makes it complicated:

Near-duplicates. “John Smith” and “John Smith” (double space) look the same to your eyes but are technically different strings. Good dedup tools handle whitespace normalization, but cheap ones don’t.

Encoding differences. “café” encoded in UTF-8 vs “café” with a composed vs decomposed accent character. They render the same but have different byte sequences. Unicode normalization matters here.

Trailing punctuation. “hello” and “hello,” and “hello.” might or might not be considered duplicates depending on your use case.

Mixed case. Already covered this, but it’s worth repeating: case-insensitive matching catches way more duplicates in real world data.

Leading/trailing whitespace. A line with an invisible space at the end looks identical to one without it, but string comparison says they’re different. Always trim before comparing.

This is another reason why a purpose-built tool beats a raw find-and-replace. TextSorter’s Remove Duplicates tool handles whitespace trimming and case sensitivity options out of the box.

What About Fuzzy Deduplication?

Fuzzy matching is when you want to catch near-duplicates that aren’t exact matches. Like “Jon Smith” and “John Smith.” Or “123 Main St” and “123 Main Street.”

This goes beyond what simple text tools do and usually requires specialized software or libraries (like Python’s fuzzywuzzy or rapidfuzz). If you’re dealing with fuzzy duplicates in a business context, you’re looking at data quality platforms or some custom scripting.

For most everyday text cleanup, exact matching (with case-insensitive option) handles 95% of scenarios. The other 5% is a whole different rabbit hole.

Quick Comparison: Which Method to Use

ScenarioBest Method
Quick cleanup of a pasted listOnline tool (TextSorter)
Spreadsheet with multiple columnsExcel/Sheets Remove Duplicates
Need to keep original data intactUNIQUE function
Large file (millions of lines)Command line (awk)
Complex matching rulesCustom script
Just want to see duplicates, not removeConditional formatting

The One Tip That Saves the Most Time

Before deduplicating, clean your data first. Trim whitespace, standardize case, remove extra spaces. Then deduplicate. You’ll catch way more duplicates this way because you’ve eliminated the trivial differences that make identical items look different.

TextSorter has tools for this workflow: start with the Clean Text tool to normalize your text, then use Remove Duplicates to strip the copies. Two steps, and your list is pristine.

Remove duplicate lines from your text now →